Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing complex numbers to netCDF #9246

Open
ZedThree opened this issue Jul 15, 2024 · 6 comments · May be fixed by #9509
Open

Writing complex numbers to netCDF #9246

ZedThree opened this issue Jul 15, 2024 · 6 comments · May be fixed by #9509

Comments

@ZedThree
Copy link
Contributor

Is your feature request related to a problem?

Currently, the only supported method for saving complex numbers to netCDF is to use the h5netcdf engine with invalid_netcdf=True.

The latest release of netCDF4, 1.7.1, now has support for complex numbers. Also, because it's built on the up-coming netCDF 4.9.3, it includes support for reading files written using h5netcdf with invalid_netcdf=True (and I think for pretty much all other types that fall under this, including enums and bools).

This means that there are now more options for users wanting to read/write complex numbers, although this depends on the version of netCDF4 in use and the flags used to open files.

Just to note that the complex number support in netCDF4 is done via converting to/from a compound datatype (or a complex dimension for netCDF-3 files), and is completely standard netCDF -- it's basically just a wrapper in the Python API.

Describe the solution you'd like

Either expose the auto_complex keyword for netCDF4.Dataset to the backend or set this automatically. This will either require using netCDF4>=1.7.1 or detecting the version.

Additionally, invalid_netcdf will no longer be strictly necessary for bools and complex numbers for h5netcdf, although again this will be dependent on the version of netCDF used -- this is sort of outside of xarray though.

I'm happy to implement this as it's pretty trivial, but is it better

Describe alternatives you've considered

No response

Additional context

There is currently an issue mixing netCDF4==1.7.1 and h5py==3.11.0 in the same python process, see: Unidata/netcdf4-python#1343 and h5py/h5py#2453. This might need to be resolved before xarray can require netCDF4==1.7.1

Any insights on how to resolve this would be much appreciated!

@shoyer
Copy link
Member

shoyer commented Jul 16, 2024

Very exciting to see progress on this! Thanks for pushing this upstream.

Just to confirm, what is the minimum version of netCDF-C required for this functionality?

@ZedThree
Copy link
Contributor Author

Just to confirm, what is the minimum version of netCDF-C required for this functionality?

For complex written via netCDF4: no minimum version, uses standard netCDF functionality (does depend on which netCDF-3/4 file format you use though)

For complex written via h5netcdf/h5py: 4.9.3 (unreleased, current main branch on github, but release expected imminently)

@shoyer
Copy link
Member

shoyer commented Jul 17, 2024

What about support for tools that read netCDF via the netCDF-C library without using Python, e.g., ncdump?

I ask because I want to clarify that support for complex numbers is "official" for netCDF, which as far as I understand basically means that it's in the netCDF-C library. Right now, the documentation for netCDF4-Python suggests that complex number support requires the nc-complexhttps://github.com/PlasmaFAIR/nc-complex).

@ZedThree
Copy link
Contributor Author

Ah sorry, to be clear, support for complex numbers is still not "official" for netCDF and not available directly in the netCDF-C library. The official Python interface has some helper functionality through the auto_complex keyword. The representation on disk uses existing netCDF-C functionality, with nc-complex handling the conversions between complex types and netCDF datatypes. This means that using nc-complex to write complex numbers doesn't affect other tools ability to read those files.

netCDF4.Dataset has auto_complex=False by default to emphasise that this is a helper utility.

ncdump will show the underlying representation:

import netCDF4

complex_array = np.array([0 + 0j, 1 + 0j, 0 + 1j, 1 + 1j, 0.25 + 0.75j])

with netCDF4.Dataset("complex.nc", "w", auto_complex=True) as nc:
    nc.createDimension("x", size=len(complex_array))
    var = nc.createVariable("data", "c16", ("x", ))
    var[:] = complex_array

With the current netCDF-C release, 4.9.2:

$ ncdump complex.nc 
netcdf complex {
types:
  compound _PFNC_DOUBLE_COMPLEX_TYPE {
    double r ;
    double i ;
  }; // _PFNC_DOUBLE_COMPLEX_TYPE
dimensions:
        x = 5 ;
variables:
        _PFNC_DOUBLE_COMPLEX_TYPE data(x) ;
data:

 data = {0, 0}, {1, 0}, {0, 1}, {1, 1}, {0.25, 0.75} ;
}

There's nothing special about this file, and it should be readable by any tool that can read standard netCDF-4 files, but they'll get a compound datatype:

with netCDF4.Dataset("complex.nc", "r") as nc:
    print(nc["data"])
    print(nc["data"][:])

# <class 'netCDF4._netCDF4.Variable'>
# compound data(x)
# compound data type: {'names': ['r', 'i'], 'formats': ['<f8', '<f8'], 'offsets': [0, 8], 'itemsize': 16, 'aligned': True}
# unlimited dimensions: 
# current shape = (5,)
# [(0.  , 0.  ) (1.  , 0.  ) (0.  , 1.  ) (1.  , 1.  ) (0.25, 0.75)]


with netCDF4.Dataset("complex.nc", "r", auto_complex=True) as nc:
    print(nc["data"])
    print(nc["data"][:])

# <class 'netCDF4._netCDF4.Variable'>
# compound data(x)
# compound data type: complex128
# unlimited dimensions: 
# current shape = (5,)
# [0.  +0.j   1.  +0.j   0.  +1.j   1.  +1.j   0.25+0.75j]

Creating the same file with h5netcdf and invalid_netcdf=True, and using the development version of netCDF-C's ncdump:

netcdf complex {
types:
  compound _AnonymousCompound1 {
    double r ;
    double i ;
  }; // _AnonymousCompound1
dimensions:
        x = 5 ;
variables:
        _AnonymousCompound1 data(x) ;
data:

 data = {0, 0}, {1, 0}, {0, 1}, {1, 1}, {0.25, 0.75} ;
}

Literally the only difference here is netCDF4/nc-complex creates a named type, while h5py uses an unnamed type that the netCDF-C library gives a temporary name to. Any tool that uses this version of netCDF-C (or uses HDF5) will be able to read this file.

(In principle, h5py could commit this datatype to the file, and then current versions of netCDF would be able to read it too)


Lastly, HDF5 is now adding native support for complex numbers that will also be able to read the {r, i} compound datatype used by h5py and nc-complex. I believe this will be in 1.15. After that, we might be able to get native support into netCDF-C, but I suspect that will take some time.

Sorry for the info dump, there's multiple libraries and versions thereof, so I just wanted to be really clear about what all the different capabilities are!

@shoyer
Copy link
Member

shoyer commented Jul 17, 2024

(In principle, h5py could commit this datatype to the file, and then current versions of netCDF would be able to read it too)

This would be very nice!

Overall, it seems that things have changed for the better since when I first added support for writing invalid netCDF into h5netcdf, because netCDF-C can now read the data associated with these HDF5 enum types.

At this point, I would say complex number support could be considered a "convention" for writing netCDF files. That means we could add direct support for reading/writing complex numbers in Xarray, without requiring invalid_netcdf=True.

@ZedThree
Copy link
Contributor Author

Ok, I'll have a stab at a PR for this then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants