You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@d70-t : "Maybe it's not a good timing, but I thought I'll post it here for reference. I was looking a bit throughout several documents on how the IPFS storage layer works with respect to chunking (IPFS splits up files into chunks which are transported independently). The thing is that zarr is also creating chunks, and chunks made by zarr are most probably better than chunks made by the IPFS storage layer as zarr is able to consider the multi-dimensional nature of the data. But if both systems are creating chunks, chances are high that the chunks sizes don't match up which would result in more chunks and more unnecessarily transferred data. In particular, if a file contains only one chunk, there is only one item to transfer, but as soon as a second chunk is added, another item (the chunk index) is added to the system, which effectively results in 3 times the number of network transfers.
So for storing zarr on IPFS, what I think works best is if each zarr chunk corresponds to one IPFS chunk. Currently (and since quite some time), IPFS uses a default chunk size of 256kiB or exactly 262144 bytes. This can be changed manually, but there is a hard limit of 1MiB per transferred item. There currently is also a 14 byte header per data item (which is proposed to be removed in the future). Technically the 14 bytes are additional to the 256kiB, but there are some ideas why it could be more performant to actually limit the total of chunk size + header to a power of 2.
The bottom line is, if you plan to put JOANNE via zarr on IPFS, I'd expect optimal performance if each of the zarr data chunk files is less or equal to 262130 bytes in size. If that doesn't work out for some reason, then the chunk size should be way larger (i.e. at least 4 to 8 times that size), as for example 262145 bytes would already result in 3 times as much individual transfers with default IPFS settings."
The text was updated successfully, but these errors were encountered:
This is still a bit rough, but I've collected a few scripts which should assist in this issue at d70-t/ipfszarr. I've converted JOANNE v0.9.2 for testing purposes using nc2zarr.py -O2 and it could be worse :-)
@d70-t : "Maybe it's not a good timing, but I thought I'll post it here for reference. I was looking a bit throughout several documents on how the IPFS storage layer works with respect to chunking (IPFS splits up files into chunks which are transported independently). The thing is that zarr is also creating chunks, and chunks made by zarr are most probably better than chunks made by the IPFS storage layer as zarr is able to consider the multi-dimensional nature of the data. But if both systems are creating chunks, chances are high that the chunks sizes don't match up which would result in more chunks and more unnecessarily transferred data. In particular, if a file contains only one chunk, there is only one item to transfer, but as soon as a second chunk is added, another item (the chunk index) is added to the system, which effectively results in 3 times the number of network transfers.
So for storing zarr on IPFS, what I think works best is if each zarr chunk corresponds to one IPFS chunk. Currently (and since quite some time), IPFS uses a default chunk size of 256kiB or exactly 262144 bytes. This can be changed manually, but there is a hard limit of 1MiB per transferred item. There currently is also a 14 byte header per data item (which is proposed to be removed in the future). Technically the 14 bytes are additional to the 256kiB, but there are some ideas why it could be more performant to actually limit the total of chunk size + header to a power of 2.
The bottom line is, if you plan to put JOANNE via zarr on IPFS, I'd expect optimal performance if each of the zarr data chunk files is less or equal to 262130 bytes in size. If that doesn't work out for some reason, then the chunk size should be way larger (i.e. at least 4 to 8 times that size), as for example 262145 bytes would already result in 3 times as much individual transfers with default IPFS settings."
The text was updated successfully, but these errors were encountered: