-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about using xapres with large ApRES datasets #80
Comments
Thank you for the quick and detailed reply, this sounds great! The documentation is already really helpful and so far xapres worked quite convenient to load in the data. As you suggested, I will start working on a subset, while looking forward to your documentation on dealing with big data. Our timeseries data for one site and year have a size of ~11 GB. I am probably going to do the computing on a virtual desktop infrastructure, which mimics a local computer, but runs on a server. Therefore, I assume additional cloud computing will not be needed, but I can't really tell, before testing it. Because of this, I thought it is probably best to first figure out how to work locally with zarr files. Finally, I noticed that the latest release of the package was in 2023 and some code examples in the documentation are currently not working when you import xapres via pip. Probably you have a new release planned, anyway, and I solved this by manually installing the current version, but I just thought to let you know. |
Yes, I recently released a new version (0.3.0) that should have all the features described in the docs. As your data is 'only' 11 GB you can probably load all the chirps into memory at once, but might struggle to compute the profiles in one go without using dask. I had a very similar situation with some ApRES data from Thwaites recently. My workflow is described here: https://github.com/ldeo-glaciology/TG_apres/blob/main/notebooks/dats_to_zarr_unattended.ipynb Working locally might be a good way of doing it, yes. |
Correction: I have had some difficulty getting the release to work properly so I recommend continuing to use the GitHub version which is the latest version. Looking forward to hearing how you get on! 😀 |
The issue with the release has been solved (#95). If you need it |
Sorry for the late reply, this and next week I am forced to mainly work on another project. Thank you for sharing the notebook, by strictly following this code, I managed to store the data as a zarr file. My main problem was that I only had tried to store the data after further processing, but then the to_zarr function is not defined anymore. So operating with zarr works now for a subdataset. In February, I will try to expand it to the whole dataset. Great, thank you that you fixed the release! |
That's great you got the data saved as a zarr. Just for the future, if your processing outputs results as xarray datasets or dataarrays you should be able to save to zarr in just the same way. |
Hi Jonny, I finally could get back into the processing of my time series and got everything working regarding storing the data as zarr as subsets and then combining them into one zarr after executing the FFT processing of the subsets. I mostly followed your example scripts but adjusted it a bit to operate locally. I attache the code below if it's helpful to anyone. And you are right, storing dataarrays to zarr works fine. I just had trouble to reload them, because then the built-in processing methods would not work for me anymore. Probably I just did something wrong with the reloading, but I noticed that I don't really need it anyway. With this, I think this issue can be closed. Thank you again for your help!
|
That's great news! When you mention
after loading the results into memory for a zarr, do you mean that when you called a method, e.g., In other words, there is nothing special about the xarray dataarrays and datasets that the xapres package produces, but the package does add these methods when the package is loaded. So if you save the data, then start a new session, and then reload the data from the zarr, the methods will not be there (the methods are not saved with the data in the zarrs unfortunately). If something different is happening and the methods arent working for a different reason, I would like to work out why that's happening. Thanks again! |
@oraschewski started the following conversation about using xapres with large ApRES datasets. I moved it here so it can be useful to others.
@oraschewski:
@jkingslake:
The text was updated successfully, but these errors were encountered: