Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] : Save ImageGrainCrops in .topostats files #1102

Open
SylviaWhittle opened this issue Mar 4, 2025 · 2 comments
Open

[feature] : Save ImageGrainCrops in .topostats files #1102

SylviaWhittle opened this issue Mar 4, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@SylviaWhittle
Copy link
Collaborator

Is your feature request related to a problem?

We need a way to be able to save and load ImageGrainCrops objects into and out of .topostats files for reproducible work.

@ns-rse and I have been pondering it and think a simple solution of nesting keys and data in a structure similar to that of how the classes are laid out, would be sensible. It would require boilerplate code but probably the simplest solution than trying to make in Class mechanics into the file directly.

Max is also in need of this when we can do it.

Describe the solution you would like.

Add simple packer - unpacker functions to allow ImageGrainCrop class objects to be serialized into .topostats HDF5 format.

Describe the alternatives you have considered.

Trying to be clever and listing the class behavior in the file structure - I believe Tensorflow might do this with their models? Ie listing what class an object should be, as metadata alongside the data as opposed to hardcoding it in a SCHEMA in TopoStats. But this would be more planning & work.

Additional context

N/A

@SylviaWhittle SylviaWhittle added the enhancement New feature or request label Mar 4, 2025
@ns-rse ns-rse self-assigned this Mar 5, 2025
@ns-rse
Copy link
Collaborator

ns-rse commented Mar 5, 2025

After discussion @SylviaWhittle and I have agreed that we should be looking to structure HDF5 output differently to its current structure.

Current Structure

❱ h5glance tests/resources/file.topostats -d 5 | cat
tests/resources/file.topostats
├filename	[ASCII string: scalar]
├grain_masks
│ └above	[int64: 1024 × 1024]
├grain_trace_data
│ └above
│   ├cropped_images
│   │ ├0	[float64: 63 × 83]
│   │ ├1	[float64: 85 × 64]
│   │ ├10	[float64: 50 × 114]
│   │ ├11	[float64: 64 × 104]
│   │ ├12	[float64: 100 × 66]
│   │ ├13	[float64: 95 × 79]
│   │ ├14	[float64: 56 × 108]
│   │ ├15	[float64: 99 × 70]
│   │ ├16	[float64: 70 × 99]
│   │ ├17	[float64: 73 × 100]
│   │ ├18	[float64: 94 × 77]
│   │ ├19	[float64: 87 × 83]
│   │ ├2	[float64: 65 × 111]
│   │ ├20	[float64: 61 × 104]
│   │ ├3	[float64: 92 × 83]
│   │ ├4	[float64: 92 × 85]
│   │ ├5	[float64: 90 × 64]
│   │ ├6	[float64: 94 × 75]
│   │ ├7	[float64: 82 × 97]
│   │ ├8	[float64: 75 × 86]
│   │ └9	[float64: 96 × 80]
│   ├ordered_trace_cumulative_distances
│   │ ├0	[float64: 95]
│   │ ├1	[float64: 116]
│   │ ├10	[float64: 97]
│   │ ├11	[float64: 157]
│   │ ├12	[float64: 199]
│   │ ├13	[float64: 197]
│   │ ├14	[float64: 173]
│   │ ├15	[float64: 170]
│   │ ├16	[float64: 200]
│   │ ├17	[float64: 139]
│   │ ├18	[float64: 191]
│   │ ├19	[float64: 150]
│   │ ├2	[float64: 171]
│   │ ├20	[float64: 146]
│   │ ├3	[float64: 116]
│   │ ├4	[float64: 182]
│   │ ├5	[float64: 116]
│   │ ├6	[float64: 182]
│   │ ├7	[float64: 84]
│   │ ├8	[float64: 110]
│   │ └9	[float64: 148]
│   ├ordered_trace_heights
│   │ ├0	[float64: 95]
│   │ ├1	[float64: 116]
│   │ ├10	[float64: 97]
│   │ ├11	[float64: 157]
│   │ ├12	[float64: 199]
│   │ ├13	[float64: 197]
│   │ ├14	[float64: 173]
│   │ ├15	[float64: 170]
│   │ ├16	[float64: 200]
│   │ ├17	[float64: 139]
│   │ ├18	[float64: 191]
│   │ ├19	[float64: 150]
│   │ ├2	[float64: 171]
│   │ ├20	[float64: 146]
│   │ ├3	[float64: 116]
│   │ ├4	[float64: 182]
│   │ ├5	[float64: 116]
│   │ ├6	[float64: 182]
│   │ ├7	[float64: 84]
│   │ ├8	[float64: 110]
│   │ └9	[float64: 148]
│   ├ordered_traces
│   │ ├0	[int64: 95 × 2]
│   │ ├1	[int64: 116 × 2]
│   │ ├10	[int64: 97 × 2]
│   │ ├11	[int64: 157 × 2]
│   │ ├12	[int64: 199 × 2]
│   │ ├13	[int64: 197 × 2]
│   │ ├14	[int64: 173 × 2]
│   │ ├15	[int64: 170 × 2]
│   │ ├16	[int64: 200 × 2]
│   │ ├17	[int64: 139 × 2]
│   │ ├18	[int64: 191 × 2]
│   │ ├19	[int64: 150 × 2]
│   │ ├2	[int64: 171 × 2]
│   │ ├20	[int64: 146 × 2]
│   │ ├3	[int64: 116 × 2]
│   │ ├4	[int64: 182 × 2]
│   │ ├5	[int64: 116 × 2]
│   │ ├6	[int64: 182 × 2]
│   │ ├7	[int64: 84 × 2]
│   │ ├8	[int64: 110 × 2]
│   │ └9	[int64: 148 × 2]
│   └splined_traces
│     ├0	[float64: 1330 × 2]
│     ├1	[float64: 1624 × 2]
│     ├10	[float64: 1358 × 2]
│     ├11	[float64: 2198 × 2]
│     ├12	[float64: 2786 × 2]
│     ├13	[float64: 2758 × 2]
│     ├14	[float64: 2422 × 2]
│     ├15	[float64: 2380 × 2]
│     ├16	[float64: 2800 × 2]
│     ├17	[float64: 1946 × 2]
│     ├18	[float64: 2674 × 2]
│     ├19	[float64: 2100 × 2]
│     ├2	[float64: 2394 × 2]
│     ├20	[float64: 2044 × 2]
│     ├3	[float64: 1624 × 2]
│     ├4	[float64: 2548 × 2]
│     ├5	[float64: 1624 × 2]
│     ├6	[float64: 2548 × 2]
│     ├7	[float64: 1176 × 2]
│     ├8	[float64: 1540 × 2]
│     └9	[float64: 2072 × 2]
├image	[float64: 1024 × 1024]
├image_original	[float64: 1024 × 1024]
├img_path	[ASCII string: scalar]
├pixel_to_nm_scaling	[float64: scalar]
└topostats_file_version	[float64: scalar]

This means each time something "new" is to be incorporated a level of nesting is added three deep and within that an
entry is made for each grain.

Proposed Structure

We are proposing switching this round so that the third level of nesting is the grain and each grain then has a series
of properties as the unit of interest is not really the whole scan the individual grains about which we wish to know
something about (and the multiple grains from a scan form a population from which certain metrics can be estimated).

Instead of the above structure we are suggesting the following...

❱ h5glance tests/resources/file.topostats -d 5 | cat
tests/resources/file.topostats
├filename	[ASCII string: scalar]
├grain_masks
│ └above	[int64: 1024 × 1024]
├grain_trace_data
│ └above
│   ├0
│   │ ├cropped_images                       [float64: 63 × 83]
│   │ ├ordered_trace_cumulative_distances   [float64: 95]
│   │ ├ordered_trace_heights                [float64: 95]
│   │ ├ordered_traces                       [int64: 95 × 2]
│   │ ├splined_traces                       [float64: 1330 × 2]
│   │ ├grain_statistics                     [type: n x c]
│   │ ├something_yet_to_be_determined       [type: n x c]
│   │ └tensor                               [int32: 63 x 83 x c]
│   ├1
│   │ ├cropped_images                       [float64: 85 × 64]
│   │ ├ordered_trace_cumulative_distances   [float64: 116]
│   │ ├ordered_trace_heights                [float64: 116]
│   │ ├ordered_traces                       [int64: 116 × 2]
│   │ ├splined_traces                       [float64: 1624 × 2]
│   │ ├grain_statistics                     [type: n x c]
│   │ ├something_yet_to_be_determined       [type: n x c]
│   │ └tensor                               [int32: 85 x 64 x c]
...

@ns-rse
Copy link
Collaborator

ns-rse commented Mar 11, 2025

Started looking at this and I think there are two components.

  1. Modify io.dict_to_hdf5() to handle ImageGrainCrops, GrainCropsDirection and GrainCrop.
  2. Restructure the following items to add their output to the correct GrainCrop...
  • processing.run_disordered_tracing() - currently returns disordered_traces a dictionary with above and below
    which in turn contains nested data.
  • processing.run_nodestats() - currently returns nodestats_whole_data again a dictionary with above and below
    which in turn contains nested data.
  • processing.run_ordered_tracing() - currently returns ordered_tracing_image_data again a dictionary with above
    and below which in turn contains nested data.
  • processing.run_splining() currently returns splined_image_data again a dictionary with above and below which
    in turn contains nested data.
  • processing.run_curvature_stats() - currently returns all_directions_grains_curvature_stats_dict again a
    dictionary with above and below which in turn contains nested data.

Modify io.dict_to_hdf5()

Made a start on this but its untested, see 917b1d6b2 on ns-rse/1102-imagegraincrops-to-hdf5, but to save having to
look that up the additions are (along with more debugging statements)...

+        if isinstance(
+            item,
+            (
+                list,
+                str,
+                int,
+                float,
+                np.ndarray,
+                Path,
+                dict,
+                grains.GrainCrop,
+                grains.GrainCropsDirection,
+                grains.ImageGrainCrops,
+            ),
+        ):  # noqa: UP038
             # Lists need to be converted to numpy arrays
             if isinstance(item, list):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 item = np.array(item)
                 open_hdf5_file[group_path + key] = item
             # Strings need to be encoded to bytes
             elif isinstance(item, str):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 open_hdf5_file[group_path + key] = item.encode("utf8")
             # Integers, floats and numpy arrays can be added directly to the hdf5 file
             # Ruff wants us to use the pipe operator here but it isn't supported by python 3.9
             elif isinstance(item, (int, float, np.ndarray)):  # noqa: UP038
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 open_hdf5_file[group_path + key] = item
             # Path objects need to be encoded to bytes
             elif isinstance(item, Path):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 open_hdf5_file[group_path + key] = str(item).encode("utf8")
+            # Extract ImageGrainCrops
+            elif isinstance(item, grains.ImageGrainCrops):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.above)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.below)
+            elif isinstance(item, grains.GrainCropsDirection):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.crops)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.full_mask_tensor)
+            elif isinstance(item, grains.GrainCrop):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.image)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.mask)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.bbox)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.pixel_to_nm_scaling)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.padding)
             # Dictionaries need to be recursively saved
             elif isinstance(item, dict):  # a sub-dictionary, so we need to recurse
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 dict_to_hdf5(open_hdf5_file, group_path + key + "/", item)

...my thinking being that as we process a scan we have ImageGrainCrops replacing the dictionaries that we build and
pass around and at the end these are expanded. The above isn't fully formed though as key will need expanding to
include the type of item that is being passed.

Can probably write some tests soon to get this tested and working.

Restructuring

This is going to be a much more involved piece of work I think and will involve...

  • Adding @property and @<property>.setter for each object to GrainCrop class that we wish subsequently.
  • Passing ImageGrainCrops into each step call and setting the derived items in the correct place.
  • Removing the building of dictionaries in each of the processing.run_*() functions mentioned above.

This will take a bit longer to do and is more involved but is totally worth doing as it then makes each grain a "unit" with all of the information in one place rather than a series of dictionaries each of which holds some information on each grain. It then makes it considerably easier to access all the information about a given grain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants