[feature] : Save ImageGrainCrops in .topostats files #1102

SylviaWhittle · 2025-03-04T16:07:47Z

Is your feature request related to a problem?

We need a way to be able to save and load ImageGrainCrops objects into and out of .topostats files for reproducible work.

@ns-rse and I have been pondering it and think a simple solution of nesting keys and data in a structure similar to that of how the classes are laid out, would be sensible. It would require boilerplate code but probably the simplest solution than trying to make in Class mechanics into the file directly.

Max is also in need of this when we can do it.

Describe the solution you would like.

Add simple packer - unpacker functions to allow ImageGrainCrop class objects to be serialized into .topostats HDF5 format.

Describe the alternatives you have considered.

Trying to be clever and listing the class behavior in the file structure - I believe Tensorflow might do this with their models? Ie listing what class an object should be, as metadata alongside the data as opposed to hardcoding it in a SCHEMA in TopoStats. But this would be more planning & work.

Additional context

N/A

The text was updated successfully, but these errors were encountered:

ns-rse · 2025-03-05T10:32:30Z

After discussion @SylviaWhittle and I have agreed that we should be looking to structure HDF5 output differently to its current structure.

Current Structure

❱ h5glance tests/resources/file.topostats -d 5 | cat
tests/resources/file.topostats
├filename	[ASCII string: scalar]
├grain_masks
│ └above	[int64: 1024 × 1024]
├grain_trace_data
│ └above
│   ├cropped_images
│   │ ├0	[float64: 63 × 83]
│   │ ├1	[float64: 85 × 64]
│   │ ├10	[float64: 50 × 114]
│   │ ├11	[float64: 64 × 104]
│   │ ├12	[float64: 100 × 66]
│   │ ├13	[float64: 95 × 79]
│   │ ├14	[float64: 56 × 108]
│   │ ├15	[float64: 99 × 70]
│   │ ├16	[float64: 70 × 99]
│   │ ├17	[float64: 73 × 100]
│   │ ├18	[float64: 94 × 77]
│   │ ├19	[float64: 87 × 83]
│   │ ├2	[float64: 65 × 111]
│   │ ├20	[float64: 61 × 104]
│   │ ├3	[float64: 92 × 83]
│   │ ├4	[float64: 92 × 85]
│   │ ├5	[float64: 90 × 64]
│   │ ├6	[float64: 94 × 75]
│   │ ├7	[float64: 82 × 97]
│   │ ├8	[float64: 75 × 86]
│   │ └9	[float64: 96 × 80]
│   ├ordered_trace_cumulative_distances
│   │ ├0	[float64: 95]
│   │ ├1	[float64: 116]
│   │ ├10	[float64: 97]
│   │ ├11	[float64: 157]
│   │ ├12	[float64: 199]
│   │ ├13	[float64: 197]
│   │ ├14	[float64: 173]
│   │ ├15	[float64: 170]
│   │ ├16	[float64: 200]
│   │ ├17	[float64: 139]
│   │ ├18	[float64: 191]
│   │ ├19	[float64: 150]
│   │ ├2	[float64: 171]
│   │ ├20	[float64: 146]
│   │ ├3	[float64: 116]
│   │ ├4	[float64: 182]
│   │ ├5	[float64: 116]
│   │ ├6	[float64: 182]
│   │ ├7	[float64: 84]
│   │ ├8	[float64: 110]
│   │ └9	[float64: 148]
│   ├ordered_trace_heights
│   │ ├0	[float64: 95]
│   │ ├1	[float64: 116]
│   │ ├10	[float64: 97]
│   │ ├11	[float64: 157]
│   │ ├12	[float64: 199]
│   │ ├13	[float64: 197]
│   │ ├14	[float64: 173]
│   │ ├15	[float64: 170]
│   │ ├16	[float64: 200]
│   │ ├17	[float64: 139]
│   │ ├18	[float64: 191]
│   │ ├19	[float64: 150]
│   │ ├2	[float64: 171]
│   │ ├20	[float64: 146]
│   │ ├3	[float64: 116]
│   │ ├4	[float64: 182]
│   │ ├5	[float64: 116]
│   │ ├6	[float64: 182]
│   │ ├7	[float64: 84]
│   │ ├8	[float64: 110]
│   │ └9	[float64: 148]
│   ├ordered_traces
│   │ ├0	[int64: 95 × 2]
│   │ ├1	[int64: 116 × 2]
│   │ ├10	[int64: 97 × 2]
│   │ ├11	[int64: 157 × 2]
│   │ ├12	[int64: 199 × 2]
│   │ ├13	[int64: 197 × 2]
│   │ ├14	[int64: 173 × 2]
│   │ ├15	[int64: 170 × 2]
│   │ ├16	[int64: 200 × 2]
│   │ ├17	[int64: 139 × 2]
│   │ ├18	[int64: 191 × 2]
│   │ ├19	[int64: 150 × 2]
│   │ ├2	[int64: 171 × 2]
│   │ ├20	[int64: 146 × 2]
│   │ ├3	[int64: 116 × 2]
│   │ ├4	[int64: 182 × 2]
│   │ ├5	[int64: 116 × 2]
│   │ ├6	[int64: 182 × 2]
│   │ ├7	[int64: 84 × 2]
│   │ ├8	[int64: 110 × 2]
│   │ └9	[int64: 148 × 2]
│   └splined_traces
│     ├0	[float64: 1330 × 2]
│     ├1	[float64: 1624 × 2]
│     ├10	[float64: 1358 × 2]
│     ├11	[float64: 2198 × 2]
│     ├12	[float64: 2786 × 2]
│     ├13	[float64: 2758 × 2]
│     ├14	[float64: 2422 × 2]
│     ├15	[float64: 2380 × 2]
│     ├16	[float64: 2800 × 2]
│     ├17	[float64: 1946 × 2]
│     ├18	[float64: 2674 × 2]
│     ├19	[float64: 2100 × 2]
│     ├2	[float64: 2394 × 2]
│     ├20	[float64: 2044 × 2]
│     ├3	[float64: 1624 × 2]
│     ├4	[float64: 2548 × 2]
│     ├5	[float64: 1624 × 2]
│     ├6	[float64: 2548 × 2]
│     ├7	[float64: 1176 × 2]
│     ├8	[float64: 1540 × 2]
│     └9	[float64: 2072 × 2]
├image	[float64: 1024 × 1024]
├image_original	[float64: 1024 × 1024]
├img_path	[ASCII string: scalar]
├pixel_to_nm_scaling	[float64: scalar]
└topostats_file_version	[float64: scalar]

This means each time something "new" is to be incorporated a level of nesting is added three deep and within that an
entry is made for each grain.

Proposed Structure

We are proposing switching this round so that the third level of nesting is the grain and each grain then has a series
of properties as the unit of interest is not really the whole scan the individual grains about which we wish to know
something about (and the multiple grains from a scan form a population from which certain metrics can be estimated).

Instead of the above structure we are suggesting the following...

❱ h5glance tests/resources/file.topostats -d 5 | cat
tests/resources/file.topostats
├filename	[ASCII string: scalar]
├grain_masks
│ └above	[int64: 1024 × 1024]
├grain_trace_data
│ └above
│   ├0
│   │ ├cropped_images                       [float64: 63 × 83]
│   │ ├ordered_trace_cumulative_distances   [float64: 95]
│   │ ├ordered_trace_heights                [float64: 95]
│   │ ├ordered_traces                       [int64: 95 × 2]
│   │ ├splined_traces                       [float64: 1330 × 2]
│   │ ├grain_statistics                     [type: n x c]
│   │ ├something_yet_to_be_determined       [type: n x c]
│   │ └tensor                               [int32: 63 x 83 x c]
│   ├1
│   │ ├cropped_images                       [float64: 85 × 64]
│   │ ├ordered_trace_cumulative_distances   [float64: 116]
│   │ ├ordered_trace_heights                [float64: 116]
│   │ ├ordered_traces                       [int64: 116 × 2]
│   │ ├splined_traces                       [float64: 1624 × 2]
│   │ ├grain_statistics                     [type: n x c]
│   │ ├something_yet_to_be_determined       [type: n x c]
│   │ └tensor                               [int32: 85 x 64 x c]
...

ns-rse · 2025-03-11T17:05:52Z

Started looking at this and I think there are two components.

Modify io.dict_to_hdf5() to handle ImageGrainCrops, GrainCropsDirection and GrainCrop.
Restructure the following items to add their output to the correct GrainCrop...

processing.run_disordered_tracing() - currently returns disordered_traces a dictionary with above and below
which in turn contains nested data.
processing.run_nodestats() - currently returns nodestats_whole_data again a dictionary with above and below
which in turn contains nested data.
processing.run_ordered_tracing() - currently returns ordered_tracing_image_data again a dictionary with above
and below which in turn contains nested data.
processing.run_splining() currently returns splined_image_data again a dictionary with above and below which
in turn contains nested data.
processing.run_curvature_stats() - currently returns all_directions_grains_curvature_stats_dict again a
dictionary with above and below which in turn contains nested data.

Modify `io.dict_to_hdf5()`

Made a start on this but its untested, see 917b1d6b2 on ns-rse/1102-imagegraincrops-to-hdf5, but to save having to
look that up the additions are (along with more debugging statements)...

+        if isinstance(
+            item,
+            (
+                list,
+                str,
+                int,
+                float,
+                np.ndarray,
+                Path,
+                dict,
+                grains.GrainCrop,
+                grains.GrainCropsDirection,
+                grains.ImageGrainCrops,
+            ),
+        ):  # noqa: UP038
             # Lists need to be converted to numpy arrays
             if isinstance(item, list):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 item = np.array(item)
                 open_hdf5_file[group_path + key] = item
             # Strings need to be encoded to bytes
             elif isinstance(item, str):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 open_hdf5_file[group_path + key] = item.encode("utf8")
             # Integers, floats and numpy arrays can be added directly to the hdf5 file
             # Ruff wants us to use the pipe operator here but it isn't supported by python 3.9
             elif isinstance(item, (int, float, np.ndarray)):  # noqa: UP038
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 open_hdf5_file[group_path + key] = item
             # Path objects need to be encoded to bytes
             elif isinstance(item, Path):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 open_hdf5_file[group_path + key] = str(item).encode("utf8")
+            # Extract ImageGrainCrops
+            elif isinstance(item, grains.ImageGrainCrops):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.above)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.below)
+            elif isinstance(item, grains.GrainCropsDirection):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.crops)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.full_mask_tensor)
+            elif isinstance(item, grains.GrainCrop):
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.image)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.mask)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.bbox)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.pixel_to_nm_scaling)
+                dict_to_hdf5(open_hdf5_file, group_path + key + "/", item.padding)
             # Dictionaries need to be recursively saved
             elif isinstance(item, dict):  # a sub-dictionary, so we need to recurse
+                LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
                 dict_to_hdf5(open_hdf5_file, group_path + key + "/", item)

...my thinking being that as we process a scan we have ImageGrainCrops replacing the dictionaries that we build and
pass around and at the end these are expanded. The above isn't fully formed though as key will need expanding to
include the type of item that is being passed.

Can probably write some tests soon to get this tested and working.

Restructuring

This is going to be a much more involved piece of work I think and will involve...

Adding @property and @<property>.setter for each object to GrainCrop class that we wish subsequently.
Passing ImageGrainCrops into each step call and setting the derived items in the correct place.
Removing the building of dictionaries in each of the processing.run_*() functions mentioned above.

This will take a bit longer to do and is more involved but is totally worth doing as it then makes each grain a "unit" with all of the information in one place rather than a series of dictionaries each of which holds some information on each grain. It then makes it considerably easier to access all the information about a given grain.

SylviaWhittle added the enhancement New feature or request label Mar 4, 2025

ns-rse self-assigned this Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] : Save ImageGrainCrops in .topostats files #1102

[feature] : Save ImageGrainCrops in .topostats files #1102

SylviaWhittle commented Mar 4, 2025

ns-rse commented Mar 5, 2025 •

edited

Loading

ns-rse commented Mar 11, 2025 •

edited

Loading

[feature] : Save ImageGrainCrops in .topostats files #1102

[feature] : Save ImageGrainCrops in .topostats files #1102

Comments

SylviaWhittle commented Mar 4, 2025

Is your feature request related to a problem?

Describe the solution you would like.

Describe the alternatives you have considered.

Additional context

ns-rse commented Mar 5, 2025 • edited Loading

Current Structure

Proposed Structure

ns-rse commented Mar 11, 2025 • edited Loading

Modify io.dict_to_hdf5()

Restructuring

ns-rse commented Mar 5, 2025 •

edited

Loading

ns-rse commented Mar 11, 2025 •

edited

Loading

Modify `io.dict_to_hdf5()`