Support reading and writing DICOM whole slide images #3742

melissalinkert · 2021-11-10T17:04:57Z

Significantly reworks the DICOM reader and adds a new writer so that WSI data is supported.

Relevant test data is in inbox/dicom-wsi. There are a ton of datasets there, but in general the newer sets (LUNG..., pixelmed-converted, WG26/3DHISTECH, WG26/Leica_GT450) will be better for testing by hand. Most are copied from ftp://dicom.nema.org/MEDICAL/Dicom/DataSets/WG26, so publicly available.

Whole slide datasets will have one or more files per resolution, with optional additional files that contain the label, overview, etc. Multiple pyramids per dataset are possible if an extended depth of field image was created from a Z stack. Both brightfield and fluorescence slides are supported.

Converting non-DICOM slides to DICOM is supported by the new DicomWriter, using typical bfconvert options. The expected output is one file per resolution or extra (label/macro/etc.) image. Some changes were needed in ImageConverter to make the tile writing logic less TIFF-specific.

There is also a new bfconvert option to turn off the sequential-writing assumption (3208d92). This doesn't change the actual tile writing order, just the assumption that the writer uses. For DICOM, this is helpful for testing the two tile position storage modes: TILED_FULL and TILED_SPARSE. The former implicitly defines the positions; all tiles are stored, and left-to-right, top-to-bottom. The latter explicitly records positions, which results in a larger file (more metadata), but allows missing tiles and/or non-sequential storage.

There is obviously a lot to go through here, so happy to have a discussion if that's helpful.

This will definitely impact memo files so should not be considered for a patch release. Most test issues will be addressed in a forthcoming config PR, and are primarily updates to image names/descriptions and a few pixel hash changes due to differences in file sorting (by metadata now instead of by file name/pattern). There are a couple of specific files which will still cause failures, and a proposed solution will be noted on the config PR.

I have it on my to-do list to go through outstanding DICOM issues to verify whether or not this helps. In particular:

Very basic support so far, still needs quite a bit of work (see TODOs).

Basic tests similar to "bfconvert test.fake test.dcm" should result in a file that dcdump and ImageMagick can handle.

…me directory

"bfconvert -tilex 512 -tiley 512 input output.dcm" should now produce 512x512 tiles as expected.

The pixel data should now be properly encapsulated as required by the transfer syntax. See http://dicom.nema.org/dicom/2013/output/chtml/part05/sect_A.4.html

Still need to figure out a better way to set the Image Type for extra images though.

-no-sequential won't change the order in which planes/tiles are written, just the writer's assumption of plane/tile order.

There are a few compromises here. One is that the DicomAttribute enum does not contain the entire dictionary, but a subset that is most likely to be explicitly referenced in the reader/writer. The very large size of the dictionary combined with Java's limitations on bytecode size means that a single enum with the whole dictionary is not possible. The other compromise is that the original metadata table does not contain every tag in every file of a dataset, as that can quickly lead to memory exhaustion (especially once the table is included in OME-XML). DicomReader now has a "List<DicomTag> getTags()" method that can be used to access the tag hierarchy directly, should an application need to do so.

sbesson · 2021-11-11T14:00:26Z

components/formats-bsd/src/loci/formats/in/DicomTag.java

+          return 8;
+        }
+      default:
+        throw new IllegalArgumentException(vr.toString());


As a result of the DICOM reader priority increase in readers.txt, this exception is now being thrown across several file formats during the isThisType check.

See e.g. bd-pathway, deltavision for examples of affected filesets.

I'd expect f75b8dc, 84e1a80, and fc13799 to resolve unexpected test failures, but will check the results tomorrow now that builds are running again.

fedorov · 2021-11-12T21:40:37Z

@melissalinkert would it be possible to acknowledge that development of this functionality was supported by NCI Imaging Data Commons, and include the wording for the funding source?

melissalinkert · 2021-11-15T18:28:29Z

@fedorov : we can certainly add a funding acknowledgement. Typically we do this in the release notes via a pull request to https://github.com/ome/bio-formats-documentation. An example is the New file formats section of the 6.6.0 notes:

https://docs.openmicroscopy.org/bio-formats/6.7.0/about/whats-new.html#december

If you have specific acknowledgement text you prefer, or a line in the release notes is insufficient, just let us know.

fedorov · 2021-11-15T18:47:33Z

Melissa, thank you for the pointer, this is very helpful. Let me check with your project leads at NCI/Leidos what is their preference, and I will get back to you towards the end of the week.

sbesson · 2021-11-16T13:58:12Z

components/formats-bsd/src/loci/formats/in/DicomReader.java

@@ -65,78 +66,30 @@
 import ome.units.quantity.Length;
 import ome.units.UNITS;

+import static loci.formats.in.DicomAttribute.*;
+import static loci.formats.in.DicomVR.*;


As mentioned during the weekly Formats discussion, immediate question is whether this API handling the specifics of the DICOM specification should be managed in a dedicated package similarly to loci.formats.tiff.

625f8cc reorganizes into a new loci.formats.dicom package.

snoopycrimecop · 2021-11-18T00:05:07Z

Conflicting PR. Removed from build BIOFORMATS-push#1041. See the console output for more details.
Possible conflicts:

PR Use multiple-of-16 tile dimensions when writing TIFFs #3709 jmuhlich 'Use multiple-of-16 tile dimensions when writing TIFFs'
- components/bio-formats-tools/src/loci/formats/tools/ImageConverter.java

--conflicts

snoopycrimecop · 2021-11-19T00:04:47Z

Conflicting PR. Removed from build BIOFORMATS-push#1042. See the console output for more details.
Possible conflicts:

PR Use multiple-of-16 tile dimensions when writing TIFFs #3709 jmuhlich 'Use multiple-of-16 tile dimensions when writing TIFFs'
- components/bio-formats-tools/src/loci/formats/tools/ImageConverter.java

--conflicts

snoopycrimecop · 2021-11-25T00:04:00Z

Conflicting PR. Removed from build BIOFORMATS-push#1049. See the console output for more details.
Possible conflicts:

PR Use multiple-of-16 tile dimensions when writing TIFFs #3709 jmuhlich 'Use multiple-of-16 tile dimensions when writing TIFFs'
- components/bio-formats-tools/src/loci/formats/tools/ImageConverter.java

~~--conflicts~~ Conflict resolved in build BIOFORMATS-push#1054. See the console output for more details.

…icom-wsi-develop

fedorov · 2021-11-30T21:25:08Z

Melissa, I am not sure how documentation updates are handled - would that be done in a separate repo after this PR is merged, to document improvements to DICOM support?

fedorov · 2021-11-30T22:07:51Z

Melissa, regarding the acknowledgment, I checked with our federal leads, and they would like to have the following acknowledgment, if possible:

"This project has been funded in whole or in part with Federal funds from the National Cancer Institute,
National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN2612015000031."

cc: @ulrikew

My personal wish would also be to hyperlink "Imaging Data Commons", so I suggest the following wording:

"This functionality was implemented through collaboration with NCI Imaging Data Commons, and has been funded in whole or in part with Federal funds from the National Cancer Institute,
National Institutes of Health, under Task Order No. HHSN26110071 under Contract No. HHSN2612015000031."

dgault · 2021-12-02T19:38:36Z

Builds and tests have been green with this PR included and the merge-ci repo tests are now green also with the associated config. Overall the PR looks to be in good shape and suitable for inclusion with the 6.8.0 release.

While testing the writer I noticed a few times that converted files would produce the below exception. This appears to have been due to files being grouped with others I was testing, when the files were moved to a separate directory they opened without exception. This is likely a similar issue that required some of the config files to be moved.

Exception in thread "main" java.lang.ArithmeticException: / by zero
	at loci.formats.MetadataTools.populatePixelsOnly(MetadataTools.java:315)
	at loci.formats.MetadataTools.populateMetadata(MetadataTools.java:262)
	at loci.formats.MetadataTools.populatePixels(MetadataTools.java:151)
	at loci.formats.MetadataTools.populatePixels(MetadataTools.java:116)
	at loci.formats.in.DicomReader.initFile(DicomReader.java:905)

I manually tested the reader across a subset of the new sample files:

CPTAC-LSCC_C3L-00965-26
Dataset opened and displayed without exception, containing 1 series with 8 resolutions.

CPTAC-LSCC_C3L-00965-26_PixelMedConvertedSVSSuppliedPyramid
Dataset opened and displayed without exception, containing 3 series, the first with 3 resolutions, then a thumbnail and overview image. However when attempting to open the file CPTAC-LSCC_C3L-00965-26_DICOMDIR an exception was thrown as below. Based on the config PR this file was set with test=false so I assume it is not expected to be able to open this file.

java.lang.ArithmeticException: / by zero
	at loci.formats.in.DicomReader.initFile(DicomReader.java:719)

LUNG-1-LN^ [LUNG-1-LN]
Each series could be opened and displayed without exceptions. Each had 5 resolutions.

WG26/3DHISTECH/FL_20191127/DICOM_TILED_FULL/056_ATM
The dataset was able to opened and displayed without exception. It contained 4 series. The first series has 10 resolutions then an overview image, another image with 10 resolutions and the final series is labelled REGIONLOCALIZER

WG26/3DHISTECH/FL_20191127/DICOM_TILED_SPARSE/056_ATM
The dataset was able to opened and displayed without exception matching that of the TILED_FULL dataset

WG26/3DHISTECH/FL_20191127/DICOM_TILED_FULL/FluorCell_4
The dataset was able to opened and displayed without exception. It contained 3 series. The first series has 10 resolutions then an image REGIONLOCALIZER followed by a label image.

WG26/3DHISTECH/FL_20191127/DICOM_TILED_SPARSE/FluorCell_4
The dataset was able to opened and displayed without exception matching that of the TILED_FULL dataset

WG26/3DHISTECH/FL_20191127/DICOM_TILED_FULL/Liver_v1
The dataset was able to opened and displayed without exception. It contained 4 series. The first series has 10 resolutions then an overview image, another image with 10 resolutions and the final series is labelled REGIONLOCALIZER

WG26/3DHISTECH/FL_20191127/DICOM_TILED_SPARSE/Liver_v1
The dataset was able to opened and displayed without exception matching that of the TILED_FULL dataset

WG26/3DHISTECH/FL_20191127/DICOM_TILED_FULL/17361-13-FISH
The dataset was able to opened and displayed without exception. It contained 4 series. The first series has 10 resolutions then an overview image, another image with 10 resolutions and the final series is labelled REGIONLOCALIZER

WG26/3DHISTECH/FL_20191127/DICOM_TILED_SPARSE/17361-13-FISH
The dataset was able to opened and displayed without exception matching that of the TILED_FULL dataset

For the writer I tested the bfconvert workflow against a range of our public data.

For the below datasets with multiple timepoints a format exception was thrown. This is as expected:

Exception in thread "main" loci.formats.FormatException: Multiple timepoints not supported
	at loci.formats.out.DicomWriter.setId(DicomWriter.java:335)

For other datasets the conversion was run both with and without the new option. Each of the resulting converted files opened and displayed as expected without any exceptions:

When running the converter using the -no-sequential across a range of the public sample data the resulting files matched those converted without the flag. However when running the converter with the flag and trying to generate resolutions I would run into the below exception (the index is 1 greater than the num planes):

bfconvert -no-sequential -pyramid-resolutions 2 -pyramid-scale 2

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 511 out of bounds for length 511
	at loci.formats.out.PyramidOMETiffWriter.close(PyramidOMETiffWriter.java:128)
	at loci.formats.ImageWriter.close(ImageWriter.java:469)
	at loci.formats.tools.ImageConverter.testConvert(ImageConverter.java:758)
However this error does not appear to be related to the changes in the ImageConverter code or the new option, it simply makes it easier to occur. You can achieve the same error using some of our example code and setting sequential writing to false. This is a bug that should probably be logged as a separate issue.

dgault · 2021-12-02T19:45:35Z

@fedorov, the addition to the version history (which will include the funding acknowledgement) mentioned by Melissa will be opened as a separate PR against https://github.com/ome/bio-formats-documentation. This usually happens once all of the PR's for a given release have been merged and the contents of that PR is then also used as the basis for the public announcements on our website and image.sc etc .

If there are any other areas of documentation you feel need updated please do let me know.

melissalinkert · 2021-12-02T21:10:07Z

Thanks for the thorough review, @dgault.

While testing the writer I noticed a few times that converted files would produce the below exception. This appears to have been due to files being grouped with others I was testing, when the files were moved to a separate directory they opened without exception. This is likely a similar issue that required some of the config files to be moved.

I definitely wouldn't recommend trying to put multiple converted DICOM pyramids in the same directory, since the file grouping is logic is mostly best guess. Would it help to have the writer issue a warning if it will be writing to a non-empty directory?

CPTAC-LSCC_C3L-00965-26_PixelMedConvertedSVSSuppliedPyramid
Dataset opened and displayed without exception, containing 3 series, the first with 3 resolutions, then a thumbnail and overview image. However when attempting to open the file CPTAC-LSCC_C3L-00965-26_DICOMDIR an exception was thrown as below. Based on the config PR this file was set with test=false so I assume it is not expected to be able to open this file.

Correct, I would not expect using the *DICOMDIR file for this particular dataset to work.

dgault · 2021-12-03T11:27:36Z

Thanks Melissa, getting this merged for now for 6.8.0

dclunie · 2023-09-05T20:05:30Z

@melissalinkert does this include support for DICOM z-stack images? E.g., see discussion at news:comp.protocols.dicom.

melissalinkert · 2023-09-12T20:15:38Z

@dclunie : yes, Z stacks are expected to be supported here. Several of the WG26 datasets referenced above include Z stacks, and those are included in nightly regression tests. If you (or anyone else following) have Z stacks that aren't being correctly read or written by Bio-Formats, we'll definitely want to know so that we can investigate the issue.

melissalinkert added 30 commits November 4, 2021 09:10

First pass at some refactoring to expand DICOM support

6b23803

Preserve attribute types (VRs) instead of converting to/from strings

9d1f19e

More refactoring and initial whole slide support

03ac89b

Read each pyramid resolution from a separate file

d7dc07e

Fix several file grouping and minor parsing issues

d3a7754

Support label/overview images and fix a few more file grouping issues

ac74f73

Fix AT VR reading

182965f

A few more file grouping and mapping fixes

15a6fdb

Refactoring to support multiple Z sections and reduce tile loading time

de97180

Add some comments, especially TODOs around known deficiencies

bceaeb1

Fix a bunch of lingering tag reading issues

db48ec7

Initial DICOM writer

7a48782

Very basic support so far, still needs quite a bit of work (see TODOs).

Fix tag sorting and make string value alignment easier

e35be5a

Basic tests similar to "bfconvert test.fake test.dcm" should result in a file that dcdump and ImageMagick can handle.

Small resolution sorting and pixel data offset calculation fixes

1354bf6

Move DicomReader above TiffDelegateReader, to fix DICOM-TIFF files

4acad95

Fix some issues with reading uncompressed tiles

8045be4

Use the specimen identifier to distinguish between pyramids in the sa…

8ffd46c

…me directory

Add support for reading concatenated files

969581a

Add bfconvert TODOs for writing DICOM tile-wise

8cf0d87

Reduce log level for debugging statement

634f5aa

Add writer TODOs and fix a few metadata bugs

d099d6c

Add DICOM tiled writing support in bfconvert

b3fc7e9

"bfconvert -tilex 512 -tiley 512 input output.dcm" should now produce 512x512 tiles as expected.

Fix writing JPEG and JPEG-2000 data

6461daa

The pixel data should now be properly encapsulated as required by the transfer syntax. See http://dicom.nema.org/dicom/2013/output/chtml/part05/sect_A.4.html

Add support for writing a full pyramid with extra images

707626e

Still need to figure out a better way to set the Image Type for extra images though.

Fix Image Type writing and reading

001e6d0

Optical path reading/writing and TILED_SPARSE writing

25debdd

Make setWriteSequentially(...) call configurable in bfconvert

3208d92

-no-sequential won't change the order in which planes/tiles are written, just the writer's assumption of plane/tile order.

Allow more pixel types for uncompressed data, don't allow SizeT > 1

5a4cf2b

Fix up metadata population and a few small lingering issues

f938958

Fix a few edge cases when using DICOMDIR files

e32b115

This was referenced Nov 10, 2021

Allow DICOM writer to accept additional DICOM-specific metadata #3744

Closed

Update readers.txt ordering #3720

Closed

sbesson reviewed Nov 11, 2021

View reviewed changes

Don't allow IllegalArgumentException to be thrown in isThisType

f75b8dc

sbesson reviewed Nov 16, 2021

View reviewed changes

melissalinkert added 3 commits November 16, 2021 14:57

Fix some edge cases in DICOM isThisType checks

84e1a80

Fix original metadata list handling

fc13799

Move DICOM helper classes to new loci.formats.dicom package

625f8cc

dgault added this to the 6.8.0 milestone Nov 24, 2021

Merge branch 'develop' of github.com:openmicroscopy/bioformats into d…

4c1f33c

…icom-wsi-develop

dgault merged commit 4485dae into ome:develop Dec 3, 2021

dgault mentioned this pull request Dec 7, 2021

Format documentation updates for 6.8.0 ome/bio-formats-documentation#215

Merged

melissalinkert deleted the dicom-wsi-develop branch January 20, 2022 16:47

melissalinkert mentioned this pull request Jul 27, 2023

DICOM writer: initial support for dual personality DICOM/TIFF #3911

Merged

melissalinkert mentioned this pull request Sep 14, 2023

shortLut causes NullPointException #3450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reading and writing DICOM whole slide images #3742

Support reading and writing DICOM whole slide images #3742

melissalinkert commented Nov 10, 2021

sbesson Nov 11, 2021

melissalinkert Nov 17, 2021

fedorov commented Nov 12, 2021

melissalinkert commented Nov 15, 2021

fedorov commented Nov 15, 2021

sbesson Nov 16, 2021

melissalinkert Nov 17, 2021

snoopycrimecop commented Nov 18, 2021

snoopycrimecop commented Nov 19, 2021

snoopycrimecop commented Nov 25, 2021 •

edited

Loading

fedorov commented Nov 30, 2021

fedorov commented Nov 30, 2021 •

edited

Loading

dgault commented Dec 2, 2021

dgault commented Dec 2, 2021

melissalinkert commented Dec 2, 2021

dgault commented Dec 3, 2021

dclunie commented Sep 5, 2023

melissalinkert commented Sep 12, 2023

Support reading and writing DICOM whole slide images #3742

Support reading and writing DICOM whole slide images #3742

Conversation

melissalinkert commented Nov 10, 2021

sbesson Nov 11, 2021

Choose a reason for hiding this comment

melissalinkert Nov 17, 2021

Choose a reason for hiding this comment

fedorov commented Nov 12, 2021

melissalinkert commented Nov 15, 2021

fedorov commented Nov 15, 2021

sbesson Nov 16, 2021

Choose a reason for hiding this comment

melissalinkert Nov 17, 2021

Choose a reason for hiding this comment

snoopycrimecop commented Nov 18, 2021

snoopycrimecop commented Nov 19, 2021

snoopycrimecop commented Nov 25, 2021 • edited Loading

fedorov commented Nov 30, 2021

fedorov commented Nov 30, 2021 • edited Loading

dgault commented Dec 2, 2021

dgault commented Dec 2, 2021

melissalinkert commented Dec 2, 2021

dgault commented Dec 3, 2021

dclunie commented Sep 5, 2023

melissalinkert commented Sep 12, 2023

snoopycrimecop commented Nov 25, 2021 •

edited

Loading

fedorov commented Nov 30, 2021 •

edited

Loading