-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading and writing DICOM whole slide images #3742
Conversation
Very basic support so far, still needs quite a bit of work (see TODOs).
Basic tests similar to "bfconvert test.fake test.dcm" should result in a file that dcdump and ImageMagick can handle.
"bfconvert -tilex 512 -tiley 512 input output.dcm" should now produce 512x512 tiles as expected.
The pixel data should now be properly encapsulated as required by the transfer syntax. See http://dicom.nema.org/dicom/2013/output/chtml/part05/sect_A.4.html
Still need to figure out a better way to set the Image Type for extra images though.
-no-sequential won't change the order in which planes/tiles are written, just the writer's assumption of plane/tile order.
There are a few compromises here. One is that the DicomAttribute enum does not contain the entire dictionary, but a subset that is most likely to be explicitly referenced in the reader/writer. The very large size of the dictionary combined with Java's limitations on bytecode size means that a single enum with the whole dictionary is not possible. The other compromise is that the original metadata table does not contain every tag in every file of a dataset, as that can quickly lead to memory exhaustion (especially once the table is included in OME-XML). DicomReader now has a "List<DicomTag> getTags()" method that can be used to access the tag hierarchy directly, should an application need to do so.
return 8; | ||
} | ||
default: | ||
throw new IllegalArgumentException(vr.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result of the DICOM reader priority increase in readers.txt
, this exception is now being thrown across several file formats during the isThisType
check.
See e.g. bd-pathway, deltavision for examples of affected filesets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@melissalinkert would it be possible to acknowledge that development of this functionality was supported by NCI Imaging Data Commons, and include the wording for the funding source? |
@fedorov : we can certainly add a funding acknowledgement. Typically we do this in the release notes via a pull request to https://github.com/ome/bio-formats-documentation. An example is the https://docs.openmicroscopy.org/bio-formats/6.7.0/about/whats-new.html#december If you have specific acknowledgement text you prefer, or a line in the release notes is insufficient, just let us know. |
Melissa, thank you for the pointer, this is very helpful. Let me check with your project leads at NCI/Leidos what is their preference, and I will get back to you towards the end of the week. |
@@ -65,78 +66,30 @@ | |||
import ome.units.quantity.Length; | |||
import ome.units.UNITS; | |||
|
|||
import static loci.formats.in.DicomAttribute.*; | |||
import static loci.formats.in.DicomVR.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned during the weekly Formats discussion, immediate question is whether this API handling the specifics of the DICOM specification should be managed in a dedicated package similarly to loci.formats.tiff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
625f8cc reorganizes into a new loci.formats.dicom
package.
Conflicting PR. Removed from build BIOFORMATS-push#1041. See the console output for more details.
--conflicts |
Conflicting PR. Removed from build BIOFORMATS-push#1042. See the console output for more details.
--conflicts |
Conflicting PR. Removed from build BIOFORMATS-push#1049. See the console output for more details.
|
Melissa, I am not sure how documentation updates are handled - would that be done in a separate repo after this PR is merged, to document improvements to DICOM support? |
Melissa, regarding the acknowledgment, I checked with our federal leads, and they would like to have the following acknowledgment, if possible: "This project has been funded in whole or in part with Federal funds from the National Cancer Institute, cc: @ulrikew My personal wish would also be to hyperlink "Imaging Data Commons", so I suggest the following wording:
|
Builds and tests have been green with this PR included and the merge-ci repo tests are now green also with the associated config. Overall the PR looks to be in good shape and suitable for inclusion with the 6.8.0 release. While testing the writer I noticed a few times that converted files would produce the below exception. This appears to have been due to files being grouped with others I was testing, when the files were moved to a separate directory they opened without exception. This is likely a similar issue that required some of the config files to be moved.
I manually tested the reader across a subset of the new sample files:
For the writer I tested the bfconvert workflow against a range of our public data. For the below datasets with multiple timepoints a format exception was thrown. This is as expected:
For other datasets the conversion was run both with and without the new option. Each of the resulting converted files opened and displayed as expected without any exceptions:
When running the converter using the -no-sequential across a range of the public sample data the resulting files matched those converted without the flag. However when running the converter with the flag and trying to generate resolutions I would run into the below exception (the index is 1 greater than the num planes):
|
@fedorov, the addition to the version history (which will include the funding acknowledgement) mentioned by Melissa will be opened as a separate PR against https://github.com/ome/bio-formats-documentation. This usually happens once all of the PR's for a given release have been merged and the contents of that PR is then also used as the basis for the public announcements on our website and image.sc etc . If there are any other areas of documentation you feel need updated please do let me know. |
Thanks for the thorough review, @dgault.
I definitely wouldn't recommend trying to put multiple converted DICOM pyramids in the same directory, since the file grouping is logic is mostly best guess. Would it help to have the writer issue a warning if it will be writing to a non-empty directory?
Correct, I would not expect using the |
Thanks Melissa, getting this merged for now for 6.8.0 |
@melissalinkert does this include support for DICOM z-stack images? E.g., see discussion at news:comp.protocols.dicom. |
@dclunie : yes, Z stacks are expected to be supported here. Several of the WG26 datasets referenced above include Z stacks, and those are included in nightly regression tests. If you (or anyone else following) have Z stacks that aren't being correctly read or written by Bio-Formats, we'll definitely want to know so that we can investigate the issue. |
Significantly reworks the DICOM reader and adds a new writer so that WSI data is supported.
Relevant test data is in
inbox/dicom-wsi
. There are a ton of datasets there, but in general the newer sets (LUNG...
,pixelmed-converted
,WG26/3DHISTECH
,WG26/Leica_GT450
) will be better for testing by hand. Most are copied fromftp://dicom.nema.org/MEDICAL/Dicom/DataSets/WG26
, so publicly available.Whole slide datasets will have one or more files per resolution, with optional additional files that contain the label, overview, etc. Multiple pyramids per dataset are possible if an extended depth of field image was created from a Z stack. Both brightfield and fluorescence slides are supported.
Converting non-DICOM slides to DICOM is supported by the new
DicomWriter
, using typicalbfconvert
options. The expected output is one file per resolution or extra (label/macro/etc.) image. Some changes were needed inImageConverter
to make the tile writing logic less TIFF-specific.There is also a new
bfconvert
option to turn off the sequential-writing assumption (3208d92). This doesn't change the actual tile writing order, just the assumption that the writer uses. For DICOM, this is helpful for testing the two tile position storage modes:TILED_FULL
andTILED_SPARSE
. The former implicitly defines the positions; all tiles are stored, and left-to-right, top-to-bottom. The latter explicitly records positions, which results in a larger file (more metadata), but allows missing tiles and/or non-sequential storage.There is obviously a lot to go through here, so happy to have a discussion if that's helpful.
This will definitely impact memo files so should not be considered for a patch release. Most test issues will be addressed in a forthcoming config PR, and are primarily updates to image names/descriptions and a few pixel hash changes due to differences in file sorting (by metadata now instead of by file name/pattern). There are a couple of specific files which will still cause failures, and a proposed solution will be noted on the config PR.
I have it on my to-do list to go through outstanding DICOM issues to verify whether or not this helps. In particular: