The following is the list of the columns included in index
. You can use those
to select cohorts and subsetting data. index
is series-based, i.e, it has one
row per DICOM series.
-
non-DICOM attributes assigned/curated by IDC:
collection_id
: short string with the identifier of the collection the series belongs toanalysis_result_id
: this string is not empty if the specific series is part of an analysis results collection; analysis results can be added to a given collection over timesource_DOI
: Digital Object Identifier of the dataset that contains the given series; note that a given collection can include one or more DOIs, since analysis results added to the collection would typically have independent DOI values!instanceCount
: number of files in the series (typically, this matches the number of slices in cross-sectional modalities)license_short_name
: short name of the license that governs the use of the files corresponding to the seriesseries_aws_url
: location of the series files in a public AWS bucketseries_size_MB
: total disk size needed to store the series
-
DICOM attributes extracted from the files
PatientID
: identifier of the patientPatientAge
andPatientSex
: attributes containing patient age and sexStudyInstanceUID
: unique identifier of the DICOM studyStudyDescription
: textual description of the study contentStudyDate
: date of the study (note that those dates are shifted, and are not real dates when images were acquired, to protect patient privacy)SeriesInstanceUID
: unique identifier of the DICOM seriesSeriesDate
: date when the series was acquiredSeriesDescription
: textual description of the series contentSeriesNumber
: series numberBodyPartExamined
: body part imagedModality
: acquisition modalityManufacturer
: manufacturer of the equipment that generated the seriesManufacturerModelName
: model name of the equipment
The following is the list of the columns included in sm_index
. sm_index
is
series-based, i.e, it has one row per DICOM series, but only includes series
with slide microscopy data.
- DICOM attributes extracted from the files:
SeriesInstanceUID
: unique identifier of the DICOM series: one DICOM series = one slideembeddingMedium
: describes in what medium the slide was embedded before the image was obtainedtissueFixative
: describes tissue fixatives used before the image was obtainedstaining_usingSubstance
: describes staining steps the specimen underwent before the image was obtainedmax_TotalPixelMatrixColumns
: width of the image at the maximum resolutionmax_TotalMatrixRows
: height of the image at the maximum resolutionmin_PixelSpacing_2sf
: pixel spacing in mm at the maximum resolution layer, rounded to 2 significant figuresObjectiveLensPower
: power of the objective lens of the equipment used to digitize the slideprimaryAnatomicStructure
: anatomic location from where the imaged specimen was collectedprimaryAnatomicStructureModifier
: additional characteristics of the specimen, such as whether it is a tumor or normal tissueilluminationType
: specifies the type of illumination used when obtaining the image
In case of embeddingMedium
, tissueFixative
, staining_usingSubstance
,
primaryAnatomicStructure
, primaryAnatomicStructureModifier
and
illuminationType
the attributes exist with suffix _code_designator_value_str
and _CodeMeaning
, which indicates whether the column contains
CodeSchemeDesignator and CodeValue, or CodeMeaning. If this is new to you, a
brief explanation on the three-value based coding scheme in DICOM can be found
at https://learn.canceridc.dev/dicom/coding-schemes.
The following is the list of the columns included in sm_instance_index
.
sm_instance_index
is instance-based, i.e, it has one row per DICOM instance
(pyramid level of a slide, plus potentially thumbnail or label images), but only
includes DICOM instances of the slide microscopy modality.
-
DICOM attributes extracted from the files:
SOPInstanceUID
: unique identifier of the DICOM instance: one DICOM instance = one level/label/thumbnail image of the slideSeriesInstanceUID
: unique identifier of the DICOM series: one DICOM series = one slideembeddingMedium
: describes in what medium the slide was embedded before the image was obtainedtissueFixative
: describes tissue fixatives used before the image was obtainedstaining_usingSubstance
: describes staining steps the specimen underwent before the image was obtainedmax_TotalPixelMatrixColumns
: width of the image at the maximum resolutionmax_TotalMatrixRows
: height of the image at the maximum resolutionPixelSpacing_0
: pixel spacing in mmImageType
: specifies further characteristics of the image in a list, including as the third value whether it is a VOLUME, LABEL, OVERVIEW or THUMBNAIL image.TransferSyntaxUID
: specifies the encoding scheme used for the image datainstance_size
: specifies the DICOM instance's size in bytes
-
non-DICOM attributes assigned/curated by IDC:
crdc_instance_uuid
: globally unique, versioned identifier of the DICOM instance
In case of embeddingMedium
, tissueFixative
, and staining_usingSubstance
the attributes exist with suffix _code_designator_value_str
and
_CodeMeaning
, which indicates whether the column contains CodeSchemeDesignator
and CodeValue, or CodeMeaning. If this is new to you, a brief explanation on the
three-value based coding scheme in DICOM can be found at
https://learn.canceridc.dev/dicom/coding-schemes.