|
| 1 | +# Metadata attributes in `idc-index`'s index tables |
| 2 | + |
| 3 | +## `index` |
| 4 | + |
| 5 | +The following is the list of the columns included in `index`. You can use those |
| 6 | +to select cohorts and subsetting data. `index` is series-based, i.e, it has one |
| 7 | +row per DICOM series. |
| 8 | + |
| 9 | +- non-DICOM attributes assigned/curated by IDC: |
| 10 | + |
| 11 | + - `collection_id`: short string with the identifier of the collection the |
| 12 | + series belongs to |
| 13 | + - `analysis_result_id`: this string is not empty if the specific series is |
| 14 | + part of an analysis results collection; analysis results can be added to a |
| 15 | + given collection over time |
| 16 | + - `source_DOI`: Digital Object Identifier of the dataset that contains the |
| 17 | + given series; note that a given collection can include one or more DOIs, |
| 18 | + since analysis results added to the collection would typically have |
| 19 | + independent DOI values! |
| 20 | + - `instanceCount`: number of files in the series (typically, this matches the |
| 21 | + number of slices in cross-sectional modalities) |
| 22 | + - `license_short_name`: short name of the license that governs the use of the |
| 23 | + files corresponding to the series |
| 24 | + - `series_aws_url`: location of the series files in a public AWS bucket |
| 25 | + - `series_size_MB`: total disk size needed to store the series |
| 26 | + |
| 27 | +- DICOM attributes extracted from the files |
| 28 | + - `PatientID`: identifier of the patient |
| 29 | + - `PatientAge` and `PatientSex`: attributes containing patient age and sex |
| 30 | + - `StudyInstanceUID`: unique identifier of the DICOM study |
| 31 | + - `StudyDescription`: textual description of the study content |
| 32 | + - `StudyDate`: date of the study (note that those dates are shifted, and are |
| 33 | + not real dates when images were acquired, to protect patient privacy) |
| 34 | + - `SeriesInstanceUID`: unique identifier of the DICOM series |
| 35 | + - `SeriesDate`: date when the series was acquired |
| 36 | + - `SeriesDescription`: textual description of the series content |
| 37 | + - `SeriesNumber`: series number |
| 38 | + - `BodyPartExamined`: body part imaged |
| 39 | + - `Modality`: acquisition modality |
| 40 | + - `Manufacturer`: manufacturer of the equipment that generated the series |
| 41 | + - `ManufacturerModelName`: model name of the equipment |
| 42 | + |
| 43 | +## `sm_index` |
| 44 | + |
| 45 | +The following is the list of the columns included in `sm_index`. `sm_index` is |
| 46 | +series-based, i.e, it has one row per DICOM series, but only includes series |
| 47 | +with slide microscopy data. |
| 48 | + |
| 49 | +- DICOM attributes extracted from the files: |
| 50 | + - `SeriesInstanceUID`: unique identifier of the DICOM series: one DICOM series |
| 51 | + = one slide |
| 52 | + - `embeddingMedium`: describes in what medium the slide was embedded before |
| 53 | + the image was obtained |
| 54 | + - `tissueFixative`: describes tissue fixatives used before the image was |
| 55 | + obtained |
| 56 | + - `staining_usingSubstance`: describes staining steps the specimen underwent |
| 57 | + before the image was obtained |
| 58 | + - `max_TotalPixelMatrixColumns`: width of the image at the maximum resolution |
| 59 | + - `max_TotalMatrixRows`: height of the image at the maximum resolution |
| 60 | + - `min_PixelSpacing_2sf`: pixel spacing in mm at the maximum resolution layer, |
| 61 | + rounded to 2 significant figures |
| 62 | + - `ObjectiveLensPower`: power of the objective lens of the equipment used to |
| 63 | + digitize the slide |
| 64 | + - `primaryAnatomicStructure`: anatomic location from where the imaged specimen |
| 65 | + was collected |
| 66 | + - `primaryAnatomicStructureModifier`: additional characteristics of the |
| 67 | + specimen, such as whether it is a tumor or normal tissue |
| 68 | + - `illuminationType`: specifies the type of illumination used when obtaining |
| 69 | + the image |
| 70 | + |
| 71 | +In case of `embeddingMedium`, `tissueFixative`, `staining_usingSubstance`, |
| 72 | +`primaryAnatomicStructure`, `primaryAnatomicStructureModifier` and |
| 73 | +`illuminationType` the attributes exist with suffix `_code_designator_value_str` |
| 74 | +and `_CodeMeaning`, which indicates whether the column contains |
| 75 | +CodeSchemeDesignator and CodeValue, or CodeMeaning. If this is new to you, a |
| 76 | +brief explanation on the three-value based coding scheme in DICOM can be found |
| 77 | +at https://learn.canceridc.dev/dicom/coding-schemes. |
| 78 | + |
| 79 | +## `sm_instance_index` |
| 80 | + |
| 81 | +The following is the list of the columns included in `sm_instance_index`. |
| 82 | +`sm_instance_index` is instance-based, i.e, it has one row per DICOM instance |
| 83 | +(pyramid level of a slide, plus potentially thumbnail or label images), but only |
| 84 | +includes DICOM instances of the slide microscopy modality. |
| 85 | + |
| 86 | +- DICOM attributes extracted from the files: |
| 87 | + |
| 88 | + - `SOPInstanceUID`: unique identifier of the DICOM instance: one DICOM |
| 89 | + instance = one level/label/thumbnail image of the slide |
| 90 | + - `SeriesInstanceUID`: unique identifier of the DICOM series: one DICOM series |
| 91 | + = one slide |
| 92 | + - `embeddingMedium`: describes in what medium the slide was embedded before |
| 93 | + the image was obtained |
| 94 | + - `tissueFixative`: describes tissue fixatives used before the image was |
| 95 | + obtained |
| 96 | + - `staining_usingSubstance`: describes staining steps the specimen underwent |
| 97 | + before the image was obtained |
| 98 | + - `max_TotalPixelMatrixColumns`: width of the image at the maximum resolution |
| 99 | + - `max_TotalMatrixRows`: height of the image at the maximum resolution |
| 100 | + - `PixelSpacing_0`: pixel spacing in mm |
| 101 | + - `ImageType`: specifies further characteristics of the image in a list, |
| 102 | + including as the third value whether it is a VOLUME, LABEL, OVERVIEW or |
| 103 | + THUMBNAIL image. |
| 104 | + - `TransferSyntaxUID`: specifies the encoding scheme used for the image data |
| 105 | + - `instance_size`: specifies the DICOM instance's size in bytes |
| 106 | + |
| 107 | +- non-DICOM attributes assigned/curated by IDC: |
| 108 | + - `crdc_instance_uuid`: globally unique, versioned identifier of the DICOM |
| 109 | + instance |
| 110 | + |
| 111 | +In case of `embeddingMedium`, `tissueFixative`, and `staining_usingSubstance` |
| 112 | +the attributes exist with suffix `_code_designator_value_str` and |
| 113 | +`_CodeMeaning`, which indicates whether the column contains CodeSchemeDesignator |
| 114 | +and CodeValue, or CodeMeaning. If this is new to you, a brief explanation on the |
| 115 | +three-value based coding scheme in DICOM can be found at |
| 116 | +https://learn.canceridc.dev/dicom/coding-schemes. |
0 commit comments