Skip to content

NagaComBio/TiNDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R build status

TiNDA

Tumor in Normal Detection Analysis

Overview

This is an R package to rescue somatic variants called as germline due to tumor DNA contamination in the patient's blood/control sample.

TiNDA makes use of the Canopy's EM-cluster function to partition the variants into different clusters. And uses the following assumptions to define these clusters into somatic and germline.

Based on the following assumptions:

  1. The variant allele frequency (VAF) of somatic variants in tumor samples will be higher than contaminated somatic variants in the control sample.
  2. The contamination exceeding a certain threshold (max_control_af: 0.25) will be difficult to separate from the germline VAF.

An area of interest (AOI) is defined in the control vs tumor VAF 2D space. Clusters with a majority (min_clst_members: 0.85) of its members within this AOI are defined as 'omatic rescue'.

Area of Interest

In the tumor VAF vs control VAF, the AOI for somatic and ChiP variants are defined in the following image. The "golden" polygon defines the somatic region, and the "red" polygon defines the ChiP region, with the rest of the areas defining germline variants.

AOI

Key Features

  • Rescuing Misclassified Variants: TiNDA rescues somatic variants that are misclassified due to tumor-in-normal contamination.
  • Detecting CHiP Clusters: TiNDA identifies CHiP clusters by distinguishing germline variants from genuine somatic mutations in blood.
  • Visualization: TiNDA provides visualization tools to help users assess quality of the clustering.

Installation

Install directly from the GitHub

devtools::install_github("nagacombio/tinda")

Usage

Workflow

The TiNDA input consists of read counts for rare and private variants, including both germline and somatic variants. These variants should be identified through the joint analysis of tumor and control samples, and they must be filtered to remove common SNPs and technical artifacts. If the dataset is still too large and to expedite clustering and plotting, consider using only exonic variants.

An ideal workflow with TiNDA:

TiNDA workflow

Input data format

The input data for TiNDA is a data frame containing the following information/columns,

  • CHR - Chromosome name
  • POS - Variant position
  • Control_ALT_DP - Read depth of the variant's alternate allele in the control sample
  • Control_DP - Total read depth of the variant in the control sample
  • Tumor_ALT_DP - Read depth of the variant's alternate allele in the tumor sample
  • Tumor_DP - Total read_depth of the variant in the tumor sample

Note: Keep the column names in the input table.

An example table,

CHR POS Control_ALT_DP Control_DP Tumor_ALT_DP Tumor_DP
1 1039001 20 40 23 46
1 2123023 12 32 14 23
1 3343543 23 56 34 67

Example TiNDA analysis

# Generate data to test the package
library(TiNDA)
data(hg19_length)
test_df <- generate_test_data(hg19_length, num_variants = 500)

Run the TiNDA function

# Check the documentation for the paramaters
tinda_object <- TiNDA(test_df)

Plotting the results

# Plot the results of the canopy cluster analysis
canopy_clst_plot(tinda_object)

canopy_clst_plot

# Plot the TiNDA cluster assignment
tinda_clst_plot(tinda_object)

tinda_clst_plot

# Plot the linear plot of the TiNDA results
tinda_linear_plot(tinda_object)

tinda_linear_plot

# Plot the summary of the TiNDA results - includes canopy clusters, TiNDA cluster assignment and linear plots
tinda_summary_plot(tinda_object)

tinda_summary_plot