Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monocle3 as input to tradeSeq #247

Open
jiangzh-coder opened this issue Jan 5, 2024 · 3 comments
Open

Monocle3 as input to tradeSeq #247

jiangzh-coder opened this issue Jan 5, 2024 · 3 comments

Comments

@jiangzh-coder
Copy link

Hi All,

I am trying to use Monocle3 as input to tradeSeq. But i met following error:

sce <- tradeSeq::fitGAM(counts = counts,
pseudotime = pseudotime,
cellWeights = cellWeights)
#Error in .assignCells(cellWeights) :

Some cells have no positive cell weights.

Full codes were as following;
-----------https://statomics.github.io/tradeSeq/articles/Monocle.html

#########Extracting the pseudotimes and cell weights for tradeSeq
library(tradeSeq)
library(RColorBrewer)
library(SingleCellExperiment)

For reproducibility

palette(brewer.pal(8, "Dark2"))
#counts <- as.matrix(countMatrix)
#data(celltype, package = "tradeSeq")
library(magrittr)

note the colnames(counts) must match rownames(meta)

data <- GetAssayData(scRNA66, assay = 'RNA', slot = 'counts')
cell_metadata <- [email protected]
gene_annotation <- data.frame(gene_short_name = rownames(data))
rownames(gene_annotation) <- rownames(data)
cds <- new_cell_data_set(expression_data = as.matrix(data),
cell_metadata = cell_metadata,
gene_metadata = gene_annotation)

access the gene names

rownames(cds)[1:5]

access the meta data

head(colData(cds))

access the assay data

head(assay(cds)[,1:3])

preprocess_cds does both 1) normalization and 2) preliminary dimension reduction. By default 1) gene expression count for each cell is divided by the total counts of that cell, multiplied by a scale factor, and log-transformed. This is done to address differences in sequencing depths for different cells. 2) PCA is used to calculate a lower dimensional space that will be used as the input for downstream steps like UMAP visualization.

cds <- preprocess_cds(cds,
num_dim = 30)

look at the percentage of variance explained by our principal components

plot_pc_variance_explained(cds)

Note here we are using the results of PCA dimension reduction in the last step and visualizing the results in 2 dimensions.

umap.fast_sgd=FALSE and cores = 1 are needed for reproducible results

cds <- reduce_dimension(cds,
preprocess_method = 'PCA',
umap.fast_sgd=FALSE,
cores = 1)

let's take another look at our cell data set object

cds

let's examine our new reducedDimNames slot!

head(reducedDims(cds)$UMAP)

view the cell types in our UMAP plot

plot_cells(cds,
color_cells_by = "celltype",
show_trajectory_graph = FALSE,
group_label_size = 3,
cell_size = 0.5)

Grouping cells into clusters is an important step in identifying the cell types represented in your data. Monocle uses a technique called community detection to group cells into cluster and partitions.

cds <- cluster_cells(cds,
k = 20,
partition_qval = 0.05)

plot_cells(cds,
color_cells_by = "partition",
show_trajectory_graph = FALSE,
group_label_size = 3,
cell_size = 0.5)

additionally we will run learn_graph to calculate trajectories on our subset!

cds <- learn_graph(cds,
use_partition = FALSE,
close_loop = TRUE)

now that we have subset out data, re-run our monocle workflow, and calculated

trajectories, let's see where we should set our root node!

plot_cells(cds,
color_cells_by = "celltype",
cell_size = 0.5,
labels_per_group = 0)

cluster.before.traj <-plot_cells(cds,
color_cells_by = "celltype",
label_groups_by_cluster = T,
group_label_size = 5) +
theme(legend.position = "right")

cluster.before.traj

head(colData(cds))
table(colData(cds)$celltype)

a helper function to identify the root principal points:

get_earliest_principal_node <- function(cds, celltype="C2_immNK1"){
cell_ids <- which(colData(cds)[, "celltype"] == "C2_immNK1")

closest_vertex <-
cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
root_pr_nodes <-
igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names
(which.max(table(closest_vertex[cell_ids,]))))]

root_pr_nodes
}

cds = order_cells(cds, root_pr_nodes=get_earliest_principal_node(cds))

#cds <- order_cells(cds, reduction_method = "UMAP",

root_cells = colnames(cds[, clusters(cds) == 5]))

plot_cells(cds, color_cells_by = "pseudotime", label_groups_by_cluster = T,
label_branch_points = T, label_roots = F, label_leaves = F)

plot pseudo time after choosing root nodes

(defined as the bottom in the cycling progenitors cell type)

plot_cells(cds,
show_trajectory_graph = T,
color_cells_by = "pseudotime",
graph_label_size=3,
cell_size = 0.5)
plot_cells(cds,
show_trajectory_graph = T,
color_cells_by = "celltype",
graph_label_size=3,
cell_size = 0.5)
##############

how are our wild-type and mutant cells distributed in this graph?

plot_cells(cds_2,
color_cells_by = "treat",
show_trajectory_graph = F,
label_cell_groups = F,
cell_size = 0.5)

cds@colData$pseudotime=pseudotime(cds)

We find all the cells that are close to the starting point

cell_ids <- colnames(cds)[cds@colData$celltype == "C2_immNK1"]
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
closest_vertex <- closest_vertex[cell_ids, ]
closest_vertex <- as.numeric(names(which.max(table(closest_vertex))))
mst <- principal_graph(cds)$UMAP
root_pr_nodes <- igraph::V(mst)$name[closest_vertex]

We compute the trajectory

cds <- order_cells(cds, root_pr_nodes = root_pr_nodes)
plot_cells(cds, color_cells_by = "pseudotime")

library(magrittr)

Get the closest vertice for every cell

y_to_cells <- principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex %>%
as.data.frame()
y_to_cells$cells <- rownames(y_to_cells)
y_to_cells$Y <- y_to_cells$V1

Get the root vertices

It is the same node as above

root <- cds@principal_graph_aux$UMAP$root_pr_nodes

Get the other endpoints

endpoints <- names(which(igraph::degree(mst) == 1))
endpoints <- endpoints[!endpoints %in% root]

For each endpoint

cellWeights <- lapply(endpoints, function(endpoint) {

We find the path between the endpoint and the root

path <- igraph::shortest_paths(mst, root, endpoint)$vpath[[1]]
path <- as.character(path)

We find the cells that map along that path

df <- y_to_cells[y_to_cells$Y %in% path, ]
df <- data.frame(weights = as.numeric(colnames(cds) %in% df$cells))
colnames(df) <- endpoint
return(df)
}) %>% do.call(what = 'cbind', args = .) %>%
as.matrix()
rownames(cellWeights) <- colnames(cds)
pseudotime <- matrix(pseudotime(cds), ncol = ncol(cellWeights),
nrow = ncol(cds), byrow = FALSE)

counts <- as.matrix(data)
counts <- as.matrix(Biobase::exprs(cds))
sce <- tradeSeq::fitGAM(counts = counts,
pseudotime = pseudotime,
cellWeights = cellWeights)
#Error in .assignCells(cellWeights) :

Some cells have no positive cell weights.

i will be greatly appreciated if you can help me!

best
jiang

@koenvandenberge
Copy link
Member

Hi @jiangzh-coder

Could you please check how cellWeights looks like? The error message is suggesting that some cells have either all zero or negative weights. This does not allow us to assign cells to lineages.
Perhaps check how often this occurs, and where these cells may reside within your trajectory can be helpful.

@mahwish2
Copy link

Hi, sorry, If I should not enter this thread, but I am having same error following the mentioned vignette. I sure have zero weights in my dataset frequently. I am wondering how to sort his out. My data looks like this
Y_1 Y_2 Y_71 Y_94 Y_96
Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
Median :0.00000 Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
Mean :0.06106 Mean :0.2091 Mean :0.1103 Mean :0.04645 Mean :0.2709
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
Y_97 Y_112 Y_117 Y_193 Y_267
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.1771 Mean :0.1873 Mean :0.2127 Mean :0.1709 Mean :0.1776
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000

this is summary of cell weights

@koenvandenberge
Copy link
Member

Hi @mahwish2

It is fine to have weights of zero, but the sum of the weights across all lineages should not be zero. In that case, we cannot assign a cell to any lineage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants