Monocle3 as input to tradeSeq #247

jiangzh-coder · 2024-01-05T20:33:46Z

Hi All,

I am trying to use Monocle3 as input to tradeSeq. But i met following error:

sce <- tradeSeq::fitGAM(counts = counts,
pseudotime = pseudotime,
cellWeights = cellWeights)
#Error in .assignCells(cellWeights) :

Some cells have no positive cell weights.

Full codes were as following；
-----------https://statomics.github.io/tradeSeq/articles/Monocle.html

#########Extracting the pseudotimes and cell weights for tradeSeq
library(tradeSeq)
library(RColorBrewer)
library(SingleCellExperiment)

For reproducibility

palette(brewer.pal(8, "Dark2"))
#counts <- as.matrix(countMatrix)
#data(celltype, package = "tradeSeq")
library(magrittr)

note the colnames(counts) must match rownames(meta)

data <- GetAssayData(scRNA66, assay = 'RNA', slot = 'counts')
cell_metadata <- [email protected]
gene_annotation <- data.frame(gene_short_name = rownames(data))
rownames(gene_annotation) <- rownames(data)
cds <- new_cell_data_set(expression_data = as.matrix(data),
cell_metadata = cell_metadata,
gene_metadata = gene_annotation)

access the gene names

rownames(cds)[1:5]

access the meta data

head(colData(cds))

access the assay data

head(assay(cds)[,1:3])

preprocess_cds does both 1) normalization and 2) preliminary dimension reduction. By default 1) gene expression count for each cell is divided by the total counts of that cell, multiplied by a scale factor, and log-transformed. This is done to address differences in sequencing depths for different cells. 2) PCA is used to calculate a lower dimensional space that will be used as the input for downstream steps like UMAP visualization.

cds <- preprocess_cds(cds,
num_dim = 30)

look at the percentage of variance explained by our principal components

plot_pc_variance_explained(cds)

Note here we are using the results of PCA dimension reduction in the last step and visualizing the results in 2 dimensions.

umap.fast_sgd=FALSE and cores = 1 are needed for reproducible results

cds <- reduce_dimension(cds,
preprocess_method = 'PCA',
umap.fast_sgd=FALSE,
cores = 1)

let's take another look at our cell data set object

cds

let's examine our new `reducedDimNames` slot!

head(reducedDims(cds)$UMAP)

view the cell types in our UMAP plot

plot_cells(cds,
color_cells_by = "celltype",
show_trajectory_graph = FALSE,
group_label_size = 3,
cell_size = 0.5)

Grouping cells into clusters is an important step in identifying the cell types represented in your data. Monocle uses a technique called community detection to group cells into cluster and partitions.

cds <- cluster_cells(cds,
k = 20,
partition_qval = 0.05)

plot_cells(cds,
color_cells_by = "partition",
show_trajectory_graph = FALSE,
group_label_size = 3,
cell_size = 0.5)

additionally we will run learn_graph to calculate trajectories on our subset!

cds <- learn_graph(cds,
use_partition = FALSE,
close_loop = TRUE)

now that we have subset out data, re-run our monocle workflow, and calculated

trajectories, let's see where we should set our root node!

plot_cells(cds,
color_cells_by = "celltype",
cell_size = 0.5,
labels_per_group = 0)

cluster.before.traj <-plot_cells(cds,
color_cells_by = "celltype",
label_groups_by_cluster = T,
group_label_size = 5) +
theme(legend.position = "right")

cluster.before.traj

head(colData(cds))
table(colData(cds)$celltype)

a helper function to identify the root principal points:

get_earliest_principal_node <- function(cds, celltype="C2_immNK1"){
cell_ids <- which(colData(cds)[, "celltype"] == "C2_immNK1")

closest_vertex <-
cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
root_pr_nodes <-
igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names
(which.max(table(closest_vertex[cell_ids,]))))]

root_pr_nodes
}

cds = order_cells(cds, root_pr_nodes=get_earliest_principal_node(cds))

#cds <- order_cells(cds, reduction_method = "UMAP",

root_cells = colnames(cds[, clusters(cds) == 5]))

plot_cells(cds, color_cells_by = "pseudotime", label_groups_by_cluster = T,
label_branch_points = T, label_roots = F, label_leaves = F)

plot pseudo time after choosing root nodes

(defined as the bottom in the cycling progenitors cell type)

plot_cells(cds,
show_trajectory_graph = T,
color_cells_by = "pseudotime",
graph_label_size=3,
cell_size = 0.5)
plot_cells(cds,
show_trajectory_graph = T,
color_cells_by = "celltype",
graph_label_size=3,
cell_size = 0.5)
##############

how are our wild-type and mutant cells distributed in this graph?

plot_cells(cds_2,
color_cells_by = "treat",
show_trajectory_graph = F,
label_cell_groups = F,
cell_size = 0.5)

cds@colData$pseudotime=pseudotime(cds)

We find all the cells that are close to the starting point

cell_ids <- colnames(cds)[cds@colData$celltype == "C2_immNK1"]
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
closest_vertex <- closest_vertex[cell_ids, ]
closest_vertex <- as.numeric(names(which.max(table(closest_vertex))))
mst <- principal_graph(cds)$UMAP
root_pr_nodes <- igraph::V(mst)$name[closest_vertex]

We compute the trajectory

cds <- order_cells(cds, root_pr_nodes = root_pr_nodes)
plot_cells(cds, color_cells_by = "pseudotime")

library(magrittr)

Get the closest vertice for every cell

y_to_cells <- principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex %>%
as.data.frame()
y_to_cells$cells <- rownames(y_to_cells)
y_to_cells$Y <- y_to_cells$V1

Get the root vertices

It is the same node as above

root <- cds@principal_graph_aux$UMAP$root_pr_nodes

Get the other endpoints

endpoints <- names(which(igraph::degree(mst) == 1))
endpoints <- endpoints[!endpoints %in% root]

For each endpoint

cellWeights <- lapply(endpoints, function(endpoint) {

We find the path between the endpoint and the root

path <- igraph::shortest_paths(mst, root, endpoint)$vpath[[1]]
path <- as.character(path)

We find the cells that map along that path

df <- y_to_cells[y_to_cells$Y %in% path, ]
df <- data.frame(weights = as.numeric(colnames(cds) %in% df$cells))
colnames(df) <- endpoint
return(df)
}) %>% do.call(what = 'cbind', args = .) %>%
as.matrix()
rownames(cellWeights) <- colnames(cds)
pseudotime <- matrix(pseudotime(cds), ncol = ncol(cellWeights),
nrow = ncol(cds), byrow = FALSE)

counts <- as.matrix(data)
counts <- as.matrix(Biobase::exprs(cds))
sce <- tradeSeq::fitGAM(counts = counts,
pseudotime = pseudotime,
cellWeights = cellWeights)
#Error in .assignCells(cellWeights) :

Some cells have no positive cell weights.

i will be greatly appreciated if you can help me!

best
jiang

The text was updated successfully, but these errors were encountered:

koenvandenberge · 2024-01-19T09:05:15Z

Hi @jiangzh-coder

Could you please check how cellWeights looks like? The error message is suggesting that some cells have either all zero or negative weights. This does not allow us to assign cells to lineages.
Perhaps check how often this occurs, and where these cells may reside within your trajectory can be helpful.

mahwish2 · 2024-04-10T13:06:20Z

Hi, sorry, If I should not enter this thread, but I am having same error following the mentioned vignette. I sure have zero weights in my dataset frequently. I am wondering how to sort his out. My data looks like this
Y_1 Y_2 Y_71 Y_94 Y_96
Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
Median :0.00000 Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
Mean :0.06106 Mean :0.2091 Mean :0.1103 Mean :0.04645 Mean :0.2709
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
Y_97 Y_112 Y_117 Y_193 Y_267
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.1771 Mean :0.1873 Mean :0.2127 Mean :0.1709 Mean :0.1776
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000

this is summary of cell weights

koenvandenberge · 2024-06-21T13:04:24Z

Hi @mahwish2

It is fine to have weights of zero, but the sum of the weights across all lineages should not be zero. In that case, we cannot assign a cell to any lineage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monocle3 as input to tradeSeq #247

Monocle3 as input to tradeSeq #247

jiangzh-coder commented Jan 5, 2024

koenvandenberge commented Jan 19, 2024

mahwish2 commented Apr 10, 2024

koenvandenberge commented Jun 21, 2024

Monocle3 as input to tradeSeq #247

Monocle3 as input to tradeSeq #247

Comments

jiangzh-coder commented Jan 5, 2024

Some cells have no positive cell weights.

For reproducibility

note the colnames(counts) must match rownames(meta)

access the gene names

access the meta data

access the assay data

look at the percentage of variance explained by our principal components

Note here we are using the results of PCA dimension reduction in the last step and visualizing the results in 2 dimensions.

umap.fast_sgd=FALSE and cores = 1 are needed for reproducible results

let's take another look at our cell data set object

let's examine our new reducedDimNames slot!

view the cell types in our UMAP plot

Grouping cells into clusters is an important step in identifying the cell types represented in your data. Monocle uses a technique called community detection to group cells into cluster and partitions.

additionally we will run learn_graph to calculate trajectories on our subset!

now that we have subset out data, re-run our monocle workflow, and calculated

trajectories, let's see where we should set our root node!

a helper function to identify the root principal points:

root_cells = colnames(cds[, clusters(cds) == 5]))

plot pseudo time after choosing root nodes

(defined as the bottom in the cycling progenitors cell type)

how are our wild-type and mutant cells distributed in this graph?

We find all the cells that are close to the starting point

We compute the trajectory

Get the closest vertice for every cell

Get the root vertices

It is the same node as above

Get the other endpoints

For each endpoint

We find the path between the endpoint and the root

We find the cells that map along that path

Some cells have no positive cell weights.

koenvandenberge commented Jan 19, 2024

mahwish2 commented Apr 10, 2024

koenvandenberge commented Jun 21, 2024

let's examine our new `reducedDimNames` slot!