Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between plotSmoothers and predictSmooth #232

Open
nickhir opened this issue Apr 5, 2023 · 5 comments
Open

Discrepancy between plotSmoothers and predictSmooth #232

nickhir opened this issue Apr 5, 2023 · 5 comments

Comments

@nickhir
Copy link

nickhir commented Apr 5, 2023

Hi,

I am using tradeSeq to identify TF which are differentially expressed along a lineage of interested. This revealed several significant hits. However, when visualizing the results for a specific gene, I noticed something odd:
When I use the plotSmoothers function for the gene dysf it looks like the lineage4_notch is higher expressed throughout the whole lineage (see image below). This is also supported when I simply look at the expression on the PCA which I used as the input for slingshot.

plotSmoothers(gam_fit,  counts(slingshot_sce), gene="dysf")

image

However, if I use predictSmooth to get expression for dysf along lineage4, it looks like at some points dysf is more strongly expressed in the control condition which confused me.

expression_prediction <- predictSmooth(gam_fit_7, gene = c("klu","dysf"), nPoints = 1500, tidy = FALSE)

# we have 4 lineages. We are only interested in lineage 4 so filter select accordingly:
expression_prediction_subset <- expression_prediction[,grepl("lineage4_",colnames(expression_prediction))]

length(expression_prediction_subset) # -> 6000, because we predict 1500 points for both conditions and genes.
column_names <-  c(rep("Control",1500),
                   rep("notch", 1500))
ha = HeatmapAnnotation(perturbation = 
                         anno_block(gp = gpar(fill = c("green",
                                                       "orange")),
                                    labels = c("Control", "notch"),
                                    labels_gp = gpar(col = "black", 
                                                     fontsize = 12,
                                                     fontface="bold")))

rownames(expression_prediction_subset_scaled) <- rownames(expression_prediction_subset)
heatmap <- Heatmap(expression_prediction_subset,
                   cluster_rows = T, 
                   cluster_columns = F, 
                   show_row_names = T, 
                   show_column_names = F,
                   show_row_dend = F,
                   name="Z-score",
                   border = "black",
                   top_annotation = ha,
                   row_names_gp = grid::gpar(fontsize = 8),
                   column_split = column_names,
                   use_raster = F,
                   column_title_gp = gpar(fontsize = 7, fontface="bold"),
                   column_title_rot = 0,
                   show_heatmap_legend = T)

image

Any idea why this might happen? For all other genes I have checked, predictSmooth and plotSmoothers agreed perfectly.

Cheers!

@koenvandenberge
Copy link
Member

Hi @nickhir

Thank you for reporting this.
My first thought is that in the plotSmoothers plot, the smoother is showing us the estimated average expression.
The heatmap, instead, looks at some scaled version of this smoother. Possibly the scaling is causing the differences you are seeing?

Though this is probably too simplistic. It would help to see the numbers of predictSmooth as well as the predictions of plotSmoothers, and e.g. plotting these against each other when using the same number of grid points (nPoints argument).

If this does not enlighten us, I may be able to take a look at this. In that case, it would be helpful if you could share the SingleCellExperiment object obtained after running fitGAM, preferably with a recent tradeSeq version.

@koenvandenberge
Copy link
Member

Hi @nickhir

I have just fixed a bug in predictSmooth, and I wonder if that would solve your issue as well. See #237

@Alexis-Varin
Copy link

Alexis-Varin commented Jul 11, 2024

Hi @koenvandenberge , I just got the same problem, one of my key gene of interest is highly overexpressed in my diseased condition, I expect predictSmooth to reflect that, however the smoothed values (before log1p(), scales::rescale() etc, just straight output from predictSmooth) are higher in the ctrl (max yhat is 0.45) than diseased (max yhat 0.29) with nPoints = 100, and the heatmap is therefore completely wrong, showing higher in ctrl. plotSmoothers shows however a very clear overexpression in diseased condition, consistent with actual biological effect expected.

Upon looking at your other issues, I saw that #230 also had some problems but unrelated to ours as it was with his custom code. Upon looking at his notebook, I noticed that the main difference between me and @nickhir and @flde is that we used tidy = FALSE, while he used tidy = TRUE. So I tested with tidy = TRUE and the discrepencies vanished, as now the output of predictSmooth shows a way higher yhat in diseased vs ctrl (max 1.64 in ctrl and max 6.30 in diseased).

So while now I am now able to plot my heatmap fine, I think there is still a big issue if the yhat values straight out of predictSmooth are so widely different whether we use tidy = FALSE or TRUE. Here are the values :

From tidy = FALSE for my gene of interest (row of matrix), 1 lineage 2 conditions 100 nPoints, rounded to 3 digits, straight from predictSmooth :

[1] 0.107 0.085 0.068 0.054 0.043 0.035 0.028 0.022 0.018 0.015 0.012 0.010 0.009 0.007 0.006 0.005
[17] 0.005 0.004 0.004 0.004 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.004 0.004 0.004 0.005
[33] 0.006 0.006 0.007 0.009 0.010 0.012 0.015 0.018 0.021 0.025 0.030 0.036 0.043 0.051 0.060 0.071
[49] 0.083 0.096 0.111 0.127 0.145 0.164 0.184 0.206 0.228 0.252 0.276 0.299 0.323 0.346 0.368 0.388
[65] 0.406 0.422 0.434 0.444 0.449 0.451 0.449 0.444 0.434 0.421 0.404 0.385 0.363 0.339 0.314 0.287
[81] 0.261 0.234 0.208 0.183 0.159 0.138 0.118 0.100 0.084 0.071 0.059 0.049 0.040 0.033 0.027 0.022
[97] 0.018 0.015 0.012 0.010

[101] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[113] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[129] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.001 0.001 0.001 0.001 0.001 0.002
[145] 0.002 0.003 0.003 0.004 0.005 0.006 0.007 0.008 0.010 0.012 0.015 0.019 0.023 0.027 0.033 0.040
[161] 0.048 0.058 0.069 0.081 0.095 0.110 0.127 0.146 0.165 0.185 0.205 0.225 0.243 0.260 0.274 0.285
[177] 0.292 0.295 0.293 0.286 0.274 0.258 0.239 0.217 0.193 0.168 0.144 0.120 0.098 0.078 0.061 0.046
[193] 0.034 0.025 0.017 0.012 0.008 0.005 0.003 0.002

From tidy = TRUE for my gene of interest (column yhat of data frame), 1 lineage 2 conditions 100 nPoints, rounded to 3 digits, straight from predictSmooth :

[1] 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
[17] 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
[33] 0.002 0.002 0.002 0.002 0.002 0.002 0.003 0.003 0.003 0.004 0.004 0.005 0.005 0.006 0.007 0.008
[49] 0.009 0.011 0.013 0.015 0.018 0.021 0.025 0.029 0.035 0.042 0.050 0.060 0.072 0.087 0.104 0.124
[65] 0.149 0.178 0.211 0.251 0.296 0.349 0.408 0.475 0.550 0.632 0.721 0.816 0.917 1.021 1.125 1.229
[81] 1.327 1.418 1.498 1.562 1.609 1.636 1.640 1.620 1.577 1.512 1.426 1.323 1.206 1.081 0.950 0.820
[97] 0.695 0.576 0.469 0.373

[101] 0.034 0.032 0.031 0.029 0.028 0.027 0.025 0.024 0.023 0.022 0.021 0.021
[113] 0.020 0.019 0.018 0.018 0.017 0.017 0.016 0.016 0.016 0.015 0.015 0.015 0.015 0.015 0.015 0.015
[129] 0.015 0.015 0.015 0.015 0.015 0.016 0.016 0.016 0.017 0.018 0.018 0.019 0.020 0.022 0.023 0.025
[145] 0.027 0.029 0.032 0.035 0.039 0.044 0.049 0.055 0.063 0.073 0.084 0.098 0.116 0.137 0.163 0.195
[161] 0.235 0.283 0.342 0.413 0.500 0.605 0.731 0.882 1.059 1.268 1.510 1.788 2.102 2.454 2.840 3.256
[177] 3.695 4.146 4.597 5.031 5.430 5.776 6.048 6.229 6.304 6.265 6.108 5.836 5.460 4.998 4.472 3.908
[193] 3.332 2.770 2.243 1.767 1.353 1.007 0.727 0.509

Exact same parameters for predictSmooth between both, with the exception of tidy :
yhatSmooth <- predictSmooth(deg.all, gene = features3E, nPoints = 100, tidy = F)
tradeSeq version is 1.16.0.

@nickhir
Copy link
Author

nickhir commented Jul 26, 2024

This is a really interesting find @Alexis-Varin! Do you have an idea which values "correct"/more appropriate? The ones with tidy=FALSE or with tidy=TRUE? I assume you are now using tidy=TRUE?

@Alexis-Varin
Copy link

Hi @nickhir, The correct one is tidy = TRUE, I used @flde code to generate the same figure from plotSmoothers, it only is identical if tidy=TRUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants