Discrepancy between `plotSmoothers` and `predictSmooth` #232

nickhir · 2023-04-05T14:00:25Z

Hi,

I am using tradeSeq to identify TF which are differentially expressed along a lineage of interested. This revealed several significant hits. However, when visualizing the results for a specific gene, I noticed something odd:
When I use the plotSmoothers function for the gene dysf it looks like the lineage4_notch is higher expressed throughout the whole lineage (see image below). This is also supported when I simply look at the expression on the PCA which I used as the input for slingshot.

plotSmoothers(gam_fit,  counts(slingshot_sce), gene="dysf")

However, if I use predictSmooth to get expression for dysf along lineage4, it looks like at some points dysf is more strongly expressed in the control condition which confused me.

expression_prediction <- predictSmooth(gam_fit_7, gene = c("klu","dysf"), nPoints = 1500, tidy = FALSE)

# we have 4 lineages. We are only interested in lineage 4 so filter select accordingly:
expression_prediction_subset <- expression_prediction[,grepl("lineage4_",colnames(expression_prediction))]

length(expression_prediction_subset) # -> 6000, because we predict 1500 points for both conditions and genes.
column_names <-  c(rep("Control",1500),
                   rep("notch", 1500))
ha = HeatmapAnnotation(perturbation = 
                         anno_block(gp = gpar(fill = c("green",
                                                       "orange")),
                                    labels = c("Control", "notch"),
                                    labels_gp = gpar(col = "black", 
                                                     fontsize = 12,
                                                     fontface="bold")))

rownames(expression_prediction_subset_scaled) <- rownames(expression_prediction_subset)
heatmap <- Heatmap(expression_prediction_subset,
                   cluster_rows = T, 
                   cluster_columns = F, 
                   show_row_names = T, 
                   show_column_names = F,
                   show_row_dend = F,
                   name="Z-score",
                   border = "black",
                   top_annotation = ha,
                   row_names_gp = grid::gpar(fontsize = 8),
                   column_split = column_names,
                   use_raster = F,
                   column_title_gp = gpar(fontsize = 7, fontface="bold"),
                   column_title_rot = 0,
                   show_heatmap_legend = T)

Any idea why this might happen? For all other genes I have checked, predictSmooth and plotSmoothers agreed perfectly.

Cheers!

The text was updated successfully, but these errors were encountered:

koenvandenberge · 2023-07-04T11:38:54Z

Hi @nickhir

Thank you for reporting this.
My first thought is that in the plotSmoothers plot, the smoother is showing us the estimated average expression.
The heatmap, instead, looks at some scaled version of this smoother. Possibly the scaling is causing the differences you are seeing?

Though this is probably too simplistic. It would help to see the numbers of predictSmooth as well as the predictions of plotSmoothers, and e.g. plotting these against each other when using the same number of grid points (nPoints argument).

If this does not enlighten us, I may be able to take a look at this. In that case, it would be helpful if you could share the SingleCellExperiment object obtained after running fitGAM, preferably with a recent tradeSeq version.

koenvandenberge · 2023-09-22T11:00:26Z

Hi @nickhir

I have just fixed a bug in predictSmooth, and I wonder if that would solve your issue as well. See #237

Alexis-Varin · 2024-07-11T20:55:07Z

Hi @koenvandenberge , I just got the same problem, one of my key gene of interest is highly overexpressed in my diseased condition, I expect predictSmooth to reflect that, however the smoothed values (before log1p(), scales::rescale() etc, just straight output from predictSmooth) are higher in the ctrl (max yhat is 0.45) than diseased (max yhat 0.29) with nPoints = 100, and the heatmap is therefore completely wrong, showing higher in ctrl. plotSmoothers shows however a very clear overexpression in diseased condition, consistent with actual biological effect expected.

Upon looking at your other issues, I saw that #230 also had some problems but unrelated to ours as it was with his custom code. Upon looking at his notebook, I noticed that the main difference between me and @nickhir and @flde is that we used tidy = FALSE, while he used tidy = TRUE. So I tested with tidy = TRUE and the discrepencies vanished, as now the output of predictSmooth shows a way higher yhat in diseased vs ctrl (max 1.64 in ctrl and max 6.30 in diseased).

So while now I am now able to plot my heatmap fine, I think there is still a big issue if the yhat values straight out of predictSmooth are so widely different whether we use tidy = FALSE or TRUE. Here are the values :

From tidy = FALSE for my gene of interest (row of matrix), 1 lineage 2 conditions 100 nPoints, rounded to 3 digits, straight from predictSmooth :

[1] 0.107 0.085 0.068 0.054 0.043 0.035 0.028 0.022 0.018 0.015 0.012 0.010 0.009 0.007 0.006 0.005
[17] 0.005 0.004 0.004 0.004 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.004 0.004 0.004 0.005
[33] 0.006 0.006 0.007 0.009 0.010 0.012 0.015 0.018 0.021 0.025 0.030 0.036 0.043 0.051 0.060 0.071
[49] 0.083 0.096 0.111 0.127 0.145 0.164 0.184 0.206 0.228 0.252 0.276 0.299 0.323 0.346 0.368 0.388
[65] 0.406 0.422 0.434 0.444 0.449 0.451 0.449 0.444 0.434 0.421 0.404 0.385 0.363 0.339 0.314 0.287
[81] 0.261 0.234 0.208 0.183 0.159 0.138 0.118 0.100 0.084 0.071 0.059 0.049 0.040 0.033 0.027 0.022
[97] 0.018 0.015 0.012 0.010

[101] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[113] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
[129] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.001 0.001 0.001 0.001 0.001 0.002
[145] 0.002 0.003 0.003 0.004 0.005 0.006 0.007 0.008 0.010 0.012 0.015 0.019 0.023 0.027 0.033 0.040
[161] 0.048 0.058 0.069 0.081 0.095 0.110 0.127 0.146 0.165 0.185 0.205 0.225 0.243 0.260 0.274 0.285
[177] 0.292 0.295 0.293 0.286 0.274 0.258 0.239 0.217 0.193 0.168 0.144 0.120 0.098 0.078 0.061 0.046
[193] 0.034 0.025 0.017 0.012 0.008 0.005 0.003 0.002

From tidy = TRUE for my gene of interest (column yhat of data frame), 1 lineage 2 conditions 100 nPoints, rounded to 3 digits, straight from predictSmooth :

[1] 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
[17] 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
[33] 0.002 0.002 0.002 0.002 0.002 0.002 0.003 0.003 0.003 0.004 0.004 0.005 0.005 0.006 0.007 0.008
[49] 0.009 0.011 0.013 0.015 0.018 0.021 0.025 0.029 0.035 0.042 0.050 0.060 0.072 0.087 0.104 0.124
[65] 0.149 0.178 0.211 0.251 0.296 0.349 0.408 0.475 0.550 0.632 0.721 0.816 0.917 1.021 1.125 1.229
[81] 1.327 1.418 1.498 1.562 1.609 1.636 1.640 1.620 1.577 1.512 1.426 1.323 1.206 1.081 0.950 0.820
[97] 0.695 0.576 0.469 0.373

[101] 0.034 0.032 0.031 0.029 0.028 0.027 0.025 0.024 0.023 0.022 0.021 0.021
[113] 0.020 0.019 0.018 0.018 0.017 0.017 0.016 0.016 0.016 0.015 0.015 0.015 0.015 0.015 0.015 0.015
[129] 0.015 0.015 0.015 0.015 0.015 0.016 0.016 0.016 0.017 0.018 0.018 0.019 0.020 0.022 0.023 0.025
[145] 0.027 0.029 0.032 0.035 0.039 0.044 0.049 0.055 0.063 0.073 0.084 0.098 0.116 0.137 0.163 0.195
[161] 0.235 0.283 0.342 0.413 0.500 0.605 0.731 0.882 1.059 1.268 1.510 1.788 2.102 2.454 2.840 3.256
[177] 3.695 4.146 4.597 5.031 5.430 5.776 6.048 6.229 6.304 6.265 6.108 5.836 5.460 4.998 4.472 3.908
[193] 3.332 2.770 2.243 1.767 1.353 1.007 0.727 0.509

Exact same parameters for predictSmooth between both, with the exception of tidy :
yhatSmooth <- predictSmooth(deg.all, gene = features3E, nPoints = 100, tidy = F)
tradeSeq version is 1.16.0.

nickhir · 2024-07-26T10:21:17Z

This is a really interesting find @Alexis-Varin! Do you have an idea which values "correct"/more appropriate? The ones with tidy=FALSE or with tidy=TRUE? I assume you are now using tidy=TRUE?

Alexis-Varin · 2024-07-29T08:40:39Z

Hi @nickhir, The correct one is tidy = TRUE, I used @flde code to generate the same figure from plotSmoothers, it only is identical if tidy=TRUE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between `plotSmoothers` and `predictSmooth` #232

Discrepancy between `plotSmoothers` and `predictSmooth` #232

nickhir commented Apr 5, 2023

koenvandenberge commented Jul 4, 2023

koenvandenberge commented Sep 22, 2023

Alexis-Varin commented Jul 11, 2024 •

edited

Loading

nickhir commented Jul 26, 2024

Alexis-Varin commented Jul 29, 2024

Discrepancy between plotSmoothers and predictSmooth #232

Discrepancy between plotSmoothers and predictSmooth #232

Comments

nickhir commented Apr 5, 2023

koenvandenberge commented Jul 4, 2023

koenvandenberge commented Sep 22, 2023

Alexis-Varin commented Jul 11, 2024 • edited Loading

nickhir commented Jul 26, 2024

Alexis-Varin commented Jul 29, 2024

Discrepancy between `plotSmoothers` and `predictSmooth` #232

Discrepancy between `plotSmoothers` and `predictSmooth` #232

Alexis-Varin commented Jul 11, 2024 •

edited

Loading