Inflation in peptide IDs after group specific FDR #522

mkirankumar45 · 2025-02-10T01:22:45Z

Hello Fragpipe team,

I am doing non-specific HLA search against a large database to identify peptides derived from non-canonical genomic regions with group specific FDR option enabled. Using the same sample, comparing to the database search against standard human canonical proteome database, there is an inflation in the number of peptides identified from large database (33,709 vs 21,924) and all the peptides identified are only from canonical proteins.
I would expect that if the group FDR is functioning correctly, both searches should yield a similar number of peptides from canonical protein sequences. Could you confirm if this assumption is correct? If so, what might be causing this discrepancy?

log_2025-01-19_15-06-25_large_database.txt
log_2025-02-09_18-23-01_Standard_database.txt

anesvi · 2025-02-10T02:44:44Z

Please first use one of our prebuilt workflows. It looks like you made some changes, like I see you have --prot 1 (i.e. no protein FDR) and you changed to 2D filter. These two settings are incompatible.

We have some tutorials on our FragPipe page, please follow it without making any changes:
https://fragpipe.nesvilab.org/docs/tutorial_group_fdr.html

mkirankumar45 · 2025-02-10T03:57:31Z

Dear Alexey,

Thank you for your response. I am using the default nonspecific HLA workflow and followed the tutorial to setup the group FDR. Does this mean that group FDR is not compatible with the nonspecific HLA workflow?

anesvi · 2025-02-10T04:07:13Z

It is compatible, but as I said something is not right. I see you made at least some changes to the workflows. Please provide more details on how you annotated sequences in the database with PE numbers, what workflow you loaded and what other changes you made.

mkirankumar45 · 2025-02-10T23:30:05Z

I loaded the default nonspecific HLA workflow and removed the nQ, NE and Cysteinylation modifications for both the searches. For the large database search, split database option was increased to 100 and group FDR option was enabled as mentioned in the tutorial.

The database was annotated with PE=1 for canonical sequences and PE=2 for non-canonical sequences using a Python script. We used FragPipe headless mode on Linux for large database search.

I don't recollect any other changes made to the workflow. I am attaching the workflows for your review.

Thank you.

fragpipe_large_database.txt
fragpipe_standard_database.txt

mkirankumar45 · 2025-02-18T04:02:56Z

Dear Alexey and the Fragpipe team,

Just following up to see if you have any suggestions regarding the workflow and the reason behind increase in the number of peptide identifications after applying group-specific FDR?

Thank you.

anesvi · 2025-02-18T04:08:14Z

Sorry this is hard to debug via GitHub for me. Perhaps you can send more details by email

mkirankumar45 · 2025-03-03T05:31:34Z

Thank you, Alexey. Sure, I will send you an email with more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inflation in peptide IDs after group specific FDR #522

Inflation in peptide IDs after group specific FDR #522

mkirankumar45 commented Feb 10, 2025

anesvi commented Feb 10, 2025

mkirankumar45 commented Feb 10, 2025

anesvi commented Feb 10, 2025

mkirankumar45 commented Feb 10, 2025

mkirankumar45 commented Feb 18, 2025

anesvi commented Feb 18, 2025

mkirankumar45 commented Mar 3, 2025

Inflation in peptide IDs after group specific FDR #522

Inflation in peptide IDs after group specific FDR #522

Comments

mkirankumar45 commented Feb 10, 2025

anesvi commented Feb 10, 2025

mkirankumar45 commented Feb 10, 2025

anesvi commented Feb 10, 2025

mkirankumar45 commented Feb 10, 2025

mkirankumar45 commented Feb 18, 2025

anesvi commented Feb 18, 2025

mkirankumar45 commented Mar 3, 2025