Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inflation in peptide IDs after group specific FDR #522

Open
mkirankumar45 opened this issue Feb 10, 2025 · 7 comments
Open

Inflation in peptide IDs after group specific FDR #522

mkirankumar45 opened this issue Feb 10, 2025 · 7 comments

Comments

@mkirankumar45
Copy link

Hello Fragpipe team,

I am doing non-specific HLA search against a large database to identify peptides derived from non-canonical genomic regions with group specific FDR option enabled. Using the same sample, comparing to the database search against standard human canonical proteome database, there is an inflation in the number of peptides identified from large database (33,709 vs 21,924) and all the peptides identified are only from canonical proteins.
I would expect that if the group FDR is functioning correctly, both searches should yield a similar number of peptides from canonical protein sequences. Could you confirm if this assumption is correct? If so, what might be causing this discrepancy?

log_2025-01-19_15-06-25_large_database.txt
log_2025-02-09_18-23-01_Standard_database.txt

@anesvi
Copy link

anesvi commented Feb 10, 2025

Please first use one of our prebuilt workflows. It looks like you made some changes, like I see you have --prot 1 (i.e. no protein FDR) and you changed to 2D filter. These two settings are incompatible.

We have some tutorials on our FragPipe page, please follow it without making any changes:
https://fragpipe.nesvilab.org/docs/tutorial_group_fdr.html

@mkirankumar45
Copy link
Author

Dear Alexey,

Thank you for your response. I am using the default nonspecific HLA workflow and followed the tutorial to setup the group FDR. Does this mean that group FDR is not compatible with the nonspecific HLA workflow?

@anesvi
Copy link

anesvi commented Feb 10, 2025

It is compatible, but as I said something is not right. I see you made at least some changes to the workflows. Please provide more details on how you annotated sequences in the database with PE numbers, what workflow you loaded and what other changes you made.

@mkirankumar45
Copy link
Author

I loaded the default nonspecific HLA workflow and removed the nQ, NE and Cysteinylation modifications for both the searches. For the large database search, split database option was increased to 100 and group FDR option was enabled as mentioned in the tutorial.

The database was annotated with PE=1 for canonical sequences and PE=2 for non-canonical sequences using a Python script. We used FragPipe headless mode on Linux for large database search.

I don't recollect any other changes made to the workflow. I am attaching the workflows for your review.

Thank you.

fragpipe_large_database.txt
fragpipe_standard_database.txt

@mkirankumar45
Copy link
Author

Dear Alexey and the Fragpipe team,

Just following up to see if you have any suggestions regarding the workflow and the reason behind increase in the number of peptide identifications after applying group-specific FDR?

Thank you.

@anesvi
Copy link

anesvi commented Feb 18, 2025

Sorry this is hard to debug via GitHub for me. Perhaps you can send more details by email

@mkirankumar45
Copy link
Author

Thank you, Alexey. Sure, I will send you an email with more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants