Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about contaminants which are used in proteomic searches #510

Closed
RobAlbn opened this issue Nov 18, 2024 · 3 comments
Closed

Questions about contaminants which are used in proteomic searches #510

RobAlbn opened this issue Nov 18, 2024 · 3 comments

Comments

@RobAlbn
Copy link

RobAlbn commented Nov 18, 2024

I am running proteomic searches with FragPipe v22.0, and I have some questions about contaminants.

After adding decoys and contaminants to my protein database, I removed proteins of the starting database and decoys. In this way, I obtained a FASTA file of contaminants ("contaminants.fasta"). Then, I downloaded the cRAP database from this link: ftp://ftp.thegpm.org/fasta/cRAP/crap.fasta. However, while my FASTA file ("contaminants.fasta") contains 118 proteins, the cRAP FASTA file ("crap.fasta") contains 116 proteins. Both files are attached as text files. Why are there differences between these two FASTA files? In general, could you provide the FASTA file of contaminants that are added by FragPipe?

After adding decoys and contaminants, in the resulting FASTA file headers of contaminants do not start with "contam_" or a similar prefix. Can I manually add "contam_" or a similar prefix to headers of contaminants before running a proteomic search? Could this affect the search results?

Finally, when running Fragpipe in headless mode, which database should I specify in the workflow file, i.e., the database with decoys and contaminants or the database without decoys and contaminants? In other words, when running Fragpipe in headless mode, does it automatically add decoys and contaminants to the database that is specified in the workflow file?

Thank you for any help and support on this.

Best regards,
Roberto Albanese

contaminants.txt
crap.txt

@fcyu fcyu transferred this issue from Nesvilab/FragPipe Nov 18, 2024
@fcyu
Copy link
Member

fcyu commented Nov 18, 2024

After adding decoys and contaminants to my protein database, I removed proteins of the starting database and decoys. In this way, I obtained a FASTA file of contaminants ("contaminants.fasta"). Then, I downloaded the cRAP database from this link: ftp://ftp.thegpm.org/fasta/cRAP/crap.fasta. However, while my FASTA file ("contaminants.fasta") contains 118 proteins, the cRAP FASTA file ("crap.fasta") contains 116 proteins. Both files are attached as text files. Why are there differences between these two FASTA files? In general, could you provide the FASTA file of contaminants that are added by FragPipe?

Maybe @AimeeD90 can take a look at this one

After adding decoys and contaminants, in the resulting FASTA file headers of contaminants do not start with "contam_" or a similar prefix.

As far as I know, adding the contam_ prefix only works when you download the database using Philosopher.

Can I manually add "contam_" or a similar prefix to headers of contaminants before running a proteomic search? Could this affect the search results?

Yes, you can.

Finally, when running Fragpipe in headless mode, which database should I specify in the workflow file, i.e., the database with decoys and contaminants or the database without decoys and contaminants? In other words, when running Fragpipe in headless mode, does it automatically add decoys and contaminants to the database that is specified in the workflow file?

Specify the one with targets, decoys, and contaminants.

Best,

Fengchao

@RobAlbn
Copy link
Author

RobAlbn commented Nov 20, 2024

Hi again @AimeeD90 and @fcyu ,
To get the FASTA file of contaminants I downloaded the cRAPS.zip folder from https://github.com/Nesvilab/philosopher/tree/dev/lib/dat. The folder contains the file "crap-gpmdb.fas", which I am attaching as a text file. This FASTA contains 119 sequences, i.e., all 118 sequences in my "contaminants.fasta" + one additional sequence ("sp|P15636|API_ACHLY Protease 1 OS=Achromobacter lyticus OX=224 PE=1 SV=1"). If I need a FASTA file of contaminants that is used by Fragpipe v22.0, should I include this protein sequence too? In general, which is the FASTA file of contaminants? Thank you again.
Best regards,
Roberto Albanese

crap-gpmdb.txt

@RobAlbn
Copy link
Author

RobAlbn commented Nov 21, 2024

Hi again @AimeeD90 and @fcyu ,
I managed to find a solution to the only open question: I added decoys and contaminants with Philosopher on the command line, as explained here: https://github.com/Nesvilab/philosopher/wiki/How-to-Prepare-a-Protein-Database. I am going to close this issue now. Thank you very much for your answers to the other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants