Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

=Stage 6: Split Fasta= no fasta records identified, exiting. #51

Open
mengmm777 opened this issue Aug 8, 2024 · 2 comments
Open

=Stage 6: Split Fasta= no fasta records identified, exiting. #51

mengmm777 opened this issue Aug 8, 2024 · 2 comments

Comments

@mengmm777
Copy link

Hello,

I encountered an issue while running the software. The specific output log is as follows:

Use of uninitialized value $file_count in concatenation (.) or string at /public/home/soft/orthomcl-pipeline/orthomcl-pipeline-master/bin/../scripts/orthomcl-pipeline.pl line 371.
Warning: directory "/public/home/pep_orthomcl/" already exists, are you sure you want to store data here [Y]? Starting OrthoMCL pipeline on: Thu Aug  8 16:25:17 2024
Git commit: unknown
=Stage 1: Validate Files =
Validated  files
Stage 1 took 0.00 minutes 
=Stage 2: Validate Database=
Warning: some tables exist already in database dbi:mysql:orthomcl:10.10.101.6:mysql_local_infile, user=orthomcl, database_name=orthomcl. Do you want to remove (y/n)? Executing: 'drop database orthomcl'
Executing: 'create database orthomcl'
Successfully removed old database entries
Stage 2 took 0.02 minutes 
=Stage 3: Load OrthoMCL Database Schema=
/public/home/soft/orthomclSoftware-v2.0.9/bin/orthomclInstallSchema "/public/home/soft/orthomcl-pipeline/orthomcl-pipeline-master/orthomcl.conf" "/public/home/pep_orthomcl/log/orthomclSchema.log" 1>/public/home/pep_orthomcl/log/3.loadschema.stdout.log 2>/public/home/pep_orthomcl/log/3.loadschema.stderr.log
Stage 3 took 0.02 minutes 
=Stage 4: Adjust Fasta=
Stage 4 took 0.00 minutes 
=Stage 5: Filter Fasta=
/public/home/soft/orthomclSoftware-v2.0.9/bin/orthomclFilterFasta "/public/home/pep_orthomcl/compliant_fasta" 10 20
Stage 5 took 0.00 minutes 
=Stage 6: Split Fasta=
splitting /public/home/pep_orthomcl/blast_dir/goodProteins.fasta into 4 pieces
no fasta records identified, exiting.

Upon checking, I found that the goodProteins.fasta file is empty. However, I manually verified that the files in the compliant_fasta directory do exist, which are my input files.

Here is my input code:

orthomcl-pipeline -i /public/home/pep_sequences/ -o /public/home/pep_orthomcl/ -m /public/home/soft/orthomcl-pipeline/orthomcl-pipeline-master/orthomcl.conf --yes #--nocompliant

Here is a sample format of my input files:

>Solyc00T000002.1
MPVIPLFFFLLAFVWQAAVNCVMLTLKL......
>Solyc00T000003.1
MVTIRADEISNIIRERIEQYNREVKIVNTG.....
>Solyc00T000004.1

Could it be that the name format in my protein files is causing the issue, or is there a problem with my input files? I have only one input file, which contains protein sequences for all genes of the species that can be mapped to the reference genome. Could this be affecting the process?
Could you please help me understand what might be causing this issue? I look forward to your response and appreciate your assistance.

Thank you very much.

@apetkau
Copy link
Owner

apetkau commented Aug 8, 2024

Is there a reason why you have #--nocompliant commented out in your command? Since this should re-adjust the sequence headers for your input fasta files to include the file name as a prefix.

@mengmm777
Copy link
Author

mengmm777 commented Aug 8, 2024

Yes, I had used this parameter before, but found the errors, so I commented it out.

There are two differences in the results when using the --nocompliant parameter and not using it:

1、When using the --nocompliant parameter, the output log includes a new section =Stage 4: Adjust Fasta=\nStage 4 took 0.00 minutes , which is not present when not using this parameter , the log goes directly from Stage 3 to Stage 5. (The rest of the output log is the same.)
2、When using the --nocompliant parameter, the /public/home/pep_orthomcl/compliant_fasta folder is empty, with no content. When not using this,there is a single file in this folder, and the size of this file matches my input protein sequence FASTA file.

Thank you for your response, I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants