Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrthoFinder fails with MMseqs #269

Closed
davidemms opened this issue Jun 10, 2019 · 1 comment
Closed

OrthoFinder fails with MMseqs #269

davidemms opened this issue Jun 10, 2019 · 1 comment

Comments

@davidemms
Copy link
Owner

OrthoFinder can fail at the orthogroups stage when using MMseqs. E.g:

$ python2 ~/software/OrthoFinder-2.3.3_source/orthofinder/orthofinder.py -f Fish_shorter/ -S mmseqs -og -t 32 -a 8

OrthoFinder version 2.3.3 Copyright (C) 2014 David Emms

2019-06-10 14:33:26 : Starting OrthoFinder
32 thread(s) for highly parallel tasks (BLAST searches etc.)
8 thread(s) for OrthoFinder algorithm

Checking required programs are installed
----------------------------------------
Test can run "mcl -h" - ok

Dividing up work for BLAST for parallel processing
--------------------------------------------------
2019-06-10 14:33:30 : Creating mmseqs database 1 of 11
2019-06-10 14:33:32 : Creating mmseqs database 2 of 11
2019-06-10 14:33:34 : Creating mmseqs database 3 of 11
2019-06-10 14:33:37 : Creating mmseqs database 4 of 11
2019-06-10 14:33:40 : Creating mmseqs database 5 of 11
2019-06-10 14:33:42 : Creating mmseqs database 6 of 11
2019-06-10 14:33:44 : Creating mmseqs database 7 of 11
2019-06-10 14:33:47 : Creating mmseqs database 8 of 11
2019-06-10 14:33:49 : Creating mmseqs database 9 of 11
2019-06-10 14:33:51 : Creating mmseqs database 10 of 11
2019-06-10 14:33:54 : Creating mmseqs database 11 of 11

Running mmseqs all-versus-all
-----------------------------
Using 32 thread(s)
2019-06-10 14:33:56 : This may take some time....
2019-06-10 14:33:56 : Done 0 of 121
2019-06-10 14:39:21 : Done 10 of 121
2019-06-10 14:39:27 : Done 20 of 121
2019-06-10 14:39:40 : Done 30 of 121
2019-06-10 14:44:42 : Done 40 of 121
2019-06-10 14:45:00 : Done 50 of 121
2019-06-10 14:45:34 : Done 60 of 121
2019-06-10 14:49:57 : Done 70 of 121
2019-06-10 14:50:35 : Done 80 of 121
2019-06-10 14:51:25 : Done 90 of 121
2019-06-10 14:57:41 : Done all-versus-all sequence search

Running OrthoFinder algorithm
-----------------------------
2019-06-10 14:57:42 : Initial processing of each species

ERROR: Query or hit sequence ID in BLAST results file was missing or incorrectly formatted.
Malformatted line in /lv01/data/emms/NOBACKUP/Fish_shorter/OrthoFinder/Results_Jun10_1/WorkingDirectory/Blast4_0.txt
Offending line was:
4_7649_0	0_4529	0.271	1921	1322	0	10063	11983	1533	3346	3.71E-207	723
Process Process-38:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/lv01/home/emms//software/OrthoFinder-2.3.3_source/orthofinder/orthofinder.py", line 505, in Worker_ProcessBlastHits
    WaterfallMethod.ProcessBlastHits(*args, qDoubleBlast=qDoubleBlast)
  File "/lv01/home/emms//software/OrthoFinder-2.3.3_source/orthofinder/orthofinder.py", line 492, in ProcessBlastHits
    Bij = BlastFileProcessor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
  File "/lv01/home/emms/software/OrthoFinder-2.3.3_source/orthofinder/scripts/blast_file_processor.py", line 68, in GetBLAST6Scores
    sequence1ID = int(row[iQ].split(sep, 1)[1])
ValueError: invalid literal for int() with base 10: '7649_0'
ERROR: An error occurred, please review error messages for more information.
@davidemms
Copy link
Owner Author

This is caused by MMseqs splitting longer sequences and renaming these split sequences resulting in them not being recognised by OrthoFinder. E.g if "gene25" gets split up then its parts get renamed "gene25_0", "gene25_1" etc. I will modify the OrthoFinder code so that it can it ignores these suffixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant