OrthoFinder fails with MMseqs #269

davidemms · 2019-06-10T14:54:01Z

OrthoFinder can fail at the orthogroups stage when using MMseqs. E.g:

$ python2 ~/software/OrthoFinder-2.3.3_source/orthofinder/orthofinder.py -f Fish_shorter/ -S mmseqs -og -t 32 -a 8

OrthoFinder version 2.3.3 Copyright (C) 2014 David Emms

2019-06-10 14:33:26 : Starting OrthoFinder
32 thread(s) for highly parallel tasks (BLAST searches etc.)
8 thread(s) for OrthoFinder algorithm

Checking required programs are installed
----------------------------------------
Test can run "mcl -h" - ok

Dividing up work for BLAST for parallel processing
--------------------------------------------------
2019-06-10 14:33:30 : Creating mmseqs database 1 of 11
2019-06-10 14:33:32 : Creating mmseqs database 2 of 11
2019-06-10 14:33:34 : Creating mmseqs database 3 of 11
2019-06-10 14:33:37 : Creating mmseqs database 4 of 11
2019-06-10 14:33:40 : Creating mmseqs database 5 of 11
2019-06-10 14:33:42 : Creating mmseqs database 6 of 11
2019-06-10 14:33:44 : Creating mmseqs database 7 of 11
2019-06-10 14:33:47 : Creating mmseqs database 8 of 11
2019-06-10 14:33:49 : Creating mmseqs database 9 of 11
2019-06-10 14:33:51 : Creating mmseqs database 10 of 11
2019-06-10 14:33:54 : Creating mmseqs database 11 of 11

Running mmseqs all-versus-all
-----------------------------
Using 32 thread(s)
2019-06-10 14:33:56 : This may take some time....
2019-06-10 14:33:56 : Done 0 of 121
2019-06-10 14:39:21 : Done 10 of 121
2019-06-10 14:39:27 : Done 20 of 121
2019-06-10 14:39:40 : Done 30 of 121
2019-06-10 14:44:42 : Done 40 of 121
2019-06-10 14:45:00 : Done 50 of 121
2019-06-10 14:45:34 : Done 60 of 121
2019-06-10 14:49:57 : Done 70 of 121
2019-06-10 14:50:35 : Done 80 of 121
2019-06-10 14:51:25 : Done 90 of 121
2019-06-10 14:57:41 : Done all-versus-all sequence search

Running OrthoFinder algorithm
-----------------------------
2019-06-10 14:57:42 : Initial processing of each species

ERROR: Query or hit sequence ID in BLAST results file was missing or incorrectly formatted.
Malformatted line in /lv01/data/emms/NOBACKUP/Fish_shorter/OrthoFinder/Results_Jun10_1/WorkingDirectory/Blast4_0.txt
Offending line was:
4_7649_0	0_4529	0.271	1921	1322	0	10063	11983	1533	3346	3.71E-207	723
Process Process-38:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/lv01/home/emms//software/OrthoFinder-2.3.3_source/orthofinder/orthofinder.py", line 505, in Worker_ProcessBlastHits
    WaterfallMethod.ProcessBlastHits(*args, qDoubleBlast=qDoubleBlast)
  File "/lv01/home/emms//software/OrthoFinder-2.3.3_source/orthofinder/orthofinder.py", line 492, in ProcessBlastHits
    Bij = BlastFileProcessor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast)
  File "/lv01/home/emms/software/OrthoFinder-2.3.3_source/orthofinder/scripts/blast_file_processor.py", line 68, in GetBLAST6Scores
    sequence1ID = int(row[iQ].split(sep, 1)[1])
ValueError: invalid literal for int() with base 10: '7649_0'
ERROR: An error occurred, please review error messages for more information.

The text was updated successfully, but these errors were encountered:

davidemms · 2019-06-10T14:58:13Z

This is caused by MMseqs splitting longer sequences and renaming these split sequences resulting in them not being recognised by OrthoFinder. E.g if "gene25" gets split up then its parts get renamed "gene25_0", "gene25_1" etc. I will modify the OrthoFinder code so that it can it ignores these suffixes.

davidemms closed this as completed in 52c1770 Jun 10, 2019

davidemms mentioned this issue Jun 10, 2019

Cannot run diamond #268

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OrthoFinder fails with MMseqs #269

OrthoFinder fails with MMseqs #269

davidemms commented Jun 10, 2019

davidemms commented Jun 10, 2019

OrthoFinder fails with MMseqs #269

OrthoFinder fails with MMseqs #269

Comments

davidemms commented Jun 10, 2019

davidemms commented Jun 10, 2019