Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in species tree / GetOrthologues #293

Closed
LovellHAGSC opened this issue Oct 17, 2019 · 1 comment
Closed

Error in species tree / GetOrthologues #293

LovellHAGSC opened this issue Oct 17, 2019 · 1 comment

Comments

@LovellHAGSC
Copy link

Hi David,
Thanks for your work on orthofinder.
I have run a few different datasets and have sometimes come across an error when running the full orthofinder program. The orthofinder call, progress and error is pasted below.

Basically, for a couple orthogroups (e.g. OG0011624)
we get:
looks to be complete: Distances/OG0011624.phy_fastme_stat.txt
looks to be complete: Distances/OG0011624.phy
file does not exist: Distances_SpeciesTree/OG0011624_tree_id.txt.dist.phylip_fastme_stat.txt
file does not exist: Distances_SpeciesTree/OG0011624__tree_id.txt.dist.phylip
file exists but is empty: OG0011624_tree_id.txt
file does not exist: OG0011624_tree_id.txt.tre

Any help is much appreciated.
John

$ orthofinder -fg OrthoFinder/Results_Oct17 -a 6

OrthoFinder version 2.3.3 Copyright (C) 2014 David Emms

2019-10-17 10:04:01 : Starting OrthoFinder
8 thread(s) for highly parallel tasks (BLAST searches etc.)
6 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Analysing Orthogroups

Calculating gene distances

2019-10-17 10:04:44 : Done

Inferring gene and species trees

2019-10-17 10:04:45 : Done 0 of 21061
2019-10-17 10:04:48 : Done 1000 of 21061
2019-10-17 10:04:51 : Done 2000 of 21061
2019-10-17 10:04:54 : Done 3000 of 21061
2019-10-17 10:04:57 : Done 4000 of 21061
2019-10-17 10:05:00 : Done 5000 of 21061
2019-10-17 10:05:03 : Done 6000 of 21061
2019-10-17 10:05:06 : Done 7000 of 21061
2019-10-17 10:05:09 : Done 8000 of 21061
2019-10-17 10:05:12 : Done 9000 of 21061
2019-10-17 10:05:15 : Done 10000 of 21061
2019-10-17 10:05:18 : Done 11000 of 21061
2019-10-17 10:05:21 : Done 12000 of 21061
2019-10-17 10:05:24 : Done 13000 of 21061
2019-10-17 10:05:27 : Done 14000 of 21061
2019-10-17 10:05:30 : Done 15000 of 21061
2019-10-17 10:05:33 : Done 16000 of 21061
2019-10-17 10:05:36 : Done 17000 of 21061
2019-10-17 10:05:39 : Done 18000 of 21061
2019-10-17 10:05:42 : Done 19000 of 21061
2019-10-17 10:05:45 : Done 20000 of 21061
2019-10-17 10:05:48 : Done 21000 of 21061

OG0011624_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored
OG0012756_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored
OG0004302_tree_id.txt - WARNING: ETE could not interpret tree file, it will be ignored
13731 trees had all species present and will be used by STAG to infer the species tree

Traceback (most recent call last):
File "/anaconda3/envs/genespace/bin/orthofinder", line 1682, in
GetOrthologues(speciesInfoObj, options, program_caller)
File "/anaconda3/envs/genespace/bin/orthofinder", line 1482, in GetOrthologues
options.name)
File "/anaconda3/envs/genespace/bin/scripts/orthologues.py", line 932, in OrthologuesWorkflow
spTreeFN_ids, qSTAG = db.RunAnalysis(userSpeciesTree == None)
File "/anaconda3/envs/genespace/bin/scripts/orthologues.py", line 464, in RunAnalysis
stag.Run_ForOrthoFinder(files.FileHandler.GetOGsTreeDir(), files.FileHandler.GetWorkingDirectory_Write(), self.ogSet.seqsInfo.speciesToUse, spTreeFN_ids)
File "/anaconda3/envs/genespace/bin/scripts/stag.py", line 252, in Run_ForOrthoFinder
InferSpeciesTree(dir_trees_out, gene_to_species.species, speciesTreeIds_FN_out)
File "/anaconda3/envs/genespace/bin/scripts/stag.py", line 238, in InferSpeciesTree
t = cons.ConsensusTree(tree_dir)
File "/anaconda3/envs/genespace/bin/scripts/consensus_tree.py", line 249, in ConsensusTree
splits_lengths, taxa_index, taxa_ordered, nTrees = GetAllSplits(trees_dir)
File "/anaconda3/envs/genespace/bin/scripts/consensus_tree.py", line 159, in GetAllSplits
t = tree.Tree(treeFN)
File "/anaconda3/envs/genespace/bin/scripts/tree.py", line 217, in init
read_newick(newick, root_node = self, format=format)
File "/anaconda3/envs/genespace/bin/scripts/newick.py", line 212, in read_newick
'Unexisting tree file or Malformed newick tree structure.'
scripts.newick.NewickError: Unexisting tree file or Malformed newick tree structure.

@davidemms
Copy link
Owner

Hi John

Thanks for letting me know about this. I guess there are two things here 1. why is there a problem with those trees 2. stopping OrthoFinder from failing if some trees have a problem. The second one is the more straight-forward. I've submitted a fix for this which is available on the master branch, and I will probably be creating a new release today or Monday that will include this too. I think with this fix everything should be fine for your dataset. There were 13731 orthogroups being used to infer the species tree, so if a few of these have to be skipped that's not going to be a problem.

As to why the trees failed in the first place, I wonder if some infinities or very small numbers made it into the distance matrix, or something else caused a problem for FastME. Would you be able to send me the Distances/OG0011624.phy_fastme_stat.txt and Distances/OG0011624.phy so I can take a look?

Many thanks
David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants