-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some clarification on xenologues and putative HGT #288
Comments
Hi Thanks for bringing this up, I'll add descriptions to the README file. Here's a brief summary, let me know if anything needs expanding upon: In my experience a lot of these genes arise not from horizontal gene transfer but from contamination when sequencing the genome of a species. Thus, you may get a bacterial gene incorrectly included in a metazoan proteome. This gene will appear out of place within the gene tree and, if the species also has it's own copy of the gene, can make it look like a gene duplication has taken place, implying that a clade of genes are paralogs rather than orthologs. For this reason OrthoFinder needs to identify these genes so that they don't interfere with ortholog assignment. For your questions
No, the Putative_Horizontal_Gene_Transfer.txt file lists the genes that appear not to come from E4U35 (contamination) or to have been a horizontal transfer into that species. OrthoFinder doesn't try and say where they came from (although see my comments about the gene tree below)
For the genes that have been identified as 'putative horizontal transfers', their apparent orthology relationship, as seen in the tree, are placed in the tab-delimited file rather than the orthologs tab-delimited file where they otherwise would have been listed. However, to save creating N2 new files, all the putative xenologs for the gene from all the other species are listed in a single file.
Yes, this means that if a pair of genes are listed as putative xenologs in the the tab-delimited file then in general only one of them will be a potential horizontal transfer whereas the other one will just be a gene that appears to be a xenolog of it in the tree.
Yes, the species X file is the set of genes in the input proteome (FASTA) file for species X that appear to have originated elsewhere. Gene treesTo see this in action, have a look at the OG0001588 gene tree. You should see that E4U35_001896 is suspiciously placed and appears to be most closely related to genes E4U32_004495, Scere_5090 and Uvire_1881. If Ecoen_23832 is listed in your E4U35 Putative_Horizontal_Gene_Transfer.txt file then it must have been in your E4U35.fa, unless something has gone horribly wrong. An OrthoFinder is saying that it doesn't look like it originated from E4U35 lineage, based on it's position in the tree. If you decide that it is ok then all you need to do is treat all putative xenolog relationships for that gene as orthologs. All the best |
David, Thank you, this clears things up. I did take a look in my E4U35.fa and I do not have Ecoen_23832 (or any of the other none E4U35 genes; i.e. E4U10_001773, E4U61_005231, E4U22_008391) in the E4U35.fa file. I also looked in the SequenceIDs.txt and could not find any of these gene names associated with the given '104' ID for this species. Attached are my input fasta, the Species104.fa, SequenceIDs.txt, and SpeciesIDs.txt. Note that I do have some #'s in the SpeciesIDs.txt as I made some mistakes and needed to remove some fasta files from the analysis and replace them with the correct versions. So if you are saying that I shouldn't be seeing Ecoen_23832 or E4U10_001773, E4U61_005231, E4U22_008391 in my Putative_Horizontal_Gene_Transfer.txt file for species EU435 then I am not sure what is happening. I am actually re-running the ortholog step with the newest version as STRIDE didn't pick the correct outgroup so I had to manually re-root the species tree and re-run using -ft and -s. LM60.fa.gz |
Also, I just looked at the OG0001588 gene tree and saw this. I do see that E4U35_001896 (red line) is somewhat closely placed near E4U32_004495 (blue line), Scere_5090 and Uvire_1881. However, I see it closer together with other genes. When I look into the Putative_xenologues files for example LM207 I do not find E4U10_001774 in that file to also be classified as a Xenologue with E4U32_004495 (blue line), Scere_5090 and Uvire_1881. Also E4U10_001774 is not classified as a Xenologue of E4U35_001896. Is this common? This also might be a problem as this was run with an incorrectly rooted species tree as my initial species tree did not root Saccahromyces and Ustilago as the outgroups. So this may be cleared up in my re-run. EDIT: I have attached that specific gene tree file |
Update: I re-ran my gene reconcilation with the latest version of Orthofinder, and in my Phylogenetically_Misplaced_Genes/LM60.txt (which corresponds to the E4U35 locus prefix) I have these results:
I doubled checked my input fasta files again and don't find any of these non-E4U35 genes within my LM60.fa. However, with the correct species tree E4U35_001896 is no longer a putative xenologue to E4U32_004495, Scere_5090, and Uvire_1881. In fact it is now showing as an ortholog to LM207's E4U10_001774. Attached is the new gene tree for OG0001588 from this latest re-run with a manual re-root. |
Hi There was an error in writing out these genes to the Phylogenetically_Misplaced_Genes files which I've submitted a fix for, thanks for chasing up on this! It resulted in not all genes being listed in these files and genes being written to the wrong species file. If you try out the code from the master branch then it should write them all out correctly. All the best |
Hello,
My run from before was able to finish but I just wanted to ask for some clarifications of some of the output files as I don't see anything in the README that specifically explains them.
Within the Orthologues/Putative_Xenologues directory we have .csv files for each species which is tab-delimiated to have Orthogroup \t Species \t Other such as:
If you have time would you be able to give a brief summary of what is occurring here. I understand that Xenologues potentially arise from horizontal gene transfers or lineage fusions. In this tab delimited file how is that being described? Is it something like gene E4U35_001896 was potentially transferred to the species in the "Other" column?
Secondly, within each species directory inside Orthologues/ there is a Putative_Horizontal_Gene_Transfer.txt which simple lists genes that I assume are potentially HGT's. Is this list, genes that were potentially transferred from another species to this species or vice versa? For example:
within my Orthologues_E4U35 my .txt file shows this.
Is my assumption correct to say that Ecoen_23832 was a gene that was transferred from Ecoen to E4U35? Also, why do I have E4U35 genes within E4U35 listed as potential horizontally transferred genes?
The text was updated successfully, but these errors were encountered: