Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some clarification on xenologues and putative HGT #288

Closed
PlantDr430 opened this issue Oct 2, 2019 · 5 comments
Closed

Some clarification on xenologues and putative HGT #288

PlantDr430 opened this issue Oct 2, 2019 · 5 comments

Comments

@PlantDr430
Copy link

Hello,

My run from before was able to finish but I just wanted to ask for some clarifications of some of the output files as I don't see anything in the README that specifically explains them.

Within the Orthologues/Putative_Xenologues directory we have .csv files for each species which is tab-delimiated to have Orthogroup \t Species \t Other such as:

OG0001588	E4U35_001897	E4U10_001773
OG0001588	E4U35_001896	E4U32_004495
OG0001588	E4U35_001896	Scere_5090
OG0001588	E4U35_001896	Uvire_1881
OG0001588	E4U35_001896	Umayd_2380, Umayd_624

If you have time would you be able to give a brief summary of what is occurring here. I understand that Xenologues potentially arise from horizontal gene transfers or lineage fusions. In this tab delimited file how is that being described? Is it something like gene E4U35_001896 was potentially transferred to the species in the "Other" column?

Secondly, within each species directory inside Orthologues/ there is a Putative_Horizontal_Gene_Transfer.txt which simple lists genes that I assume are potentially HGT's. Is this list, genes that were potentially transferred from another species to this species or vice versa? For example:

within my Orthologues_E4U35 my .txt file shows this.

E4U35_000848
E4U49_000404
Ecoen_23832
E4U35_005859
E4U35_005965
E4U35_007277
E4U35_006719
E4U35_001229
E4U35_008393
E4U10_001773
E4U61_005231
E4U22_008391
E4U35_000478

Is my assumption correct to say that Ecoen_23832 was a gene that was transferred from Ecoen to E4U35? Also, why do I have E4U35 genes within E4U35 listed as potential horizontally transferred genes?

@davidemms
Copy link
Owner

Hi

Thanks for bringing this up, I'll add descriptions to the README file. Here's a brief summary, let me know if anything needs expanding upon:

In my experience a lot of these genes arise not from horizontal gene transfer but from contamination when sequencing the genome of a species. Thus, you may get a bacterial gene incorrectly included in a metazoan proteome. This gene will appear out of place within the gene tree and, if the species also has it's own copy of the gene, can make it look like a gene duplication has taken place, implying that a clade of genes are paralogs rather than orthologs. For this reason OrthoFinder needs to identify these genes so that they don't interfere with ortholog assignment.

For your questions

Is it something like gene E4U35_001896 was potentially transferred to the species in the "Other" column?

No, the Putative_Horizontal_Gene_Transfer.txt file lists the genes that appear not to come from E4U35 (contamination) or to have been a horizontal transfer into that species. OrthoFinder doesn't try and say where they came from (although see my comments about the gene tree below)

In this tab delimited file what is being described?

For the genes that have been identified as 'putative horizontal transfers', their apparent orthology relationship, as seen in the tree, are placed in the tab-delimited file rather than the orthologs tab-delimited file where they otherwise would have been listed. However, to save creating N2 new files, all the putative xenologs for the gene from all the other species are listed in a single file.

Ed. Anything else to add

Yes, this means that if a pair of genes are listed as putative xenologs in the the tab-delimited file then in general only one of them will be a potential horizontal transfer whereas the other one will just be a gene that appears to be a xenolog of it in the tree.

Is this list, genes that were potentially transferred from another species to this species

Yes, the species X file is the set of genes in the input proteome (FASTA) file for species X that appear to have originated elsewhere.

Gene trees

To see this in action, have a look at the OG0001588 gene tree. You should see that E4U35_001896 is suspiciously placed and appears to be most closely related to genes E4U32_004495, Scere_5090 and Uvire_1881.

If Ecoen_23832 is listed in your E4U35 Putative_Horizontal_Gene_Transfer.txt file then it must have been in your E4U35.fa, unless something has gone horribly wrong. An OrthoFinder is saying that it doesn't look like it originated from E4U35 lineage, based on it's position in the tree. If you decide that it is ok then all you need to do is treat all putative xenolog relationships for that gene as orthologs.

All the best
David

@PlantDr430
Copy link
Author

PlantDr430 commented Oct 3, 2019

David,

Thank you, this clears things up.

I did take a look in my E4U35.fa and I do not have Ecoen_23832 (or any of the other none E4U35 genes; i.e. E4U10_001773, E4U61_005231, E4U22_008391) in the E4U35.fa file.

I also looked in the SequenceIDs.txt and could not find any of these gene names associated with the given '104' ID for this species. Attached are my input fasta, the Species104.fa, SequenceIDs.txt, and SpeciesIDs.txt. Note that I do have some #'s in the SpeciesIDs.txt as I made some mistakes and needed to remove some fasta files from the analysis and replace them with the correct versions.

So if you are saying that I shouldn't be seeing Ecoen_23832 or E4U10_001773, E4U61_005231, E4U22_008391 in my Putative_Horizontal_Gene_Transfer.txt file for species EU435 then I am not sure what is happening. I am actually re-running the ortholog step with the newest version as STRIDE didn't pick the correct outgroup so I had to manually re-root the species tree and re-run using -ft and -s.

LM60.fa.gz
SequenceIDs.txt.gz
Species104.fa.gz
SpeciesIDs.txt

@PlantDr430
Copy link
Author

PlantDr430 commented Oct 3, 2019

Also,

I just looked at the OG0001588 gene tree and saw this. I do see that E4U35_001896 (red line) is somewhat closely placed near E4U32_004495 (blue line), Scere_5090 and Uvire_1881. However, I see it closer together with other genes. When I look into the Putative_xenologues files for example LM207 I do not find E4U10_001774 in that file to also be classified as a Xenologue with E4U32_004495 (blue line), Scere_5090 and Uvire_1881. Also E4U10_001774 is not classified as a Xenologue of E4U35_001896. Is this common? This also might be a problem as this was run with an incorrectly rooted species tree as my initial species tree did not root Saccahromyces and Ustilago as the outgroups. So this may be cleared up in my re-run.

Gene_tree_OG0001588

EDIT: I have attached that specific gene tree file
non_recon_OG0001588_tree.txt
recon_OG0001588_tree.txt

@PlantDr430
Copy link
Author

Update:

I re-ran my gene reconcilation with the latest version of Orthofinder, and in my Phylogenetically_Misplaced_Genes/LM60.txt (which corresponds to the E4U35 locus prefix) I have these results:

E4U35_000848
E4U49_000404
Ecoen_23832
E4U35_005859
E4U35_005965
E4U35_007277
E4U35_001229
E4U35_008393
E4U61_005231
E4U22_008391
E4U35_000478

I doubled checked my input fasta files again and don't find any of these non-E4U35 genes within my LM60.fa.

However, with the correct species tree E4U35_001896 is no longer a putative xenologue to E4U32_004495, Scere_5090, and Uvire_1881. In fact it is now showing as an ortholog to LM207's E4U10_001774.

Attached is the new gene tree for OG0001588 from this latest re-run with a manual re-root.
OG0001588_tree.txt

@davidemms
Copy link
Owner

Hi

There was an error in writing out these genes to the Phylogenetically_Misplaced_Genes files which I've submitted a fix for, thanks for chasing up on this! It resulted in not all genes being listed in these files and genes being written to the wrong species file. If you try out the code from the master branch then it should write them all out correctly.

All the best
David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants