Recipe Advice for Annotating Unknown Genes from RNAseq Analysis Using Orthologs from Related Species #538

MauriAndresMU1313 · 2024-10-02T16:38:35Z

Hi everyone, this is not an issue, but I’m looking for some advice on following the recipes you provide in the "A few recipes" section of v2.1.5 to v2.1.12.

A little context: I ran an RNAseq analysis, and my output is the count-genes.tsv file. Using reference genomes from RefSeq, the annotation of most of the genes in these files is generally fine; most genes were mapped to their corresponding gene name. However, I have some unknown genes with no associated gene symbol, like LOCXXXXXXX (where X is any number).

I plan to find the corresponding orthologs for those genes using related species to increase the number of annotated genes. With this in mind, I ran Orthofinder with related species (mammalian species). In short, the output is orthogroup fasta files that contain orthologous proteins in each file. These files have protein IDs in the format NP_XXXXXXXX or XP_XXXXXXXX. So now, the plan is to use Eggnog-mapper to identify the functional annotations related to these proteins in each orthogroup.

Here’s where I’m a little confused about the next step: I will get the annotations, but I’m wondering how I can track the functional annotation to their respective genes and determine if it is an LOCXXXXXXX-type gene. For example, in the "A few recipes" section, you have options like:

Run search and annotation, using MMseqs after translating input CDS to proteins. Add the search and annotation results to the attributes of an existing GFF file (GFF decoration), using the GeneID field to link features from the GFF to the annotation results. (This seems the most appropriate to me because I can download GFF files from RefSeq-genomes.)
Run gene prediction using a genome to train Prodigal
Repeat the annotation step, using specific taxa as target and reporting the one-to-one orthologs found (This seems like another option, but I’m concerned that this depends on the number of species in the phylogeny since I don’t have too many.)

Do you think these ideas are realistic? Even if I get the functional annotation of the orthologs, I may need to trace them back to their respective positions on the chromosome and check if the gene symbol is unknown. Then, maybe I can use a parameter to confidently replace the gene symbol with its respective ortholog.
In general, I’m looking for guidance on using eggnog-mapper for the potential workflow I have in mind. I’m posting here because some papers have used eggnog-mapper to map to their respective orthologs.

Any comment, suggestion or idea is more that welcome!

The text was updated successfully, but these errors were encountered:

MauriAndresMU1313 closed this as completed Oct 11, 2024

MauriAndresMU1313 reopened this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recipe Advice for Annotating Unknown Genes from RNAseq Analysis Using Orthologs from Related Species #538

Recipe Advice for Annotating Unknown Genes from RNAseq Analysis Using Orthologs from Related Species #538

MauriAndresMU1313 commented Oct 2, 2024

Recipe Advice for Annotating Unknown Genes from RNAseq Analysis Using Orthologs from Related Species #538

Recipe Advice for Annotating Unknown Genes from RNAseq Analysis Using Orthologs from Related Species #538

Comments

MauriAndresMU1313 commented Oct 2, 2024