Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collapse_annotation.py cannot process the gtf file generated by gffread #71

Open
biozzq opened this issue Jun 27, 2022 · 2 comments
Open

Comments

@biozzq
Copy link

biozzq commented Jun 27, 2022

Dear all,

To prepare the gtf file used in rnaseqc, I first converted the gff file to gtf file using following command,

gffread-0.12.7.Linux_x86_64/gffread -T -o out.gtf input.gff, however it give me error when running collapse_annotation.py out.gtf collapse.gtf

Traceback (most recent call last):
  File "collapse_annotation.py", line 294, in <module>
    annotation = Annotation(args.transcript_gtf)
  File "collapse_annotation.py", line 89, in __init__
    attributes.pop('transcript_type'), g, start_pos, end_pos)
KeyError: 'transcript_type'

Based on above error message, I added gene_biotype and transcript_type information to the end of each line.
perl -e 'while(<>){chomp; print $_," gene_biotype \"protein_coding\"; transcript_biotype \"protein_coding\";\n"}' out.gtf >processed.gtf

Finally, when running collapse_annotation.py processed.gtf collapse.gtf, another error occured.

Traceback (most recent call last):
  File "collapse_annotation.py", line 294, in <module>
    annotation = Annotation(args.transcript_gtf)
  File "collapse_annotation.py", line 89, in __init__
    attributes.pop('transcript_type'), g, start_pos, end_pos)
UnboundLocalError: local variable 'g' referenced before assignment

I attached the processed.gtf here. How should this be handled?
processed.zip

Thank you in advance.
Best wishes,
Zheng zhuqing

@francois-a
Copy link
Collaborator

RNA-SeQC requires GTF in the format specified at https://www.gencodegenes.org/pages/data_format.html, with a gene > transcript > exon hierarchy in the feature type column (additional features like CDS etc are also supported). Your GTF is missing gene features, it only has transcripts and exonic features.

@KristinaGagalova
Copy link

Is there a tool to convert a gtf to the required format? I am also having issues with that.
Thank you in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants