Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing or interrupted b-allele tracks for Gens #2027

Closed
dnil opened this issue Apr 13, 2023 · 10 comments
Closed

Missing or interrupted b-allele tracks for Gens #2027

dnil opened this issue Apr 13, 2023 · 10 comments
Assignees
Labels

Comments

@dnil
Copy link

dnil commented Apr 13, 2023

Describe the bug
Occasionally B-allele plots in Gens are missing data, from part of chromosomes or several chromosomes.
It could be an issue with the internal processing in gens, but as this is running stably at another site, an interaction issue with the pipeline seems likely. Maybe some job crash reports are not going through? Perhaps something is a little different with the certain DeepVariant calls than with the GATK ones?

Screenshots

Screenshot 2023-04-13 at 11 29 59

Screenshot 2023-04-13 at 11 49 33

If applicable, add screenshots to help explain your problem.

Software version (please complete the following information):

  • MIP: 11.1.3

Additional context
See eg cases fastemu, ingrouse, finerlemur, rightbee, alerthermit (missing all) or e.g. neatsheep track stops mid chr...

@dnil dnil added the Bug label Apr 13, 2023
@jemten
Copy link
Collaborator

jemten commented Apr 13, 2023

Hmm the generate_gens_data.pl spews out a ton of warnings, without failing. Will have to look into what's actually happening. I think we run the standard script from Lund, maybe @raysloks can confirm? Do you have a case that actually succeeded?

@dnil
Copy link
Author

dnil commented Apr 13, 2023

Hmm the generate_gens_data.pl spews out a ton of warnings, without failing. Will have to look into what's actually happening. I think we run the standard script from Lund, maybe @raysloks can confirm? Do you have a case that actually succeeded?

Excellent. Most actually do work; either simply click a couple from scout or I can help with that a little later!

@dnil
Copy link
Author

dnil commented Apr 13, 2023

Eg heroicracer looks fine, or for a trio novelpanda (ignore chr15 - I think that has biological reasons 😁): the index is good, but the parents are very slow to load. Also e.g. poeticmullet is very slow to load - missing index files perhaps?

But I will say the likelihood of coming across a partly broken one seems rather high.

@raysloks
Copy link
Contributor

raysloks commented Apr 14, 2023

I think we run the standard script from Lund, maybe @raysloks can confirm?

Sort of, that script is the same as the one they're using in Lund. However, it invokes gvcfvaf.pl which I've modified to support INFO-subfields appearing in any order.

Edit: Format/Sample-subfields, not Info-subfields

@dnil
Copy link
Author

dnil commented Apr 14, 2023

We can confirm missing .baf.bed.gz.tbi for the parents in novelpanda.
Screenshot 2023-04-14 at 09 36 15

@raysloks
Copy link
Contributor

I've looked at a couple of error logs from gens_generatedata.
This one results in a missing index file:

Calculating coverage data
Calculating BAFs from gvcf...
Argument "." isn't numeric in numeric ne (!=) at /bin/gvcfvaf.pl line 97, <GVCF> line 160.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 160.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 160.
Argument "." isn't numeric in numeric ne (!=) at /bin/gvcfvaf.pl line 97, <GVCF> line 231.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 231.
...
...
...
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 153308377.
Argument "." isn't numeric in numeric ne (!=) at /bin/gvcfvaf.pl line 97, <GVCF> line 153308381.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 153308381.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 153308381.
Argument "." isn't numeric in numeric ne (!=) at /bin/gvcfvaf.pl line 97, <GVCF> line 153308423.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 153308423.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 153308423.
22 variants skipped!
Outputting BAF o...
Outputting BAF a...
Outputting BAF b...
Outputting BAF c...
Outputting BAF d...
[E::hts_idx_push] Unsorted positions on sequence #10: 103801078 followed by 92729143
tbx_index_build failed: /home/proj/production/rare-disease/cases/amazedjay/analysis/ACC11462A1/gens_generatedata/ACC11462A1_lanes_1234_sorted_md.baf.bed.gz

This one results in an extremely tiny file:

Calculating coverage data
Calculating BAFs from gvcf...
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 165.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 165.
Argument "." isn't numeric in numeric ne (!=) at /bin/gvcfvaf.pl line 97, <GVCF> line 226.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 226.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 226.
...
...
...
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 139635282.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 139635286.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 139635286.
Argument "." isn't numeric in numeric ne (!=) at /bin/gvcfvaf.pl line 97, <GVCF> line 139635321.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 94, <GVCF> line 139635321.
Use of uninitialized value within @fmt in string eq at /bin/gvcfvaf.pl line 101, <GVCF> line 139635321.
21 variants skipped!
Outputting BAF o...
readline() on closed filehandle $fh at /bin/generate_gens_data.pl line 60.
Outputting BAF a...
readline() on closed filehandle $fh at /bin/generate_gens_data.pl line 60.
Outputting BAF b...
readline() on closed filehandle $fh at /bin/generate_gens_data.pl line 60.
Outputting BAF c...
readline() on closed filehandle $fh at /bin/generate_gens_data.pl line 60.
Outputting BAF d...
readline() on closed filehandle $fh at /bin/generate_gens_data.pl line 60.

@jemten
Copy link
Collaborator

jemten commented Apr 21, 2023

Are you looking into this error @raysloks? Otherwise I'll have a look at solving the issue with the perl script

@raysloks
Copy link
Contributor

Yes, I started looking into it more closely yesterday.
I have found the reason for the warnings.

One of them was present in the original script from Lund (it can't handle '.' as a value in the GT-subfield).

The other was introduced when I made it able to handle arbitrarily ordered Format/Sample-subfields. They were looping past the end of the @fmt-array, which originally didn't matter because they exited the loop early (when encountering the DP-subfield). Due to my limited experience with Perl, I didn't notice this when copying the for loop.

However, I don't know if fixing those two issues will fix the output.

@dnil
Copy link
Author

dnil commented Apr 21, 2023

Good, it's a start. You will most likely also need to fix the sort order issue tabix runs into at https://github.com/Clinical-Genomics/gens/blob/ef1c512d9246c7ee9022845d9e2db611cf1c96db/utils/generate_gens_data.pl#L46
I guess that is the source of this kind of error:

[E::hts_idx_push] Unsorted positions on sequence #10: 103801078 followed by 92729143
tbx_index_build failed: /home/proj/production/rare-disease/cases/amazedjay/analysis/ACC11462A1/gens_generatedata/ACC11462A1_lanes_1234_sorted_md.baf.bed.gz

@raysloks raysloks self-assigned this Apr 24, 2023
@jemten
Copy link
Collaborator

jemten commented May 9, 2023

Hopefully solved with #2039

@jemten jemten closed this as completed May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants