Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand cattle-outbreak beyond B3.13 #139

Open
jameshadfield opened this issue Feb 24, 2025 · 0 comments
Open

Expand cattle-outbreak beyond B3.13 #139

jameshadfield opened this issue Feb 24, 2025 · 0 comments

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Feb 24, 2025

With the recent move to B3.13 filtering defining the cattle-outbreak genome build we are not able to include strains with fewer than 8 sequenced segments (and thus the implementation in #111 is outdated). Furthermore we're going to drop some strains because their genoFLU calls aren't B3.13. Comparing the last successful cattle-flu dataset we are going to drop the following strains due to not being B3.13:

$ cat data/ncbi/metadata.tsv | csvtk grep -t -f strain -P auspice.strains.tsv | csvtk cut -t -f strain,genoflu | grep -v B3.13 | csvtk pretty -t
strain                                  genoflu                                                                           
-------------------------------------   ----------------------------------------------------------------------------------
A/cattle/Texas/24-009499-002/2024       Not assigned: No Matching Genotypes                                               
A/cattle/Texas/24-009308-003/2024       Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/NewMexico/24-010195-004/2024   Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_MD_041/2024     Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_ME_003/2024     Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Idaho/Broad_ME_018/2024        Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Idaho/Broad_ME_020/2024        Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_MF_011/2024     Not assigned: Only 5 segments >98.0% match found of total 8 segments in input file
A/cattle/Missouri/Broad_MD_031/2024     Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/Texas/Broad_MD_027/2024        Not assigned: Only 5 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_MF_016/2024     Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Michigan/Broad_ME_010/2024     Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-031346-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-032636-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-034010-002/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-034010-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-033997-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/PETFOOD/USA/24-037325-013/2024        Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/PETFOOD/USA/24-037325-012/2024        Not assigned: Only 5 segments >98.0% match found of total 8 segments in input file

Including full genome strains which don't match B3.13

We may wish to relax the 98% cutoff. Looking at some of those examples above the number of Ns is perhaps behind their exclusion:

  • A/cattle/Texas/24-009499-002/2024 has 4.5kb of Ns on the branch leading to it, although few mutations indicating that it is likely to be part of the outbreak
  • A/cattle/Texas/24-009308-003/2024 - similarly - 4.5kb of Ns

Including strains with fewer than 8 segments sequenced

If we modify GenoFLU to report segment-level calls for strains with <8 segments then we can match on (e.g) "7 segments sequenced and all agree with B3.13 constellation". This improvement to GenoFLU was also mentioned here as being desirable more generally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant