Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest extra sources #35

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,18 +60,25 @@ Note that you may need to remove any existing data in `results/` in order for sn

#### Using locally ingested data (instead of downloading from S3)

Run the pipeline with `--config 'local_ingest=True'` to use the locally available files produced by the ingest pipeline (see `./ingest/README.md` for details on how to run).
Specifically, the files needed are `ingest/results/metadata.tsv` and `ingest/results/sequences_{SEGMENT}.fasta`.
Run the pipeline with `--config 'local_ingest=fauna'` to use the locally available files produced by the ingest pipeline (see `./ingest/README.md` for details on how to run).
Specifically, the files needed are `ingest/results/fauna/metadata.tsv` and `ingest/results/fauna/sequences_{SEGMENT}.fasta`.
Replace "fauna" with "genome" (or any other namespace which ingest can produce) as needed.


#### Running full genome builds

Run full genome builds with the following command.
Run full genome builds using the data on S3 (fauna) with the following command:

``` bash
```bash
nextstrain build . --snakefile Snakefile.genome
```

To include non-fauna data first run the "Ingest for whole genome builds" (see `ingest/README.md`) then run:
```bash
nextstrain build . --snakefile Snakefile.genome --config 'local_ingest=genome'
```


Currently this is only set up for the "h5n1-cattle-outbreak" build, and restricts the build to a set of strains where we think there's no reassortment (`config/include_strains_h5n1-cattle-outbreak.txt`). Output files will be placed in `results/h5n1-cattle-outbreak/genome`. See `Snakefile.genome` for more details.


Expand Down
3 changes: 3 additions & 0 deletions config/include_strains_h5n1-cattle-outbreak.txt
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,9 @@ A/raccoon/NewMexico/24009496002/2024

A/Texas/37/2024

# Following strains are sourced from ingest/source-data
A/environment/USA/CO-UW-9084466/2024

# Dropping these strains from include due to excess private mutations
# A/dairycattle/NorthCarolina/24010327002/2024
# A/dairycattle/Texas/24009495007/2024
Expand Down
17 changes: 16 additions & 1 deletion ingest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ nextstrain build \
.
```

This command produces one metadata file, `results/metadata.tsv`, and one sequences file per gene segment like `results/sequences_ha.fasta`.
This command produces one metadata file, `results/fauna/metadata.tsv`, and one sequences file per gene segment like `results/fauna/sequences_ha.fasta`.
Each file represents all available subtypes.

Add the `upload_all` target to the command above to run the complete ingest pipeline _and_ upload results to AWS S3.
Expand Down Expand Up @@ -56,6 +56,21 @@ nextstrain build . merge_andersen_segment_metadata

The results will be available in `results/andersen-lab/`.


### Ingest for whole genome builds

> This section is in flux

To produce ingest files specifically tailored for the whole genome H5N1 cattle outbreak whole genome build, which combines fauna data
with extra data in `source-data` run the following (you may need rethink credentials as above):

```sh
nextstrain build . all_genome
```

The results will be available in `results/genome/`.


## Configuration

### Environment Variables
Expand Down
17 changes: 13 additions & 4 deletions ingest/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,23 @@ wildcard_constraints:
segment = "|".join(config["segments"])

rule all:
# As of 2024-05-16 the default ingest only ingests data from fauna
input:
sequences=expand("results/metadata_{segment}.tsv", segment=config["segments"]),
metadata="results/metadata.tsv",
sequences=expand("results/fauna/sequences_{segment}.fasta", segment=config["segments"]),
metadata="results/fauna/metadata.tsv",

rule upload_all:
# As of 2024-05-16 the default upload only uploads data from fauna
input:
sequences=expand("s3/sequences_{segment}.done", segment=config["segments"]),
metadata="s3/metadata.done",
sequences=expand("s3/fauna/sequences_{segment}.done", segment=config["segments"]),
metadata="s3/fauna/metadata.done",

rule all_genome:
input:
sequences=expand("results/genome/sequences_{segment}.fasta", segment=config["segments"]),
metadata="results/genome/metadata.tsv",

include: "rules/upload_from_fauna.smk"
include: "rules/ingest_andersen_lab.smk"
include: "rules/ingest_genome_data.smk"

31 changes: 31 additions & 0 deletions ingest/rules/ingest_genome_data.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

## This ruleset is in flux and will change often
## Currently it merges hardcoded (committed) source-data with fauna-derived data
## No checking is done for duplicate strains

rule merge_genome_metadata:
input:
fauna = "results/fauna/metadata.tsv"
params:
source_data = "source-data/metadata.tsv"
output:
metadata = "results/genome/metadata.tsv"
shell:
"""
diff <(head -n 1 {params.source_data}) <(head -n 1 {input.fauna}) &&
cp {params.source_data} {output.metadata} && \
tail -n +2 {input.fauna} >> {output.metadata}
"""

rule merge_genome_sequences:
input:
fauna = "results/fauna/sequences_{segment}.fasta"
params:
source_data = "source-data/sequences_{segment}.fasta"
output:
metadata = "results/genome/sequences_{segment}.fasta"
shell:
"""
cat {params.source_data} {input.fauna} > {output.metadata}
"""

25 changes: 13 additions & 12 deletions ingest/rules/upload_from_fauna.smk
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
rule download_segment:
output:
sequences = "data/{segment}.fasta",
sequences = "data/fauna/{segment}.fasta",
params:
fasta_fields = "strain virus accession collection_date region country division location host domestic_status subtype originating_lab submitting_lab authors PMID gisaid_clade h5_clade",
output_dir = "data/fauna",
benchmark:
"benchmarks/download_segment_{segment}.txt"
shell:
Expand All @@ -12,16 +13,16 @@ rule download_segment:
--virus avian_flu \
--fasta_fields {params.fasta_fields} \
--select locus:{wildcards.segment} \
--path data \
--path {params.output_dir} \
--fstem {wildcards.segment}
"""

rule parse_segment:
input:
sequences = "data/{segment}.fasta",
sequences = "data/fauna/{segment}.fasta",
output:
sequences = "results/sequences_{segment}.fasta",
metadata = "results/metadata_{segment}.tsv",
sequences = "results/fauna/sequences_{segment}.fasta",
metadata = "results/fauna/metadata_{segment}.tsv",
params:
fasta_fields = "strain virus isolate_id date region country division location host domestic_status subtype originating_lab submitting_lab authors PMID gisaid_clade h5_clade",
prettify_fields = "region country division location host originating_lab submitting_lab authors PMID"
Expand All @@ -44,10 +45,10 @@ rule merge_segment_metadata:
for each segment, but that would be a nice improvement.
"""
input:
segments = expand("results/metadata_{segment}.tsv", segment=config["segments"]),
metadata = "results/metadata_ha.tsv",
segments = expand("results/fauna/metadata_{segment}.tsv", segment=config["segments"]),
metadata = "results/fauna/metadata_ha.tsv",
output:
metadata = "results/metadata.tsv",
metadata = "results/fauna/metadata.tsv",
shell:
"""
python scripts/add_segment_counts.py \
Expand All @@ -58,9 +59,9 @@ rule merge_segment_metadata:

rule upload_sequences:
input:
sequences="results/sequences_{segment}.fasta",
sequences="results/fauna/sequences_{segment}.fasta",
output:
flag=touch("s3/sequences_{segment}.done"),
flag=touch("s3/fauna/sequences_{segment}.done"),
params:
s3_dst=config["s3_dst"],
shell:
Expand All @@ -73,9 +74,9 @@ rule upload_sequences:

rule upload_metadata:
input:
metadata="results/metadata.tsv",
metadata="results/fauna/metadata.tsv",
output:
flag=touch("s3/metadata.done"),
flag=touch("s3/fauna/metadata.done"),
params:
s3_dst=config["s3_dst"],
shell:
Expand Down
2 changes: 1 addition & 1 deletion ingest/scripts/add_segment_counts.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def summary(strain_count):
row[column]=strain_count[row['strain']]

with open(args.output, 'w') as fh:
writer = csv.DictWriter(fh, fieldnames=fieldnames, delimiter='\t')
writer = csv.DictWriter(fh, fieldnames=fieldnames, delimiter='\t', lineterminator='\n')
writer.writeheader()
for row in rows:
writer.writerow(row)
Expand Down
2 changes: 2 additions & 0 deletions ingest/source-data/metadata.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
strain virus isolate_id date region country division location host domestic_status subtype originating_lab submitting_lab authors PMID gisaid_clade h5_clade n_segments
A/environment/USA/CO-UW-9084466/2024 avian_flu PP796043 2024-04-XX North America Usa Colorado Colorado Environment domestic h5n1 University of Washington University of Washington Roychoudhury,P., Han,P., Kong,K., Xie,H., Gamboa,L., Rodriguez-Salas,L., Ellis,S.E., Greninger,A., Bedford,T., Starita,L. and Chu,H. ? 2.3.4.4b ? 8
2 changes: 2 additions & 0 deletions ingest/source-data/sequences_ha.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>A/environment/USA/CO-UW-9084466/2024
GATCAGATTTGCATTGGTTACCATGCAAACAATTCGACAGAGCAAGTTGACACGATAATGGAAAAGAACGTCACTGTTACACATGCCCAAGACATACTGGAAAAAACACACAACGGGAAGCTATGCGACCTAAATGGGGTGAAGCCACTGATTTTAAAGGACTGCAGTGTAGCTGGATGGCTCCTCGGAAACCCAATGTGCGACGAATTCATCAGAGTGCCGGAATGGTCTTACATAGTGGAGCGGGCTAACCCAGCTAATGACCTCTGTTACCCAGGGAGCCTCAATGACTATGAAGAACTGAAACACATGTTGAGCAGAATAAATCATTTTGAGAAGATTCAGATCATTCCCAAGAGTTCCTGGCCAAATCATGAAACATCACTAGGGGTGAGCGCAGCTTGTCCATACCAGGGAGCACCCTCCTTTTTCAGAAATGTGGTGTGGCTTATCAAAAAGAACGATGCATACCCAACAATAAAGATAAGCTACAATAATACTAATCGGGAAGATCTCTTGATACTGTGGGGGATTCATCATTCCAACAATGCAGAAGAGCAGACAAATCTCTACAAAAACCCAATCACCTACATTTCAGTTGGAACATCAACTTTAAACCAGAGGKTGGCACCAAAAATAGCTACTAGATCCCAAGTAAACGGGCAACGTGGAAGAATGGACTTCTTCTGGACAATCTTAAAACCAGATGATGCAATCCATTTCGAGAGTAACGGAAATTTCATTGCTCCAGAATATGCATACAAAATTGTTAAGAAAGGGGACTCGACAATTATGAAAAGTGGAGTGGAATATGGCCATTGCAACACCAAATGTCAAACCCCAGTAGGTGCGATAAATTCTAGTATGCCATTTCACAACATACATCCTCTCACCATTGGGGAATGCCCCAAATACGTGAAATCAAACAAGTTGGTCCTTGCGACTGGGCTCAGAAATAGTCCTCTAAGAGAAAAGAGAAGAAAAAGAGGTCTGTTTGGGGCGATAGCAGGGTTTATAGAGGGAGGATGGCAGGGAATGGTTGATGGTTGGTATGGGTACCATCATAGCAATGAGCAGGGGAGTGGGTACGCTGCGGACAAAGAATCCACCCAAAAGGCAATAGATGGAGTTACCAATAAGGTCAACTCAATCATTGACAAAATGAACACTCAATTTGAGGCAGTTGGAAGGGAGTTTAATAACTTAGAAAGGAGGATAGAGAATTTGAACAAGAAAATGGAAGACGGATTCCTAGATGTCTGGACATATAATGCTGAACTTCTAGTTCTCATGGAAAACGAGAGGACTCTAGATTTCCATGATTCAAATGTCAAGAACCTTTACGACAAAGTCAGATTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGCTGTTTCGAATTCTATCACAAATGTGATAATGAATGTATGGAAAGTGTGAGAAATGGGACGTATGACTACCCTCAGTATTCAGAAGAAGCAAGATTAAAAAGAGAAGAAATAAGCGGAGTGAAATTAGAATCAGTAGGAACTTACCAGATACTGTCAATTTATTCAACAGCGGCAAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAGATTTATGAGCTCAGATTGTAGTTAAAAACACC
2 changes: 2 additions & 0 deletions ingest/source-data/sequences_mp.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>A/environment/USA/CO-UW-9084466/2024
GATATTGAAAGATGAGTCTTCTAACCGAGGTCGAAACGTACGTTCTCTCTATCGTCCCGTCGGGCCCCCTCAAAGCCGAGATCGCGCAGAGACTTGAAGATGTCTTTGCAGGGAAGAACACCGATCTTGAGGCTCTCATGGAATGGCTAAAGACAAGACCAATCCTGTCACCTCTGACTAAGGGGATTTTGGGATTTGTGTTCACGCTCACCGTGCCCAGTGAGCGAGGACTGCAGCGTAGACGCTTTGTCCAAAGTGCCCTAAGTGGAACTGGAGACCCAAACAACATGGACAGAGCAGTCAAGTTGTACAGGAAACTGAAGAGAGAGATAACATTCCATGGGGCTAAAGAAGTTGCACTCAGTTACTCAACCGGTGCACTTGCCAGTTGTATGGGTCTCATATACAACAGGATGGGGACGGTGACCGCAGAAGTGGCATTGGGCCTAGTGTGTGCCACCTGTGAACAGATTGCTGATTCACAGCATCGGTCTCACAGACAGATAGCTACCACCACCAACCCACTGATCAGACATGAAAACAGAATGGTTTTGGCCAGTACTACAGCTAAGGCTATGGAGCAGATGGCTGGATCGAGTGAGCAAGCAGTGGAAGCCATGGAGGTTGCTAGTCAGGCTAGGCAGATGGTGCAGGCGATGAGGACCATTGGAACTCATCCTAGCTCCAGTACCGGTCTGAGAGATGATCTCCTTGAAAATTTGCAGGCCTACCAAAAACGGATGGGAGTGCAACTGCAGCGATTCAAGTGATCCTCTCGTTATTGCCGCAAGTATCATTGGGATCTTGCACTTGATATTGTGGATTCTTGATCGCCTTTTCTTCAAATGCGTTTATCGTCGCCTTAAATACGGTTTGAAAGGAGGGCCTTCTACGGAAGGAGTACCTGAGTCCATGAGGGAAGAGTACCGGCAGGAACAGCAGAGTGCTGTGGATGTTGACAATGGTCATTTTGTCAACATAGAGCTGGAGTAGAAACAAGGTAGTTTTTTACT
2 changes: 2 additions & 0 deletions ingest/source-data/sequences_na.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>A/environment/USA/CO-UW-9084466/2024
GTATGGTAATTGGGATAGTCAGCTTGATGCTGCAAATTGGGAACATAATCTCAATATGGGTTAGCCATTCAATCCAAACAGGGAATCAATACCAGCCTGAACCATGCAATCAAAGCATCATTACCTATGAGAACAACACCTGGGTAAATCAGACGTATATCAACATCAGCAGTACCAATTTTCTTGCTGAGCAGGCTGTTACTTCGGTAACATTAGCGGGCAATTCATCTCTTTGCCCTATTAGTGGGTGGGCAATATACAGTAAGGACAACGGTATAAGAATTGGGTCTAAGGGGGATGTGTTTGTTATAAGAGAACCATTCATCTCATGCTCCCACTTGGAATGCAGAACCTTTTTCCTGACCCAGGGAGCTCTGCTGAATGACAAACATTCTAATGGGACAGTTAAGGATAGAAGCCCTTATAGAACTTTGATGAGTTGTCCCGTGGGTGAGGCTCCTTCCCCGTACAATTCAAGATTTGAGTCTGTTGCTTGGTCGGCAAGTGCTTGTCATGATGGCATCAGTTGGTTGACAATCGGTATTTCTGGTCCAGACAATGGAGCTGTGGCTGTATTGAAGTACAATGGCATAATAACGGATACTATCAAGAGTTGGAGAAACAACATTTTGAGAACTCAAGAATCTGAATGTGCTTGCGTAAATGGCTCCTGCTTCACCGTAATGACTGATGGACCAAGCAATGGGCAGGCCTCATATAAAATCTTCAAGATAGAGAAAGGGAAAGTTGTCAAATCAGTTGAAATGAATGCCCCTAATTACCACTACGAGGAATGCTCCTGTTATCCTGATGCGGGTGATATTATGTGTGTGTGCAGGGACAATTGGCATGGCTCGAACCGGCCGTGGGTATCTTTTAATCAAAATCTGGAGTATCAAATAGGATATATATGCAGTGGGATTTTCGGGGACAATCCCCGCCCCAATGATGGAACAGGCAGTTGCAGTCCAATGCCCTCTAATGGGGCATATGGGGTAAAAGGGTTTTCATTTAAGTACGGTAATGGGGTTTGGATCGGAAGAACAAAAAGCACTAGTTCCAGAAGCGGCTTTGAGATGATTTGGGATCCGAATGGGTGGACTGAGACGGACAGTAGTTTCTCAGTGAAGCAAGACATTGTAGAAATAACTGACTGGTCAGGATATAGTGGGAGTTTTGTCCAGCATCCAGAACTGACAGGATTAGATTGCATGAGGCCTTGTTTCTGGGTTGAGCTAATTAGAGGGAGGCCCAAAGAGAATACAATTTGGACTAGCGGGAGCAGCATATCCTTTTGTGGTGTAAATAGTGACACTGTGGGTTGGTCTTGGCCAGACGGTGCTGAGTTGCCATTCACCATTGACAAGTAG
2 changes: 2 additions & 0 deletions ingest/source-data/sequences_np.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>A/environment/USA/CO-UW-9084466/2024
CTGAGTGACATCCACATCATGGCGTCTCAAGGCACCAAACGATCCTATGAACAAATGGAAACTGGTGGGGAACGCCAGAATGCCACTGAAATCAGAGCATCTGTTGGAAGAATGGTTGGCGGAATCGGGAGATTCTACATACAGATGTGCACTGAGCTCAAACTCAGTGATCACGAAGGGAGGCTGATCCAAAACAGCATAACCATAGAAAGGATGGTTCTCTCGGCATTTGATGAAAGGAGGAACAAGTATCTGGAGGAACATCCCAGTGCTGGAAAGGATCCCAAGAAGACTGGAGGTCCAATCTACAGGAGGAGAGATGGCAAATGGATGAGAGAGTTGATCCTCTACGACAAAGAAGAGATCAGAAGAATTTGGCGTCAAGCTAATAATGGAGAGGATGCAACTGCTGGTCTCACTCACTTGATGATTTGGCATTCCAATCTGAATGATGCCACATACCAGAGAACAAGGGCACTTGTGCGTACTGGAATGGATCCTAGGATGTGCTCACTGATGCAAGGCTCAACCCTCCCTAGGAGATCCGGGGCTGCTGGAGCGGCAGTGAAAGGAGTTGGAACAATGGTGATGGAATTGATTCGAATGATCAAACGAGGAATCAATGATCGGAATTTCTGGAGAGGTGAAAACGGACGGAGAACCAGGATTGCCTACGAGAGAATGTGCAACATCCTCAAGGGAAAGTTCCAAACAGCAGCACAACGAGCAATGATGGACCAAGTGAGGGAAAGCCGGAATCCTGGGAATGCTGAAATTGAAGATCTCATCTTTCTCGCACGATCTGCTCTCATCCTGAGGGGATCAGTGGCTCATAAGTCCTGTCTGCCTGCTTGCGTGTATGGACTTGCTGTAGCCAGTGGATATGACTTTGAAAGAGAGGGATACTCTCTAGTCGGAATTGATCCTTTCCGTCTGCTCCAGAACAGTCAAGTTTTCAGTCTCATCAGACCGAATGAAAATCCAGCTCACAAAAGTCAGCTGGTATGGATGGCATGCCACTCTGCAGCATTTGAGGATCTGAGAGTGTCAAGCTTCATCAGAGGAACAAGAGTAGTCCCAAGAGGACAACTGTCCACCAGAGGAGTTCAGATTGCTTCAAATGAAAACATGGAGACAATGGATTCCAGTACTCTTGAACTGAGGAGCAGATACTGGGCTATAAGAACAAGAAGTGGAGGAAACACCAACCAACAGAGAGCATCTGCAGGACAAATCAGCGTACAGCCCACATTCTCTGTGCAGAGAAACCTCCCATTCGAGAGAGCAACCATCATGGCAGCATTTACGGGAAACACTGAAGGCAGAACTTCAGACATGAGAACTGAGATCATAAGGATGATGGAAAATGCCAGACCTGAAGATGTGTCTTTCCAGGGGCGGGGAGTCTTCGAGCTCTCGGACGAAAAGGCAACGAACCCGATCGTGCCTTCCTTTGACATGAACAATGAAGGATCTTATTTCTTCGGAGACAATGCAGAGGAGTATGACAATTAAAGAAAAATAC
2 changes: 2 additions & 0 deletions ingest/source-data/sequences_ns.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>A/environment/USA/CO-UW-9084466/2024
GGATTCCAACACTGTGTTAAGCTTTCAGGTAGACTGCTTTCTTTGGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTCCGCCGAGACCAGAAGTCTCTAAGAGGAAGAGGCAGCACTCTTGGTCTGGACATCGAGACGGCCACTCGTGCTGGGAAGCAGATAGTGGAGAGGATTCTGGAGGAAGAATCCGACGAGGCACTCAAAATGACTATTGCCTCTGTGCCTGCTCCACGCTACCTAACTGACATGACTCTTGAAGAGATGTCAAGAGACTGGTTCATGCTCATGCCCAAGCAAAAAGTGGCAGGCTCCCTCAGTATCAGAATGGACCAGGCGATTATGGATAAGAACATCATACTGAAGGCAAACTTCAGTGTGATCTTCAATCGGCTGGAGACACTAATACTACTCAGAGCTTTCACTGAAGAGGGAGCAATTGTCGGCGAAATTTCACCATTGCCTTCTCTTCCAGGACATACTGATGAGGATGTCAAAAATGCAATTGGGGTCCTCATCGGAGGACTTGAATGGAATGATAACACAGTTCGAGTCTCTGAAACTTTACAGAGATTCGCTTGGAGAAGCAGTAATGAGGATGGGAGACCTCCACTCCCTCCAAAGCAGAAACGGAAAATGGAGAGGACAATTGAGTCAGAAGTTTGAAGAAATAAGGTGGCTGATTGAAGAAGTGCGACACAGACTAAAGATCACAGAAAATAGTTTTGAACAAATAACATTTATGCAAGCCTTACAACTACTGCTTGAAGTGGAGCAAGAGNNNNNNNNNNNNTCGTTTCAGCTTTTTTATYATTAAATAA
2 changes: 2 additions & 0 deletions ingest/source-data/sequences_pa.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>A/environment/USA/CO-UW-9084466/2024
ATTCAAAATGGAAGACTTTGTGCGACAATGCTTCAATCCAATGATTGTCGAGCTTGCGGAAAAAGCAATGAAAGAATATGGGGAAGATCCGAAAATCGAGACAAACAAATTTGCCGCAATATGCACACACTTAGAAGTCTGTTTCATGTATTCGGATTTCCATTTTATTGACGAACGAGGCGAATCAATGATTGTAGAATCTGGCGATCCAAATGCATTATTGAAACACCGATTTGAGATAATCGAAGGGAGAGACCGAGCAATGGCCTGGACAGTGGTGAATAGTATCTGCAACACCACAGGGGTCGAAAAGCCCAAATTCCTCCCTGATTTGTATGACTACAGAGAGAACAGATTCATTGAAATTGGAGTAACGCGAAGGGAAGTTCACATATACTATTTGGAAAAAGCCAACAAGATAAAATCAGAGAAAACACATATTCACATATTCTCATTCACTGGAGAGGAAATGGCCACCAAGGCGGACTACACCCTTGATGAAGAGAGCAGAGCAAGAATAAAAACCAGACTGTTCACTATAAGACAAGAAATGGCCAGTAGAGGTCTATGGGATTCCTTTCGTCAATCCGAGAGAGGCGAAGAGACAATTGAAGAAAGATTTGAAATCACAGGAACCATGCGCAGGCTTGCCGACCAAAGTATTCCACCGAACTTCTCCAGCCTTGAAAACTTTAGAGCCTATGTGGATGGATTCGAACCGAACGGCTGCATTGAGGGCAAGCTTTCTCAAATGTCAAAAGAGGTGAACGCCAGAATTGAGCCATTTCTGAAGACAACACCACGCCCTCTCAGATTACCTGATGGGCCTCCCTGTCCTCAGCGGTCGAAGTTCTTGCTGATGGATGCCCTTAAGTTGAGCATCGAAGACCCTAGTCATGAGGGGGAGGGCATACCGCTGTATGATGCAATCAAATGCATGAAGACATTTTTTGGCTGGAAAGAGCCCAACATCGTAAAGCCGCATGAGAAAGGCATAAACCCTAATTACCTCCTGGCTTGGAAGCAGGTGCTGGCAGAACTTCAAGACATTGAAAATGAGGAGAAAATTCCAAAAACAAAGAACATGAAGAAAACAAGCCAATTGAAGTGGGCACTTGGTGAGAACATGGCTCCAGAAAAAGTGGACTTTGAGGACTGCAAAGATGTTAGCGATCTAAGACAGTACGACAGTGACGAACCAGAGTCTAGATCACTAGCAAGCTGGATTCAGAGTGAATTCAACAAGGCATGCGAACTGACAGATTCGAGTTGGATTGAACTTGATGAGATAGGGGAAGACGTTGCTCCAATCGAACACATTGCGAGTGTGAGGAGGAACTATTTCACAGCGGAGGTATCCCATTGCAGGGCCACTGAATACATAATGAAGGGAGTATACATAAACACAGCCCTATTGAATGCATCCTGTGCAGCCATGGATGACTTCCAATTGATTCCAATGATAAGTAAGTGCAGAACTAAAGAAGGAAGACGGAGGACAAATCTGTATGGATTCATTATAAAAGGAAGATCCCATTTGAGGAATGACACCGATGTGGTAAACTTTGTGAGCATGGAATTCTCTCTAACTGACCCGAGGCTAGAGCCACACAAATGGGAAAARTACTGTGTTCTTGAAATAGGAGACATGCTATTGAGGACTGCGATAGGTCAAGTGTCGAGGCCCATGTTCCTRTATGTGAGAACCAATGGAACYTCCAARATCAARATGAAATGGGGCATGGARATGAGGCGMTGCCTTCTTCAGTCCCTTCAACAAATTGAGAGCATGATTGAGGCCGAATCTTCTGTCAAAGAGAAGGACATGTCCAAGGAATTCTTTGAAAACAAATCAGAAACATGGCCAATTGGAGAATCACCCAAAGGGGTGGAGGAAGGCTCTATTGGGAAAGTATGCAGAACATTGCTAGCAAAGTCTGTGTTCAACAGCCTATATGCATCTCCTCAACTCGAGGGGTTTTCAGCTGAATCAAGAAAATTGCTTCTCATTGTTCAGGCACTTAGGGACAACCTGGAACCTGGAACCTTCGATCTTGGGGGGCTATATGAAGCAATTGAGGAGTGCCTGATTAACGATCCCTGGGTTTTGCTTAATGCATCTTGGTTCAACTCCTTCCTCACACATGCACTGAAATAGTTGTGGCAATGCTACTATTTGCTATCCATACTGTCCAAACAAGGTACTTTTTTGGAC
2 changes: 2 additions & 0 deletions ingest/source-data/sequences_pb1.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>A/environment/USA/CO-UW-9084466/2024
GAATGGATGTCAATCCGACCTTACTCTTCTTGAAAGTTCCAGCGCAAAATGCCATAAGCACCACATTCCCGTATACTGGAGATCCTCCATACAGCCATGGAACAGGAACAGGATATACCATGGACACAGTTAACAGAACACATCAATATTCAGAAAAAGGGAAATGGACAACAAACTCAGAAACCGGGGCACCTCAACTCAATCCAATTGATGGACCATTGCCTGATGACAATGAGCCAAGTGGATATGCACAAACGGACTGCGTCCTTGAAGCAATGGCTTTCCTTGAAGAATCCCATCCAGGAATCTTTGAAAACTCGTGTCTTGAAACGATGGAAGTTGTTCAACAAACAAGAGTGGACAAGTTGACCCAAGGCCGTCAGACTTATGATTGGACATTAAACAGAAATCAGCCGGCTGCAACTGCATTAGCTAATACTATAGAGGTCTTCAGATCGAACGGTCTTACAGCTAATGAATCAGGAAGGCTAATAGATTTCCTCAAGGATGTGGTGGAATCAATGGATAAAGAGGAAATAGAAATAACAACGCATTTCCAAAGGAAAAGAAGAGTGAGAGACAACATGACCAAGAAAATGGTCACACAACGGACGATAGGAAAGAAGAAACAAAGGTTAAACAAAAGGAGCTATCTGATAAGAGCATTGACACTGAACACAATGACAAAAGACGCCGAAAGAGGCAAATTAAAGAGAAGGGCAATTGCAACACCCGGAATGCAAATCAGAGGGTTTGTGTACTTTGTTGAAACATTAGCAAGGAGCATTTGTGAGAAACTTGAACAATCTGGACTCCCAGTTGGAGGCAATGAAAAGAAGGCCAAACTAGCAAATGTTGTGAGAAAGATGATGACTAATTCGCAAGACACAGAGCTCTCTTTCACAATCACGGGAGACAACACTAAATGGAATGAGAACCAGAATCCTAGGATGTTTCTGGCAATGATAACATAYATWACAAGGAACCAACCTGAATGGTTCAGGAATGTATTGAGCATTGCACCTATAATGTTCTCAAACAAAATGGCAAGACTAGGGAAAGGATACATGTTCGAAAGTAAGAGCATGAAGCTTCGAACACAAATACCGGCAGAAATGCTAGCGAGCATTGATCTGAAATACTTCAATGAGTCAACAAGGAAGAAAATAGAGAAGATAAGACCTCTTCTAATAGATGGTACGGCCTCATTAAGCCCTGGAATGATGATGGGCATGTTCAACATGCTGAGTACAGTTCTGGGAGTTTCGATTCTAAATCTAGGGCAAAAGAAGTACACCAAAACAACATACTGGTGGGATGGACTACAATCTTCTGATGACTTTGCTCTCATCGTGAATGCTCCAAATCATGAGGGAATACAAGCAGGAGTAGACAGATTCTATAGAACCTGCAAGCTGGTAGGAATCAATATGAGCAAAAAGAAGTCATACATAAACAGGACAGGAACATTTGAATTCACAAGTTTTTTCTATCGCTATGGATTTGTAGCCAATTTCAGCATGGAGTTGCCCAGCTTTGGAGTTTCTGGGATCAATGAATCTGCAGACATGAGCATTGGAGTAACAGTGATAAAGAACAACATGATCAACAATGATCTTGGACCAGCAACAGCCCAAATGGCTCTACAGCTATTCATCAAGGATTACAGATACACATATCGATGTCACAGAGGAGACACACAAATTCAAACAAGGAGGTCATTCGAGCTGAAAAAGTTATGGGAACAAACCCGCTCAAAACCAGGACTGCTGGTCTCAGATGGAGGGCCAAATCTATACAATATCCGAAATCTCCACATTCCGGAAGTCTGCTTAAAATGGGAGCTAATGGACGAAGACTATCAGGGAAGGCTTTGTAATCCCCTGAATCCGTTTGTAAGCCACAAAGAAATAGAGTCTGTGAACAATGCTGTGGTGATGCCAGCTCATGGCCCAGCTAAGAGTATGGAATATGATGCTGTTGCCACCACTCACTCCTGGATCCCTAAGAGGAACCGCTCTATTCTTAATACAAGCCAAAGGGGAATCCTTGAAGACGAACAGATGTATCAAAAGTGCTGCAATCTATTTGAAAAATTCTTCCCTAGCAGTTCATACAGGAGGCCGGTTGGAATTTCCAGCATGGTGGAGGCCATGGTTTCTAGGGCCCGAATTGATGCACGAATTGACTTCGAATCTGGACGGATTAAGAAGGAGGAGTTTGCTGAGATCATGAAGATCTGTTCCACCATTGAAGAGCTCAGACGGCAGAAATAGTGAATTTAGCTTGTCCTTCATGAAAAAATG
Loading