Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output format #14

Open
antoine4ucsd opened this issue Dec 7, 2022 · 2 comments
Open

output format #14

antoine4ucsd opened this issue Dec 7, 2022 · 2 comments

Comments

@antoine4ucsd
Copy link

hello
is it possible to output the drm per sequence into a csv format rather than json (or convert the latter afterward)
I used HIV-DRLink_github.pl to get an overall summary but I'd like to output the drm per sequence present in my alignment (similar to the stanford website DRM report )

thank you!

@amolares
Copy link

amolares commented Aug 3, 2023

Hi.
I'm using this client for analysing HIV-1 samples:
sierrapy --virus HIV1 seqreads path-to-codfreqs-files

Once the json files were obtained, I would like to know how to convert them to a more friendly format (for example, csv, tsv, ...). An example of json file is attached.
11191381_S1_L001_001.report.zip
Any suggestion is very welcome.

Thank you

@gpcr
Copy link

gpcr commented Mar 11, 2025

import json
import pandas as pd

# Load JSON file
json_file = "11191381_S1_L001_001.report.json"
with open(json_file, "r") as f:
    data = json.load(f)

# Extract Subtype and Consensus Sequence
subtype = data.get("bestMatchingSubtype", {}).get("display", "Unknown")
consensus_seq = data.get("assembledConsensus", "Not available")

# Extract Available Genes
available_genes = [gene["name"] for gene in data.get("availableGenes", [])]

# Extract Gene Mutations
mutations_list = []
for gene_data in data.get("allGeneSequenceReads", []):
    gene_name = gene_data["gene"]["name"]
    for mutation in gene_data.get("mutations", []):
        if not mutation.get("isUnsequenced", False):  # Exclude unsequenced positions
            mutations_list.append([
                gene_name,
                mutation.get("reference", "-"),
                mutation.get("position", "-"),
                mutation.get("AAs", "-"),
                mutation.get("isInsertion", False),
                mutation.get("isDeletion", False)
            ])

mutations_df = pd.DataFrame(mutations_list, columns=["Gene", "Ref AA", "Position", "Mutated AA", "Insertion", "Deletion"])

# Include genes without mutations
for gene in available_genes:
    if gene not in mutations_df["Gene"].unique():
        mutations_df = pd.concat([mutations_df, pd.DataFrame([[gene, "-", "-", "-", False, False]], columns=mutations_df.columns)])

# Extract Drug Resistance Data
drug_resistance_list = []
for resistance_data in data.get("drugResistance", []):
    gene_name = resistance_data.get("gene", {}).get("name", "Unknown")
    for drug in resistance_data.get("drugScores", []):
        drug_resistance_list.append([
            gene_name,
            drug.get("drugClass", {}).get("name", "Unknown"),
            drug.get("drug", {}).get("name", "Unknown"),
            drug.get("drug", {}).get("displayAbbr", "N/A"),
            drug.get("score", 0.0),
            drug.get("level", "Unknown"),
            drug.get("text", "Unknown")
        ])

drug_resistance_df = pd.DataFrame(drug_resistance_list, columns=["Gene", "Drug Class", "Drug Name", "Abbreviation", "Score", "Level", "Interpretation"])


# Extract Unique Mutation Comments
mutation_comments_set = set()  # To prevent duplicates
for resistance_data in data.get("drugResistance", []):
    for drug in resistance_data.get("drugScores", []):
        for partial_score in drug.get("partialScores", []):
            for mutation in partial_score.get("mutations", []):
                for comment in mutation.get("comments", []):
                    mutation_comments_set.add((
                        mutation.get("text", "Unknown"),
                        comment.get("type", "Unknown"),
                        comment.get("text", "No comment provided")
                    ))

mutation_comments_df = pd.DataFrame(sorted(mutation_comments_set), columns=["Mutation", "Type", "Comment"])

# Display Results
print("#" * 60)
print("HIV-1 Drug Resistance By Next Generation Sequencing")
print("#" * 60)

print("\n# Subtype Information")
print(f"HIV Subtype: {subtype}")

print("\n# Consensus Sequence")
print(f"{consensus_seq}")
#print(f"{consensus_seq[:100]}... (truncated)")

print("\n# Available Genes")
print(", ".join(available_genes))

print("\n# Gene Mutations")
print(mutations_df.to_string(index=False))  # Neat table format

print("\n# Drug Resistance Interpretation")
print(drug_resistance_df.to_string(index=False))

print("\n# Mutation Comments")
print(mutation_comments_df.to_string(index=False))

Potential Output:

############################################################
HIV-1 Drug Resistance By Next Generation Sequencing
############################################################

Subtype Information

HIV Subtype: B (3.42%)

Consensus Sequence

ATCACTCTTTGGCAACGACCCCTCGTCCCAATAAAGATAGGGGGGCAAATAAAGGAAGCTCTACTAGATACAGGAGCAGATGATACAGTATTAGAAGAGATAAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAATATGATCAGATACCCATAGAAATTTGTGGACATAAAGCTATAGGTACAGTATTAATAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGTACTTTAAATTTTCCCATTAGTCCTATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAGAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAATAGAAATTTGTACAGAAATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGCACTAAATGGAGAAAATTAGTAGACTTCAGGGAGCTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCGGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTGGATAAAGACTTCAGGAAGTACACTGCATTTACCATACCTAGTACAAACAATGAGACACCAGGGATTAGATATCAGTACAACGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATTTTAGAACCTTTTAGAAAACAAAATCCAGACATAATTATCTATCAATACGTGGATGATTTGTATGTAGGGTCTGACTTAGAAATAGGACAGCATAGAGCAAAAATAGAGGAACTGAGACAACATCTGTTGAAGTGGGGGTTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTTCTCTGGATGGGTTATGAACTCCATCCTGATAGATGGACAGTACAGCCTATAATGCTGCCAGAAAAAGACAGCTGGACTGTCAATGATATACAGAAGTTAGTGGGAAAATTGAATTGGGCAAGTCAGATTTATTCAGGGATCAAGGTAAGACAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAACAGAAGTAGTATCATTAACAAGAGAAGCAGAGCTAGAACTGGCAGAAAACAGAGAAATTCTAAAAGAACCAGTATATGGAGTATATTATGACCCATCAAAAGATTTAGTAGCAGAAATACAGAAGCAAGAGCAAGGCCAATGGACATATCAAATTTATCAAGAGCCATTTAAAAATCTGAAGACAGGAAAGTATGCAAGAATGAGGGGTGCCCACACTAATGATGTAAGACAGTTAACAGAGGCAGTGCAAAAAATTGCACAAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAAACTACCCATACAAAAGGAAACATGGGAAGCATGGTGGACAGAGTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTCAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGAGAAAGAACCCATAGTAGGGGCAGAAACTTTCTATGTAGATGGAGCAGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGAGGAAATGAACAAGTAGATAAATTAGTCAGTACTGGAATTAGGAAAGTACTATTTTTAGATGGAATAGATAAGGCCCAAGAAGACCATGAGAAATATCACAGCAATTGGAGAGCAATGGCTAATGAATTCAACCTACCACCTATAGTAGCAAAGGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGTGAAGCCATACATGGACAAGTAGACTGTAGTCCAGGAATATGGCAACTAGATTGTACACATTTAGAAGGAAAAATTATCCTGGTAGCAGTTCATGTAGCCAGTGGATATATAGAAGCAGAAGTTATCCCAGCAGAGACAGGGCAGGAAACAGCATACTTTCTTTTAAAATTAGCAGGGAGATGGCCAGTAACAACAATACATACAGACAATGGCCGCAACTTCACCAGTACTGTGGTTAAAGCCGCCTGCTGGTGGGCAGGGATCAAGCAGGAATTTAGCATTCCCTACAATCCCCAAAGTCACGGGGTGGTAGAATCTATGCATAAAGAATTAAAGAAAATTATAGGACAGGTAAGAGATCAGGCTGAACATCTTAAAACAGCAGTACAAATGGCAGTATTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGGGGATACAGTGCAGGGGAAAGAATAATAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGACAGCAGAGAACCACTTTGGAAAGGACCAGCAAAGCTGCTCTGGAAAGGTGAAGGGGCAGTAGTAATACAAGATAATAGTGAAATAAAAGTAGTGCCAAGAAGAAAAGCAAAGATCATTAGGGATTATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAAGATTAAAACATGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTGTTAAATGGCAGTCTAGCAGAGAAAGTAATRMTWAGATCTSASAATTTCTCRRACAATGCTAAAAMCATAATAGTACARCTGARCRAARCTGTAAWAATTAATTGTACAAGACCCRRCAAYAAYACAAGAAAASTATAMAWWTRGGACCAGGGAGAGCATWTKRGCAACAGGAGGCRWAATAGTAGGARAYATAAGACARGCACAYTGTAAYATTAGTRGWACARAATGGAATSAWACTYTARAARAGATAGTTRWAAAATTAAGAGARCAATTTAATAAAACAATARTCTTTAAYMRYTCCTCAGGAGGGGACCCAGAAATT

Available Genes

gag, pol, PR, RT, IN, vif, env

Gene Mutations

Gene Ref AA Position Mutated AA Insertion Deletion
PR T 12 P False False
PR L 19 I False False
PR M 36 I False False
PR N 37 S False False
PR L 63 P False False
PR V 77 I False False
PR I 93 L False False
RT K 20 R False False
RT V 35 I False False
RT I 135 T False False
RT V 179 I False False
RT M 184 V False False
RT T 200 A False False
RT R 211 K False False
RT K 238 R False False
RT V 245 M False False
RT A 272 S False False
RT K 277 R False False
RT I 293 V False False
RT P 294 S False False
RT E 297 R False False
RT H 315 Y False False
RT I 326 V False False
RT G 333 E False False
RT K 366 R False False
RT T 377 Q False False
RT A 554 T False False
IN E 11 D False False
IN S 24 N False False
IN D 25 E False False
IN V 31 I False False
IN M 50 I False False
IN K 111 T False False
IN S 119 R False False
IN T 125 V False False
IN G 140 S False False
IN Q 148 H False False
IN N 155 H False False
IN V 201 I False False
IN D 232 E False False
IN D 256 E False False
gag - - - False False
pol - - - False False
vif - - - False False
env - - - False False

Drug Resistance Interpretation

Gene Drug Class Drug Name Abbreviation Score Level Interpretation
PR PI ATV ATV/r 0.0 1 Susceptible
PR PI DRV DRV/r 0.0 1 Susceptible
PR PI FPV FPV/r 0.0 1 Susceptible
PR PI IDV IDV/r 0.0 1 Susceptible
PR PI LPV LPV/r 0.0 1 Susceptible
PR PI NFV NFV 0.0 1 Susceptible
PR PI SQV SQV/r 0.0 1 Susceptible
PR PI TPV TPV/r 0.0 1 Susceptible
RT NRTI ABC ABC 15.0 3 Low-Level Resistance
RT NRTI AZT AZT -10.0 1 Susceptible
RT NRTI D4T D4T -10.0 1 Susceptible
RT NRTI DDI DDI 10.0 2 Potential Low-Level Resistance
RT NRTI FTC FTC 60.0 5 High-Level Resistance
RT NRTI LMV 3TC 60.0 5 High-Level Resistance
RT NRTI TDF TDF -10.0 1 Susceptible
RT NNRTI DOR DOR 0.0 1 Susceptible
RT NNRTI EFV EFV 0.0 1 Susceptible
RT NNRTI ETR ETR 0.0 1 Susceptible
RT NNRTI NVP NVP 0.0 1 Susceptible
RT NNRTI RPV RPV 0.0 1 Susceptible
IN INSTI BIC BIC 75.0 5 High-Level Resistance
IN INSTI CAB CAB 105.0 5 High-Level Resistance
IN INSTI DTG DTG 75.0 5 High-Level Resistance
IN INSTI EVG EVG 150.0 5 High-Level Resistance
IN INSTI RAL RAL 150.0 5 High-Level Resistance

Mutation Comments

Mutation Type Comment
G140S Major G140S/A/C are non-polymorphic mutations that usually occur with Q148 mutations. Alone, they have minimal effects on INSTI susceptibility. However, in combination with Q148 mutations they are associated with high-level resistance to RAL and EVG and intermediate reductions in DTG and BIC susceptibility.
M184V NRTI M184V/I cause high-level in vitro resistance to 3TC and FTC and low/intermediate resistance to ABC (3-fold reduced susceptibility). M184V/I are not contraindications to continued treatment with 3TC or FTC because they increase susceptibility to AZT and TDF and are associated with clinically significant reductions in HIV-1 replication.
N155H Major N155H is a common nonpolymorphic INSTI-resistance mutations. It has been reported in a high proportion of persons developing VF and HIVDR while receiving RAL, EVG, DTG, and CAB. Alone, it reduces RAL and EVG susceptibility about 10 and 30-fold, respectively. It has minimal effect on susceptibility to DTG, BIC, and CAB.
Q148H Major Q148H/K/R are nonpolymorphic mutations reported in persons receiving RAL, EVG, CAB, and DTG. They nearly always occur in combination with G140A/S or E138K. In this setting they are associated with near complete resistance to RAL and EVG, high-levels of reduction in CAB susceptibility, and low-to-intermediate reductions in DTG and BIC susceptibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants