Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of uncertainty in JSONs #386

Open
jameshadfield opened this issue Oct 25, 2019 · 3 comments
Open

Representation of uncertainty in JSONs #386

jameshadfield opened this issue Oct 25, 2019 · 3 comments
Labels
priority: high To be resolved before other issues

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Oct 25, 2019

Currently uncertainty in a trait, e.g. location for node X, is represented in augur along the lines of:

/* traits.json */
{X: {location: "blue", location_confidence: {blue: 1.0}}}
/* v1 tree JSON */
{strain: "X", attr: {location: "blue", location_confidence: {blue: 1.0}}}
/* v2 JSON */
{name: "X", node_attrs: {location: {value: "blue", confidence: {blue: 1.0}}}}

Temporal confidence is slightly different formatting, but conceptually identical. This is independent of the model employed.

Importantly, if node X had location "blue" (via metadata) then the output is indistinguishable to if it was inferred with 100% confidence as being in location "blue".

image
For this example ☝️ all nodes would look like X above, and auspice wouldn't know whether to say "Node A: inferred as blue with 100% confidence" or "Node A: blue". This is even more problematic with tip sampling dates, where we have some code in auspice to try to guess the true meaning:

if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {

Proposed solution

Modify augur traits and augur refine to produce output where non-inferred nodes do not have associated confidences. This will then be carried through augur export {v1,v2}. Auspice's v1->v2 JSON conversion function implement the code above to remove confidence values for tips it believes aren't inferred.

jameshadfield added a commit to nextstrain/auspice that referenced this issue Oct 25, 2019
This can be much improved upon resolution of nextstrain/augur#386. See that issue for more information.
jameshadfield added a commit to nextstrain/auspice that referenced this issue Oct 25, 2019
Currently, the way augur exports confidence values for tips, it's largely impossible to know if a tip's trait which has 100% confidence is inferred or known (i.e. defined by the metadata). Since the majority of tips for which DTA is run have data, we assume that the value is provided.

This can be much improved upon resolution of nextstrain/augur#386.
@rneher
Copy link
Member

rneher commented Oct 26, 2019

The issue I see here is that time tree confidences are not always inferred (for performance reasons). But augur refine exports raw-date and that could be compared to the inferred date. Similarly, traits could write the input value into the json if it exists. I would prefer this to signal inference through absence of confidence values.

@huddlej huddlej added the needs triage Needs triage by a Nextstrain team member label Jul 4, 2020
@jameshadfield
Copy link
Member Author

jameshadfield commented Feb 17, 2025

Following up with two related (I think... 5 years later) requests:

  • In the Auspice JSON an inferred num_date looks like "num_date": {"value": 2025.13, "confidence": [2025.027, 2025.13]}. We should add the underlying (metadata) date (in this example, 2025-XX-XX) to the exported JSON. Key name suggestions raw_value, raw? This would preserve whatever values we allow in augur (e.g. see Allow precise date ranges #1304).

    • (This would require changes to Auspice. A short-term solution would be to add it as a separate attr.)
  • In parallel, but broader scope, having a inferred: boolean key/value in the node attr would be immensely helpful. I think that's basically what the original issue here is talking about.

@jameshadfield jameshadfield added priority: high To be resolved before other issues and removed needs triage Needs triage by a Nextstrain team member labels Feb 17, 2025
@jameshadfield
Copy link
Member Author

We should add the underlying (metadata) date (in this example, 2025-XX-XX) to the exported JSON

Sketching out what this may look like

Augur patch
diff --git a/augur/export_v2.py b/augur/export_v2.py
index 6484eca7..7e2245e4 100644
--- a/augur/export_v2.py
+++ b/augur/export_v2.py
@@ -859,6 +859,11 @@ def set_node_attrs_on_tree(data_json, node_attrs, additional_metadata_columns):
       if is_valid(raw_data.get("num_date", None)): # it's ok not to have temporal information
           node["node_attrs"]["num_date"] = {"value": format_number(raw_data["num_date"])}
           node["node_attrs"]["num_date"].update(attr_confidence(node["name"], raw_data, "num_date"))
+            # We aim to know whether the date has been inferred via timetree. The following approach is
+            # temporary - ideally `augur refine` would add a `inferred: boolean` value.
+            original_value = raw_data.get("raw_date", "")
+            if original_value and not re.match(r"^\d{4}-\d{2}-\d{2}$", original_value):
+                node["node_attrs"]["num_date"]["raw_value"] = original_value

   def _transfer_url_accession(node, raw_data):
       for prop in ["url", "accession"]:
Auspice patch
diff --git a/src/components/tree/infoPanels/click.js b/src/components/tree/infoPanels/click.js
index 8a423499..bffe488c 100644
--- a/src/components/tree/infoPanels/click.js
+++ b/src/components/tree/infoPanels/click.js
@@ -177,12 +177,14 @@ const SampleDate = ({isTerminal, node, t}) => {
 const date = getTraitFromNode(node, "num_date");
 if (!date) return null;

+  const original = getTraitFromNode(node, "num_date", {raw: true});
 const dateUncertainty = getTraitFromNode(node, "num_date", {confidence: true});
 if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {
   return (
     <>
       {item(t(isTerminal ? "Inferred collection date" : "Inferred date"), numericToCalendar(date))}
       {item(t("Date Confidence Interval"), `(${numericToCalendar(dateUncertainty[0])}, ${numericToCalendar(dateUncertainty[1])})`)}
+        {original && item(t("Raw date"), original)}
     </>
   );
 }
diff --git a/src/util/treeMiscHelpers.js b/src/util/treeMiscHelpers.js
index ef71a66c..6960bd8c 100644
--- a/src/util/treeMiscHelpers.js
+++ b/src/util/treeMiscHelpers.js
@@ -25,10 +25,10 @@ james hadfield, nov 2019.
* NOTE: do not use this for "div", "vaccine" or other traits set on `node_attrs`
* which don't share the same structure as traits. See the JSON spec for more details.
*/
-export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}={}) => {
+export const getTraitFromNode = (node, trait, {entropy=false, confidence=false, raw=false}={}) => {
 if (!node.node_attrs) return undefined;

-  if (!entropy && !confidence) {
+  if (!entropy && !confidence && !raw) {
   if (!node.node_attrs[trait]) {
     if (trait === strainSymbol) return node.name;
     return undefined;
@@ -42,6 +42,9 @@ export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}=
 } else if (confidence) {
   if (node.node_attrs[trait]) return node.node_attrs[trait].confidence;
   return undefined;
+  } else if (raw) {
+    if (node.node_attrs[trait]) return node.node_attrs[trait].raw_value;
+    return undefined;
 }
 return undefined;
};
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high To be resolved before other issues
Projects
None yet
Development

No branches or pull requests

3 participants