Representation of uncertainty in JSONs #386

jameshadfield · 2019-10-25T00:39:02Z

Currently uncertainty in a trait, e.g. location for node X, is represented in augur along the lines of:

/* traits.json */
{X: {location: "blue", location_confidence: {blue: 1.0}}}
/* v1 tree JSON */
{strain: "X", attr: {location: "blue", location_confidence: {blue: 1.0}}}
/* v2 JSON */
{name: "X", node_attrs: {location: {value: "blue", confidence: {blue: 1.0}}}}

Temporal confidence is slightly different formatting, but conceptually identical. This is independent of the model employed.

Importantly, if node X had location "blue" (via metadata) then the output is indistinguishable to if it was inferred with 100% confidence as being in location "blue".

For this example ☝️ all nodes would look like X above, and auspice wouldn't know whether to say "Node A: inferred as blue with 100% confidence" or "Node A: blue". This is even more problematic with tip sampling dates, where we have some code in auspice to try to guess the true meaning:

if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {

Proposed solution

Modify augur traits and augur refine to produce output where non-inferred nodes do not have associated confidences. This will then be carried through augur export {v1,v2}. Auspice's v1->v2 JSON conversion function implement the code above to remove confidence values for tips it believes aren't inferred.

The text was updated successfully, but these errors were encountered:

This can be much improved upon resolution of nextstrain/augur#386. See that issue for more information.

Currently, the way augur exports confidence values for tips, it's largely impossible to know if a tip's trait which has 100% confidence is inferred or known (i.e. defined by the metadata). Since the majority of tips for which DTA is run have data, we assume that the value is provided. This can be much improved upon resolution of nextstrain/augur#386.

rneher · 2019-10-26T10:36:33Z

The issue I see here is that time tree confidences are not always inferred (for performance reasons). But augur refine exports raw-date and that could be compared to the inferred date. Similarly, traits could write the input value into the json if it exists. I would prefer this to signal inference through absence of confidence values.

jameshadfield · 2025-02-17T00:27:45Z

Following up with two related (I think... 5 years later) requests:

In the Auspice JSON an inferred num_date looks like "num_date": {"value": 2025.13, "confidence": [2025.027, 2025.13]}. We should add the underlying (metadata) date (in this example, 2025-XX-XX) to the exported JSON. Key name suggestions raw_value, raw? This would preserve whatever values we allow in augur (e.g. see Allow precise date ranges #1304).
- (This would require changes to Auspice. A short-term solution would be to add it as a separate attr.)
In parallel, but broader scope, having a inferred: boolean key/value in the node attr would be immensely helpful. I think that's basically what the original issue here is talking about.

jameshadfield · 2025-02-17T01:08:28Z

We should add the underlying (metadata) date (in this example, 2025-XX-XX) to the exported JSON

Sketching out what this may look like

Augur patch

diff --git a/augur/export_v2.py b/augur/export_v2.py
index 6484eca7..7e2245e4 100644
--- a/augur/export_v2.py
+++ b/augur/export_v2.py
@@ -859,6 +859,11 @@ def set_node_attrs_on_tree(data_json, node_attrs, additional_metadata_columns):
       if is_valid(raw_data.get("num_date", None)): # it's ok not to have temporal information
           node["node_attrs"]["num_date"] = {"value": format_number(raw_data["num_date"])}
           node["node_attrs"]["num_date"].update(attr_confidence(node["name"], raw_data, "num_date"))
+            # We aim to know whether the date has been inferred via timetree. The following approach is
+            # temporary - ideally `augur refine` would add a `inferred: boolean` value.
+            original_value = raw_data.get("raw_date", "")
+            if original_value and not re.match(r"^\d{4}-\d{2}-\d{2}$", original_value):
+                node["node_attrs"]["num_date"]["raw_value"] = original_value

   def _transfer_url_accession(node, raw_data):
       for prop in ["url", "accession"]:

Auspice patch

diff --git a/src/components/tree/infoPanels/click.js b/src/components/tree/infoPanels/click.js
index 8a423499..bffe488c 100644
--- a/src/components/tree/infoPanels/click.js
+++ b/src/components/tree/infoPanels/click.js
@@ -177,12 +177,14 @@ const SampleDate = ({isTerminal, node, t}) => {
 const date = getTraitFromNode(node, "num_date");
 if (!date) return null;

+  const original = getTraitFromNode(node, "num_date", {raw: true});
 const dateUncertainty = getTraitFromNode(node, "num_date", {confidence: true});
 if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {
   return (
     <>
       {item(t(isTerminal ? "Inferred collection date" : "Inferred date"), numericToCalendar(date))}
       {item(t("Date Confidence Interval"), `(${numericToCalendar(dateUncertainty[0])}, ${numericToCalendar(dateUncertainty[1])})`)}
+        {original && item(t("Raw date"), original)}
     </>
   );
 }
diff --git a/src/util/treeMiscHelpers.js b/src/util/treeMiscHelpers.js
index ef71a66c..6960bd8c 100644
--- a/src/util/treeMiscHelpers.js
+++ b/src/util/treeMiscHelpers.js
@@ -25,10 +25,10 @@ james hadfield, nov 2019.
* NOTE: do not use this for "div", "vaccine" or other traits set on `node_attrs`
* which don't share the same structure as traits. See the JSON spec for more details.
*/
-export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}={}) => {
+export const getTraitFromNode = (node, trait, {entropy=false, confidence=false, raw=false}={}) => {
 if (!node.node_attrs) return undefined;

-  if (!entropy && !confidence) {
+  if (!entropy && !confidence && !raw) {
   if (!node.node_attrs[trait]) {
     if (trait === strainSymbol) return node.name;
     return undefined;
@@ -42,6 +42,9 @@ export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}=
 } else if (confidence) {
   if (node.node_attrs[trait]) return node.node_attrs[trait].confidence;
   return undefined;
+  } else if (raw) {
+    if (node.node_attrs[trait]) return node.node_attrs[trait].raw_value;
+    return undefined;
 }
 return undefined;
};

jameshadfield added a commit to nextstrain/auspice that referenced this issue Oct 25, 2019

interpret certain trait values on tips as known not inferred

6b9b5c0

This can be much improved upon resolution of nextstrain/augur#386. See that issue for more information.

huddlej added the needs triage Needs triage by a Nextstrain team member label Jul 4, 2020

jameshadfield added priority: high To be resolved before other issues and removed needs triage Needs triage by a Nextstrain team member labels Feb 17, 2025

This was referenced Feb 18, 2025

Read optional date ambiguity node attrs nextstrain/auspice#1943

Merged

Export ambiguous date strings for tips #1760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of uncertainty in JSONs #386

Representation of uncertainty in JSONs #386

jameshadfield commented Oct 25, 2019 •

edited

Loading

rneher commented Oct 26, 2019

jameshadfield commented Feb 17, 2025 •

edited

Loading

jameshadfield commented Feb 17, 2025

Representation of uncertainty in JSONs #386

Representation of uncertainty in JSONs #386

Comments

jameshadfield commented Oct 25, 2019 • edited Loading

Proposed solution

rneher commented Oct 26, 2019

jameshadfield commented Feb 17, 2025 • edited Loading

jameshadfield commented Feb 17, 2025

jameshadfield commented Oct 25, 2019 •

edited

Loading

jameshadfield commented Feb 17, 2025 •

edited

Loading