Add data semantics #82

dkuebric · 2016-03-19T19:47:15Z

This is a cleaned-up version of #61 to continue the path towards merge, after discussion there and in #75.

I haven't incorporated the auto-generation suggested in #76 yet but that would be a possible next step.

One question in my mind is whether this document should get listed in the top-level nav? For now, I've left it out and referenced from the Log and Span sections of spec.md.

yurishkuro · 2016-03-19T22:21:54Z

data-semantics.md

+
+As an example, consider the common case of a HTTP-based application server. The URL of an incoming request that the application is handling is often useful for diagnostics, as well as the HTTP verb and the resultant status code. An instrumentor could choose to report the URL in a tag named `URL`, or perhaps named `http.url`--either would be valid from the pure API perspective. But if the Tracer wishes to add intelligence, such as indexing on the URL value or sampling proactively for requests to a particular endpoint, it must know where to look for relevant data. In short, when tag names and other instrumentor-provided values are used consistently, the tracers on the other side of the API can employ more intelligence.
+
+The guidelines contained describe a common ground on which instrumentors and tracer authors can build beyond pure data collection. Adherence to the guidelines is optional but highly recommended for instrumetors.


"contained here", and typo in the last word "instrumetors"

yurishkuro · 2016-03-19T22:46:31Z

Overall LGTM, a few minor typos, and a couple questions.

I think it makes sense to include in the top level menu, under Semantic Spec link.

bhs · 2016-03-21T20:53:59Z

data-semantics.md

+
+### Span Naming
+
+Spans carry tags, logs, and attributes, but they also have a top-level **operation name**. This should be a low-cardinality string value representing the type of work being done in the span. Typically, this is the local type of work being done. Examples include `GET` or `POST` for a HTTP span, `SELECT` or `INSERT` for a SQL span, and so on.


s/attributes/baggage/

I see the need for GET/SELECT/etc as operation names when nothing else is available, but in an ideal world the operation name would have more semantic meaning... e.g., in an RPC-based system it would be the name of the RPC (thrift/grpc/whatever) method. Maybe that would be a better example?

I agree, that would be a better example, especially as a guideline. In practice, when implicit instrumentation is used, it's nearly impossible to come up with business-specific names, but if an RPC framework itself is instrumented, it often does know the name of the endpoint it is calling.

We had a situation where we had a thick client for a storage service, which was communicating to the server via http. The implicit instrumentation picked the default get/post names at the http level spans, but the explicit instrumentation in the thick client was wrapping them into better-named spans.

bhs · 2016-03-21T21:04:50Z

Looking great! I just made a bunch of nits.

bhs · 2016-04-12T04:47:26Z

(@dkuebric friendly ping on this)

dkuebric · 2016-04-16T20:48:49Z

@yurishkuro @bensigelman thanks for all the thoughtful comments (and close reading). I've taken most of your suggestions directly. I'm hoping this is mergeable right now, as I've heard some others had changes they wanted to start merging in. Here's my thoughts on some of the less obvious changes I made:

Status code. @yurishkuro -- could conceivably either be int or string. I don't know that one is a clear winner, but based on a quick survey of AppNeta typed instrumentation (java, c-based), int is more common format at framework/handler level. Looked at apache, nginx, PHP, netty, and servlet interfaces. So I'd vote we suggest int for now--seems simpler than support for both/either.
Query string. @yurishkuro -- first off, thanks for catching the immediate inconsistency of the advice ;). Second, I think that the idea to split it into 2 parts is perhaps not something to tackle at this layer--it comes from some specific requests we have gotten to not report query string data gathered from instrumentation. This could be handled by tracer internals. So, I've removed http.query as a separate tag and suggested inclusion in base URI tag.
Error payload. @bensigelman -- agreed this may be best handled via a dedicated method. To the extent that log_event is used, I've updated it to suggest a map-based type (dict, etc) per @yurishkuro suggestion.
client and server string. Expanded from c and s, which I stole from zipkin. This actually feels like a good candidate for encoding in ext tags?

bhs · 2016-04-16T20:59:15Z

data-semantics.md

@@ -32,7 +32,7 @@ Span structure is also important: what do spans represent, and what relationship

 ## Span Tag Use-Cases

-The following tags are recommended for instrumentors who are trying to represent a particular type of data. Tag names follow a general structure of namespacing; the values are determined purely on a tag-by-tag basis.
+The following tags are recommended for instrumentors who are trying to represent a particular type of data. Tag names follow a general structure of namespacing.

 The recommended tags below are accompanied by `const` values included in an `ext` module for each opentracing implementation.  These `ext` values should be used in place of the strings below, as tracers may choose to use different underlying representations for these common concepts.  The symbols are similar in each implementation: ([Go](https://github.com/opentracing/opentracing-go/blob/master/ext/tags.go), [Python](https://github.com/opentracing/opentracing-python/blob/master/opentracing/ext/tags.py), etc.)


Just FYI (i.e., no need to change this PR), #76 seems like a very wise idea and – once realized – would be something to reference here... a canonical YAML file and codegen seems like the only sane way to go in any case.

yurishkuro · 2016-04-16T21:08:33Z

LGTM

client and server string. Expanded from c and s, which I stole from zipkin. This actually feels like a good candidate for encoding in ext tags?

Already there, but as c/s. We can change that, since there tags are normally only used in-memory. https://github.com/opentracing/opentracing-go/blob/master/ext/tags.go#L43

bhs · 2016-04-16T22:36:43Z

(LGTM as well)

yurishkuro · 2016-04-18T16:31:01Z

data-semantics.md

+### Exception
+
+* Event: `exception`
+* Payload - map


I think map is too language specific (i.e. python). If this was Java/JavaScript, I would simply do span.log("exception", exc) and let the tracer figure out how to record it.

Agree that languages would benefit from some convenience methods that automatically extract features from commonly-used error/exception objects. However, would that be implemented via a log_event overload, or via a log_error or similar?

I suppose secretly I've been assuming that payloads should are maps/arrays/objects in the way that spans are. (An event might have an arbitrary set of data to report with it.) If not, instrumentors are going to have to figure out what types of payloads might be valid for various tracer implementations, I guess?

However, would that be implemented via a log_event overload, or via a log_error or similar?

No, I was thinking just plain old log method: span.log("exception", exc)

If not, instrumentors are going to have to figure out what types of payloads might be valid for various tracer implementations, I guess?

If we were shooting for maps, they I would suggest changing the API to take a map rather than interface{} / Object. But it feels that people would question this choice; after all, an instance of Exception is a natural representation of the exception, why make more work for people to break it apart into a map?

We have stayed away from defining semantics of the payload so far. It's definitely something that we need to collectively address and be more prescriptive, to clarify the contract between instrumentors and implementors. But I would suggest a different issue/PR for that. Until then, an opaque Object is the least restrictive form.

FWIW, I have generally thought of the payload first and foremost as a place for the OpenTracing caller to stick "FYI"-level structured data that might help explain or contextualize the trace.

bhs · 2016-05-04T03:25:36Z

@dkuebric can you merge this soon?

…ssion (opentracing#82)

dkuebric · 2016-05-04T19:22:43Z

@bensigelman Made changes per discussions of other week. I don't have write access but I think the PR is ready, pending a quick review.

bhs · 2016-05-04T19:54:23Z

@dkuebric oh dear, sorry! I'll review and likely merge shortly.

dkuebric · 2016-05-04T21:39:56Z

👍

dkuebric added 2 commits March 19, 2016 19:44

Add data semantics page based on discussion in opentracing#61

2ac2f65

Reference data semantics page from main spec

49959a9

This was referenced Mar 19, 2016

Rough, rough draft of data semantics guidelines for discussion #61

Closed

Top-level tags: "operation_name" and/or "component"? #75

Closed

yurishkuro reviewed Mar 19, 2016
View reviewed changes

yurishkuro mentioned this pull request Mar 20, 2016

[GH-75][GH-76] Document "component" tag and all other standardized tags in spec #81

Closed

bhs reviewed Mar 21, 2016
View reviewed changes

Update data semantics reference based on discussion in opentracing#82

0ad3784

bhs reviewed Apr 16, 2016
View reviewed changes

Elevate span.kind in data semantics documentation

9dff09c

yurishkuro reviewed Apr 18, 2016
View reviewed changes

dkuebric added 2 commits May 4, 2016 19:21

Remove error semantics from data semantic guide pending further discu…

9db4afb

…ssion (opentracing#82)

Merge branch 'master' into add-data-semantics

82b452f

bhs merged commit ebd1020 into opentracing:master May 4, 2016

dkuebric deleted the add-data-semantics branch May 4, 2016 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data semantics #82

Add data semantics #82

dkuebric commented Mar 19, 2016

yurishkuro Mar 19, 2016

yurishkuro commented Mar 19, 2016

bhs Mar 21, 2016

bhs Mar 21, 2016

yurishkuro Mar 21, 2016

bhs commented Mar 21, 2016

bhs commented Apr 12, 2016

dkuebric commented Apr 16, 2016

bhs Apr 16, 2016

yurishkuro commented Apr 16, 2016

bhs commented Apr 16, 2016

yurishkuro Apr 18, 2016

dkuebric Apr 18, 2016

yurishkuro Apr 18, 2016

bhs Apr 18, 2016

bhs commented May 4, 2016

dkuebric commented May 4, 2016

bhs commented May 4, 2016

dkuebric commented May 4, 2016


		As an example, consider the common case of a HTTP-based application server. The URL of an incoming request that the application is handling is often useful for diagnostics, as well as the HTTP verb and the resultant status code. An instrumentor could choose to report the URL in a tag named `URL`, or perhaps named `http.url`--either would be valid from the pure API perspective. But if the Tracer wishes to add intelligence, such as indexing on the URL value or sampling proactively for requests to a particular endpoint, it must know where to look for relevant data. In short, when tag names and other instrumentor-provided values are used consistently, the tracers on the other side of the API can employ more intelligence.

		The guidelines contained describe a common ground on which instrumentors and tracer authors can build beyond pure data collection. Adherence to the guidelines is optional but highly recommended for instrumetors.


		### Span Naming

		Spans carry tags, logs, and attributes, but they also have a top-level operation name. This should be a low-cardinality string value representing the type of work being done in the span. Typically, this is the local type of work being done. Examples include `GET` or `POST` for a HTTP span, `SELECT` or `INSERT` for a SQL span, and so on.

Add data semantics #82

Add data semantics #82

Conversation

dkuebric commented Mar 19, 2016

Choose a reason for hiding this comment

yurishkuro commented Mar 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhs commented Mar 21, 2016

bhs commented Apr 12, 2016

dkuebric commented Apr 16, 2016

Choose a reason for hiding this comment

yurishkuro commented Apr 16, 2016

bhs commented Apr 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhs commented May 4, 2016

dkuebric commented May 4, 2016

bhs commented May 4, 2016

dkuebric commented May 4, 2016