Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data semantics #82

Merged
merged 6 commits into from
May 4, 2016
Merged

Add data semantics #82

merged 6 commits into from
May 4, 2016

Conversation

dkuebric
Copy link
Contributor

This is a cleaned-up version of #61 to continue the path towards merge, after discussion there and in #75.

I haven't incorporated the auto-generation suggested in #76 yet but that would be a possible next step.

One question in my mind is whether this document should get listed in the top-level nav? For now, I've left it out and referenced from the Log and Span sections of spec.md.


As an example, consider the common case of a HTTP-based application server. The URL of an incoming request that the application is handling is often useful for diagnostics, as well as the HTTP verb and the resultant status code. An instrumentor could choose to report the URL in a tag named `URL`, or perhaps named `http.url`--either would be valid from the pure API perspective. But if the Tracer wishes to add intelligence, such as indexing on the URL value or sampling proactively for requests to a particular endpoint, it must know where to look for relevant data. In short, when tag names and other instrumentor-provided values are used consistently, the tracers on the other side of the API can employ more intelligence.

The guidelines contained describe a common ground on which instrumentors and tracer authors can build beyond pure data collection. Adherence to the guidelines is optional but highly recommended for instrumetors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"contained here", and typo in the last word "instrumetors"

@yurishkuro
Copy link
Member

Overall LGTM, a few minor typos, and a couple questions.

I think it makes sense to include in the top level menu, under Semantic Spec link.


### Span Naming

Spans carry tags, logs, and attributes, but they also have a top-level **operation name**. This should be a low-cardinality string value representing the type of work being done in the span. Typically, this is the local type of work being done. Examples include `GET` or `POST` for a HTTP span, `SELECT` or `INSERT` for a SQL span, and so on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/attributes/baggage/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the need for GET/SELECT/etc as operation names when nothing else is available, but in an ideal world the operation name would have more semantic meaning... e.g., in an RPC-based system it would be the name of the RPC (thrift/grpc/whatever) method. Maybe that would be a better example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that would be a better example, especially as a guideline. In practice, when implicit instrumentation is used, it's nearly impossible to come up with business-specific names, but if an RPC framework itself is instrumented, it often does know the name of the endpoint it is calling.

We had a situation where we had a thick client for a storage service, which was communicating to the server via http. The implicit instrumentation picked the default get/post names at the http level spans, but the explicit instrumentation in the thick client was wrapping them into better-named spans.

@bhs
Copy link
Contributor

bhs commented Mar 21, 2016

Looking great! I just made a bunch of nits.

@bhs
Copy link
Contributor

bhs commented Apr 12, 2016

(@dkuebric friendly ping on this)

@dkuebric
Copy link
Contributor Author

@yurishkuro @bensigelman thanks for all the thoughtful comments (and close reading). I've taken most of your suggestions directly. I'm hoping this is mergeable right now, as I've heard some others had changes they wanted to start merging in. Here's my thoughts on some of the less obvious changes I made:

  1. Status code. @yurishkuro -- could conceivably either be int or string. I don't know that one is a clear winner, but based on a quick survey of AppNeta typed instrumentation (java, c-based), int is more common format at framework/handler level. Looked at apache, nginx, PHP, netty, and servlet interfaces. So I'd vote we suggest int for now--seems simpler than support for both/either.
  2. Query string. @yurishkuro -- first off, thanks for catching the immediate inconsistency of the advice ;). Second, I think that the idea to split it into 2 parts is perhaps not something to tackle at this layer--it comes from some specific requests we have gotten to not report query string data gathered from instrumentation. This could be handled by tracer internals. So, I've removed http.query as a separate tag and suggested inclusion in base URI tag.
  3. Error payload. @bensigelman -- agreed this may be best handled via a dedicated method. To the extent that log_event is used, I've updated it to suggest a map-based type (dict, etc) per @yurishkuro suggestion.
  4. client and server string. Expanded from c and s, which I stole from zipkin. This actually feels like a good candidate for encoding in ext tags?

@@ -32,7 +32,7 @@ Span structure is also important: what do spans represent, and what relationship

## Span Tag Use-Cases

The following tags are recommended for instrumentors who are trying to represent a particular type of data. Tag names follow a general structure of namespacing; the values are determined purely on a tag-by-tag basis.
The following tags are recommended for instrumentors who are trying to represent a particular type of data. Tag names follow a general structure of namespacing.

The recommended tags below are accompanied by `const` values included in an `ext` module for each opentracing implementation. These `ext` values should be used in place of the strings below, as tracers may choose to use different underlying representations for these common concepts. The symbols are similar in each implementation: ([Go](https://github.com/opentracing/opentracing-go/blob/master/ext/tags.go), [Python](https://github.com/opentracing/opentracing-python/blob/master/opentracing/ext/tags.py), etc.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI (i.e., no need to change this PR), #76 seems like a very wise idea and – once realized – would be something to reference here... a canonical YAML file and codegen seems like the only sane way to go in any case.

@yurishkuro
Copy link
Member

LGTM

client and server string. Expanded from c and s, which I stole from zipkin. This actually feels like a good candidate for encoding in ext tags?

Already there, but as c/s. We can change that, since there tags are normally only used in-memory. https://github.com/opentracing/opentracing-go/blob/master/ext/tags.go#L43

@bhs
Copy link
Contributor

bhs commented Apr 16, 2016

(LGTM as well)

### Exception

* Event: `exception`
* Payload - map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think map is too language specific (i.e. python). If this was Java/JavaScript, I would simply do span.log("exception", exc) and let the tracer figure out how to record it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that languages would benefit from some convenience methods that automatically extract features from commonly-used error/exception objects. However, would that be implemented via a log_event overload, or via a log_error or similar?

I suppose secretly I've been assuming that payloads should are maps/arrays/objects in the way that spans are. (An event might have an arbitrary set of data to report with it.) If not, instrumentors are going to have to figure out what types of payloads might be valid for various tracer implementations, I guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, would that be implemented via a log_event overload, or via a log_error or similar?

No, I was thinking just plain old log method: span.log("exception", exc)

If not, instrumentors are going to have to figure out what types of payloads might be valid for various tracer implementations, I guess?

If we were shooting for maps, they I would suggest changing the API to take a map rather than interface{} / Object. But it feels that people would question this choice; after all, an instance of Exception is a natural representation of the exception, why make more work for people to break it apart into a map?

We have stayed away from defining semantics of the payload so far. It's definitely something that we need to collectively address and be more prescriptive, to clarify the contract between instrumentors and implementors. But I would suggest a different issue/PR for that. Until then, an opaque Object is the least restrictive form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I have generally thought of the payload first and foremost as a place for the OpenTracing caller to stick "FYI"-level structured data that might help explain or contextualize the trace.

@bhs
Copy link
Contributor

bhs commented May 4, 2016

@dkuebric can you merge this soon?

@dkuebric
Copy link
Contributor Author

dkuebric commented May 4, 2016

@bensigelman Made changes per discussions of other week. I don't have write access but I think the PR is ready, pending a quick review.

@bhs
Copy link
Contributor

bhs commented May 4, 2016

@dkuebric oh dear, sorry! I'll review and likely merge shortly.

@bhs bhs merged commit ebd1020 into opentracing:master May 4, 2016
@dkuebric dkuebric deleted the add-data-semantics branch May 4, 2016 21:39
@dkuebric
Copy link
Contributor Author

dkuebric commented May 4, 2016

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants