Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ianb/dogstreams aggregate #3732

Merged
merged 4 commits into from
Aug 2, 2018
Merged

Ianb/dogstreams aggregate #3732

merged 4 commits into from
Aug 2, 2018

Conversation

ian28223
Copy link
Contributor

@ian28223 ian28223 commented May 7, 2018

What does this PR do?

Includes the tags (metric attributes if there's none) as part of the metric aggregation.

Motivation

Given the following logs, dogstreams will only submit one of the metric (usually the last one) and drop the others even if they are tagged differently.

my.vcurrency 1525651200 100 tags=curr:usd
my.vcurrency 1525651202 100 tags=curr:usd
my.vcurrency 1525651203 200 tags=curr:aud
my.vcurrency 1525651204 100 tags=curr:usd
my.vcurrency 1525651205 200 tags=curr:aud
my.vcurrency 1525651206 300 tags=curr:eur

Expected metrics to be sent to DD after aggregation:
3 unique:

my.vcurrency 1525651204 100 tags=curr:usd
my.vcurrency 1525651205 200 tags=curr:aud
my.vcurrency 1525651206 300 tags=curr:eur

Actual result: (just one)

my.vcurrency 1525651206 300 tags=curr:eur

ian28223 added 3 commits May 4, 2018 15:25
Issue: if multiple metrics with the same metric_name and timestamp but different tags are read, only one is submitted. Other metrics with other tags are lost
return (p[1], p[0], p[3].get('host_name', None), p[3].get('device_name', None))

# Sort and group by timestamp, metric name, host_name, device_name, (tags or attributes)
tags = p[3].get('tags', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None being the default return value from get, we could simplify by:

tags = p[3].get("tags")
attribs = sorted(tags.split(",") if tags else p[3])

Same line 32, we can remove the second parameter get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that. It was just the pattern of the original code; let me make the changes.

# Sort and group by timestamp, metric name, host_name, device_name
return (p[1], p[0], p[3].get('host_name', None), p[3].get('device_name', None))

# Sort and group by timestamp, metric name, host_name, device_name, (tags or attributes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we try to found tags and not use attribs all the time ? Is it a special use case where if tags is present in the attributes the code will later use it ?

Copy link
Contributor Author

@ian28223 ian28223 Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason why I did it this way.

As per Our docs, the conical format is
metric unix_timestamp value [attribute1=v1 attributes2=v2 ...]

And for ParserFunction-parsed logs, attributes contain tags.

    metric_attributes = {
        'tags': tags,
        'metric_type': 'gauge',
    }
    return (metric_name, date, metric_value, metric_attributes)

So ideally in canonical format, metric tags should be sent like below - which is why I look for tags
mymetric 1531900000 99 tags=env:prod,app:web metric_type=gauge

BUT, most of the time in practice (and from an example we provided from old KB content), the tags are sent as follows. In this case there are no tags so we assume that the attribs could contain the tags.

applications.function.runtime_seconds 1464462187 24 application=myapp code_author=gus
mymetric 1531900000 99 env=prod app=web
mymetric 1531900001 99 env=prod app=web

Why not attribs all the time?
Because we have to assume that the tags (if present) are ordered - which isn't always the case. Example below where using attribs doesn't work
1_ Using canonical format, this would look like 2 unique metrics and will not be aggregated. groupby() apparently sees it that way.

mymetric 1531900000 99 tags=env:prod,app:web metric_type=gauge
mymetric 1531900000 99 tags=app:web,env:prod metric_type=gauge

2_ using parser function, same thing. If the tags were submitted in a different order, they'll not be aggregated and we can't assume that tags are always sorted when returned.

user.crashes|2016-05-28 20:24:43.111|24|LotusNotes,Outlook,Explorer
user.crashes|2016-05-28 20:24:43.222|24|LotusNotes,Explorer,Outlook
    tags = extras.split(',')
    metric_attributes = {
        'tags': tags,
        'metric_type': 'gauge',
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for that great explication !

Should we not also sort p[3] if tags is not set ? Right now we're only sorting tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, theres's no need. Per my testing, we don't need to sort dict; only needed to do it for list. Took a while for me to figure this out: ¯_(ツ)_/¯

  • attribs returned is a list if tags were found
  • attribs returned is a dict if from p[3].

`None` being the default return value from `get`
@hush-hush hush-hush added this to the 5.27.0 milestone Jul 20, 2018
@hush-hush hush-hush merged commit 0960ba7 into master Aug 2, 2018
@hush-hush hush-hush deleted the ianb/dogstreams_aggregate branch August 2, 2018 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants