Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#9] Add Duration of DAG Runs #11

Merged
merged 4 commits into from
Sep 12, 2018

Conversation

hydrosquall
Copy link
Contributor

This targets feature request #9. It Reports the duration that currently running DagRuns have been running for.

This can be used when people are trying to alert based on DagRuns that have gone on longer than expected.

@hydrosquall hydrosquall force-pushed the feature/add-dag-run-duration branch from c1bb88c to 51f7c27 Compare September 10, 2018 20:42
@hydrosquall hydrosquall changed the title [#9] Add dag run duration [#9] Add Duration of DAG Runs Sep 10, 2018
@elephantum
Copy link
Contributor

@hydrosquall do you think it's possible to have generalized metric "duration of oldest dagrun for specific status"?

i.e. sometimes we have a problem when dagrun is stuck in "queued" state. we'd like to alert when dagrun is in queued for more than one hour.

@hydrosquall
Copy link
Contributor Author

Hi @elephantum -

I checked the DagRun model, and it looks like there are only 3 possible DagRun states (running, success, failure). I wanted to capture generalized duration for all 3 states, but the problem is that end_date was not always stored on the dagrun.

https://github.com/apache/incubator-airflow/blob/1f038a7919207338471d31890f76e71e5cb4571c/airflow/utils/state.py#L60

Queued status alerts are possible, but that felt to me like it would belong to a Duration of TaskInstance metric instead.

@elephantum
Copy link
Contributor

You're right. I was thinking about TaskInstance while your PR is about DagRun. I'm checking it locally and merging.

@elephantum
Copy link
Contributor

elephantum commented Sep 11, 2018

@hydrosquall One more question: as I see in a situation when we have three simultaneous DagRuns for the same dag_id we'll have three metrics for this dag_id. Will this be actually useful?

What is your target scenario for monitoring?

@hydrosquall
Copy link
Contributor Author

hydrosquall commented Sep 11, 2018

Good question! I believe each one of the DagRuns will create a unique row with its own run_id, and it's completely fine for multiple run_ids to share the same dag_id.

In the scenario where I'm using this, 1 dag_id will only have 1 active run at a time. However, I don't believe that would break the sort of alerting that I want if there were multiple DagRuns happening concurrently, since I'm interested in being notified if any DagRuns are going for beyond a certain period of time.

@elephantum
Copy link
Contributor

Ok, I can't see any issues with this approach.

@elephantum elephantum merged commit 355cd45 into epoch8:master Sep 12, 2018
@hydrosquall hydrosquall deleted the feature/add-dag-run-duration branch September 12, 2018 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants