Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augur log data oversaturation #3008

Open
cdolfi opened this issue Feb 20, 2025 · 4 comments
Open

Augur log data oversaturation #3008

cdolfi opened this issue Feb 20, 2025 · 4 comments

Comments

@cdolfi
Copy link

cdolfi commented Feb 20, 2025

It can be hard at times to debug because the logs can not be kept for long as 1000s of jobs generate 100s of lines of stack trace, all at once after a potential useful error log (like index corruption messages) has passed. There might be a design problem/change that would help the useful logs be able to stick around longer. @GregSutcliffe tagged to provide more context and lmk if I should edit this description

@Ulincsys
Copy link
Contributor

Hi Cali,

The primary log stream receives the combined output of every logging interface within Augur, so it does go very quickly on an active instance.

However, in the logs/ directory (or elsewhere, if configured otherwise) each logging unit should have its own log file in the form of logging_unit.info and logging_unit.error for separate stdout/stderr logs, or logging_unit.out for a combined log stream.

If you are running in a docker container, then you'll need to mount logs/ as a volume (which I believe that @GregSutcliffe has mentioned doing previously), or you can access the container via an interactive shell with

docker exec -it <container_name> bash

@sgoggins
Copy link
Member

I think setting the LOGLEVEL to ERROR in the augur_operations.config table will also reduce logging to contain something more manageable and expected in a production instance.

@cdolfi
Copy link
Author

cdolfi commented Feb 24, 2025

@GregSutcliffe Do you know what our logging level is?

@GregSutcliffe
Copy link
Collaborator

Sorry for the delay in getting to this, it's been a busy week :)

So, there are a couple of issues here that I think are worth tackling. Firstly, yes, we do use a fairly low logging level, because we've been having issues. That's obviously part of the problem, but it's an expected one, and I'm happy to agree that we would want a higher level at a later time.

However, this part is actually the issue:

However, in the logs/ directory (or elsewhere, if configured otherwise) each logging unit should have its own log file in the form of logging_unit.info and logging_unit.error for separate stdout/stderr logs, or logging_unit.out for a combined log stream.

This process, while useful for debugging, is a major pain from an operational view. Standard Linux practice is to write a logrotate.d/ file for the log files, such that Logrotate and manage the log files and rotate them (and gzip the old ones) on a variety of criteria, including file size. However, Logrotate cannot recurse or recurse-glob, so I currently have to specify every file manually in my snippet, because there are 2 files per task and they reside in their own directories. Quick example:

...
├── start_tasks
│   ├── augur_collection_monitor
│   │   ├── augur_collection_monitor.error
│   │   ├── augur_collection_monitor.info
│   │   ├── augur_collection_monitor.info.1.gz
│   ├── augur_collection_update_weights
│   │   ├── augur_collection_update_weights.error
│   │   ├── augur_collection_update_weights.info
│   │   ├── augur_collection_update_weights.info.1.gz
│   ├── create_collection_status_records
│   │   ├── create_collection_status_records.error
│   │   └── create_collection_status_records.info
...

This leads to a very brittle logrotate config - if a new version of Augur adds a new task, I won't necessarily know to update my logging configuration. It currently reads:

Logrotate.conf

root@augur:~# cat /etc/logrotate.d/augur /opt/logs/events/collect_events/collect_events.info /opt/logs/traffic/collect_github_repo_clones_data/collect_github_repo_clones_data.info /opt/logs/events_task/collect_gitlab_merge_request_events/collect_gitlab_merge_request_events.info /opt/logs/events_task/collect_gitlab_issue_events/collect_gitlab_issue_events.info /opt/logs/start_tasks/augur_collection_monitor/augur_collection_monitor.info /opt/logs/start_tasks/augur_collection_update_weights/augur_collection_update_weights.info /opt/logs/start_tasks/non_repo_domain_tasks/non_repo_domain_tasks.info /opt/logs/start_tasks/create_collection_status_records/create_collection_status_records.info /opt/logs/start_tasks/retry_errored_repos/retry_errored_repos.info /opt/logs/augur.info /opt/logs/tasks/detect_github_repo_move_core/detect_github_repo_move_core.info /opt/logs/tasks/process_scc_value_metrics/process_scc_value_metrics.info /opt/logs/tasks/collect_repo_info/collect_repo_info.info /opt/logs/tasks/process_pull_request_commits/process_pull_request_commits.info /opt/logs/tasks/process_ossf_dependency_metrics/process_ossf_dependency_metrics.info /opt/logs/tasks/process_dependency_metrics/process_dependency_metrics.info /opt/logs/tasks/collect_linux_badge_info/collect_linux_badge_info.info /opt/logs/tasks/process_libyear_dependency_metrics/process_libyear_dependency_metrics.info /opt/logs/tasks/collect_releases/collect_releases.info /opt/logs/tasks/process_pull_request_files/process_pull_request_files.info /opt/logs/tasks/collect_pull_requests/collect_pull_requests.info /opt/logs/tasks/insert_facade_contributors/insert_facade_contributors.info /opt/logs/tasks/collect_pull_request_review_comments/collect_pull_request_review_comments.info /opt/logs/tasks/detect_github_repo_move_secondary/detect_github_repo_move_secondary.info /opt/logs/tasks/collect_pull_request_reviews/collect_pull_request_reviews.info /opt/logs/merge_request_task/collect_merge_request_reviewers/collect_merge_request_reviewers.info /opt/logs/merge_request_task/collect_merge_request_files/collect_merge_request_files.info /opt/logs/merge_request_task/collect_gitlab_merge_requests/collect_gitlab_merge_requests.info /opt/logs/merge_request_task/collect_merge_request_comments/collect_merge_request_comments.info /opt/logs/merge_request_task/collect_merge_request_metadata/collect_merge_request_metadata.info /opt/logs/merge_request_task/collect_merge_request_commits/collect_merge_request_commits.info /opt/logs/core_task_failure.info /opt/logs/messages/collect_github_messages/collect_github_messages.info /opt/logs/populate_repo_src_id/populate_repo_src_id_task/populate_repo_src_id_task.info /opt/logs/facade_tasks/trim_commits_post_analysis_facade_task/trim_commits_post_analysis_facade_task.info /opt/logs/facade_tasks/git_update_commit_count_weight/git_update_commit_count_weight.info /opt/logs/facade_tasks/facade_analysis_init_facade_task/facade_analysis_init_facade_task.info /opt/logs/facade_tasks/facade_fetch_missing_commit_messages/facade_fetch_missing_commit_messages.info /opt/logs/facade_tasks/trim_commits_facade_task/trim_commits_facade_task.info /opt/logs/facade_tasks/analyze_commits_in_parallel/analyze_commits_in_parallel.info /opt/logs/facade_tasks/git_repo_updates_facade_task/git_repo_updates_facade_task.info /opt/logs/facade_tasks/clone_repos/clone_repos.info /opt/logs/facade_tasks/facade_error_handler/facade_error_handler.info /opt/logs/facade_tasks/facade_start_contrib_analysis_task/facade_start_contrib_analysis_task.info /opt/logs/facade_tasks/facade_analysis_end_facade_task/facade_analysis_end_facade_task.info /opt/logs/server.info /opt/logs/facade_task_failure.info /opt/logs/collection_util/facade_clone_success_util/facade_clone_success_util.info /opt/logs/collection_util/issue_pr_task_update_weight_util/issue_pr_task_update_weight_util.info /opt/logs/collection_util/secondary_task_success_util/secondary_task_success_util.info /opt/logs/collection_util/ml_task_success_util/ml_task_success_util.info /opt/logs/collection_util/task_failed_util/task_failed_util.info /opt/logs/collection_util/facade_task_success_util/facade_task_success_util.info /opt/logs/collection_util/core_task_success_util/core_task_success_util.info /opt/logs/secondary_task_failure.info /opt/logs/contributors/grab_comitters/grab_comitters.info /opt/logs/contributors/process_contributors/process_contributors.info /opt/logs/frontend/add_gitlab_repos/add_gitlab_repos.info /opt/logs/frontend/add_repo/add_repo.info /opt/logs/frontend/add_org_repo_list/add_org_repo_list.info /opt/logs/frontend/add_github_orgs_and_repos/add_github_orgs_and_repos.info /opt/logs/frontend/add_org/add_org.info /opt/logs/issues/collect_issues/collect_issues.info /opt/logs/issues_task/collect_gitlab_issue_comments/collect_gitlab_issue_comments.info /opt/logs/issues_task/collect_gitlab_issues/collect_gitlab_issues.info /opt/logs/refresh_materialized_views/refresh_materialized_views/refresh_materialized_views.info /opt/logs/augur_view.info /opt/logs/contributor_breadth_worker/contributor_breadth_model/contributor_breadth_model.info /opt/logs/events/collect_events/collect_events.error /opt/logs/traffic/collect_github_repo_clones_data/collect_github_repo_clones_data.error /opt/logs/events_task/collect_gitlab_merge_request_events/collect_gitlab_merge_request_events.error /opt/logs/events_task/collect_gitlab_issue_events/collect_gitlab_issue_events.error /opt/logs/start_tasks/augur_collection_monitor/augur_collection_monitor.error /opt/logs/start_tasks/augur_collection_update_weights/augur_collection_update_weights.error /opt/logs/start_tasks/non_repo_domain_tasks/non_repo_domain_tasks.error /opt/logs/start_tasks/create_collection_status_records/create_collection_status_records.error /opt/logs/start_tasks/retry_errored_repos/retry_errored_repos.error /opt/logs/augur_view.error /opt/logs/tasks/detect_github_repo_move_core/detect_github_repo_move_core.error /opt/logs/tasks/process_scc_value_metrics/process_scc_value_metrics.error /opt/logs/tasks/collect_repo_info/collect_repo_info.error /opt/logs/tasks/process_pull_request_commits/process_pull_request_commits.error /opt/logs/tasks/process_ossf_dependency_metrics/process_ossf_dependency_metrics.error /opt/logs/tasks/process_dependency_metrics/process_dependency_metrics.error /opt/logs/tasks/collect_linux_badge_info/collect_linux_badge_info.error /opt/logs/tasks/process_libyear_dependency_metrics/process_libyear_dependency_metrics.error /opt/logs/tasks/collect_releases/collect_releases.error /opt/logs/tasks/process_pull_request_files/process_pull_request_files.error /opt/logs/tasks/collect_pull_requests/collect_pull_requests.error /opt/logs/tasks/insert_facade_contributors/insert_facade_contributors.error /opt/logs/tasks/collect_pull_request_review_comments/collect_pull_request_review_comments.error /opt/logs/tasks/detect_github_repo_move_secondary/detect_github_repo_move_secondary.error /opt/logs/tasks/collect_pull_request_reviews/collect_pull_request_reviews.error /opt/logs/merge_request_task/collect_merge_request_reviewers/collect_merge_request_reviewers.error /opt/logs/merge_request_task/collect_merge_request_files/collect_merge_request_files.error /opt/logs/merge_request_task/collect_gitlab_merge_requests/collect_gitlab_merge_requests.error /opt/logs/merge_request_task/collect_merge_request_comments/collect_merge_request_comments.error /opt/logs/merge_request_task/collect_merge_request_metadata/collect_merge_request_metadata.error /opt/logs/merge_request_task/collect_merge_request_commits/collect_merge_request_commits.error /opt/logs/server.error /opt/logs/messages/collect_github_messages/collect_github_messages.error /opt/logs/populate_repo_src_id/populate_repo_src_id_task/populate_repo_src_id_task.error /opt/logs/facade_tasks/trim_commits_post_analysis_facade_task/trim_commits_post_analysis_facade_task.error /opt/logs/facade_tasks/git_update_commit_count_weight/git_update_commit_count_weight.error /opt/logs/facade_tasks/facade_analysis_init_facade_task/facade_analysis_init_facade_task.error /opt/logs/facade_tasks/facade_fetch_missing_commit_messages/facade_fetch_missing_commit_messages.error /opt/logs/facade_tasks/trim_commits_facade_task/trim_commits_facade_task.error /opt/logs/facade_tasks/analyze_commits_in_parallel/analyze_commits_in_parallel.error /opt/logs/facade_tasks/git_repo_updates_facade_task/git_repo_updates_facade_task.error /opt/logs/facade_tasks/clone_repos/clone_repos.error /opt/logs/facade_tasks/facade_error_handler/facade_error_handler.error /opt/logs/facade_tasks/facade_start_contrib_analysis_task/facade_start_contrib_analysis_task.error /opt/logs/facade_tasks/facade_analysis_end_facade_task/facade_analysis_end_facade_task.error /opt/logs/collection_util/facade_clone_success_util/facade_clone_success_util.error /opt/logs/collection_util/issue_pr_task_update_weight_util/issue_pr_task_update_weight_util.error /opt/logs/collection_util/secondary_task_success_util/secondary_task_success_util.error /opt/logs/collection_util/ml_task_success_util/ml_task_success_util.error /opt/logs/collection_util/task_failed_util/task_failed_util.error /opt/logs/collection_util/facade_task_success_util/facade_task_success_util.error /opt/logs/collection_util/core_task_success_util/core_task_success_util.error /opt/logs/secondary_task_failure.error /opt/logs/facade_task_failure.error /opt/logs/augur.error /opt/logs/contributors/grab_comitters/grab_comitters.error /opt/logs/contributors/process_contributors/process_contributors.error /opt/logs/core_task_failure.error /opt/logs/frontend/add_gitlab_repos/add_gitlab_repos.error /opt/logs/frontend/add_repo/add_repo.error /opt/logs/frontend/add_org_repo_list/add_org_repo_list.error /opt/logs/frontend/add_github_orgs_and_repos/add_github_orgs_and_repos.error /opt/logs/frontend/add_org/add_org.error /opt/logs/issues/collect_issues/collect_issues.error /opt/logs/issues_task/collect_gitlab_issue_comments/collect_gitlab_issue_comments.error /opt/logs/issues_task/collect_gitlab_issues/collect_gitlab_issues.error /opt/logs/refresh_materialized_views/refresh_materialized_views/refresh_materialized_views.error /opt/logs/contributor_breadth_worker/contributor_breadth_model/contributor_breadth_model.error { maxsize 100M hourly missingok rotate 5 compress notifempty nocreate copytruncate su root root }

Not great, hard to manage. As such, I would like the option to disable the separate log files (probably as a config option, because I recognise their use in other circumstances), and have them all go to a single file, so I can write a simpler and unchanging file.

In Docker-land specifically, I would also like to get the stdout/stderr into that directory too (instead of to stdout) because as it stands the Docker internal log buffer can also consume all the disk space - I had to add:

root@augur:~# cat /etc/docker/daemon.json 
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

To the Docker Engine config to get it stop doing that - having that stuff in the logdir would make it easier to manage, and safe out-of-the-box for other Docker users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants