-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for JSON formatted logs #5799
Conversation
This adds a new extra package dependency on `json-logging` and an environment variable, which when set to "true" regardless of case, will try to use the json-logging non-web app log formatter. If the `json-logging` package is not installed but the environment variable is True, something like this will be logged but it will not crash the application: ``` $ ENABLE_JSON_LOGGING=true jupyter notebook Unable to use json logging due to missing packages. Run "pip install notebook[json-logging]" to fix. Traceback (most recent call last): File "/home/osboxes/jupyter/notebook/notebook/notebookapp.py", line 144, in <module> import json_logging ModuleNotFoundError: No module named 'json_logging' ``` Initially I tried to add a new Bool config option to toggle this but the problem is (from my limited understanding of traitlets and tornado) is that `_log_formatter_cls` needs to be set early and trying to se the log formatter later in `init_logging` is too late - or at least I couldn't figure out a way to reset the log formatter dynamically (I tried calling `_log_format_changed` [1] but that didn't work). With this working you get logs like this: ``` {"written_at": "2020-10-07T21:10:51.606Z", "written_ts": 1602105051606265000, "msg": "404 GET /nbextensions/widgets/notebook/js/extension.js (127.0.0.1) 9.26ms referer=http://localhost:8888/notebooks/Untitled.ipynb", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 49} {"written_at": "2020-10-07T21:10:54.443Z", "written_ts": 1602105054443309000, "msg": "Starting buffering for f260ddd7-938c-42d0-ac3b-455bea76694f:49f30b53fc4b4ec6a8f2fb748a171613", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "kernelmanager", "line_no": 222} {"written_at": "2020-10-07T21:10:54.446Z", "written_ts": 1602105054446264000, "msg": "Kernel shutdown: f260ddd7-938c-42d0-ac3b-455bea76694f", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "multikernelmanager", "line_no": 201} ``` An obvious improvement here would to be able to split the `msg` fields apart so that we can log things like response status code, request method, request URL, response_time_ms etc. That should be possible with the `JSONLogWebFormatter` from `json-logging` but when I tried using that I was getting errors from the library about a correlation id (which is based on a request header we don't use). The `json-logging` library supports several web frameworks like Flask but unfortunately does not have built in support for Tornado, but it does support custom formatters so that might be a follow up option to improve on this. [1] https://github.com/ipython/traitlets/blob/4.3.3/traitlets/config/application.py#L195 Closes jupyter#5798
This is the error I get when trying to use
And it looks like that's because I'm not going through one of the https://github.com/bobbui/json-logging-python/blob/1.2.8/json_logging/__init__.py#L132 |
This gets us closer to logging parts of the request as separate json fields but not all the way there. For example: ```json {"written_at": "2020-10-07T21:52:16.975Z", "written_ts": 1602107536975497000, "msg": "{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 1.9807815551757812, 'referer': 'http://localhost:8888/notebooks/test.ipynb'}", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 54} ```
Getting a little further by changing the {"written_at": "2020-10-07T21:52:16.975Z", "written_ts": 1602107536975497000, "msg": "{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 1.9807815551757812, 'referer': 'http://localhost:8888/notebooks/test.ipynb'}", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 54} But the |
notebook/log.py
Outdated
@@ -46,5 +50,11 @@ def log_request(handler): | |||
if status >= 500 and status != 502: | |||
# log all headers if it caused an error | |||
log_method(json.dumps(dict(request.headers), indent=2)) | |||
log_method(msg.format(**ns)) | |||
if _enable_json_logs: | |||
# FIXME: this still logs the msg as a serialized json string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could use a partial to pass log
in here:
notebook/notebook/notebookapp.py
Line 269 in b617d91
log_function=log_request, |
Because that'd be the NotebookApp
logger which is using our json log formatter.
notebook/log.py
Outdated
@@ -46,5 +50,11 @@ def log_request(handler): | |||
if status >= 500 and status != 502: | |||
# log all headers if it caused an error | |||
log_method(json.dumps(dict(request.headers), indent=2)) | |||
log_method(msg.format(**ns)) | |||
if _enable_json_logs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because this is true doesn't mean that json-logging is installed. If it's not, the request log line ends up looking like this:
[W 17:04:57.097 NotebookApp] {'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 6.743431091308594, 'referer': 'http://localhost:8888/notebooks/Untitled.ipynb?kernel_name=python3'}
Also note that with the json logging we're not formatting the request time to 2 decimal places:
'request_time': 6.743431091308594
Thanks Matt. This is an interesting conundrum on a couple counts.
I think these can be resolved in a more application-friendly manner by first introducing a boolean configurable Traitlets also provide the ability to validate a config option. This is where we could see if the We could let the default value for the I apologize for these details, but because there were so many unknowns I had, it was quicker just to try things out. I have some working code - but I'd rather help you navigate this if you don't mind. Otherwise, just let me know. |
@kevin-bates thanks for those detailed pointers.
Yup, no problem and I definitely appreciate the patience with helping me learn here (I don't really have experience with tornado or traitlets, mostly copy/tweak/test when I need to make changes). |
@kevin-bates I worked on your suggestions a bit this morning and captured the diff here since it's not working. A few notes:
I think that's useful for someone looking at logging options to have those grouped together for context.
Thanks again for the help. |
Thanks Matt. The "tickle" should actually change the format. In this case, I removed the color-coding attributes. Here's the code I had put together the other day. Since you've changed the traitlet's name, that change would need to be reflected, as well as perhaps logging that the import failed during the validation. Lastly, the convention for envs is that they be prefixed with I found no changes necessary other than in notebookapp.py, but would prefer that you confirm the same. json_logging = Bool(config=True)
@default('json_logging')
def _default_json_logging(self):
return os.getenv('JUPYTER_ENABLE_JSON_LOGGING', 'false').lower() == 'true'
@validate('json_logging')
def _validate_json_logging(self, proposal):
value = proposal['value']
if value is True:
try:
import json_logging
self._log_formatter_cls = json_logging.JSONLogFormatter
self.log_format = u"[%(levelname)1.1s %(asctime)s.%(msecs).03d %(name)s] %(message)s"
except ImportError:
value = False
return value My concern with the change to |
Got it. I'm not totally sure about that solution since if I set
Done.
I see a bug in my code, I wasn't using
I renamed the config option to simply I'll tinker with this locally and see what I come up with. I'm also going to see what happens if we call https://github.com/bobbui/json-logging-python/blob/1.2.8/json_logging/__init__.py#L153 |
This adds the `NotebookApp.log_json` CLI option which defaults to False. The default value comes from the `JUPYTER_ENABLE_JSON_LOGGING` environment variable. A validation hook is added such that if json logging is enabled but the `json-logging` package is not installed, an error is logged and the effective `log_json` value is False. If enabled and the `json-logging` package is installed, the `init_logging` method will initialize the non-web [1] json logging formatter. [1] https://github.com/bobbui/json-logging-python#21-non-web-application-log
This works and avoids the need to change Still need to figure out how to get
and if you do it logs the request parameters as a serialized json string:
|
In order to use the same logger from the notebook app we need to pass it through to the log_request function. To do that we need to use a partial function so the `log` kwarg is available. If not provided the default logger is the same as before, the tornado web access logger. This is necessary for the `log_json` settings to be used for both the notebook app logs and the web app request logging. There is still a todo needed here so that the individual request fields get formatted separately rather than dumped into the `msg` field.
I'd like to add tests for this but I'm not sure if there is a great way to test it. Maybe something like start the notebook app server with the env var set and verify the logged message can be serialized and then deserialized JSON? Though that would also mean having |
This updates the log_request method to get the request properties logged separately if `log_json=True`. This ends up looking like: ``` {"written_at": "2020-10-09T20:59:52.044Z", "written_ts": 1602277192044732000, "msg": "", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 50, "status": 404, "method": "GET", "ip": "127.0.0.1", "uri": "/api/kernels/a6dd447b-282a-4e1c-ab65-eb340ad12965/channels?session_id=472907c27ed94e2cbfbae86501f7d159", "request_time": 8.63, "referer": "None"} ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Matt. This is coming along, but I think still too spread out. I've added some comments that should help isolate some of these changes. Thanks.
notebook/notebookapp.py
Outdated
@@ -250,9 +251,11 @@ def init_settings(self, jupyter_app, kernel_manager, contents_manager, | |||
# collapse $HOME to ~ | |||
root_dir = '~' + root_dir[len(home):] | |||
|
|||
# Use the NotebookApp logger and its formatting for tornado request logging. | |||
log_function = functools.partial(log_request, log=log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not finding this to be necessary. If I restore to the original, I get the same output.
This PR:
{"written_at": "2020-10-09T21:11:30.439Z", "written_ts": 1602277890439353000, "msg": "200 GET /api/contents?type=directory&_=1602277851454 (127.0.0.1) 22.49ms", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "DEBUG", "module": "log", "line_no": 49}
with changes reverted:
{"written_at": "2020-10-09T21:19:03.578Z", "written_ts": 1602278343578833000, "msg": "200 GET /api/contents?type=directory&_=1602278343433 (127.0.0.1) 12.31ms", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "DEBUG", "module": "log", "line_no": 49}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes coming soon.
Just use the app logger since that has the default LogFormatter logging until we change it to JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi Matt - this is looking great. Just had the one comment regarding the debug statement that doesn't seem to get displayed.
Make sure the default value is handled properly based on environment variables. Also checks the validation code based on whether or not json_logging is imported.
Basic test just to make sure log_request handles log_json=True properly.
The intent of the test is to make sure that log_json defaults to True when JUPYTER_ENABLE_JSON_LOGGING is set to "true" so this makes the test more explicit and to pass when json-logging is not installed so that does not interfere due to validation.
Hi Matt - would you mind merging with the latest master and pushing that commit? Your changes with legitimate test failures coincided with a general build issue and I'd like to have this PR be 'green' prior to its merge into master. Sorry for the inconvenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - thank you Matt.
Thanks for all of your help Kevin. |
@kevin-bates quick question, do you have any idea of a timeline for the next planned release? Looking at recent releases it looks like you've been doing monthly releases which is cool. My team would like to avoid pulling this change in via a git hash so if there is a release planned on the horizon we'll wait, just wondering. |
Would getting a release out in the next couple of weeks be sufficient - say last week of October/first week of November? Although convenient, I'd rather not set a fixed cadence unless there are other maintainers that can assist. |
Yup, that would be great. |
This adds a new extra package dependency on
json-logging
and an environment variable, which when set to "true" regardless
of case, will try to use the json-logging non-web app log formatter.
A
NotebookApp.log_json
config option is also added which getsits default value from the
JUPYTER_ENABLE_JSON_LOGGING
environment variable.
If the
json-logging
package is not installed butlog_json
is True,something like this will be logged but it will not crash the application:
With this working you get logs like this:
Web app requests get logged with separate properties like this:
The nice thing about that is you can filter request logs on certain error
status codes or request times that take longer than some threshold to
setup alerts in your logging infrastructure (ELK, LogDNA, etc).
Closes #5798