-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPYNB parsing: auto-register scrapbook scraps somehow #4
Comments
This may well be possible with executablebooks/MyST-Parser#47 |
Having looked at a scrapbook, I don't think it does quite what we want. But actually, using the same kind of mechanics, its super easy to achieve this 'content recording'. Basically paste this code into a notebook, execute and save: def get_mimetypes(obj):
if hasattr(obj, "_repr_mimebundle_"):
return obj._repr_mimebundle_()
mimebundle = {}
for mimetype, method in (
("text/plain", "__str__"),
("text/html", "_repr_html_"),
("application/json", "_repr_json_"),
("image/jpeg", "_repr_jpeg_"),
("image/png", "_repr_png_"),
("image/svg+xml", "_repr_svg_"),
("text/latex", "_repr_latex_"),
):
if hasattr(obj, method):
mime_content = getattr(obj, method)()
if mime_content is not None:
mimebundle[mimetype] = getattr(obj, method)()
return mimebundle
def record_outputs(obj, key, metadata=None):
from IPython.display import display
mimebundle = get_mimetypes(obj)
if not mimebundle:
raise ValueError("No mimebundle available")
metadata = metadata or {}
metadata.update({"record_key": key})
display(
{"recorded/" + k: v for k, v in mimebundle.items()},
raw=True,
metadata=metadata,
)
a = "abc"
record_outputs(a, "mytext")
import pandas as pd
record_outputs(pd.DataFrame([1, 2, 3]), "mytable") If you look at the notebook JSON, you'll see the outputs have all been saved, along with the keys in the metadata: {
"cells": [
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"recorded/text/plain": "abc"
},
"metadata": {
"record_key": "mytext"
},
"output_type": "display_data"
},
{
"data": {
"recorded/text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n </tr>\n </tbody>\n</table>\n</div>",
"recorded/text/plain": " 0\n0 1\n1 2\n2 3"
},
"metadata": {
"record_key": "mytable"
},
"output_type": "display_data"
}
]
}
]
} From this its then super easy to query for record keys, given a reference like |
@choldgraf, following our discussion, I looked at the scrapbook code, and I see how they do it now, which with my example above, you can literally condense to a couple of lines of code. def glue(obj, name, display=False):
import IPython
from IPython.display import display as ipy_display
mimebundle, metadata = IPython.core.formatters.format_display_data(obj)
mime_prefix = "" if display else "application/papermill.record/"
metadata["scrapbook"] = dict(name=name, mime_prefix=mime_prefix)
ipy_display(
{mime_prefix + k: v for k, v in mimebundle.items()},
raw=True,
metadata=metadata,
)
a = "abc"
glue(a, "mytext")
import pandas as pd
glue(pd.DataFrame([1, 2, 3]), "mytable", display=True) It just seems a lot of overhead to depend on a separate package for such a simple function. |
@chrisjsewell yea that makes sense to me...so we only want a tiny fraction of what scrapbook offers (aka, "only try to store the display information of an object, or store the raw object values if it's a text object") I'm less-concerned with which technical stack that we use, and more concerned with what kinds of behavior we'd ask users to take in order to use our stack. The main reason I was suggesting scrapbook was because it's the only pre-existing API I know of for "store some information inside the notebook to be re-used later" but perhaps we can piggy-back off of that pattern while using a much-simplified stack? |
Yeh, as I see it the primary functionality is: "store the output mime-bundles of any object with a unique key identifier, in a consistent format for later querying" (text objects here are treated the same as any other object), and the potential secondary functionality is: "store this data without displaying it in the frontend notebook" (this is where you need to add a prefix to the mime type). You can obviously do that with scrapbook; but then I fear you make the whole of your stack dependent on scrapbook; e.g. you would need to read in all notebooks with the scrapbook reader, rather than the standard nbformat one. |
@chrisjsewell I've been thinking about this a bit more - do you imagine that this should be in |
You could do something like process the stored outputs for these stored scraps, at commit time, then store them in the DB commit record, for fast lookup. However, a complication is that the cache can store multiple versions of a notebook which would lead to key clashes, and the cached notebooks don't necessarily relate to what notebooks are being used in the sphinx build. Therefore, I think this may be better handled in the MyST-NB parser:
|
so do you imagine something like:
and it'd store the mimebundle outputs in the notebook metdata (maybe For a simplest implementation, we could do something like:
Over time, this could be modified to use a smarter cacheing mechanism than a dictionary, but I'm just imagining a quick working prototype. WDYT? |
It would probably just store it in the outputs, but with a prefix on the mimetype, so that its ignored by general renderers (as scrapbook does) |
Ah yeah, and then as we loop through the cells, check the outputs for a mimetype meant for caching and then store it? |
But yeh thats the general idea |
committed on behalf of @choldgraf This adds a prototype functionality for "glue and paste" with MyST-NB. It closes #4 You glue things into a notebook's metadata like this: ```python from myst_nb import glue glue("your-key", an_object) ``` And it will run IPython's display on the object, then store the mimebundle at the key you specify. When the notebooks are parsed with `MyST-NB`, it builds up a registry of all the keys across all notebooks, so that you can then refer to them in the following ways: You can paste it in markdown with a directive like this: ```` ```{paste} your-key ``` ```` Or you can add it in-line like this: ``` {paste}`your-key` ``` optionally: ``` {paste}`your-key:format-string` ``` See documentation for more details
An idea I just had when thinking about how we can reference content from one notebook to another - what if as a part of the Sphinx parser we automatically generated docutils targets from any scraps that are in a notebook during the parsing process. That way references to those targets could already exist and users could reference them with a role, similar to what we do with
:ref:
. I'm imagining something like:In a notebook
Somebody writes an analysis that generates a plot they'd like to include elsewhere in their docs. They run
scrapbook.glue('myplot')
In a MyST document
Somebody wants to include that "scrap" from the notebook. They only need to do something like
and Sphinx just looks up
myplot
against a list of scraps that it has found across any parsed notebook.The text was updated successfully, but these errors were encountered: