-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Proposed improvement/alternative to jupyter-sphinx
#47
Conversation
The only addition is that the SphinxRenderer handles cross references with the `pending_xref` node.
This is really cool! After I understand it better, I think it would be great if this could go to jupyter-sphinx, and fix a bunch of issues there! |
@chrisjsewell a few questions:
|
I'll take a look at this as well, it sounds like quite an interesting proposal! At a first glance, my main concern is with maintainability and using an SQLite database. It may be the best technical solution to this, but we should think about it balanced against the likelihood that potential maintainers will have experience with sqlite. I can get a better feel for this after looking through the implementation code. To me the question is always "will contributing this project be accessible to somebody with only volunteer time, and a PhD's worth of Python experience in a scientific field" |
|
Well I think sqlalchemy makes it relatively easy to work with sql databases, and hides a lot of the complexity. I only learnt it in January and didn't find it too traumatising. But have a look at the code and see what you think. |
The idea of this is that there is no inherent constraint that the outputs of a code cell must have one defined visualisation. Along the lines of the classic Model-View pattern for data visualisation, ideally the data and the views would be split. Granted that for simple use cases, the user may want to reduce the syntax, and only use the single |
Thanks for the explanation. I agree that storing all outputs, and referring to them from the How would the user control rendering of a single output? By including a separate I would like to suggest sticking to the constraint that the cell outputs only occur after the cell. This choice was a result of a lengthy discussion when we were working on jupyter-sphinx, and the main motivation was the robustness. When working on documentation, it is usually easy to keep in your memory what happens within a single kernel, and manipulate its outputs explicitly. Allowing to include outputs elsewhere may lead to subtle errors because the effects of the changes become much more nonlocal. Similarly, I propose to make the inline role also execute a small cell, rather than refer to some code. I expect that the inline executable code would mainly yield text or latex output, and would be relatively short. This approach (inline code rather than inline references) is also used by Rmd. |
Also I'd like provide more background about the jupyter-sphinx design: a principle I found really useful was to critically examine the amount of configurability we allow. For example, I see that you control that whether errors are tolerated is controlled on a per-kernel basis in your implementation, whereas the jupyter-sphinx only has a global setting or a per-cell setting. This is a result of considering usage scenarios outlined in jupyter/jupyter-sphinx#73. Basically, since adding features is much easier than removing them, we opted for implementing a minimal set that we consider definitely useful, and to wait for requests with implementing anything else. |
|
||
# Document Elements | ||
app.add_directive("jupyter-kernel", JupyterKernel) | ||
app.add_directive("jupyter-exec", JupyterExec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: is there a reason why you use jupyter-exec
, and not jupyter-execute
? I personally don't have a preference either way, but if possible I'd like to avoid changing conventions. Or is there another package that introduces a jupyter-exec
directive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it was absolutely an arbitrary choice, and can be changed to jupyter-execute
👍
Thanks 👍 , I will certainly take this into consideration. Actually the only reason I added the |
Yeh that's the idea: .. jupyter-exec::
:timeout: 20
:show-code:
print("x")
a = 1
a
.. jupyter-view::
Although as I mentioned before, for simple use cases this may not be ideal; to have to put a |
jupyter-sphinx now defines some basic options like |
Well you still at least need to make sure that the code/process is structured in such a way that it can be extended like this. Also, to my mind there is already some potential needs for it; adding things like captions and labels to outputs, allowing for complimentary latex/PDF output, etc... |
Ah, but I have nothing against the view node or the code structure. Rather I think the view directive may be omitted at first—that faces the users and complicates the interface. Instead, I imagine the view node would be inserted during the execution. |
Also: what are output captions and labels? Is that similar to latex figure captions and labels? If that's the case, wouldn't it be better specified outside of the jupyter-specific directives? |
Yeh no problem, I wouldn't be completely against that lol |
Well it would be like the You mean wrapping the |
I think a round-trip conversion would be nearly impossible to achieve regardless of that, due to the notebook format limitations. Both v2 and v2.1 component listings in executablebooks/meta#21 don't include round-tripping. See also discussions in #27. Come to think of this, even following this PR, a text representation with more than one kernel cannot round-trip to a notebook. |
Well it should be round-trippable for a basic subset of the text format. Yes you won’t be able to do a 1:1 mapping for multiple kernels, but then generally if your just using them for separate sections you can just split those sections into multiple documents (would be nice in Sphinx if multiple source documents could end up as a single HTML page). For single kernel pages particularly, it might be nice (and facilitate this round trip) if you could specify the kernel in the front matter. In RST it would look like this: :kernel_name: name In myst: —-
kernel_name: name
—- This would be available (after passing the whole document) in |
Sure, but why would figure captions and labels belong to this basic subset? My concern here is that this way a directive for code execution also takes responsibility for something unrelated to execution, which is also provided by another directive. EDIT: kernel in the frontmatter sounds like a great idea. |
Well surely this was my whole point with the This is not already provided by another directive; the |
@chrisjsewell great point. Since caption itself may include arbitrary rst content, it also seems hard to implement a wrapper that would create figures. Continuing with the usage of Also: how would this interact with thebelab? Since its users aren't the code devs, they will be confused if the input code is separated in the document from the result it produces (also thebelab only inserts the output right after the input). Should the code appear in the figure body after thebelab is activated? What should happen to the original code? Same questions apply to the case when there are multiple references to the execute call, or when the execute and view are separated in the document. Currently in jupyter-sphinx the code only moves when thebelab is activated if initially if it was rendered below the output, and that's not too far. Further questions: do we expect that the users will need to tweak the rendering of the mime bundle on a case by case basis, or would a global configuration suffice? |
mime: OrmMimeBundle = session.query(OrmMimeBundle).filter_by(pk=pk).one() | ||
source = mime.source | ||
if mime.mimetype == "html": | ||
source = dedent(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always a safe operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean safe? It’s not actually executing the source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that's a noob question: is dedenting html guaranteed to not change its looks?
What's the status of this PR? Is it superseded by the existence of |
Eventually it will be, but I'll leave it open for now since it has some different/additional code than is currently in |
Now being implemented in |
@choldgraf @akhmerov following discussion in #42 I've created a proposal here for how I would forsee the execution working.
To recap, the Sphinx build phases and Sphinx core events can be summarised as:
jupyter_sphinx.main
contains the extension setup, and utilises this approach:In (2) Setup a
JupyterDB
, that is stored within the doctrees build folder. This is imported from a separate packagejupyter_db
whose sole purpose is to maintain a database of all the kernels code cells, and their outputs. It doesn't know anything about sphinx, hence having it as a separate package. I have written this as an SQL database (with sqlalchemy to provide an OO layer on top), since I think that's best for fast look-up and ansynchronous access. But you could essentially write this in anyway you see fit, as long as it has the same interface (i.e. class methods)During (3) any documents (and their related cells, kernels, outputs) that have been changed/removed are removed from
JupyterDB
.During (7) directives save kernels and code cells to the
JupyterDB
and theJupyterDB
return primary keys (pk) to those elements, which are saved on thedocutils
elements for later lookup.During (10) the
JupyterDB
is parsed to an entry-point defined function:run_execution
. This function's job is to act 'in-place' onJupyterDB
to populate it with outputs.Again it doesn't know anything about sphinx, so could be external to
jupyter-sphinx
.To decide which kernels need re-running, it can check which kernels / cells no longer have outputs, or could even check the outputs 'last modified time'.
During (12) The primary keys are used to access the outputs, and insert them where necessary.