Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate highlighter and entities generator to Python 3 #227

Merged
merged 3 commits into from
May 28, 2020

Conversation

sideshowbarker
Copy link
Member

Fixes #220

@sideshowbarker sideshowbarker force-pushed the sideshowbarker/python3-updates branch from f6b7e47 to 40181a7 Compare May 21, 2020 09:24
@sideshowbarker
Copy link
Member Author

CI is failing, I think because of this part:

https://travis-ci.org/github/whatwg/html-build/builds/689576444#L2199

fatal: not a git repository (or any of the parent directories): .git

I think that’s because the wattsi build script expects to be run from within a clone of the wattsi repo, and the wattsi directory the CI is building wattsi from is apparently not a git clone?

@domenic
Copy link
Member

domenic commented May 21, 2020

I don't think that's it. That's just trying to figure out the Wattsi version number, and always happens on Docker or on CI (because if you put a .git directory into a Docker container bad things happen). You can see the same error in the previous successful build at https://travis-ci.org/github/whatwg/html-build/builds/686853995#L2250.

Since it's failing between the Highlighting and saving index-html and the Splitting... steps, I'm suspecting it's something about the highlighter still, but we'll need a bit more work to debug. I can try to do so (since I have Docker all set up and everything).

@domenic
Copy link
Member

domenic commented May 21, 2020

I think this is working now. The missing step was to pip install the highlighter package, since it now gets is dependencies (like widlparser) from pip, instead of vendoring them.

I also had to rejigger some of the base image stuff since debian:stable doesn't let you apt-get Python 3.8 out of the box.

I sent tabatkins/highlighter#18 which should land first. Then, we might want to consider pip installing the highlighter itself (instead of git cloning it and pip installing the local folder), if @tabatkins does a publish.

@tabatkins
Copy link

Latest code has been published 👍

@domenic
Copy link
Member

domenic commented May 21, 2020

@tabatkins dumb question, but if I do pip install bs-highlighter, how do I invoke the resulting server.py from the command line?

@tabatkins
Copy link

Not a dumb question: you can't. I'll need to add a command for that. ^_^

@domenic
Copy link
Member

domenic commented May 22, 2020

(FYI, Travis is failing because my Python 3.8 stuff is conflicting with the JDK stuff. I'll fix it!)

Copy link
Member Author

@sideshowbarker sideshowbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes thus far LGTM

@tabatkins
Copy link

All right, just pushed a (breaking) change to pypi. The command for invoking the highlighter manually is now bs-highlighter (same as the package name now); the command to invoke the server is bs-highlighter-server.

This also updates the install process for the highlighter to pull from pip.
@domenic domenic force-pushed the sideshowbarker/python3-updates branch from 29a918a to eea5915 Compare May 27, 2020 17:34
@domenic
Copy link
Member

domenic commented May 27, 2020

Alright, this seems to work. I think I tested all of the following configurations, although another round of testing would be appreciated:

  • ./build.sh with Wattsi only, pip3 only, Wattsi+pip3, Wattsi+pip3+bs-highlighter installed
  • ./build.sh --docker
  • ./html-build/ci-deploy/outside-container.sh (this is what is tested on CI so probably doesn't need a double-check)

I decided to go back to python 3.7, i.e. the Python that is default-available on Debian's apt-get repository. Raising the bar to 3.8 adds pain for both our CI and potentially our users, for no gain at the present. If we find features in 3.8 that we want, we can revisit this decision.

@domenic domenic force-pushed the sideshowbarker/python3-updates branch from eea5915 to 918c88d Compare May 27, 2020 17:57
build.sh Outdated
if [[ "$LOCAL_WATTSI" == "true" ]]; then
HIGHLIGHT_SERVER_ARGS=()
$QUIET && HIGHLIGHT_SERVER_ARGS+=( --quiet )
bs-highlighter-server "${HIGHLIGHT_SERVER_ARGS[@]}" &
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bs-highlighter-server "${HIGHLIGHT_SERVER_ARGS[@]}" &
bs-highlighter-server ${HIGHLIGHT_SERVER_ARGS[@]+"${HIGHLIGHT_SERVER_ARGS[@]}"} &

For non-quiet builds, without using the ${HIGHLIGHT_SERVER_ARGS[@]+"${HIGHLIGHT_SERVER_ARGS[@]}" expression, the build fails in my environment (and I would think in anybody else’s too…) — because we’re using set -o nounset (aka set -u), and so we get the following error :

./build.sh: line 665: HIGHLIGHT_SERVER_ARGS[@]: unbound variable

${HIGHLIGHT_SERVER_ARGS[@]+"${HIGHLIGHT_SERVER_ARGS[@]}"} means “Expand the HIGHLIGHT_SERVER_ARGS[@] variable only if it’s not unset”.

"${HIGHLIGHT_SERVER_ARGS[@]}" by itself means “Unconditionally try to expand the HIGHLIGHT_SERVER_ARGS[@] variable, even if it’s unset.

If we weren’t using set -o nounset, then bs-highlighter-server "${HIGHLIGHT_SERVER_ARGS[@]}" would work fine too. But because we are using set -o nounset, then if HIGHLIGHT_SERVER_ARGS[@] is unset, it fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. It did not fail on my environment (both Windows Git Bash and Debian 10 via Windows Subsystem for Linux). I guess the idea is that empty arrays count as "unset"?

Why do we not need to do a similar thing for UNZIP_ARGS on line 638?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's because this was fixed in recent (2016+) Bash versions? https://stackoverflow.com/a/39687362/3191

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's because this was fixed in recent (2016+) Bash versions? https://stackoverflow.com/a/39687362/3191

Yeah, probably so. I know the bash that ships in macOS is quite old.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. It did not fail on my environment (both Windows Git Bash and Debian 10 via Windows Subsystem for Linux). I guess the idea is that empty arrays count as "unset"?

Yes, to sh and bash they do at least.

Why do we not need to do a similar thing for UNZIP_ARGS on line 638?

I think we do need to do a similar thing with that too. I guess the only reason none of us has run into it causing a failure is that none of us has run the build script with the verbose switch since whenever it was that set -o nounset was added to the script

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll push a separate commit that adds that, although I'll want to rearrange and squash it before merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, actually I ended up force-pushing, because I realized I had messed up the commit message for the third commit anyway. PTAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not need to do a similar thing for UNZIP_ARGS on line 638?

I think we do need to do a similar thing with that too.

Actually, I think I misspoke about that. Line 639 is what matters, because that’s where UNZIP_ARGS is actually referenced. And when it’s referenced there, it’s always going to have been assigned/set — because line 638 adds "$HTML_TEMP/wattsi-output.zip" -d "$2" to its value unconditionally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, that makes sense! Force-pushed to undo that line (and remove the discussion of it from the commit message).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, that makes sense! Force-pushed to undo that line (and remove the discussion of it from the commit message).

ah, OK — re-pulled now and will review and test it again to be sure

@domenic domenic force-pushed the sideshowbarker/python3-updates branch from 918c88d to 42f42d7 Compare May 28, 2020 14:03
In particular, this only attempts to install and start the server if local Wattsi is in use. As part of this, we no longer check for the existence and version of Wattsi each time before we invoke it, but instead only once.
@domenic domenic force-pushed the sideshowbarker/python3-updates branch from 42f42d7 to 89852f5 Compare May 28, 2020 14:26
Copy link
Member Author

@sideshowbarker sideshowbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and tested — LGTM

Copy link
Member Author

@sideshowbarker sideshowbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yep, reviewed again and re-tested and still LGTM

@domenic domenic merged commit df0c72b into master May 28, 2020
@domenic domenic deleted the sideshowbarker/python3-updates branch May 28, 2020 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to Python 3
3 participants