-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve black startup to format a single file. Fixes #2564 #2656
Conversation
Thanks. I like the idea. I don't know if we (maintainers collective) want lazy imports all over the code base tho. I think it's worth it potentially, but for cases where latency really matters we recommended blackd. Some of your import moving has broke tests ... I did expect more work to be needed in some of those places. This is the fun and games with moving imports for speed introduces. Bringing in @ichard26 to ensure this does not break mypyc - I think it's time we add an action to build with mypyc @ichard26 - I'm happy to help here to avoid regressions there. I am interested to hear others thoughts. |
13aa647
to
1f77f13
Compare
1f77f13
to
0aad245
Compare
Ok, I've just fixed the tests and added back methods which had been removed from the public API. Also, lazy imports are added very judiciously in the PR, so, I hope this would be OK (I personally nowadays prefer lazy imports project-wide, but this really needs to be a conscious decision from the project and I understand this isn't the case for black, so, I tried to leave changes to a minimum to make it work yet still bring in substantial benefits for the use case where those modules aren't used at all). As for the tests which were failing due to the imports, I don't think that something as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this must be a positive. All of these dynamic imports makes me a bit iffy style-wise, but if the performance gains are deemed worth it, then it's fine by me 👍 Some comments mostly about the ignore handling below.
except GitWildMatchPatternError: | ||
ctx.exit(1) | ||
|
||
assert exclude_and_gitignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning for this assert? If try fails then we exit, so something else? If it needs to be there, we should probably give users a proper message.
@@ -558,6 +570,17 @@ def get_sources( | |||
|
|||
sources.add(p) | |||
elif p.is_dir(): | |||
if exclude_and_gitignore is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat dislike that this is in a loop. I get that we'd like to compute the ignore only if we have a directory. That could be better expressed in the original place with any(Path(s).is_dir() for s in src)
or something like that. We could perhaps even make the conversions to Path
beforehand so that it isn't duplicated in the loop. Maybe.
|
||
try: | ||
exclude_and_gitignore = compute_exclude_and_gitignore(exclude, root) | ||
except GitWildMatchPatternError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you've moved the error inward! But it's not all the way is it? Surely it could be wrapped around get_gitignore
directly.
gitignore = get_gitignore(root) | ||
else: | ||
gitignore = None | ||
return exclude, gitignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I appreciate the idea of making small functions, I feel this is more indirection because the decision of computing the exclude pattern is moved to the computing function. And if the decision was moved back, this would be a two-line wrapper to two simple calls. But this is highly debatable. Thoughts?
This might be better if we get rid of the exclude_and_gitignore
sentinel as I suggested in another comment, because we could unpack directly: ex, ign = compute(ex, root)
.
# Load reformat_many as lazily as possible as loading it will | ||
# require support for asyncio and concurrent.futures which | ||
# are slow to import. | ||
from black.reformat_many import reformat_many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this, seems like a good win for essentially nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review pass, nothing majorly wrong but I got quite a few questions.
I'll take a closer look at this tomorrow -- in particular I'll make sure this doesn't break compilation with mypyc which brings its own import time reductions :)
Thank you for working on this!
@@ -1361,9 +1264,50 @@ def patch_click() -> None: | |||
module._verify_python_env = lambda: None # type: ignore | |||
|
|||
|
|||
def shutdown(loop: "asyncio.AbstractEventLoop") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these two utility functions used anywhere given reformat_many.py uses the ones available in concurrency.py?
return concurrency.cancel(tasks) | ||
|
||
|
||
async def schedule_formatting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing, is this used anywhere? Right now Black doesn't have an official API (and even then I doubt multiprocessing code should be treated as public).
for arg in sys.argv: | ||
if "multiprocessing" in arg: | ||
# Don't import multiprocessing unless really needed. | ||
from multiprocessing import freeze_support | ||
|
||
freeze_support() | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there documentation that confirms this is a valid way of using freeze_support
on Windows? This feels exceedingly sketchy.
FYI I picked up your PR and pushed it to completion, hopefully you don't mind. Thanks for the PR in the first place! |
@ichard26 thank you for taking up the PR to completion (I'm sorry I didn't have more time available to continue the work to get it to completion and I really appreciate that you were able to do that). |
Make the startup of black faster when formatting a single file by importing less modules and skipping loading of
.gitignore
.With this patch the time to format one file went from a baseline of:
(min /max time from doing many runs on my machine)
0.29s -> 0.47s
to0.17s -> 0.32s
-- Note that this includes the time to format too, not just the startup time. Also, this depends a lot on the target machine (slower machines will probably have more to gain)... in my machine I'd say there's around a 50% increase in the startup speed. There's still room for improvement, but I'd say that the low-hanging fruits have been taken care of.
The number of imported modules for doing a run in this case went from
268
to202
.It went something as:
Modules imported (baseline):
268
Without asyncio:
235
Without concurrent.futures
222
Without pathspec:
214
Without multiprocessing:
202