-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare for 1.6 release #3352
Prepare for 1.6 release #3352
Conversation
OK. I think I have tracked it down. In the terminal, things look as follows:
and this is OK. The problem is that I run previous tests under VSCode. For reference, the instructions to check these things are here #3248. And the output is (under VSCode):
So the question is - how can we make sure that DataFrames.jl cleanly loads under VSCode? Any tips would be welcome. CC @davidanthoff @pfitzseb (as maybe you saw something similar previously with other packages) |
I'm not sure if there's anything to do here. We're defining a new struct IJuliaStdio{IO_t <: IO,F} <: Base.AbstractPipe
io::IOContext{IO_t}
send_callback::F
end
# ...
Base.unwrapcontext(io::IJuliaStdio) = Base.unwrapcontext(io.io) which seems entirely correct (not quite harmless though, apparently). I can look into only loading the |
Would that then cause the precompile cache for DataFrames to get invalidated whenever one switches between the VS Code Julia REPL and notebook in VS Code? Or are we already in a really bad situation there because it also gets invalidated if one switches just generally between a plain REPL and the VS Code Julia REPL? Or is this just not an issue at all? |
@davidanthoff The issue is between VSCode Julia REPL and even plain REPL in VSCode: Julia REPL in VSCode (the one integrated with the editor for code passing)
plain REPL in VSCode (started
(and do not look at absolute time as it will vary machine by machine - now I am on a very weak computer but what percentage of load time is compilation time and how much of it is recompilation) I also suspect that this issue is not only DataFrames.jl specific. Here is an example with CSV.jl (even worse as 60% of time is spent in compilation under Julia REPL) Julia REPL in VSCode (the one integrated with the editor for code passing)
plain REPL in VSCode (started
|
So on my system it is actually consistently faster to run things from the integrated REPL in VS Code... From the VS Code Julia REPL:
From a normal REPL:
I tried this a couple of times, and I always get results roughly similar to that. Does that make any sense?? These runs are on Julia 1.9.1 on Windows, with project that only has My worry was a different thing though, namely that different things would be cached depending on whether the precompile is triggered from a normal REPL or the VS Code integrated REPL. And that one would then end up overwriting the cache files every time one switches between the VS Code and standard REPL. But at least from a cursory trial I don't think that is actually happening, so that was probably a false worry. |
Yeah, Julia can't cache methods from non-dependency packages, so we don't need to worry about that. I didn't time load time, but can confirm the invalidations. |
For me, it is strange that you consistently see shorter timings in VS Code Julia REPL, as they do more allocations, allocate more, and trigger recompilation. Maybe there is some other setting (e.g. like number of threads used - if it affects anything here) that could influence this. |
I just tried again, and double checked any VS Code specific settings, but there aren't any, as far as I can tell... I also don't have a But maybe that is a side discussion. For the original specific question the only thing we could consider is a conditional exclude of the IJuliaCore stuff in VSCodeServer, I guess? But that might not be so easy, I assume that would then somehow mess with the precompile cache for |
I have run such tests on Julia 1.9.1 Windows and I see a similar difference in loading times (yet in my case number of allocations when running However, IMO, the differences in loading times between VSCode and REPL can be easily explained. The REPL inside VS code starts with having the package
and VS Code
|
@pszufe - this is interesting and surprising, as DataFrames.jl and HTTP.jl are only @KristofferC - I would expect that things in |
If I print out what gets compiled I see
So something seems to invalidate
So the difference in VSCode and REPL is that Revise is loaded in VSCode. |
No, that is not so :) Both of these are test only dependencies and will definitely not be loaded by default in the Julia VS Code REPL.
I get the times that I reported even when I completely remove Revise from any project that might be loaded, so I don't think that can explain it. |
My timings are without Revise also.
Yes, but what @pszufe suggests (and this might be the case) is that "they should not be loaded" but maybe some signatures from them get loaded although they should not (I agree this is unlikely so this was just his working hypothesis). |
Can you run the VSCode REPL with |
Stand alone Julia session:
VSCode Julia REPL (I leave out a lot of stuff before
(note that there are extra |
Probably these then
Something has invalidated the package loading code itself. Should be fixed on 1.10 at least (package loading code is shielded from invalidations there). |
OK. Thank you. So in summary. We can close this discussion in DataFrames.jl and the precompilation issue:
Thank you! (so those who are interested in this PR can unfollow it, as likely now we will discuss with @nalimilan what we should have in precompilation statements for the next release) |
@nalimilan - so now concentrating on the proposed change under Julia 1.9.2. So the proposed change: increases initial compilation by 16 seconds and load time by 0.2 seconds. What the changes are (in short). Additionally cover:
In precompilation (they are common when working with CSV and large files). The question is if we want to pay this cost. The detailed timings are: Prepared release 1.6Initial compilation time
imports time after initial compilation
total time after initial compilation
Current release 1.5Initial compilation time
imports time after initial compilation
total time after initial compilation
|
It would be useful to quantify the benefit too (in addition to the cost shown above). That is, how much faster are common operations? |
This depends on what one sees as "common". As commented, the difference will be visible if you use CSV.jl. E.g. even one DataFrames.jl 1.5
DataFrames.jl 1.6 proposal
While if you do not use CSV.jl then most likely you do not use InlineStrings.jl and PooledArrays.jl (at least a typical user) and it will be just a cost for you. So, again, in short - this is an optimization for CSV.jl users. |
I might be misunderstanding, but if the user does a second |
because you can do other operations than Second |
I haven't thought about this too deeply, but my gut feel is that the increase in compilation time is too large to warrant this. As you say this is for CSV users only so many use cases won't benefit at all, and equally one should keep in mind that in practice compilation doesn't tend to be a one time cost, as eg transitory dependencies update and force a recompilation, people use Pluto notebooks which create new and often slightly different environments, and there might still be issues with false positives when detecting stale cache files and that triggering recompilation. What are the timings on 1.10/nightly? Anecdotally compilation times have come down quite a bit so might that make the tradeoff more palatable? |
I have made some more benchmarking. My conclusion is to stay with the status quo, given the discussion. When Julia 1.10 is out we can make some additional testing and make a change if we decide we want to (it will be a patch, as it will not change anything). Here is the timing of the status quo (current state of this PR) on Julia 1.9.2 vs Julia 1.10-alpha1 (in general Julia 1.10 promises indeed faster precompilation and load times): Julia 1.9.2
Julia 1.10-alpha1
In summary - my proposal would be to merge this PR as it is now (only bumping package version) and go back to the precompilation discussion after Julia 1.10 is out. After this PR is merged I would make 1.6 release of DataFrames.jl. |
Thank you! |
@timholy - I have the following problem when preparing for 1.6 release of DataFrames.jl on Julia 1.9.1.
When I do
@time using DataFrames
from this branch in a fresh Julia session, I get the following:My question is how I can track down what causes recompilation (we discussed it some time ago, I think, but I wanted to make sure what is the current best practice). Thank you!
CC @nalimilan