-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for shallow updates #319
Comments
This is possible if you use a branch for the project However, if you use a SHA, you can't do this as easily, because not all Git servers let you fetch a SHA directly. GitHub is one of these servers. One potential workaround is to add a new project attribute that says "the SHA in the revision field is in this branch over here", (#344) but that's a little tricky because the branch could get ahead of the SHA in the revision field, e.g. (for nrf) after merging a PR in the the zephyr project before updating nrf/west.yml to the new SHA. So I agree this would be nice, but it's not totally trivial. |
Yes but Also I'm getting this when using
So what would be nice to have an option to override |
This was fixed in master last week, sorry about that.
I understand and agree. You get what I'm saying about why that isn't really possible to do 100% effectively with SHA revisions, though, right? |
Yes, I understand it and wonder if we can somehow make a fallback here - if it failed with limited depth try to do a full fetch or something like that. |
I think something like a combination of specifying the upstream branch plus a fallback strategy would probably be needed, indeed. That makes the feature complex to implement (and test) and less predictable in its results (and thus less useful IMO), which is why it hasn't been a priority, but I do agree it's a valid use case. |
For a repo with many branches this would be a useful and hopefully simple optimization even before looking at more complex shallow optimizations. #139 (comment) BTW which refs does I feel like this has already been answered somewhere buried in the long #331 discussion but I'm afraid I just fell asleep trying to understand what @marc-hb wrote there at the time. It should probably be documented? |
Just for fun if you're a git nerd (if not why are you reading this page?), here's another case where shallow optimization fails and needs some kind of CI/validation fallback; when too shallow and missing the merge base! thesofproject/linux@53b27a20cd5d45544c3 |
Hi, since multiple people are asking for this, I will try to prioritize it.
There was an RFC for this that I never got around to finishing. I will see about resurrecting it: One thing I just realized is that it's pretty useless for branches that may rebase, but that's not everybody, so it may be useful.
It fetches everything, including tags. Information on getting the details in: https://docs.zephyrproject.org/latest/guides/west/repo-tool.html#troubleshooting
That's because we use
It's currently deliberately left as an implementation detail to try to avoid boxing us into a particular strategy. |
Doh! How did I miss #344? Because it was closed? Thanks, it also happens to be much easier to read than #331 ... but still very long, IMHO better start a brand new issue that resumes where #331 left. It's possible because #331 did stop with at a fairly complete proposal.
I recommend "volatile branch or volatile SHA1" as a ligher and better name than "branches-that-may-rebase" or "rewritable history". When I'm amending or discarding a git commit I'm conceptually not rebasing anything yet the relevant net effect on the SHA1s is the same. In fact I don't even need to use the "git rebase" command when amending the last commit - yet the relevant net effect is again the same: I just changed SHA1s / rewrote history. "git rebase" is yet another poorly named git command because it describes only one use case. Should have been "git rewrite" or "git replay". So better not stretch this already bad name to even worse things like "rebasable branch". Back to your point then yes: if the user tries to "catch" a specific SHA1 by pointing at a volatile branch that keeps jumping around the place then for sure it's likely to fail at some point. This is IMHO just PBKC and the best that can be done then is to provide a good error message. The whole Or were you referring to something different? Maybe let's discuss this in a new issue?
Yet |
No, it lets the user policy on that rule the day. |
Yep, this is getting out of scope. |
Now that GitHub supports fetching arbitrary references, the plan is to implement this in the near future. |
Just FYI: |
The current proposition for the implementation of shallow cloning in west.
The following pythonish pseudocode explains what should happen when west tool shall try to clone the git repository:
|
I don't understand the rationale for calling shallow cloning "new approach" and unlimited depth "old approach": is there any other difference between them besides the Must the depth really be hardcoded to 1? With a bit of depth it's much easier to make the difference between an original commit and its cherry-picks and generally easier to "locate" the current HEAD without looking at SHAs. It also allows per-commit CI checks for small enough pull requests. How does this interact with the existing
You can I'm also wondering what happens when you repeatedly fetch a moving repo with a depth of 1... does that leave gaps? Admittedly a pure git, not-west question but related to this. |
I think @czeslawmakarski means fetching a particular ref, note this: |
Thanks @carlescufi for reminding me that shallow cloning of a SHA depends on fetching SHAs directly which I think is not designed yet. Implementing the former before the latter would feel like jumping the gun to me. Fetching SHAs directly seems simple but it's not, see "fallback" discussion above and long discussions in #344 PS: with some repos, fetching SHAs directly can offer large performance improvements even before shallow cloning. |
Is this actually something we should specify in a manifest file ?
A project maintainer cannot in any sensible way determine the right use-case for everyone, making it hard to write a proper manifest file. To me, this looks like something that should be a west configuration setting, perhaps like (all settings are True / False):
Project setting takes precedence over remote setting, remote setting takes precedence over workspace setting. That way a user wanting to do shallow cloning on all projects in remote
The decision on using shallow cloning is probably quite constant, based on the users preference, and not a use-case where a user want to do shallow update today, and tomorrow decides to update everything, just to do shallow cloning the third day.
I think, in this particular case, a
Placing this in manifest actually means we start imposing a specific workflow. With the above suggestion, the workflow for a user wanting to do shallow clone on everything (example CI) could then be:
Edit: repo changed to remote |
The intention was rather to state the possibility of specifing such fields in the manifest file, not the obligation. The proposed I envisioned the possibility to place Also please note, that in your approach one has to call first |
yes, I understand. But with only a By only having
except for the fact that the first time a user updates the manifest repo using That is:
will still fetch everything, just a bit later. So maybe the shallow init usecase is better handled using git, that is:
cause that will then correspond to later manifest fetches using:
or what you need to update to. Note: if having a |
I find the idea of using the manifest to tell west about the various [*] https://git-scm.com/docs/git-fetch-pack
Like @tejlmand, I think it's a bad idea to add user preferences in the manifest because it will pollute the local "git status". Very soon after someone commits their personal preferences and shares them (intentionally or not; many people use the evil "git commit -a") and then everyone has to either pollute their git status too or have very long arguments about default values that not everybody can ever agree on because different people and different bots have different use cases.
IMHO good tools and languages are better defined by how they stop users from shooting in each other's foot than by how many possibilities (and lack of test coverage) they have.
If you don't know at least "a bit" about git then you should not get close to a west manifest, you should not be setting up any automation and you should probably not get close to shallow cloning either cause it's a hack. I've answered many more git questions than than I asked in my life yet this shallow cloning weirdness cost me days of troubleshooting and investigation: thesofproject/linux#2556 tl;dr: these are relatively advanced git optimizations and topics: their user interface can assume basic git knowledge.
Good news: in this particular case you can already shallow clone the manifest repo and save a lot of time without any new west feature. Speaking of automation, an alternative, 100% safe and conceptually simple optimization is a mirror/cache. git makes mirroring very easy. A bit off-topic sorry: how convenient is it to override |
Well, even less experienced Shallow cloning is probably better kept to more experienced users, which is exactly why I showed that using plain |
I've posted a PR which allows us to fetch SHAs directly, to start progress on this: #475 |
@mbolivar-nordic just abandoned #475 and I'm not sure why. I think giving #475 and #344 a name should help clarify concepts, I suggest "narrow". I see roughly 4 classes of git clones in general use (with and without west)
Thanks to custom refspecs, intermediate --depth, different clone+fetch combinations, different git versions etc. the reality is not just these 4 points but actually a spectrum. However I think these 4 points should capture most real-world use cases and that these are the concepts The big difference |
Now reopened |
@marc-hb west actually does not use I think we are headed towards a world with the following possibilities:
|
I was starting from plain, familiar git use cases.
I mentioned DEFAULT because it's the git default. I agree the other 3 seem enough for west, I actually suggested that in #475 |
Gotcha, I misunderstood you. I thought you were saying this was the west default.
Awesome, that makes more sense now. I was confused about why there seemed to be more possibilities here than there. I'm totally with you. Seems like we're on the same page then and the FULL/NARROW/DEPTH1 breakdown in #319 (comment) is what we'd like to have. Comments from anyone else following this thread? |
@czeslawmakarski I would like your help in testing the results of this optimization in your environment. Here are two scripts, fetch_full.sh and fetch_depth1.sh: https://gist.github.com/mbolivar-nordic/f610f1234c2cb8a6cb716c75e44b6c8f Could you please run them in the environment you are creating your workspaces in as follows: $ ./fetch_full.sh 2>results_full.txt; ./fetch_depth1.sh 2>results_depth1.txt Then share the In my environment (1 GbE network connection and an SSD), I get somewhat surprising results:
This is "only" a 37% improvement: significant to be sure, but not an earth-shattering improvement. I'm curious if the speedup in your environment is worth it to you or if setting up some caching as discussed elsewhere would be better for solving the performance problems you're having. |
I tried something slightly different: I hardcoded With a fast, business network connection it made little difference. Most of the difference (10s) was when I think this is a combination of:
Then I ran a few 2.5G grand total with full clones including --tags 850M all .git/ folders - full clones 400M zephyr/.git The next biggest repos: |
@mbolivar Thanks for the scripts - I'll give them a go and shall come back with the results. |
HI @mbolivar-nordic, First, I've done the test on my local machine (ISP Downstream about 300 Mbps, NVME storage): Then we've run the tests on our Jenkins environment and the results were different on different nodes - ranging from 16% to 58% - you can see the image included. AFAIK the nodes were executing only your bash scripts and the same time, so probably the reason could be in the differences in ISP bandwidth allocation to the corresponding nodes. Tagging @dawidprzybylo for visibility. |
Jenkins has a number of git optimizations: Shallow cloning and reference repositories won't work for west subprojects (only for git submodules) but if Zephyr - the largest repo by far - is used as the manifest then it would already save a lot. Not cloning from scratch every time and cleaning the repo instead should Just Work. That's the most basic and obvious git thing to do. |
OK, I did a bit more benchmarking. So far I concluded from this that, at least in my environment:
This time I compared:
I used this west branch during testing: https://github.com/mbolivar-nordic/west/tree/update-optimizations (commit 867b42). With this branch, you can:
WARNING: THE SCRIPT IN THE FOLLOWING LINK DELETES GIT REPOSITORIES WHILE IT RUNS. DO NOT RUN IT IN A WORKSPACE YOU CARE ABOUT. I also wrote this benchmarking script: https://gist.github.com/mbolivar-nordic/4e50065a724ed579eb0b430f26473736 You can run it safely like this, after editing the EXISTING_WORKSPACE variable in the script to suit your environment. mkdir west-update-testing
cd west-update-testing
git clone https://github.com/zephyrproject-rtos/zephyr # can use sdk-nrf if you want
west init -l zephyr # or nrf
./benchmark.sh Results go to the shell running benchmark.sh. Logs go to benchmark.log (or LOG_FILE if set). Here are box-and-whisker plots of my results on the same 1 GbE network + SSD environment from earlier: As you can see, the results are consistent between runs, and using a cache is the clear winner in terms of performance. Paradoxically, I'm also curious if others (@marc-hb , @czeslawmakarski ) can reproduce these results. I understand that maintaining a cache is a burden for CI environment maintainers, though, so I will prepare a PR with all of these options available once I write some more test cases. I still think that even a stale cache which is missing some git refs is likely to be much faster than either Thanks for everyone's patience on this. I think we are closing in on a good solution. |
Final benchmark results after re-running with a few new options for apples/apples comparison: TL;DR: even with a fast network connection, caching is the clear performance win for updating a Zephyr workspace. Shallow updates are slower than using a cache, even if the cache is 'stale' and contains very out of date clones of the workspace's projects. Oddly, shallow updates are slower than just fetching the exact project revision at full depth (at least in this workspace + my environment). Explanation of each box-and-whiskers plot (n=10 in each case):
In each case, zephyr is the manifest repository and commit 72595a334cdbcc964aeb9bdba37b84e21ac08b07 is used. I used my home's 1GbE<->fiber connection on an otherwise quiet network. (I used Zephyr both for wider applicability and since it's a larger workspace than NCS.) I've posted a PR with the new west options used above: #496 Please review and of course feel free to run your own tests to determine what is best for your environment. |
I've been experimenting with the cached version also, and my results were similar to yours - much faster than the 'shallow' cloning. The only problem we can envision is that the old-cache version of the NCS (most probably latest 1.x.0 release) would be needed to be included in the Docker image. |
Famous last words 😄 @carlescufi asked me why the results are so different between the above benchmarks and the rough script I posted about a month ago, which showed a ~37% improvement for NCS if using --depth=1. I re-ran the benchmarks with NCS: TL;DR: caching is still the clear winner, but shallow updates do make a difference for our downstream Zephyr distro. However, even an old cache that lags behind by 2 releases is still 20% faster than shallow updates when combined with Explanation of each box-and-whiskers plot (n=10 in each case):
In each case, sdk-nrf is the manifest repository and commit 61fa66c78 is used. As before, I used my home's 1GbE fiber connection on an otherwise quiet network. These results are consistent with my earlier NCS results, so NCS really is different. Interestingly, much of the performance difference just comes from zephyr itself:
|
@czeslawmakarski that is not 100% certain IMO. You could keep it out of the image but volume-mount it into the running container, since updating from a cache is a read-only operation from the cache's perspective. |
This assumes you've got a bare metal server or some other way to do persistent storage, of course. It also has the advantage that you can define a cron job to run on the weekend that keeps the cache fresh. |
Closed by #496. |
If you have any interest in cloning performance then try cloning with
cc: @aborisovich, @wszypelt , @keqiaozhang, @greg-intel |
For verification purposes, it would be nice to init and fetch only versions from the manifest file.
So issuing:
or
Would create a minimal version of repository saving a lot of time for big projects.
The text was updated successfully, but these errors were encountered: