Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandwidth quota of Git LFS Data #454

Open
xfq opened this issue Feb 11, 2025 · 10 comments
Open

Bandwidth quota of Git LFS Data #454

xfq opened this issue Feb 11, 2025 · 10 comments
Assignees

Comments

@xfq
Copy link
Member

xfq commented Feb 11, 2025

This repo seems to exceed the bandwidth quota of Git LFS Data:

Image

We need to find a solution. @deniak Do you have any solution?

@deniak
Copy link
Member

deniak commented Feb 11, 2025

Thanks @xfq, I've contacted @himorin and asked him to delete the image source files that are on Git LFS. Hopefully, this should solve the issue.

@himorin
Copy link
Contributor

himorin commented Feb 11, 2025

Thanks @xfq, I've contacted @himorin and asked him to delete the image source files that are on Git LFS. Hopefully, this should solve the issue.

sorry I could not reply to the thread yet, but I don't think just deleting from HEAD would not work, since all remains in git history.
it's quite hard and will take several work days to go through all git history to remove every entry, since there should be mix between lfs manipulation and normal edits, but I don't have any knowledge nor experience to remove historical GIT LFS entries.
also as clearly stated before I am quite in doubt, and also I've already stated several times like last time of this discussion, we should be better to identify what is the exact operation eating LFS bandwidth, since normal or simple 'git clone' or 'git checkout' will not download any of LFS data itself (but just pointer within git repository), and definitely something should happen to eat allowed bandwidth. I'm not quite sure why we are just going just to remove LFS objects just from the HEAD, but without diving into seeking the root cause... (not objecting to do the operation, but I just could not believe just removing will resolve this...)

@r12a
Copy link
Contributor

r12a commented Feb 11, 2025

@himorin i think we also need to identify a new home for the data that you'll be deleting from LFS, so that we don't lose access to it.

@deniak
Copy link
Member

deniak commented Feb 12, 2025

@himorin i think we also need to identify a new home for the data that you'll be deleting from LFS, so that we don't lose access to it.

You may keep them in the repository but just not using git LFS

@xfq
Copy link
Member Author

xfq commented Feb 12, 2025

also as clearly stated before I am quite in doubt, and also I've already stated several times like last time of this discussion, we should be better to identify what is the exact operation eating LFS bandwidth, since normal or simple 'git clone' or 'git checkout' will not download any of LFS data itself (but just pointer within git repository), and definitely something should happen to eat allowed bandwidth.

I'm not sure if this is the reason, but I suspect it is this:

If Git Large File Storage (Git LFS) objects are included in source code archives for your repository, downloads of those archives will count towards bandwidth usage for the repository.

From https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage#tracking-storage-and-bandwidth-use .

@deniak
Copy link
Member

deniak commented Feb 12, 2025

I wonder if our git repositoy backup system is not responsible for this. There's a chance it downloads archives of that repository on a regular basis but we don't have any control over this.

In any case, I don't see any reasons to keep these files in git LFS so I suggest to move them out of the repository or at the very least, delete them from git LFS.

@himorin
Copy link
Contributor

himorin commented Feb 12, 2025

hrm,,, some thoughts...

  1. turning lfs files into normal one at HEAD only should be easy (just git lfs untrack and git add), but all lfs entries will remain (e.g. github lfs document). also removing and adding lfs files will break history incl. ones of PR. (we don't rely on commit hash, but just using PR ID, so changing history via force push will not be so harmful)
  2. I'm not sure something like git lfs migrate export --include '*' --everything could work or not.
  3. looking at old discussion, it seems we just discussed about what way to take for binary files, but not considered / checked its file size against file size limit (50MB/100MB of github). so there is no reason we turn all lfs files into normal git files
  4. source archive could be one of bandwidth, but I am quite not sure why 75.32GB has been used against 0.54GB of actual total lfs file size. if bandwidth calculation is correct, it is around 140 times of total file size, so it could not be just a single archive download like just archive of HEAD, but quite multiple? If so, we should remove all history of lfs files from this repository.
  5. I believe forks (like my one) will consume storage and bandwidth from origin, and may need to ask something after we convert this repository??

I don't think we want to check or use older source files in .ai or .indd than HEAD for future, since JL-TF is now finished all works on it and freezing (per new jlreq-d; errata could be fixed in future but marking not in scope for now). So removing every history of lfs tracked files via git rm as @deniak suggested and just adding the latest files into HEAD, could work.

how about,,, let me try to find some time to how git lfs migrate export --include '*' --everything works and check its result?

@deniak
Copy link
Member

deniak commented Feb 12, 2025

hrm,,, some thoughts...

  1. turning lfs files into normal one at HEAD only should be easy (just git lfs untrack and git add), but all lfs entries will remain (e.g. github lfs document). also removing and adding lfs files will break history incl. ones of PR. (we don't rely on commit hash, but just using PR ID, so changing history via force push will not be so harmful)

As far as I undersand, these files are only there for the record. They are not the ones that are actually used in the spec.
Plus, I'm not sure how it can break history.

  1. I'm not sure something like git lfs migrate export --include '*' --everything could work or not.
  2. looking at old discussion, it seems we just discussed about what way to take for binary files, but not considered / checked its file size against file size limit (50MB/100MB of github). so there is no reason we turn all lfs files into normal git files

AFAICS, the source images don't weight more than a few MB so we are still below the file size limit.

  1. source archive could be one of bandwidth, but I am quite not sure why 75.32GB has been used against 0.54GB of actual total lfs file size. if bandwidth calculation is correct, it is around 140 times of total file size, so it could not be just a single archive download like just archive of HEAD, but quite multiple? If so, we should remove all history of lfs files from this repository.

The backup system runs at least daily so the process will take at least 15Gb/month. I mentioned the backup system but there might be other downloads from other sources.

  1. I believe forks (like my one) will consume storage and bandwidth from origin, and may need to ask something after we convert this repository??> > I don't think we want to check or use older source files in .ai or .indd than HEAD for future, since JL-TF is now finished all works on it and freezing (per new jlreq-d; errata could be fixed in future but marking not in scope for now). So removing every history of lfs tracked files via git rm as @deniak suggested and just adding the latest files into HEAD, could work.>
    how about,,, let me try to find some time to how git lfs migrate export --include '*' --everything works and check its result?

I'm far from being a git LFS guru and it seems jlreq is the only repo relying on it but it's enough to exceed the bandwidth quota. Another option would be to disable backups for that repo but I don't think this is a good idea. My guess is that given the size of that repo, it's better to not rely on git LFS.

@xfq
Copy link
Member Author

xfq commented Feb 28, 2025

  1. turning lfs files into normal one at HEAD only should be easy (just git lfs untrack and git add), but all lfs entries will remain (e.g. github lfs document).

In the documentation, they only mentioned that this would affect storage quota, but not bandwidth quota:

After you remove files from Git LFS, the Git LFS objects still exist on the remote storage and will continue to count toward your Git LFS storage quota.

We only exceeded the bandwidth quota, not the storage quota, so it seems that there should be no problem. If we are really unsure, we can contact GitHub Support.

also removing and adding lfs files will break history incl. ones of PR.

Why?

@deniak
Copy link
Member

deniak commented Feb 28, 2025

I agree, I don't see why it would break the history. It's not like thee files are used directly in the spec itself.
Only the images source files are in git-lfs.

Note that even if the files are removed from git-lfs, we may need to contact the support to reset the bandwidth usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants