Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad robot #730

Closed
rmyorston opened this issue Feb 10, 2019 · 5 comments · Fixed by #778
Closed

Bad robot #730

rmyorston opened this issue Feb 10, 2019 · 5 comments · Fixed by #778
Assignees
Labels
Medium Priority This ticket has a medium priority type.feature New feature
Milestone

Comments

@rmyorston
Copy link

release-monitoring.org is monitoring one of my projects. It uses the custom backend to download the project web page once an hour.

  • Since the project is updated, on average, once a year this is far too frequent.
  • There's no need to download the whole web page just to know an update has happened.

If you're going to consume my bandwidth you should at least make some effort to be efficient. It's only polite.

@Zlopez
Copy link
Contributor

Zlopez commented Feb 11, 2019

I'm not sure if we can do anything with this. Anitya couldn't possibly know when a new version will be released and it's not good if we report new version few days after release.
How could we parse the webpage and search for regex, if we don't download it. Some backends are using APIs or something other for checking, but the custom backend must be usable for anything, so it just downloads the page and do a regex check.

@Zlopez
Copy link
Contributor

Zlopez commented Feb 11, 2019

If there is other way, how to check your project and this could be applicable for similar project, it is possible to create a special backend for it that will be checking for new version using other method than just HTTPS GET.

@rmyorston
Copy link
Author

Anitya couldn't possibly know when a new version will be released

No, even I don't know that. But the previous frequency of updates provides a clue.

it's not good if we report new version few days after release

For critical infrastructure, perhaps, but for many projects a slight delay in notification is neither here nor there.

How could we parse the webpage and search for regex, if we don't download it

You can't. But you only need to download it if it's changed. If-modified-since is your friend.

@Zlopez
Copy link
Contributor

Zlopez commented Feb 11, 2019

You can't. But you only need to download it if it's changed. If-modified-since is your friend.

Didn't know about this field in the header, thanks for letting me know. I will see, how I can use this in Anitya, when checking for new version.

@Zlopez Zlopez added type.feature New feature Medium Priority This ticket has a medium priority and removed discussion labels Feb 11, 2019
@Zlopez Zlopez added this to the 0.16.0 milestone Feb 21, 2019
@Zlopez
Copy link
Contributor

Zlopez commented Apr 30, 2019

Implementation details

There are few things that must be implemented in order to make this work:

  1. We need to add If-modified-since to check header in https://github.com/release-monitoring/anitya/blob/master/anitya/lib/backends/__init__.py#L43 and https://github.com/release-monitoring/anitya/blob/master/anitya/lib/backends/__init__.py#L238
  2. The If-modified-since needs to be filled with time value from latest retrieved version before doing actual check. We will need a method on Project model that will return us either the time or latest version. See https://github.com/release-monitoring/anitya/blob/master/anitya/db/models.py#L194
  3. We need to handle HTTP 304 response correctly - No new version found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Medium Priority This ticket has a medium priority type.feature New feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants