-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Github backend to use Github API v4 #582
Conversation
Codecov Report
@@ Coverage Diff @@
## master #582 +/- ##
==========================================
+ Coverage 89.65% 90.01% +0.35%
==========================================
Files 55 55
Lines 2591 2644 +53
Branches 332 341 +9
==========================================
+ Hits 2323 2380 +57
+ Misses 203 201 -2
+ Partials 65 63 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few things that can be refactored here, but overall it's looking like a big improvement 🍰 ✨
@@ -4,7 +4,7 @@ | |||
VAGRANTFILE_API_VERSION = "2" | |||
|
|||
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| | |||
config.vm.box = "fedora/27-cloud-base" | |||
config.vm.box = "fedora/28-cloud-base" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally good practice to break this sort of stuff out into a separate commit. I don't see anyone wanting to revert this or anything, but it's a nice separation of concerns thing. Not a huge deal, don't feel like you need to fix it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move it to separate commit.
anitya/lib/backends/__init__.py
Outdated
req.add_header('User-Agent', user_agent) | ||
req.add_header('From', from_email) | ||
req.add_header('User-Agent', headers.user_agent) | ||
req.add_header('From', headers.from_email) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
headers
is a dictionary, right? I don't think this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. This means, that there was not test for this, I will add it.
anitya/lib/backends/github.py
Outdated
if url: | ||
(owner, repo) = url.split('/') | ||
|
||
if (not owner) or (not repo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's exactly equivalent, but if not (owner and repo)
might read a little cleaner.
Also, I know this is just a refactor of what was there before, but it makes more sense to validate this before it goes in the database, rather than right here. It'd tell the user up-front they've put in incorrect data. Might be a nice refactor for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be probably validated on frontend in javascript, but you also need to validate it before saving it to db (in case somebody is trying something nasty). This could be different for every backend, so it will be good to have validation method as part of backend and the db method could call it before inserting the data.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable, yes. It can be called during the pre-commit event.
anitya/lib/backends/github.py
Outdated
return versions | ||
|
||
@classmethod | ||
def prepare_query(cls, owner, repo, after=''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason for this to be a class method. There's also no reason for the class inheritance structure in the backends generally, so while this matches the general style, it's not a good style. You can either make it a private instance method or just rip this out into a private module-level function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it to private method. It doesn't need to be public.
anitya/lib/backends/github.py
Outdated
def prepare_query(cls, owner, repo, after=''): | ||
''' Method for preparing GraphQL query for specified repository | ||
|
||
:arg owner: owner of the repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the "Google" style of comments more pleasant to write, you might as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the style, that was there before, but I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was the old style and I was slowly converting it all as I touched things
anitya/lib/backends/github.py
Outdated
return query | ||
|
||
@classmethod | ||
def call_url_post(cls, url, json, insecure=False, use_token=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just factor this into the get_versions
call. I think it's reasonable for the token to be required, and we should never use insecure=True
with GitHub so it boils down to a single call to another function.
The get_headers
interface (while matching the existing style) is more complicated than just setting a default set of headers somewhere, then making a copy here and injecting the access token. I recommend doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand the second paragraph in your comment. You want me to copy the get_headers
method to this backend.
The get_headers
method is new method in BaseBackend
and it only fills basic header fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, what I'm saying is I don't like the use of inheritance in the backends. It's complicated and adds no value, but I never got a chance to refactor it. Rather than add to it, this could be implemented as:
In constants or BaseBackend's module or something:
REQUEST_HEADERS = <default_headers>
Then for your Github backend you just make a copy of REQUEST_HEADERS and add your auth token:
github_headers = REQUEST_HEADERS.copy()
github_headers['Authentication'] = ...
There's no need for a get_headers method that subclasses override. The whole call_url
function has no business being a class method at all. If should just be a module-level function that acts as a wrapper that handles the special FTP case and otherwise is the requests API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the inheritance is intentional, otherwise BaseBackend
doesn't have much sense. I thought it is used as some kind of interface, that should be implemented by every backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inheritance is fine to define the interface, but the only interface we're really interested in is for a given project, new versions are returned. Everything else is an implementation detail of the particular backend and doesn't need to be part of the class.
anitya/lib/backends/github.py
Outdated
'%s: Server responded with status "%s": "%s"' % ( | ||
project.name, resp.status_code, resp.reason)) | ||
|
||
if 'errors' in json: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making a function that accepts the json and spits out the versions (or raises an exception). That'll make testing easier, and make this function a digestible length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will do this. It sounds good.
anitya/lib/backends/github.py
Outdated
# cursor = json['data']['repository']['refs']['edges']['cursor'] | ||
|
||
remaining = json['data']['rateLimit']['remaining'] | ||
reset_time = json['data']['rateLimit']['resetAt'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle the rate limiting gracefully. If we're getting rate limited, it should raise an exception (RateLimitError or similar) that contains the time the rate limit is over so that the caller can handle it by rescheduling for later (or exploding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like elegant solution to handle the rate limit.
2bad5d5
to
d8cb24b
Compare
from anitya.lib.backends import BaseBackend, http_session, REQUEST_HEADERS | ||
from anitya.lib.exceptions import AnityaPluginException, RateLimitException | ||
from anitya.config import config | ||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically we've followed PEP 8, which includes rules for import ordering: the first group is stdlib imports, the second group is third-party libs, and the final group is imports from the current package. Each group is separated by a blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small comments, but the only thing I think really needs to change is the decoding of FTP responses.
anitya/lib/backends/github.py
Outdated
|
||
try: | ||
headers = REQUEST_HEADERS.copy() | ||
token = config.get('GITHUB_ACCESS_TOKEN') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you set a default for the token, you don't need to use get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right
anitya/lib/exceptions.py
Outdated
def __init__(self, reset_time): | ||
self._reset_time = reset_time | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend just calling self._reset_time
self.reset_time
and not making a property. Properties are handy if you need to do some sort of transformation or you want your internal representation to differ from the public API in some way, but here it's the same.
|
||
import unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the PEP 8 rules, unittest should stay up in the top group and six should move to this group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know this
date: ['Thu, 23 Aug 2018 09:39:54 GMT'] | ||
keep-alive: ['timeout=5, max=100'] | ||
server: [Apache/2.2.22 (Fedora)] | ||
x-clacks-overhead: [GNU Terry Pratchett] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised by this value when I fetched the URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you're not familiar with Terry Pratchett's Discworld books, one of his books includes a lot of amusing references to the Free Software movement and this HTTP header is in reference to one of his books.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know his work. I read almost all his books from Discworld, but I was surprised to see the header field, I never saw it before.
anitya/lib/backends/github.py
Outdated
|
||
_log.debug('Received %s tags for %s' % (total_count, project.name)) | ||
|
||
# TODO: Save cursor to project and use it to fetch only newer versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend turning these TODOs into tickets and removing them from the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
anitya/lib/backends/__init__.py
Outdated
try: | ||
dir_listing = resp.text | ||
except AttributeError: | ||
dir_listing = resp.decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just skimmed https://tools.ietf.org/html/rfc2640, but it sounds like there's no guarantees that the response will be UTF-8 encoded, and even if the spec said it MUST be, I'd be willing to bet there are some non-conforming servers out there. If it should only be ASCII/UTF-8 then this is fine, but this should handle an Unicode decoding exception and raise the appropriate Anitya exception explaining that the FTP service is misbehaving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add another try-except when trying to decode the binary string and if it's not UTF-8, I will just raise AnityaPluginException
.
Maybe there is some way, how to get the encodings of the binary string and then use the appropriate decode method. This will be nice way how to do it.
I will check this.
It will be much easier, if Requests
library could work with FTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a requests-ftp extension, although I'm not sure if it works with current requests. It'd be nice to drop the FTP support, but I'm not sure if there's still a number of projects only available over FTP. I think just bailing is a reasonable approach. The specification indicates it should only ever be ASCII or UTF-8, but because FTP is so old I have no doubt there are servers out there not following the latest RFC about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it, but it's only a template, not real library. It's only some basic functionality for future work.
anitya/lib/backends/__init__.py
Outdated
url, e.reason)) | ||
except UnicodeDecodeError: | ||
raise AnityaPluginException( | ||
'FTP response is not unicode: %s' % url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly more accurate error message would be 'FTP response cannot be decoded with UTF-8' since Unicode defines the code points and UTF-8 is a particular way to map Unicode code points to a binary representation, but this is also fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds better than current error message.
anitya/lib/exceptions.py
Outdated
@@ -115,7 +115,6 @@ class RateLimitException(AnityaException): | |||
def __init__(self, reset_time): | |||
self._reset_time = reset_time | |||
|
|||
@property | |||
def reset_time(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, what I meant was just delete this method and change the name from self._reset_time
to self.reset_time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only want the time to be retrievable by the code which catches it.
Maybe bad habit from other object oriented languages.
I'm used to have getter and setter (when needed) for every attribute the object has.
But in this case it could be probably better to have direct access to reset_time
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't really call it a bad habit (this is the only way to do things in Java), but it's not "Pythonic". Python uses _
it indicate a variable, method, function, etc is private and might change at any time, but doesn't enforce it. If the interface you'd like to present to the public differs from the one you'd like to manage internally, it's a perfect place for the setter/getter pattern with properties. Otherwise, just exposing the variable itself is fine.
Update with latest master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨ 🍰 ✨
Implementation of Github API v4 as requested in #567
This should also solve #581