Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Github backend to use Github API v4 #582

Merged
merged 8 commits into from
Aug 28, 2018

Conversation

Zlopez
Copy link
Contributor

@Zlopez Zlopez commented Aug 21, 2018

Implementation of Github API v4 as requested in #567
This should also solve #581

@codecov-io
Copy link

codecov-io commented Aug 21, 2018

Codecov Report

Merging #582 into master will increase coverage by 0.35%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #582      +/-   ##
==========================================
+ Coverage   89.65%   90.01%   +0.35%     
==========================================
  Files          55       55              
  Lines        2591     2644      +53     
  Branches      332      341       +9     
==========================================
+ Hits         2323     2380      +57     
+ Misses        203      201       -2     
+ Partials       65       63       -2
Impacted Files Coverage Δ
anitya/config.py 100% <ø> (ø) ⬆️
anitya/lib/backends/github.py 100% <100%> (+19.04%) ⬆️
anitya/lib/exceptions.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64e4bde...97bbef1. Read the comment docs.

Copy link
Member

@jeremycline jeremycline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few things that can be refactored here, but overall it's looking like a big improvement 🍰 ✨

@@ -4,7 +4,7 @@
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "fedora/27-cloud-base"
config.vm.box = "fedora/28-cloud-base"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally good practice to break this sort of stuff out into a separate commit. I don't see anyone wanting to revert this or anything, but it's a nice separation of concerns thing. Not a huge deal, don't feel like you need to fix it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move it to separate commit.

req.add_header('User-Agent', user_agent)
req.add_header('From', from_email)
req.add_header('User-Agent', headers.user_agent)
req.add_header('From', headers.from_email)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

headers is a dictionary, right? I don't think this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. This means, that there was not test for this, I will add it.

if url:
(owner, repo) = url.split('/')

if (not owner) or (not repo):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exactly equivalent, but if not (owner and repo) might read a little cleaner.

Also, I know this is just a refactor of what was there before, but it makes more sense to validate this before it goes in the database, rather than right here. It'd tell the user up-front they've put in incorrect data. Might be a nice refactor for a future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be probably validated on frontend in javascript, but you also need to validate it before saving it to db (in case somebody is trying something nasty). This could be different for every backend, so it will be good to have validation method as part of backend and the db method could call it before inserting the data.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable, yes. It can be called during the pre-commit event.

return versions

@classmethod
def prepare_query(cls, owner, repo, after=''):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for this to be a class method. There's also no reason for the class inheritance structure in the backends generally, so while this matches the general style, it's not a good style. You can either make it a private instance method or just rip this out into a private module-level function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it to private method. It doesn't need to be public.

def prepare_query(cls, owner, repo, after=''):
''' Method for preparing GraphQL query for specified repository

:arg owner: owner of the repository
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the "Google" style of comments more pleasant to write, you might as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the style, that was there before, but I will change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was the old style and I was slowly converting it all as I touched things

return query

@classmethod
def call_url_post(cls, url, json, insecure=False, use_token=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just factor this into the get_versions call. I think it's reasonable for the token to be required, and we should never use insecure=True with GitHub so it boils down to a single call to another function.

The get_headers interface (while matching the existing style) is more complicated than just setting a default set of headers somewhere, then making a copy here and injecting the access token. I recommend doing that.

Copy link
Contributor Author

@Zlopez Zlopez Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand the second paragraph in your comment. You want me to copy the get_headers method to this backend.
The get_headers method is new method in BaseBackend and it only fills basic header fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, what I'm saying is I don't like the use of inheritance in the backends. It's complicated and adds no value, but I never got a chance to refactor it. Rather than add to it, this could be implemented as:

In constants or BaseBackend's module or something:

REQUEST_HEADERS = <default_headers>

Then for your Github backend you just make a copy of REQUEST_HEADERS and add your auth token:

github_headers = REQUEST_HEADERS.copy()
github_headers['Authentication'] = ...

There's no need for a get_headers method that subclasses override. The whole call_url function has no business being a class method at all. If should just be a module-level function that acts as a wrapper that handles the special FTP case and otherwise is the requests API.

Copy link
Contributor Author

@Zlopez Zlopez Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the inheritance is intentional, otherwise BaseBackend doesn't have much sense. I thought it is used as some kind of interface, that should be implemented by every backend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheritance is fine to define the interface, but the only interface we're really interested in is for a given project, new versions are returned. Everything else is an implementation detail of the particular backend and doesn't need to be part of the class.

'%s: Server responded with status "%s": "%s"' % (
project.name, resp.status_code, resp.reason))

if 'errors' in json:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making a function that accepts the json and spits out the versions (or raises an exception). That'll make testing easier, and make this function a digestible length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will do this. It sounds good.

# cursor = json['data']['repository']['refs']['edges']['cursor']

remaining = json['data']['rateLimit']['remaining']
reset_time = json['data']['rateLimit']['resetAt']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle the rate limiting gracefully. If we're getting rate limited, it should raise an exception (RateLimitError or similar) that contains the time the rate limit is over so that the caller can handle it by rescheduling for later (or exploding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like elegant solution to handle the rate limit.

@Zlopez Zlopez force-pushed the github_API branch 3 times, most recently from 2bad5d5 to d8cb24b Compare August 24, 2018 11:54
from anitya.lib.backends import BaseBackend, http_session, REQUEST_HEADERS
from anitya.lib.exceptions import AnityaPluginException, RateLimitException
from anitya.config import config
import logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically we've followed PEP 8, which includes rules for import ordering: the first group is stdlib imports, the second group is third-party libs, and the final group is imports from the current package. Each group is separated by a blank line.

Copy link
Member

@jeremycline jeremycline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments, but the only thing I think really needs to change is the decoding of FTP responses.


try:
headers = REQUEST_HEADERS.copy()
token = config.get('GITHUB_ACCESS_TOKEN')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you set a default for the token, you don't need to use get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right

def __init__(self, reset_time):
self._reset_time = reset_time

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend just calling self._reset_time self.reset_time and not making a property. Properties are handy if you need to do some sort of transformation or you want your internal representation to differ from the public API in some way, but here it's the same.


import unittest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the PEP 8 rules, unittest should stay up in the top group and six should move to this group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know this

date: ['Thu, 23 Aug 2018 09:39:54 GMT']
keep-alive: ['timeout=5, max=100']
server: [Apache/2.2.22 (Fedora)]
x-clacks-overhead: [GNU Terry Pratchett]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised by this value when I fetched the URL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you're not familiar with Terry Pratchett's Discworld books, one of his books includes a lot of amusing references to the Free Software movement and this HTTP header is in reference to one of his books.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know his work. I read almost all his books from Discworld, but I was surprised to see the header field, I never saw it before.


_log.debug('Received %s tags for %s' % (total_count, project.name))

# TODO: Save cursor to project and use it to fetch only newer versions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend turning these TODOs into tickets and removing them from the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

try:
dir_listing = resp.text
except AttributeError:
dir_listing = resp.decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just skimmed https://tools.ietf.org/html/rfc2640, but it sounds like there's no guarantees that the response will be UTF-8 encoded, and even if the spec said it MUST be, I'd be willing to bet there are some non-conforming servers out there. If it should only be ASCII/UTF-8 then this is fine, but this should handle an Unicode decoding exception and raise the appropriate Anitya exception explaining that the FTP service is misbehaving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add another try-except when trying to decode the binary string and if it's not UTF-8, I will just raise AnityaPluginException.
Maybe there is some way, how to get the encodings of the binary string and then use the appropriate decode method. This will be nice way how to do it.
I will check this.
It will be much easier, if Requests library could work with FTP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a requests-ftp extension, although I'm not sure if it works with current requests. It'd be nice to drop the FTP support, but I'm not sure if there's still a number of projects only available over FTP. I think just bailing is a reasonable approach. The specification indicates it should only ever be ASCII or UTF-8, but because FTP is so old I have no doubt there are servers out there not following the latest RFC about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it, but it's only a template, not real library. It's only some basic functionality for future work.

url, e.reason))
except UnicodeDecodeError:
raise AnityaPluginException(
'FTP response is not unicode: %s' % url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly more accurate error message would be 'FTP response cannot be decoded with UTF-8' since Unicode defines the code points and UTF-8 is a particular way to map Unicode code points to a binary representation, but this is also fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds better than current error message.

@@ -115,7 +115,6 @@ class RateLimitException(AnityaException):
def __init__(self, reset_time):
self._reset_time = reset_time

@property
def reset_time(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, what I meant was just delete this method and change the name from self._reset_time to self.reset_time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only want the time to be retrievable by the code which catches it.

Maybe bad habit from other object oriented languages.
I'm used to have getter and setter (when needed) for every attribute the object has.

But in this case it could be probably better to have direct access to reset_time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't really call it a bad habit (this is the only way to do things in Java), but it's not "Pythonic". Python uses _ it indicate a variable, method, function, etc is private and might change at any time, but doesn't enforce it. If the interface you'd like to present to the public differs from the one you'd like to manage internally, it's a perfect place for the setter/getter pattern with properties. Otherwise, just exposing the variable itself is fine.

@Zlopez
Copy link
Contributor Author

Zlopez commented Aug 28, 2018

Update with latest master git rebase master

Copy link
Member

@jeremycline jeremycline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ 🍰 ✨

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants