-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GDPR script #649
Add GDPR script #649
Conversation
Codecov Report
@@ Coverage Diff @@
## master #649 +/- ##
==========================================
+ Coverage 95.76% 95.81% +0.04%
==========================================
Files 56 57 +1
Lines 2835 2864 +29
Branches 387 391 +4
==========================================
+ Hits 2715 2744 +29
Misses 84 84
Partials 36 36
Continue to review full report at Codecov.
|
anitya/db/models.py
Outdated
@@ -948,6 +948,31 @@ def get_id(self): | |||
""" | |||
return six.text_type(self.id) | |||
|
|||
def __json__(self, detailed=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like detailed
does not do anything (and it is also undocumented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied the method from other class and didn't noticed this parameter. Will remove.
anitya/sar.py
Outdated
from anitya.config import config | ||
from anitya import db | ||
|
||
LOG = logging.getLogger('anitya') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python, naming a variable in all caps conventionally means that it is a constant. The logger isn't a constant, so I recommend naming it _log
(since it is also meant to be private to this module).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from anitya_cron.py
script. Will change.
anitya/sar.py
Outdated
|
||
with open(FILENAME, 'w') as fd: | ||
LOG.info('Dump data to json {}'.format(users_list)) | ||
json.dump(users_list, fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the SAR data must be written to stdout as documented here:
https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/gdpr_sar.html#script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the output is sar_output_file
environment variable. If it should only print on stdout it is much easier to test.
anitya/tests/test_sar.py
Outdated
sar.main() | ||
|
||
testdata = open(sar.FILENAME).read() | ||
print(testdata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray print statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove.
anitya/tests/test_sar.py
Outdated
@@ -0,0 +1,137 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright © 2014 Red Hat, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from another file. Didn't noticed. Will change.
anitya/sar.py
Outdated
|
||
if SAR_EMAIL: | ||
_log.debug('Find users by e-mail {}'.format(SAR_EMAIL)) | ||
users = users + session.query(db.User).filter(db.User.email == SAR_EMAIL).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write this as db.User.query..filter(db.User.email == SAR_EMAIL).all()
or even db.User.query.filter_by(email=SAR_EMAIL).all()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know you can write it like this. It is more readable. I will change it.
anitya/db/models.py
Outdated
@@ -954,6 +954,31 @@ def get_id(self): | |||
""" | |||
return six.text_type(self.id) | |||
|
|||
def __json__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copied from other model, but I do not think the name of this method conveys what it does. It's not a "real" dunder method, nor is it actually returning JSON. as_dict
or to_dict
or something is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right.
anitya/tests/test_sar.py
Outdated
|
||
out, err = self.capsys.readouterr() | ||
|
||
self.assertTrue('"username": "' + user.username + '"' in out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a format string here would make it easier to read:
self.assertTrue('"username": "' + user.username + '"' in out) | |
self.assertTrue('"username": "{}"'.format(user.username) in out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this.
@bowlofeggs is this looking good to you now? |
anitya/tests/test_sar.py
Outdated
out, err = self.capsys.readouterr() | ||
print(out) | ||
|
||
self.assertTrue('' in out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be True
for any str
:
>>> '' in 'hello'
True
I recommend asserting that the output is exactly the JSON string that is expected (would that be []
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing. I will change this to work with json.
anitya/tests/test_sar.py
Outdated
self.assertTrue('"username": "{}"'.format(user.username) in out) | ||
self.assertTrue('"email": "{}"'.format(user.email) in out) | ||
self.assertFalse('"username": "{}"'.format(user2.username) in out) | ||
self.assertFalse('"email": "{}"'.format(user2.email) in out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than spot checking strings, why not load the output with json.loads()
and asserting that it is exactly the expected structure, including all keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied this from some other test, but you are right, it will be better to check against expected JSON. I will change this.
anitya/tests/test_sar.py
Outdated
out, err = self.capsys.readouterr() | ||
|
||
self.assertTrue('"username": "{}"'.format(user.username) in out) | ||
self.assertTrue('"email": "{}"'.format(user.email) in out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, it would be good to assert that the output is valid JSON and has all the expected keys.
anitya/sar.py
Outdated
_log = logging.getLogger('anitya') | ||
|
||
SAR_USERNAME = os.getenv('SAR_USERNAME') | ||
SAR_EMAIL = os.getenv('SAR_EMAIL') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to test this if these lines were part of main. Then you can use mock
to set the environment variables and assert that these lines use os.getenv()
correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Thanks for noticing.
@bowlofeggs I addressed your changes and did a squash. |
Rebased with latest master |
anitya/tests/test_sar.py
Outdated
}] | ||
|
||
with mock.patch('os.getenv') as mock_getenv: | ||
mock_getenv.side_effect = mock_getenv_email |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use mock.patch.dict()
here instead? It's less code for you to maintain and is clearer for readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't thought about this. You are right, it will be clearer.
anitya/tests/test_sar.py
Outdated
} | ||
] | ||
|
||
sar.SAR_USERNAME = 'user' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line doesn't seem needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably missed it.
I fixed the conflict and addressed the last suggestions. |
Cool, let me know once tests pass. |
Add SAR (Subject Access Requests) GDPR (General Data Protection Regulation) script that could be used to gather user data from database. This script currently takes data from users and social_auth_usersocialauth table and saves them in json format. Signed-off-by: Michal Konečný <[email protected]>
@bowlofeggs |
Add SAR (Subject Access Requests) GDPR (General Data Protection Regulation)
script that could be used to gather user data from database.
This script currently takes data from users and
social_auth_usersocialauth table and saves them in json format.
Signed-off-by: Michal Konečný [email protected]