Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GDPR script #649

Merged
merged 1 commit into from
Nov 29, 2018
Merged

Add GDPR script #649

merged 1 commit into from
Nov 29, 2018

Conversation

Zlopez
Copy link
Contributor

@Zlopez Zlopez commented Oct 29, 2018

Add SAR (Subject Access Requests) GDPR (General Data Protection Regulation)
script that could be used to gather user data from database.
This script currently takes data from users and
social_auth_usersocialauth table and saves them in json format.

Signed-off-by: Michal Konečný [email protected]

@codecov-io
Copy link

codecov-io commented Oct 29, 2018

Codecov Report

Merging #649 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #649      +/-   ##
==========================================
+ Coverage   95.76%   95.81%   +0.04%     
==========================================
  Files          56       57       +1     
  Lines        2835     2864      +29     
  Branches      387      391       +4     
==========================================
+ Hits         2715     2744      +29     
  Misses         84       84              
  Partials       36       36
Impacted Files Coverage Δ
anitya/db/models.py 100% <100%> (ø) ⬆️
anitya/sar.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f781f2...456af31. Read the comment docs.

@@ -948,6 +948,31 @@ def get_id(self):
"""
return six.text_type(self.id)

def __json__(self, detailed=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like detailed does not do anything (and it is also undocumented).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the method from other class and didn't noticed this parameter. Will remove.

anitya/sar.py Outdated
from anitya.config import config
from anitya import db

LOG = logging.getLogger('anitya')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, naming a variable in all caps conventionally means that it is a constant. The logger isn't a constant, so I recommend naming it _log (since it is also meant to be private to this module).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from anitya_cron.py script. Will change.

anitya/sar.py Outdated

with open(FILENAME, 'w') as fd:
LOG.info('Dump data to json {}'.format(users_list))
json.dump(users_list, fd)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the SAR data must be written to stdout as documented here:

https://fedora-infra-docs.readthedocs.io/en/latest/sysadmin-guide/sops/gdpr_sar.html#script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the output is sar_output_file environment variable. If it should only print on stdout it is much easier to test.

sar.main()

testdata = open(sar.FILENAME).read()
print(testdata)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray print statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

@@ -0,0 +1,137 @@
# -*- coding: utf-8 -*-
#
# Copyright © 2014 Red Hat, Inc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from another file. Didn't noticed. Will change.

anitya/sar.py Outdated

if SAR_EMAIL:
_log.debug('Find users by e-mail {}'.format(SAR_EMAIL))
users = users + session.query(db.User).filter(db.User.email == SAR_EMAIL).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this as db.User.query..filter(db.User.email == SAR_EMAIL).all() or even db.User.query.filter_by(email=SAR_EMAIL).all().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know you can write it like this. It is more readable. I will change it.

@@ -954,6 +954,31 @@ def get_id(self):
"""
return six.text_type(self.id)

def __json__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is copied from other model, but I do not think the name of this method conveys what it does. It's not a "real" dunder method, nor is it actually returning JSON. as_dict or to_dict or something is clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right.


out, err = self.capsys.readouterr()

self.assertTrue('"username": "' + user.username + '"' in out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a format string here would make it easier to read:

Suggested change
self.assertTrue('"username": "' + user.username + '"' in out)
self.assertTrue('"username": "{}"'.format(user.username) in out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this.

@Zlopez
Copy link
Contributor Author

Zlopez commented Nov 22, 2018

@bowlofeggs is this looking good to you now?

out, err = self.capsys.readouterr()
print(out)

self.assertTrue('' in out)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be True for any str:

>>> '' in 'hello'
True

I recommend asserting that the output is exactly the JSON string that is expected (would that be []?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing. I will change this to work with json.

self.assertTrue('"username": "{}"'.format(user.username) in out)
self.assertTrue('"email": "{}"'.format(user.email) in out)
self.assertFalse('"username": "{}"'.format(user2.username) in out)
self.assertFalse('"email": "{}"'.format(user2.email) in out)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than spot checking strings, why not load the output with json.loads() and asserting that it is exactly the expected structure, including all keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied this from some other test, but you are right, it will be better to check against expected JSON. I will change this.

out, err = self.capsys.readouterr()

self.assertTrue('"username": "{}"'.format(user.username) in out)
self.assertTrue('"email": "{}"'.format(user.email) in out)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, it would be good to assert that the output is valid JSON and has all the expected keys.

anitya/sar.py Outdated
_log = logging.getLogger('anitya')

SAR_USERNAME = os.getenv('SAR_USERNAME')
SAR_EMAIL = os.getenv('SAR_EMAIL')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to test this if these lines were part of main. Then you can use mock to set the environment variables and assert that these lines use os.getenv() correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Thanks for noticing.

@Zlopez
Copy link
Contributor Author

Zlopez commented Nov 27, 2018

@bowlofeggs I addressed your changes and did a squash.

@Zlopez
Copy link
Contributor Author

Zlopez commented Nov 27, 2018

Rebased with latest master

bowlofeggs
bowlofeggs previously approved these changes Nov 27, 2018
}]

with mock.patch('os.getenv') as mock_getenv:
mock_getenv.side_effect = mock_getenv_email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use mock.patch.dict() here instead? It's less code for you to maintain and is clearer for readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't thought about this. You are right, it will be clearer.

}
]

sar.SAR_USERNAME = 'user'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't seem needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably missed it.

@Zlopez
Copy link
Contributor Author

Zlopez commented Nov 28, 2018

I fixed the conflict and addressed the last suggestions.
There are still failing tests, but these will be fixed by #667

@bowlofeggs
Copy link

Cool, let me know once tests pass.

Add SAR (Subject Access Requests) GDPR (General Data Protection Regulation)
script that could be used to gather user data from database.
This script currently takes data from users and
social_auth_usersocialauth table and saves them in json format.

Signed-off-by: Michal Konečný <[email protected]>
@Zlopez
Copy link
Contributor Author

Zlopez commented Nov 28, 2018

@bowlofeggs
Tests are working again. \o/

@mergify mergify bot merged commit ef735f7 into fedora-infra:master Nov 29, 2018
@Zlopez Zlopez deleted the gdpr branch February 22, 2019 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants