Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/custom xloader site url rebased #243

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 4 additions & 72 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ jobs:

validateVersion:
runs-on: ubuntu-latest
if: github.repository == 'ckan/ckanext-xloader'
steps:
- uses: actions/checkout@v4

Expand All @@ -52,79 +53,10 @@ jobs:
exit 1
fi

lint:
needs: validateVersion
if: github.repository == 'ckan/ckanext-xloader'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install requirements
run: pip install flake8 pycodestyle

- name: Check syntax
run: flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics --extend-exclude ckan

test:
needs: lint
strategy:
matrix:
include: #ckan-image see https://github.com/ckan/ckan-docker-base, ckan-version controls other image tags
- ckan-version: "2.11"
ckan-image: "2.11-py3.10"
- ckan-version: "2.10"
ckan-image: "2.10-py3.10"
#- ckan-version: "master" Publish does not care about master
# ckan-image: "master"
fail-fast: false

name: CKAN ${{ matrix.ckan-version }}
runs-on: ubuntu-latest
container:
image: ckan/ckan-dev:${{ matrix.ckan-image }}
options: --user root
services:
solr:
image: ckan/ckan-solr:${{ matrix.ckan-version }}-solr9
postgres:
image: ckan/ckan-postgres-dev:${{ matrix.ckan-version }}
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports:
- 5432:5432
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
redis:
image: redis:3
env:
CKAN_SQLALCHEMY_URL: postgresql://ckan_default:pass@postgres/ckan_test
CKAN_DATASTORE_WRITE_URL: postgresql://datastore_write:pass@postgres/datastore_test
CKAN_DATASTORE_READ_URL: postgresql://datastore_read:pass@postgres/datastore_test
CKAN_SOLR_URL: http://solr:8983/solr/ckan
CKAN_REDIS_URL: redis://redis:6379/1

steps:
- uses: actions/checkout@v4
- if: ${{ matrix.ckan-version == 2.9 }}
run: pip install "setuptools>=44.1.0,<71"
- name: Install requirements
run: |
pip install -r requirements.txt
pip install -r dev-requirements.txt
pip install -e .
pip install -U requests[security]
# Replace default path to CKAN core config file with the one on the container
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini
- name: Setup extension (CKAN >= 2.9)
run: |
ckan -c test.ini db init
- name: Run tests
run: pytest --ckan-ini=test.ini --cov=ckanext.xloader --disable-warnings ckanext/xloader/tests
needs: validateVersion
name: Test
uses: ./.github/workflows/test.yml # Call the reusable workflow
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use test.yml so we don't need to duplicate how we test again.


publishSkipped:
if: github.repository != 'ckan/ckanext-xloader'
Expand Down
19 changes: 9 additions & 10 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
experimental: true # master is unstable, good to know if we are compatible or not
fail-fast: false

name: CKAN ${{ matrix.ckan-version }}
name: ${{ matrix.experimental && '**Fail_Ignored** ' || '' }} CKAN ${{ matrix.ckan-version }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate from another repo to give info that its 'ignoring' errors.

runs-on: ubuntu-latest
container:
image: ckan/ckan-dev:${{ matrix.ckan-image }}
Expand Down Expand Up @@ -65,12 +65,7 @@ jobs:
- uses: actions/checkout@v4
continue-on-error: ${{ matrix.experimental }}

- name: Pin setuptools for ckan 2.9 only
if: ${{ matrix.ckan-version == 2.9 }}
run: pip install "setuptools>=44.1.0,<71"
continue-on-error: ${{ matrix.experimental }}

- name: Install requirements
- name: ${{ matrix.experimental && '**Fail_Ignored** ' || '' }} Install requirements
continue-on-error: ${{ matrix.experimental }}
run: |
pip install -r requirements.txt
Expand All @@ -80,16 +75,20 @@ jobs:
# Replace default path to CKAN core config file with the one on the container
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini

- name: Setup extension
- name: ${{ matrix.experimental && '**Fail_Ignored** ' || '' }} Setup extension
continue-on-error: ${{ matrix.experimental }}
run: |
ckan -c test.ini db init
ckan -c test.ini user add ckan_admin email=ckan_admin@localhost password="AbCdEf12345!@#%"
ckan -c test.ini sysadmin add ckan_admin
ckan config-tool test.ini "ckanext.xloader.api_token=$(ckan -c test.ini user token add ckan_admin xloader | tail -n 1 | tr -d '\t')"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configure proper xloader api token for test based on what would be going into the docker container.

ckan -c test.ini user list

- name: Run tests
- name: ${{ matrix.experimental && '**Fail_Ignored** ' || '' }} Run tests
continue-on-error: ${{ matrix.experimental }}
run: pytest --ckan-ini=test.ini --cov=ckanext.xloader --disable-warnings ckanext/xloader/tests --junit-xml=/tmp/artifacts/junit/results.xml

- name: Test Summary
- name: ${{ matrix.experimental && '**Fail_Ignored** ' || '' }} Test Summary
uses: test-summary/action@v2
continue-on-error: ${{ matrix.experimental }}
with:
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ To install XLoader:
execute jobs against the server:

ckanext.xloader.api_token = <your-CKAN-generated-API-Token>
ckan config-tool test.ini "ckanext.xloader.api_token=$(ckan -c test.ini user token add ckan_admin xloader | tail -n 1 | tr -d '\t')"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: readme update on new ckan config options for custom xloader site url


6. If it is a production server, you'll want to store jobs info in a
more robust database than the default sqlite file. It can happily
Expand Down Expand Up @@ -220,8 +221,7 @@ ckanext.xloader.api_token = eyJ0eXAiOiJKV1QiLCJh.eyJqdGkiOiJ0M2VNUFlQWFg0VU.8QgV

Default value: none

Uses a specific API token for the xloader_submit action instead of the
apikey of the site_user. It's mandatory starting from CKAN 2.10. You can get one
It's mandatory starting from CKAN 2.10. You can get one
running the command `ckan user token add {USER_NAME} xloader -q`


Expand Down
3 changes: 2 additions & 1 deletion ckanext/xloader/action.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def xloader_submit(context, data_dict):
:rtype: bool
'''
p.toolkit.check_access('xloader_submit', context, data_dict)
api_key = utils.get_xloader_user_apitoken()
custom_queue = data_dict.pop('queue', rq_jobs.DEFAULT_QUEUE_NAME)
schema = context.get('schema', ckanext.xloader.schema.xloader_submit_schema())
data_dict, errors = _validate(data_dict, schema, context)
Expand Down Expand Up @@ -147,7 +148,7 @@ def xloader_submit(context, data_dict):
qualified=True
)
data = {
'api_key': utils.get_xloader_user_apitoken(),
'api_key': api_key,
'job_type': 'xloader_to_datastore',
'result_url': callback_url,
'metadata': {
Expand Down
4 changes: 2 additions & 2 deletions ckanext/xloader/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import ckan.plugins.toolkit as tk

from ckanext.xloader.jobs import xloader_data_into_datastore_
from ckanext.xloader.utils import XLoaderFormats
from ckanext.xloader.utils import XLoaderFormats, get_xloader_user_apitoken


class XloaderCmd:
Expand Down Expand Up @@ -117,7 +117,7 @@ def _submit_resource(self, resource, user, indent=0, sync=False, queue=None):
data_dict['ckan_url'] = tk.config.get('ckan.site_url')
input_dict = {
'metadata': data_dict,
'api_key': 'TODO'
'api_key': get_xloader_user_apitoken()
}
logger = logging.getLogger('ckanext.xloader.cli')
xloader_data_into_datastore_(input_dict, None, logger)
Expand Down
24 changes: 19 additions & 5 deletions ckanext/xloader/config_declaration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@ version: 1
groups:
- annotation: ckanext-xloader settings
options:
- key: ckanext.xloader.site_url
example: http://ckan-dev:5000
default:
description: |
Provide an alternate site URL for the xloader_submit action.
This is useful, for example, when the site is running within a docker network.
Note: This setting will not alter path. i.e ckan.root_path
required: false
- key: ckanext.xloader.site_url_ignore_path_regex
example: "(/PathToS3HostOriginIWantToGoDirectTo|/anotherPath)"
default:
description: |
Provide the ability to ignore paths which can't be mapped to alternative site URL for resource access.
This is useful, for example, when the site is running within a docker network and the cdn front door has
Blob storage mapped to another path on the same domain.
required: false
- key: ckanext.xloader.jobs_db.uri
default: sqlite:////tmp/xloader_jobs.db
description: |
Expand All @@ -14,9 +30,9 @@ groups:
example: eyJ0eXAiOiJKV1QiLCJh.eyJqdGkiOiJ0M2VNUFlQWFg0VU.8QgV8em4RA
description: |
Uses a specific API token for the xloader_submit action instead of the
apikey of the site_user. Will be mandatory when dropping support for
CKAN 2.9.
required: false
apikey of the site_user.
default: 'NOT_SET'
required: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped 2.9, set to mandatory and give it a default we can throw validation errors on. (i.e. stop the chicken and egg problem that you need a running ckan to create the api key to reference in the config)

- key: ckanext.xloader.formats
example: csv application/csv xls application/vnd.ms-excel
description: |
Expand Down Expand Up @@ -171,5 +187,3 @@ groups:
they will also display "complete", "active", "inactive", and "unknown".
type: bool
required: false


27 changes: 8 additions & 19 deletions ckanext/xloader/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,10 @@

from . import db, loader
from .job_exceptions import JobError, HTTPError, DataTooBigError, FileCouldNotBeLoadedError
from .utils import datastore_resource_exists, set_resource_metadata
from .utils import datastore_resource_exists, set_resource_metadata, modify_input_url

try:
from ckan.lib.api_token import get_user_from_token
except ImportError:
get_user_from_token = None

from ckan.lib.api_token import get_user_from_token

log = logging.getLogger(__name__)

Expand Down Expand Up @@ -296,8 +294,9 @@ def _download_resource_data(resource, data, api_key, logger):
data['datastore_contains_all_records_of_source_file'] = False
which will be saved to the resource later on.
'''
# update base url (for possible local loopback)
url = modify_input_url(resource.get('url'))
# check scheme
url = resource.get('url')
url_parts = urlsplit(url)
scheme = url_parts.scheme
if scheme not in ('http', 'https', 'ftp'):
Expand Down Expand Up @@ -467,7 +466,7 @@ def callback_xloader_hook(result_url, api_key, job_dict):

try:
result = requests.post(
result_url,
modify_input_url(result_url), # modify with local config
data=json.dumps(job_dict, cls=DatetimeJsonEncoder),
verify=SSL_VERIFY,
headers=headers)
Expand Down Expand Up @@ -510,19 +509,9 @@ def update_resource(resource, patch_only=False):

def _get_user_from_key(api_key_or_token):
""" Gets the user using the API Token or API Key.

This method provides backwards compatibility for CKAN 2.9 that
supported both methods and previous CKAN versions supporting
only API Keys.
"""
user = None
if get_user_from_token:
user = get_user_from_token(api_key_or_token)
if not user:
user = model.Session.query(model.User).filter_by(
apikey=api_key_or_token
).first()
return user
return get_user_from_token(api_key_or_token)



def get_resource_and_dataset(resource_id, api_key):
Expand Down
16 changes: 1 addition & 15 deletions ckanext/xloader/plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,7 @@
except ImportError:
HAS_IPIPE_VALIDATION = False

try:
config_declarations = toolkit.blanket.config_declarations
except AttributeError:
# CKAN 2.9 does not have config_declarations.
# Remove when dropping support.
def config_declarations(cls):
return cls
config_declarations = toolkit.blanket.config_declarations

if toolkit.check_ckan_version(min_version='2.11'):
from ckanext.datastore.interfaces import IDataDictionaryForm
Expand Down Expand Up @@ -76,14 +70,6 @@ def configure(self, config_):
else:
self.ignore_hash = False

for config_option in ("ckan.site_url",):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ckan.site_url is mandatory in 2.10+ and defaults to 127.0.0.1:5000 if not set, we don't need to config validate this.

if not config_.get(config_option):
raise Exception(
"Config option `{0}` must be set to use ckanext-xloader.".format(
config_option
)
)

# IPipeValidation

def receive_validation_report(self, validation_report):
Expand Down
6 changes: 1 addition & 5 deletions ckanext/xloader/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,7 @@
int_validator = get_validator('int_validator')
OneOf = get_validator('OneOf')
ignore_not_sysadmin = get_validator('ignore_not_sysadmin')

if p.toolkit.check_ckan_version('2.9'):
unicode_safe = get_validator('unicode_safe')
else:
unicode_safe = str
unicode_safe = get_validator('unicode_safe')


def xloader_submit_schema():
Expand Down
23 changes: 19 additions & 4 deletions ckanext/xloader/tests/test_action.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import pytest
from ckan.plugins import toolkit
try:
from unittest import mock
except ImportError:
Expand Down Expand Up @@ -117,10 +118,24 @@ def test_status(self):

assert status["status"] == "pending"

def test_xloader_user_api_token_defaults_to_site_user_apikey(self):
api_token = get_xloader_user_apitoken()
site_user = helpers.call_action("get_site_user")
assert api_token == site_user["apikey"]

def test_xloader_user_api_token_from_config(self):
sysadmin = factories.SysadminWithToken()
apikey = sysadmin["token"]
with mock.patch.dict(toolkit.config, {'ckanext.xloader.api_token': apikey}):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to mimic what would be in the config. I ran out of time to work out how to look up a key inside pytest or hope that the db setup at the start was not 'cleansed'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dynamic token values this approach does make sense.

api_token = get_xloader_user_apitoken()
assert api_token == apikey

@pytest.mark.ckan_config("ckanext.xloader.api_token", "NOT_SET")
def test_xloader_user_api_token_from_config_should_throw_exceptio_when_not_set(self):

hasNotThrownException = True
try:
get_xloader_user_apitoken()
except Exception:
hasNotThrownException = False

assert not hasNotThrownException

@pytest.mark.ckan_config("ckanext.xloader.api_token", "random-api-token")
def test_xloader_user_api_token(self):
Expand Down
Loading
Loading