-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI for Python client #1096
base: main
Are you sure you want to change the base?
Add CI for Python client #1096
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Will appreciate a python person to take a look. cc @HonahX
.github/workflows/python-client.yml
Outdated
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://github.com/apache/polaris/blob/main/regtests/client/python/pyproject.toml#L33, the min supported python version is 3.8. Also, with the command later on where we will be installing newest poetry, it will actually needed at least python 3.9 instead.
.github/workflows/python-client.yml
Outdated
- name: Install Poetry | ||
if: steps.cache-poetry.outputs.cache-hit != 'true' | ||
run: | | ||
curl -sSL https://install.python-poetry.org | python3 - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may cause issue as this will install poetry 2.x version on python >=3.9 while the repo is still on 1.8.x version. There is a PR from me for upgrading to 2.x version in case if we want to proceed with this: #898
But even with this PR, it is still not a good idea to run latest version as the above command can install newer version of poetry which is not being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think we should fix the poetry version here to be the one in regtest/requirements.txt
.github/workflows/python-client.yml
Outdated
# stop the build if there are Python syntax errors or undefined names | ||
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics | ||
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide | ||
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this line will have exit-zero which is a free-pass regardless what people check-in (also the existed code doesn't follow the these rules as well). Should we consider fix existed code first then ensure no other warnings/errors are shows? As if we go with current code, this is will still be ignore and it is always pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.github/workflows/python-client.yml
Outdated
|
||
- name: Test with pytest | ||
working-directory: regtests/client/python | ||
run: poetry run pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every test used by the current test cases are on pytest and running this command will throw error (due to one of the recent change for profile setting which we needed script dir to be set). But do we really needed to run this as this is already getting done via https://github.com/apache/polaris/blob/main/.github/workflows/regtest.yml#L55
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we really needed to run this as this is already getting done via
I think it's still worth running the client tests here because they help quickly isolate errors specific to the client. Additionally, the regtest workflow doesn’t cover all supported Python versions, and it's beneficial to test across different versions since users may run the CLI in various environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, in that case, we will need to set a variable for SCRIPT_DIR to be able to run this one. Also, some of the test cases may needed spark as well which is not getting install from poetry as of now. I had a PR to switch from local installed spark to python but that was removed due to too many changes. So we may want to introduce that change again or setup spark as how the current test setup is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passes locally without specifing anything specific for Spark 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check a bit later tonight for this. It is possible as not all tests are pytest (in this case, the spark related ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the SCRIPT_DIR I was referring to:
(polaris-venv) ➜ python git:(main) ✗ poetry run pytest
The "poetry.dev-dependencies" section is deprecated and will be removed in a future version. Use "poetry.group.dev.dependencies" instead.
========================================================================== test session starts ===========================================================================
platform darwin -- Python 3.13.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/yong/Desktop/GitHome/polaris/regtests/client/python
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 208 items / 1 error
================================================================================= ERRORS =================================================================================
_______________________________________________________________ ERROR collecting test/test_cli_parsing.py ________________________________________________________________
test/test_cli_parsing.py:26: in <module>
from cli.command import Command
cli/command/__init__.py:22: in <module>
from cli.constants import Commands, Arguments
cli/constants.py:241: in <module>
raise Exception("The SCRIPT_DIR environment variable is not set. Please set it to the Polaris's script directory.")
E Exception: The SCRIPT_DIR environment variable is not set. Please set it to the Polaris's script directory.
You can get around it with following:
SCRIPT_DIR="" poetry run pytest
Also, quick checks shows we do have spark deps but you are not hitting it as you are on the wrong test dir. We should run test on regtests
this will include a lot more testings (this will fail as it will need polaris server to be up running and not just mock tests which is the dir you were hitting earlier). Then here is an example script path where we are invoking spark: ./t_pyspark/src/iceberg_spark.py
uses: actions/cache@v4 | ||
with: | ||
path: ~/.cache/pypoetry | ||
key: ${{ runner.os }}-poetry-${{ hashFiles('regtests/client/python/poetry.lock') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it was nicely noticed in the related PR here #1102 (comment)
This line will cause failure since poetry.lock
is in .gitignore
I'd duplicate my question here as well
If we had poetry.lock
we could even use this config cache: 'poetry' of actions/setup-python@v5 GitHub action (here's this docs section).
Without having poetry.lock
we can only cache based on pyptoject.toml
which doesn't cover transitive dependencies.
I've also thought about generating poetry.lock
on CI but for this we need to have poetry first and when we install poetry, it installs dependencies as well - so, it wouldn't work
What about removing poetry.lock
from .gitignore
and creating it and managing for the future changes? 🤔
Or is it good enough to create a hash key based on myproject.toml
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment for this in the other PR.
#1058