Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI for Python client #1096

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

DaniilRoman
Copy link

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Will appreciate a python person to take a look. cc @HonahX

runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
Copy link
Contributor

@MonkeyCanCode MonkeyCanCode Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://github.com/apache/polaris/blob/main/regtests/client/python/pyproject.toml#L33, the min supported python version is 3.8. Also, with the command later on where we will be installing newest poetry, it will actually needed at least python 3.9 instead.

- name: Install Poetry
if: steps.cache-poetry.outputs.cache-hit != 'true'
run: |
curl -sSL https://install.python-poetry.org | python3 -
Copy link
Contributor

@MonkeyCanCode MonkeyCanCode Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause issue as this will install poetry 2.x version on python >=3.9 while the repo is still on 1.8.x version. There is a PR from me for upgrading to 2.x version in case if we want to proceed with this: #898

But even with this PR, it is still not a good idea to run latest version as the above command can install newer version of poetry which is not being tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think we should fix the poetry version here to be the one in regtest/requirements.txt

# stop the build if there are Python syntax errors or undefined names
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this line will have exit-zero which is a free-pass regardless what people check-in (also the existed code doesn't follow the these rules as well). Should we consider fix existed code first then ensure no other warnings/errors are shows? As if we go with current code, this is will still be ignore and it is always pass.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I took it from regtests/client/python/.github/workflows/python.yml assuming that it was verified before

There are quite a few linter complaints, that's true
Screenshot 2025-03-04 at 21 42 14

But at the same time, we can start with smth small like this command
Screenshot 2025-03-04 at 21 43 37


- name: Test with pytest
working-directory: regtests/client/python
run: poetry run pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every test used by the current test cases are on pytest and running this command will throw error (due to one of the recent change for profile setting which we needed script dir to be set). But do we really needed to run this as this is already getting done via https://github.com/apache/polaris/blob/main/.github/workflows/regtest.yml#L55

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we really needed to run this as this is already getting done via

I think it's still worth running the client tests here because they help quickly isolate errors specific to the client. Additionally, the regtest workflow doesn’t cover all supported Python versions, and it's beneficial to test across different versions since users may run the CLI in various environments.

Copy link
Contributor

@MonkeyCanCode MonkeyCanCode Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, in that case, we will need to set a variable for SCRIPT_DIR to be able to run this one. Also, some of the test cases may needed spark as well which is not getting install from poetry as of now. I had a PR to switch from local installed spark to python but that was removed due to too many changes. So we may want to introduce that change again or setup spark as how the current test setup is doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passes locally without specifing anything specific for Spark 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check a bit later tonight for this. It is possible as not all tests are pytest (in this case, the spark related ones).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the SCRIPT_DIR I was referring to:

(polaris-venv) ➜  python git:(main) ✗ poetry run pytest
The "poetry.dev-dependencies" section is deprecated and will be removed in a future version. Use "poetry.group.dev.dependencies" instead.
========================================================================== test session starts ===========================================================================
platform darwin -- Python 3.13.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/yong/Desktop/GitHome/polaris/regtests/client/python
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 208 items / 1 error

================================================================================= ERRORS =================================================================================
_______________________________________________________________ ERROR collecting test/test_cli_parsing.py ________________________________________________________________
test/test_cli_parsing.py:26: in <module>
    from cli.command import Command
cli/command/__init__.py:22: in <module>
    from cli.constants import Commands, Arguments
cli/constants.py:241: in <module>
    raise Exception("The SCRIPT_DIR environment variable is not set. Please set it to the Polaris's script directory.")
E   Exception: The SCRIPT_DIR environment variable is not set. Please set it to the Polaris's script directory.

You can get around it with following:

 SCRIPT_DIR="" poetry run pytest

Also, quick checks shows we do have spark deps but you are not hitting it as you are on the wrong test dir. We should run test on regtests this will include a lot more testings (this will fail as it will need polaris server to be up running and not just mock tests which is the dir you were hitting earlier). Then here is an example script path where we are invoking spark: ./t_pyspark/src/iceberg_spark.py

MonkeyCanCode added a commit to MonkeyCanCode/polaris that referenced this pull request Mar 4, 2025
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry
key: ${{ runner.os }}-poetry-${{ hashFiles('regtests/client/python/poetry.lock') }}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it was nicely noticed in the related PR here #1102 (comment)
This line will cause failure since poetry.lock is in .gitignore

I'd duplicate my question here as well
If we had poetry.lock we could even use this config cache: 'poetry' of actions/setup-python@v5 GitHub action (here's this docs section).

Without having poetry.lock we can only cache based on pyptoject.toml which doesn't cover transitive dependencies.
I've also thought about generating poetry.lock on CI but for this we need to have poetry first and when we install poetry, it installs dependencies as well - so, it wouldn't work

What about removing poetry.lock from .gitignore and creating it and managing for the future changes? 🤔
Or is it good enough to create a hash key based on myproject.toml instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for this in the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants