Skip to content
This repository was archived by the owner on Feb 22, 2022. It is now read-only.

[stable/airflow] airflow.extraPipPackages that require gcc #22677

Closed
mbaroody opened this issue Jun 4, 2020 · 7 comments
Closed

[stable/airflow] airflow.extraPipPackages that require gcc #22677

mbaroody opened this issue Jun 4, 2020 · 7 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@mbaroody
Copy link

mbaroody commented Jun 4, 2020

Describe the bug
There is an issue in stable/airflow with the airflow.extraPipPackages value where if you specify a pip package that requires gcc to install, the chart installation fails. I realize this is an shortcoming of the official Airflow image that removes gcc which is what makes it "slim". Ideally we change the default image to include gcc, but I haven't been able to find an official Airflow image that has gcc installed.

Version of Helm and Kubernetes:

~ kubectl version

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.8", GitCommit:"ec6eb119b81be488b030e849b9e64fda4caaf33c", GitTreeState:"clean", BuildDate:"2020-03-12T21:00:06Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.8", GitCommit:"ec6eb119b81be488b030e849b9e64fda4caaf33c", GitTreeState:"clean", BuildDate:"2020-03-12T20:52:22Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
~ helm version

Client: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}

Which chart:
stable/airflow

What happened:
logs of $AIRFLOW_WEB when trying to install chart with value airflow.extraPipPackages: [ "pyldap" ]:

unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1

What you expected to happen:
successfully install

How to reproduce it (as minimally and precisely as possible):

helm install --set "airflow.extraPipPackages[0]=pyldap" stable/airflow

Anything else we need to know:

@mbaroody mbaroody changed the title stable/airflow airflow.extraPipPackages that require gcc [stable/airflow] airflow.extraPipPackages that require gcc Jun 4, 2020
@alexbegg
Copy link
Contributor

alexbegg commented Jun 5, 2020

I have this issue too with one of my dependencies I am trying to install with installRequirements as true (similar because both methods do a pip install)

While I don't yet have a solution, I do know that there is discussion to add a way to add additional dependencies for the official apache/airflow image: apache/airflow#8872

@mbaroody
Copy link
Author

mbaroody commented Jun 5, 2020

fairly trivial to build manually I suppose with:

./breeze build-image
  --production-image \
  --extras "pyldap" \
  --install-airflow-version 1.10.10 \
  --python 3.6

I suppose. Too bad you can't add a hook in helm to build the image for certain .Values

@thesuperzapper
Copy link
Contributor

Option 1: (probably won't work)
While it not perfect, we could consider using conda which often has built binaries in their repositories. However, this would also require an addition to the image (unless we can assume that the Pods have internet at start time, then I guess we could download conda).

However, I am not inclined to do this, as the main benefit of using the offical images is the prebuilt airflow, which is only really distributed in the official airflow images.

Option 2: (Good for non-prod)
Potentially, we could also specify a list of apt-get packages to install at Pod start. However, you have to realise that downloading and compiling a package may dramatically increase the start time of the pods.

Option 3: (recommended)
The best bet for production use is to take the official Docker image, and make your own custom one with something like:

FROM apache/airflow:1.10.10-python3.6

# install binary dependancies 
RUN apt-get update \
 && apt-get -y install \
    XXXX \
    YYYY \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

# install python dependancies
RUN pip install XXXXX

@mbaroody
Copy link
Author

doing something very similar to option 3, @thesuperzapper.

@alexbegg
Copy link
Contributor

I am in favor of option 3 too, and it is what the apache/airflow docker maintainers suggest. Basically if you need to add dependencies, build your own image and then use that image for the helm chart instead.

I also take back my last comment because I linked to a discussion about adding a way to add dependencies at build time, but with helm charts we are running already built images so therefore that won't be of use to us.

@stale
Copy link

stale bot commented Jul 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2020
@stale
Copy link

stale bot commented Jul 25, 2020

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Jul 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants