diff --git a/docs/apache-airflow/installation.rst b/docs/apache-airflow/installation.rst index eac6894a6f2d4..0184216435802 100644 --- a/docs/apache-airflow/installation.rst +++ b/docs/apache-airflow/installation.rst @@ -27,7 +27,7 @@ installation with other tools as well. .. note:: - Airflow is also distributed as a Docker image (OCI Image). For more information, see: :ref:`docker_image` + Airflow is also distributed as a Docker image (OCI Image). Consider using it to guarantee that software will always run the same no matter where it is deployed. For more information, see: :doc:`docker-stack:index`. Prerequisites ''''''''''''' diff --git a/docs/apache-airflow/production-deployment.rst b/docs/apache-airflow/production-deployment.rst index 0f4dfaa1233f5..ecc6077d81ea5 100644 --- a/docs/apache-airflow/production-deployment.rst +++ b/docs/apache-airflow/production-deployment.rst @@ -118,852 +118,7 @@ To mitigate these issues, make sure you have a :doc:`health check `_ is a bare image -that has a few external dependencies and extras installed.. - -The Apache Airflow image provided as convenience package is optimized for size, so -it provides just a bare minimal set of the extras and dependencies installed and in most cases -you want to either extend or customize the image. You can see all possible extras in -:doc:`extra-packages-ref`. The set of extras used in Airflow Production image are available in the -`Dockerfile `_. - -The production images are build in DockerHub from released version and release candidates. There -are also images published from branches but they are used mainly for development and testing purpose. -See `Airflow Git Branching `_ -for details. - - -Customizing or extending the Production Image ---------------------------------------------- - -Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the -way we do, you might want to know very quickly how you can extend or customize the existing image -for Apache Airflow. This chapter gives you a short answer to those questions. - -Airflow Summit 2020's `Production Docker Image `_ talk provides more -details about the context, architecture and customization/extension methods for the Production Image. - -Extending the image -................... - -Extending the image is easiest if you just need to add some dependencies that do not require -compiling. The compilation framework of Linux (so called ``build-essential``) is pretty big, and -for the production images, size is really important factor to optimize for, so our Production Image -does not contain ``build-essential``. If you need compiler like gcc or g++ or make/cmake etc. - those -are not found in the image and it is recommended that you follow the "customize" route instead. - -How to extend the image - it is something you are most likely familiar with - simply -build a new image using Dockerfile's ``FROM`` directive and add whatever you need. Then you can add your -Debian dependencies with ``apt`` or PyPI dependencies with ``pip install`` or any other stuff you need. - -You should be aware, about a few things: - -* The production image of airflow uses "airflow" user, so if you want to add some of the tools - as ``root`` user, you need to switch to it with ``USER`` directive of the Dockerfile. Also you - should remember about following the - `best practises of Dockerfiles `_ - to make sure your image is lean and small. - -.. code-block:: dockerfile - - FROM apache/airflow:2.0.1 - USER root - RUN apt-get update \ - && apt-get install -y --no-install-recommends \ - my-awesome-apt-dependency-to-add \ - && apt-get autoremove -yqq --purge \ - && apt-get clean \ - && rm -rf /var/lib/apt/lists/* - USER airflow - - -* PyPI dependencies in Apache Airflow are installed in the user library, of the "airflow" user, so - you need to install them with the ``--user`` flag and WITHOUT switching to airflow user. Note also - that using --no-cache-dir is a good idea that can help to make your image smaller. - -.. code-block:: dockerfile - - FROM apache/airflow:2.0.1 - RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add - -* As of 2.0.1 image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment variable - to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. - - -* If your apt, or PyPI dependencies require some of the build-essentials, then your best choice is - to follow the "Customize the image" route. However it requires to checkout sources of Apache Airflow, - so you might still want to choose to add build essentials to your image, even if your image will - be significantly bigger. - -.. code-block:: dockerfile - - FROM apache/airflow:2.0.1 - USER root - RUN apt-get update \ - && apt-get install -y --no-install-recommends \ - build-essential my-awesome-apt-dependency-to-add \ - && apt-get autoremove -yqq --purge \ - && apt-get clean \ - && rm -rf /var/lib/apt/lists/* - USER airflow - RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add - - -* You can also embed your dags in the image by simply adding them with COPY directive of Airflow. - The DAGs in production image are in /opt/airflow/dags folder. - -Customizing the image -..................... - -Customizing the image is an alternative way of adding your own dependencies to the image - better -suited to prepare optimized production images. - -The advantage of this method is that it produces optimized image even if you need some compile-time -dependencies that are not needed in the final image. You need to use Airflow Sources to build such images -from the `official distribution folder of Apache Airflow `_ for the -released versions, or checked out from the GitHub project if you happen to do it from git sources. - -The easiest way to build the image is to use ``breeze`` script, but you can also build such customized -image by running appropriately crafted docker build in which you specify all the ``build-args`` -that you need to add to customize it. You can read about all the args and ways you can build the image -in the `<#production-image-build-arguments>`_ chapter below. - -Here just a few examples are presented which should give you general understanding of what you can customize. - -This builds the production image in version 3.7 with additional airflow extras from 2.0.1 PyPI package and -additional apt dev and runtime dependencies. - -.. code-block:: bash - - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg AIRFLOW_VERSION="2.0.1" \ - --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" \ - --build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc" \ - --build-arg ADDITIONAL_PYTHON_DEPS="pandas" \ - --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" \ - --build-arg ADDITIONAL_RUNTIME_APT_DEPS="default-jre-headless" \ - --tag my-image - - -the same image can be built using ``breeze`` (it supports auto-completion of the options): - -.. code-block:: bash - - ./breeze build-image \ - --production-image --python 3.7 --install-airflow-version=2.0.1 \ - --additional-extras=jdbc --additional-python-deps="pandas" \ - --additional-dev-apt-deps="gcc g++" --additional-runtime-apt-deps="default-jre-headless" - - -You can customize more aspects of the image - such as additional commands executed before apt dependencies -are installed, or adding extra sources to install your dependencies from. You can see all the arguments -described below but here is an example of rather complex command to customize the image -based on example in `this comment `_: - -.. code-block:: bash - - docker build . -f Dockerfile \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg AIRFLOW_VERSION="2.0.1" \ - --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" \ - --build-arg ADDITIONAL_AIRFLOW_EXTRAS="slack" \ - --build-arg ADDITIONAL_PYTHON_DEPS="apache-airflow-backport-providers-odbc \ - apache-airflow-backport-providers-odbc \ - azure-storage-blob \ - sshtunnel \ - google-api-python-client \ - oauth2client \ - beautifulsoup4 \ - dateparser \ - rocketchat_API \ - typeform" \ - --build-arg ADDITIONAL_DEV_APT_DEPS="msodbcsql17 unixodbc-dev g++" \ - --build-arg ADDITIONAL_DEV_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | \ - apt-key add --no-tty - && \ - curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list" \ - --build-arg ADDITIONAL_DEV_ENV_VARS="ACCEPT_EULA=Y" \ - --build-arg ADDITIONAL_RUNTIME_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | \ - apt-key add --no-tty - && \ - curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list" \ - --build-arg ADDITIONAL_RUNTIME_APT_DEPS="msodbcsql17 unixodbc git procps vim" \ - --build-arg ADDITIONAL_RUNTIME_ENV_VARS="ACCEPT_EULA=Y" \ - --tag my-image - -Customizing images in high security restricted environments -........................................................... - -You can also make sure your image is only build using local constraint file and locally downloaded -wheel files. This is often useful in Enterprise environments where the binary files are verified and -vetted by the security teams. - -This builds below builds the production image in version 3.7 with packages and constraints used from the local -``docker-context-files`` rather than installed from PyPI or GitHub. It also disables MySQL client -installation as it is using external installation method. - -Note that as a prerequisite - you need to have downloaded wheel files. In the example below we -first download such constraint file locally and then use ``pip download`` to get the .whl files needed -but in most likely scenario, those wheel files should be copied from an internal repository of such .whl -files. Note that ``AIRFLOW_VERSION_SPECIFICATION`` is only there for reference, the apache airflow .whl file -in the right version is part of the .whl files downloaded. - -Note that 'pip download' will only works on Linux host as some of the packages need to be compiled from -sources and you cannot install them providing ``--platform`` switch. They also need to be downloaded using -the same python version as the target image. - -The ``pip download`` might happen in a separate environment. The files can be committed to a separate -binary repository and vetted/verified by the security team and used subsequently to build images -of Airflow when needed on an air-gaped system. - -Preparing the constraint files and wheel files: - -.. code-block:: bash - - rm docker-context-files/*.whl docker-context-files/*.txt - - curl -Lo "docker-context-files/constraints-2-0.txt" \ - https://raw.githubusercontent.com/apache/airflow/constraints-2-0/constraints-3.7.txt - - pip download --dest docker-context-files \ - --constraint docker-context-files/constraints-2-0.txt \ - apache-airflow[async,aws,azure,celery,dask,elasticsearch,gcp,kubernetes,mysql,postgres,redis,slack,ssh,statsd,virtualenv]==2.0.1 - -Since apache-airflow .whl packages are treated differently by the docker image, you need to rename the -downloaded apache-airflow* files, for example: - -.. code-block:: bash - - pushd docker-context-files - for file in apache?airflow* - do - mv ${file} _${file} - done - popd - -Building the image: - -.. code-block:: bash - - ./breeze build-image \ - --production-image --python 3.7 --install-airflow-version=2.0.1 \ - --disable-mysql-client-installation --disable-pip-cache --install-from-local-files-when-building \ - --constraints-location="/docker-context-files/constraints-2-0.txt" - -or - -.. code-block:: bash - - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg AIRFLOW_VERSION="2.0.1" \ - --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" \ - --build-arg INSTALL_MYSQL_CLIENT="false" \ - --build-arg AIRFLOW_PRE_CACHED_PIP_PACKAGES="false" \ - --build-arg INSTALL_FROM_DOCKER_CONTEXT_FILES="true" \ - --build-arg AIRFLOW_CONSTRAINTS_LOCATION="/docker-context-files/constraints-2-0.txt" - - -Customizing & extending the image together -.......................................... - -You can combine both - customizing & extending the image. You can build the image first using -``customize`` method (either with docker command or with ``breeze`` and then you can ``extend`` -the resulting image using ``FROM`` any dependencies you want. - -Customizing PYPI installation -............................. - -You can customize PYPI sources used during image build by adding a docker-context-files/.pypirc file -This .pypirc will never be committed to the repository and will not be present in the final production image. -It is added and used only in the build segment of the image so it is never copied to the final image. - -External sources for dependencies ---------------------------------- - -In corporate environments, there is often the need to build your Container images using -other than default sources of dependencies. The docker file uses standard sources (such as -Debian apt repositories or PyPI repository. However, in corporate environments, the dependencies -are often only possible to be installed from internal, vetted repositories that are reviewed and -approved by the internal security teams. In those cases, you might need to use those different -sources. - -This is rather easy if you extend the image - you simply write your extension commands -using the right sources - either by adding/replacing the sources in apt configuration or -specifying the source repository in pip install command. - -It's a bit more involved in the case of customizing the image. We do not have yet (but we are working -on it) a capability of changing the sources via build args. However, since the builds use -Dockerfile that is a source file, you can rather easily simply modify the file manually and -specify different sources to be used by either of the commands. - - -Comparing extending and customizing the image ---------------------------------------------- - -Here is the comparison of the two types of building images. - -+----------------------------------------------------+---------------------+-----------------------+ -| | Extending the image | Customizing the image | -+====================================================+=====================+=======================+ -| Produces optimized image | No | Yes | -+----------------------------------------------------+---------------------+-----------------------+ -| Use Airflow Dockerfile sources to build the image | No | Yes | -+----------------------------------------------------+---------------------+-----------------------+ -| Requires Airflow sources | No | Yes | -+----------------------------------------------------+---------------------+-----------------------+ -| You can build it with Breeze | No | Yes | -+----------------------------------------------------+---------------------+-----------------------+ -| Allows to use non-default sources for dependencies | Yes | No [1] | -+----------------------------------------------------+---------------------+-----------------------+ - -[1] When you combine customizing and extending the image, you can use external sources -in the "extend" part. There are plans to add functionality to add external sources -option to image customization. You can also modify Dockerfile manually if you want to -use non-default sources for dependencies. - -Using the production image --------------------------- - -The PROD image entrypoint works as follows: - -* In case the user is not "airflow" (with undefined user id) and the group id of the user is set to 0 (root), - then the user is dynamically added to /etc/passwd at entry using USER_NAME variable to define the user name. - This is in order to accommodate the - `OpenShift Guidelines `_ - -* The ``AIRFLOW_HOME`` is set by default to ``/opt/airflow/`` - this means that DAGs - are in default in the ``/opt/airflow/dags`` folder and logs are in the ``/opt/airflow/logs`` - -* The working directory is ``/opt/airflow`` by default. - -* If ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` variable is passed to the container and it is either mysql or postgres - SQL alchemy connection, then the connection is checked and the script waits until the database is reachable. - If ``AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD`` variable is passed to the container, it is evaluated as a - command to execute and result of this evaluation is used as ``AIRFLOW__CORE__SQL_ALCHEMY_CONN``. The - ``_CMD`` variable takes precedence over the ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` variable. - -* If no ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` variable is set then SQLite database is created in - ${AIRFLOW_HOME}/airflow.db and db reset is executed. - -* If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command - if you specify extra arguments. For example: - -.. code-block:: bash - - docker run -it apache/airflow:master-python3.6 bash -c "ls -la" - total 16 - drwxr-xr-x 4 airflow root 4096 Jun 5 18:12 . - drwxr-xr-x 1 root root 4096 Jun 5 18:12 .. - drwxr-xr-x 2 airflow root 4096 Jun 5 18:12 dags - drwxr-xr-x 2 airflow root 4096 Jun 5 18:12 logs - -* If first argument is equal to "python" - you are dropped in python shell or python commands are executed if - you pass extra parameters. For example: - -.. code-block:: bash - - > docker run -it apache/airflow:master-python3.6 python -c "print('test')" - test - -* If first argument equals to "airflow" - the rest of the arguments is treated as an airflow command - to execute. Example: - -.. code-block:: bash - - docker run -it apache/airflow:master-python3.6 airflow webserver - -* If there are any other arguments - they are simply passed to the "airflow" command - -.. code-block:: bash - - > docker run -it apache/airflow:master-python3.6 version - 2.1.0.dev0 - -* If ``AIRFLOW__CELERY__BROKER_URL`` variable is passed and airflow command with - scheduler, worker of flower command is used, then the script checks the broker connection - and waits until the Celery broker database is reachable. - If ``AIRFLOW__CELERY__BROKER_URL_CMD`` variable is passed to the container, it is evaluated as a - command to execute and result of this evaluation is used as ``AIRFLOW__CELERY__BROKER_URL``. The - ``_CMD`` variable takes precedence over the ``AIRFLOW__CELERY__BROKER_URL`` variable. - -Production image build arguments --------------------------------- - -The following build arguments (``--build-arg`` in docker build command) can be used for production images: - -+------------------------------------------+------------------------------------------+------------------------------------------+ -| Build argument | Default value | Description | -+==========================================+==========================================+==========================================+ -| ``PYTHON_BASE_IMAGE`` | ``python:3.6-slim-buster`` | Base python image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``PYTHON_MAJOR_MINOR_VERSION`` | ``3.6`` | major/minor version of Python (should | -| | | match base image). | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_VERSION`` | ``2.0.1.dev0`` | version of Airflow. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_REPO`` | ``apache/airflow`` | the repository from which PIP | -| | | dependencies are pre-installed. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_BRANCH`` | ``master`` | the branch from which PIP dependencies | -| | | are pre-installed initially. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_CONSTRAINTS_LOCATION`` | | If not empty, it will override the | -| | | source of the constraints with the | -| | | specified URL or file. Note that the | -| | | file has to be in docker context so | -| | | it's best to place such file in | -| | | one of the folders included in | -| | | .dockerignore. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_CONSTRAINTS_REFERENCE`` | ``constraints-master`` | Reference (branch or tag) from GitHub | -| | | where constraints file is taken from | -| | | It can be ``constraints-master`` but | -| | | also can be ``constraints-1-10`` for | -| | | 1.10.* installation. In case of building | -| | | specific version you want to point it | -| | | to specific tag, for example | -| | | ``constraints-1.10.14``. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``INSTALL_PROVIDERS_FROM_SOURCES`` | ``false`` | If set to ``true`` and image is built | -| | | from sources, all provider packages are | -| | | installed from sources rather than from | -| | | packages. It has no effect when | -| | | installing from PyPI or GitHub repo. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_EXTRAS`` | (see Dockerfile) | Default extras with which airflow is | -| | | installed. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``INSTALL_FROM_PYPI`` | ``true`` | If set to true, Airflow is installed | -| | | from PyPI. if you want to install | -| | | Airflow from self-build package | -| | | you can set it to false, put package in | -| | | ``docker-context-files`` and set | -| | | ``INSTALL_FROM_DOCKER_CONTEXT_FILES`` to | -| | | ``true``. For this you have to also keep | -| | | ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` flag | -| | | set to ``false``. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` | ``false`` | Allows to pre-cache airflow PIP packages | -| | | from the GitHub of Apache Airflow | -| | | This allows to optimize iterations for | -| | | Image builds and speeds up CI builds. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``INSTALL_FROM_DOCKER_CONTEXT_FILES`` | ``false`` | If set to true, Airflow, providers and | -| | | all dependencies are installed from | -| | | from locally built/downloaded | -| | | .whl and .tar.gz files placed in the | -| | | ``docker-context-files``. In certain | -| | | corporate environments, this is required | -| | | to install airflow from such pre-vetted | -| | | packages rather than from PyPI. For this | -| | | to work, also set ``INSTALL_FROM_PYPI``. | -| | | Note that packages starting with | -| | | ``apache?airflow`` glob are treated | -| | | differently than other packages. All | -| | | ``apache?airflow`` packages are | -| | | installed with dependencies limited by | -| | | airflow constraints. All other packages | -| | | are installed without dependencies | -| | | 'as-is'. If you wish to install airflow | -| | | via 'pip download' with all dependencies | -| | | downloaded, you have to rename the | -| | | apache airflow and provider packages to | -| | | not start with ``apache?airflow`` glob. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``UPGRADE_TO_NEWER_DEPENDENCIES`` | ``false`` | If set to true, the dependencies are | -| | | upgraded to newer versions matching | -| | | setup.py before installation. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``CONTINUE_ON_PIP_CHECK_FAILURE`` | ``false`` | By default the image build fails if pip | -| | | check fails for it. This is good for | -| | | interactive building but on CI the | -| | | image should be built regardless - we | -| | | have a separate step to verify image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_AIRFLOW_EXTRAS`` | | Optional additional extras with which | -| | | airflow is installed. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_PYTHON_DEPS`` | | Optional python packages to extend | -| | | the image with some extra dependencies. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``DEV_APT_COMMAND`` | (see Dockerfile) | Dev apt command executed before dev deps | -| | | are installed in the Build image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_DEV_APT_COMMAND`` | | Additional Dev apt command executed | -| | | before dev dep are installed | -| | | in the Build image. Should start with | -| | | ``&&``. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``DEV_APT_DEPS`` | (see Dockerfile) | Dev APT dependencies installed | -| | | in the Build image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_DEV_APT_DEPS`` | | Additional apt dev dependencies | -| | | installed in the Build image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_DEV_APT_ENV`` | | Additional env variables defined | -| | | when installing dev deps. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``RUNTIME_APT_COMMAND`` | (see Dockerfile) | Runtime apt command executed before deps | -| | | are installed in the Main image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_RUNTIME_APT_COMMAND`` | | Additional Runtime apt command executed | -| | | before runtime dep are installed | -| | | in the Main image. Should start with | -| | | ``&&``. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``RUNTIME_APT_DEPS`` | (see Dockerfile) | Runtime APT dependencies installed | -| | | in the Main image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_RUNTIME_APT_DEPS`` | | Additional apt runtime dependencies | -| | | installed in the Main image. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``ADDITIONAL_RUNTIME_APT_ENV`` | | Additional env variables defined | -| | | when installing runtime deps. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_HOME`` | ``/opt/airflow`` | Airflow’s HOME (that’s where logs and | -| | | SQLite databases are stored). | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_UID`` | ``50000`` | Airflow user UID. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_GID`` | ``50000`` | Airflow group GID. Note that most files | -| | | created on behalf of airflow user belong | -| | | to the ``root`` group (0) to keep | -| | | OpenShift Guidelines compatibility. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_USER_HOME_DIR`` | ``/home/airflow`` | Home directory of the Airflow user. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``CASS_DRIVER_BUILD_CONCURRENCY`` | ``8`` | Number of processors to use for | -| | | cassandra PIP install (speeds up | -| | | installing in case cassandra extra is | -| | | used). | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``INSTALL_MYSQL_CLIENT`` | ``true`` | Whether MySQL client should be installed | -| | | The mysql extra is removed from extras | -| | | if the client is not installed. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``AIRFLOW_PIP_VERSION`` | ``20.2.4`` | PIP version used. | -+------------------------------------------+------------------------------------------+------------------------------------------+ -| ``PIP_PROGRESS_BAR`` | ``on`` | Progress bar for PIP installation | -+------------------------------------------+------------------------------------------+------------------------------------------+ - -There are build arguments that determine the installation mechanism of Apache Airflow for the -production image. There are three types of build: - -* From local sources (by default for example when you use ``docker build .``) -* You can build the image from released PyPI airflow package (used to build the official Docker image) -* You can build the image from any version in GitHub repository(this is used mostly for system testing). - -+-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ -| Build argument | Default | What to specify | -+===================================+========================+===================================================================================+ -| ``AIRFLOW_INSTALLATION_METHOD`` | ``apache-airflow`` | Should point to the installation method of Apache Airflow. It can be | -| | | ``apache-airflow`` for installation from packages and URL to installation from | -| | | GitHub repository tag or branch or "." to install from sources. | -| | | Note that installing from local sources requires appropriate values of the | -| | | ``AIRFLOW_SOURCES_FROM`` and ``AIRFLOW_SOURCES_TO`` variables as described below. | -| | | Only used when ``INSTALL_FROM_PYPI`` is set to ``true``. | -+-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ -| ``AIRFLOW_VERSION_SPECIFICATION`` | | Optional - might be used for package installation of different Airflow version | -| | | for example"==2.0.1". For consistency, you should also set``AIRFLOW_VERSION`` | -| | | to the same value AIRFLOW_VERSION is resolved as label in the image created. | -+-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ -| ``AIRFLOW_CONSTRAINTS_REFERENCE`` | ``constraints-master`` | Reference (branch or tag) from GitHub where constraints file is taken from. | -| | | It can be ``constraints-master`` but also can be``constraints-1-10`` for | -| | | 1.10.* installations. In case of building specific version | -| | | you want to point it to specific tag, for example ``constraints-2.0.1`` | -+-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ -| ``AIRFLOW_WWW`` | ``www`` | In case of Airflow 2.0 it should be "www", in case of Airflow 1.10 | -| | | series it should be "www_rbac". | -+-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ -| ``AIRFLOW_SOURCES_FROM`` | ``empty`` | Sources of Airflow. Set it to "." when you install airflow from | -| | | local sources. | -+-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ -| ``AIRFLOW_SOURCES_TO`` | ``/empty`` | Target for Airflow sources. Set to "/opt/airflow" when | -| | | you want to install airflow from local sources. | -+-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ - -This builds production image in version 3.6 with default extras from the local sources (master version -of 2.0 currently): - -.. code-block:: bash - - docker build . - -This builds the production image in version 3.7 with default extras from 2.0.1 tag and -constraints taken from constraints-2-0 branch in GitHub. - -.. code-block:: bash - - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="https://github.com/apache/airflow/archive/2.0.1.tar.gz#egg=apache-airflow" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ - --build-arg AIRFLOW_BRANCH="v1-10-test" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" - -This builds the production image in version 3.7 with default extras from 2.0.1 PyPI package and -constraints taken from 2.0.1 tag in GitHub and pre-installed pip dependencies from the top -of v1-10-test branch. - -.. code-block:: bash - - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg AIRFLOW_VERSION="2.0.1" \ - --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ - --build-arg AIRFLOW_BRANCH="v1-10-test" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2.0.1" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" - -This builds the production image in version 3.7 with additional airflow extras from 2.0.1 PyPI package and -additional python dependencies and pre-installed pip dependencies from 2.0.1 tagged constraints. - -.. code-block:: bash - - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg AIRFLOW_VERSION="2.0.1" \ - --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ - --build-arg AIRFLOW_BRANCH="v1-10-test" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2.0.1" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" \ - --build-arg ADDITIONAL_AIRFLOW_EXTRAS="mssql,hdfs" \ - --build-arg ADDITIONAL_PYTHON_DEPS="sshtunnel oauth2client" - -This builds the production image in version 3.7 with additional airflow extras from 2.0.1 PyPI package and -additional apt dev and runtime dependencies. - -.. code-block:: bash - - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg AIRFLOW_VERSION="2.0.1" \ - --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" \ - --build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc" \ - --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" \ - --build-arg ADDITIONAL_RUNTIME_APT_DEPS="default-jre-headless" - - -Actions executed at image start -------------------------------- - -If you are using the default entrypoint of the production image, -there are a few actions that are automatically performed when the container starts. -In some cases, you can pass environment variables to the image to trigger some of that behaviour. - -The variables that control the "execution" behaviour start with ``_AIRFLOW`` to distinguish them -from the variables used to build the image starting with ``AIRFLOW``. - -Creating system user -.................... - -Airflow image is Open-Shift compatible, which means that you can start it with random user ID and group id 0. -Airflow will automatically create such a user and make it's home directory point to ``/home/airflow``. -You can read more about it in the "Support arbitrary user ids" chapter in the -`Openshift best practices `_. - -Waits for Airflow DB connection -............................... - -In case Postgres or MySQL DB is used, the entrypoint will wait until the airflow DB connection becomes -available. This happens always when you use the default entrypoint. - -The script detects backend type depending on the URL schema and assigns default port numbers if not specified -in the URL. Then it loops until the connection to the host/port specified can be established -It tries ``CONNECTION_CHECK_MAX_COUNT`` times and sleeps ``CONNECTION_CHECK_SLEEP_TIME`` between checks -To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``. - -Supported schemes: - -* ``postgres://`` - default port 5432 -* ``mysql://`` - default port 3306 -* ``sqlite://`` - -In case of SQLite backend, there is no connection to establish and waiting is skipped. - -Upgrading Airflow DB -.................... - -If you set ``_AIRFLOW_DB_UPGRADE`` variable to a non-empty value, the entrypoint will run -the ``airflow db upgrade`` command right after verifying the connection. You can also use this -when you are running airflow with internal SQLite database (default) to upgrade the db and create -admin users at entrypoint, so that you can start the webserver immediately. Note - using SQLite is -intended only for testing purpose, never use SQLite in production as it has severe limitations when it -comes to concurrency. - - -Creating admin user -................... - -The entrypoint can also create webserver user automatically when you enter it. you need to set -``_AIRFLOW_WWW_USER_CREATE`` to a non-empty value in order to do that. This is not intended for -production, it is only useful if you would like to run a quick test with the production image. -You need to pass at least password to create such user via ``_AIRFLOW_WWW_USER_PASSWORD_CMD`` or -``_AIRFLOW_WWW_USER_PASSWORD_CMD`` similarly like for other ``*_CMD`` variables, the content of -the ``*_CMD`` will be evaluated as shell command and it's output will be set as password. - -User creation will fail if none of the ``PASSWORD`` variables are set - there is no default for -password for security reasons. - -+-----------+--------------------------+----------------------------------------------------------------------+ -| Parameter | Default | Environment variable | -+===========+==========================+======================================================================+ -| username | admin | ``_AIRFLOW_WWW_USER_USERNAME`` | -+-----------+--------------------------+----------------------------------------------------------------------+ -| password | | ``_AIRFLOW_WWW_USER_PASSWORD_CMD`` or ``_AIRFLOW_WWW_USER_PASSWORD`` | -+-----------+--------------------------+----------------------------------------------------------------------+ -| firstname | Airflow | ``_AIRFLOW_WWW_USER_FIRSTNAME`` | -+-----------+--------------------------+----------------------------------------------------------------------+ -| lastname | Admin | ``_AIRFLOW_WWW_USER_LASTNAME`` | -+-----------+--------------------------+----------------------------------------------------------------------+ -| email | airflowadmin@example.com | ``_AIRFLOW_WWW_USER_EMAIL`` | -+-----------+--------------------------+----------------------------------------------------------------------+ -| role | Admin | ``_AIRFLOW_WWW_USER_ROLE`` | -+-----------+--------------------------+----------------------------------------------------------------------+ - -In case the password is specified, the user will be attempted to be created, but the entrypoint will -not fail if the attempt fails (this accounts for the case that the user is already created). - -You can, for example start the webserver in the production image with initializing the internal SQLite -database and creating an ``admin/admin`` Admin user with the following command: - -.. code-block:: bash - - docker run -it -p 8080:8080 \ - --env "_AIRFLOW_DB_UPGRADE=true" \ - --env "_AIRFLOW_WWW_USER_CREATE=true" \ - --env "_AIRFLOW_WWW_USER_PASSWORD=admin" \ - apache/airflow:master-python3.8 webserver - - -.. code-block:: bash - - docker run -it -p 8080:8080 \ - --env "_AIRFLOW_DB_UPGRADE=true" \ - --env "_AIRFLOW_WWW_USER_CREATE=true" \ - --env "_AIRFLOW_WWW_USER_PASSWORD_CMD=echo admin" \ - apache/airflow:master-python3.8 webserver - -The commands above perform initialization of the SQLite database, create admin user with admin password -and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver. - - -Waits for celery broker connection -.................................. - -In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower`` -commands are used the entrypoint will wait until the celery broker DB connection is available. - -The script detects backend type depending on the URL schema and assigns default port numbers if not specified -in the URL. Then it loops until connection to the host/port specified can be established -It tries ``CONNECTION_CHECK_MAX_COUNT`` times and sleeps ``CONNECTION_CHECK_SLEEP_TIME`` between checks. -To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``. - -Supported schemes: - -* ``amqp(s)://`` (rabbitmq) - default port 5672 -* ``redis://`` - default port 6379 -* ``postgres://`` - default port 5432 -* ``mysql://`` - default port 3306 -* ``sqlite://`` - -In case of SQLite backend, there is no connection to establish and waiting is skipped. - - -Recipes -------- - -Users sometimes share interesting ways of using the Docker images. We encourage users to contribute these -recipes to the documentation in case they prove useful to other members of the community by -submitting a pull request. The sections below capture this knowledge. - -Google Cloud SDK installation -............................. - -Some operators, such as :class:`airflow.providers.google.cloud.operators.kubernetes_engine.GKEStartPodOperator`, -:class:`airflow.providers.google.cloud.operators.dataflow.DataflowStartSqlJobOperator`, require -the installation of `Google Cloud SDK `__ (includes ``gcloud``). -You can also run these commands with BashOperator. - -Create a new Dockerfile like the one shown below. - -.. exampleinclude:: /docker-images-recipes/gcloud.Dockerfile - :language: dockerfile - -Then build a new image. - -.. code-block:: bash - - docker build . \ - --build-arg BASE_AIRFLOW_IMAGE="apache/airflow:2.0.1" \ - -t my-airflow-image - - -Apache Hadoop Stack installation -................................ - -Airflow is often used to run tasks on Hadoop cluster. It required Java Runtime Environment (JRE) to run. -Below are the steps to take tools that are frequently used in Hadoop-world: - -- Java Runtime Environment (JRE) -- Apache Hadoop -- Apache Hive -- `Cloud Storage connector for Apache Hadoop `__ - - -Create a new Dockerfile like the one shown below. - -.. exampleinclude:: /docker-images-recipes/hadoop.Dockerfile - :language: dockerfile - -Then build a new image. - -.. code-block:: bash - - docker build . \ - --build-arg BASE_AIRFLOW_IMAGE="apache/airflow:2.0.1" \ - -t my-airflow-image - -More details about the images ------------------------------ - -You can read more details about the images - the context, their parameters and internal structure in the -`IMAGES.rst `_ document. +We provide :doc:`a Docker Image (OCI) for Apache Airflow ` for use in a containerized environment. Consider using it to guarantee that software will always run the same no matter where it’s deployed. .. _production-deployment:kerberos: diff --git a/docs/apache-airflow/start/docker.rst b/docs/apache-airflow/start/docker.rst index 641192870507c..20f5846e926c9 100644 --- a/docs/apache-airflow/start/docker.rst +++ b/docs/apache-airflow/start/docker.rst @@ -195,7 +195,7 @@ To stop and delete containers, delete volumes with database data and download im Notes ===== -By default, the Docker Compose file uses the latest Airflow image (`apache/airflow `__). If you need, you can :ref:`customize and extend it `. +By default, the Docker Compose file uses the latest Airflow image (`apache/airflow `__). If you need, you can :doc:`customize and extend it `. What's Next? ============ diff --git a/docs/conf.py b/docs/conf.py index 45b1f297c6888..2beb64e65fd1f 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -145,7 +145,7 @@ 'providers_packages_ref', ] ) -elif PACKAGE_NAME == "helm-chart": +elif PACKAGE_NAME in ("helm-chart", "docker-stack"): # No extra extensions pass else: diff --git a/docs/docker-stack/build-arg-ref.rst b/docs/docker-stack/build-arg-ref.rst new file mode 100644 index 0000000000000..bc02d35ceebcc --- /dev/null +++ b/docs/docker-stack/build-arg-ref.rst @@ -0,0 +1,212 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Image build arguments reference +------------------------------- + +The following build arguments (``--build-arg`` in docker build command) can be used for production images: + ++------------------------------------------+------------------------------------------+------------------------------------------+ +| Build argument | Default value | Description | ++==========================================+==========================================+==========================================+ +| ``PYTHON_BASE_IMAGE`` | ``python:3.6-slim-buster`` | Base python image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``PYTHON_MAJOR_MINOR_VERSION`` | ``3.6`` | major/minor version of Python (should | +| | | match base image). | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_VERSION`` | ``2.0.1.dev0`` | version of Airflow. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_REPO`` | ``apache/airflow`` | the repository from which PIP | +| | | dependencies are pre-installed. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_BRANCH`` | ``master`` | the branch from which PIP dependencies | +| | | are pre-installed initially. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_CONSTRAINTS_LOCATION`` | | If not empty, it will override the | +| | | source of the constraints with the | +| | | specified URL or file. Note that the | +| | | file has to be in docker context so | +| | | it's best to place such file in | +| | | one of the folders included in | +| | | ``.dockerignore`` file. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_CONSTRAINTS_REFERENCE`` | ``constraints-master`` | Reference (branch or tag) from GitHub | +| | | where constraints file is taken from | +| | | It can be ``constraints-master`` but | +| | | also can be ``constraints-1-10`` for | +| | | 1.10.* installation. In case of building | +| | | specific version you want to point it | +| | | to specific tag, for example | +| | | ``constraints-1.10.14``. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``INSTALL_PROVIDERS_FROM_SOURCES`` | ``false`` | If set to ``true`` and image is built | +| | | from sources, all provider packages are | +| | | installed from sources rather than from | +| | | packages. It has no effect when | +| | | installing from PyPI or GitHub repo. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_EXTRAS`` | (see Dockerfile) | Default extras with which airflow is | +| | | installed. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``INSTALL_FROM_PYPI`` | ``true`` | If set to true, Airflow is installed | +| | | from PyPI. if you want to install | +| | | Airflow from self-build package | +| | | you can set it to false, put package in | +| | | ``docker-context-files`` and set | +| | | ``INSTALL_FROM_DOCKER_CONTEXT_FILES`` to | +| | | ``true``. For this you have to also keep | +| | | ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` flag | +| | | set to ``false``. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_PRE_CACHED_PIP_PACKAGES`` | ``false`` | Allows to pre-cache airflow PIP packages | +| | | from the GitHub of Apache Airflow | +| | | This allows to optimize iterations for | +| | | Image builds and speeds up CI builds. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``INSTALL_FROM_DOCKER_CONTEXT_FILES`` | ``false`` | If set to true, Airflow, providers and | +| | | all dependencies are installed from | +| | | from locally built/downloaded | +| | | .whl and .tar.gz files placed in the | +| | | ``docker-context-files``. In certain | +| | | corporate environments, this is required | +| | | to install airflow from such pre-vetted | +| | | packages rather than from PyPI. For this | +| | | to work, also set ``INSTALL_FROM_PYPI``. | +| | | Note that packages starting with | +| | | ``apache?airflow`` glob are treated | +| | | differently than other packages. All | +| | | ``apache?airflow`` packages are | +| | | installed with dependencies limited by | +| | | airflow constraints. All other packages | +| | | are installed without dependencies | +| | | 'as-is'. If you wish to install airflow | +| | | via 'pip download' with all dependencies | +| | | downloaded, you have to rename the | +| | | apache airflow and provider packages to | +| | | not start with ``apache?airflow`` glob. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``UPGRADE_TO_NEWER_DEPENDENCIES`` | ``false`` | If set to true, the dependencies are | +| | | upgraded to newer versions matching | +| | | setup.py before installation. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``CONTINUE_ON_PIP_CHECK_FAILURE`` | ``false`` | By default the image build fails if pip | +| | | check fails for it. This is good for | +| | | interactive building but on CI the | +| | | image should be built regardless - we | +| | | have a separate step to verify image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_AIRFLOW_EXTRAS`` | | Optional additional extras with which | +| | | airflow is installed. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_PYTHON_DEPS`` | | Optional python packages to extend | +| | | the image with some extra dependencies. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``DEV_APT_COMMAND`` | (see Dockerfile) | Dev apt command executed before dev deps | +| | | are installed in the Build image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_DEV_APT_COMMAND`` | | Additional Dev apt command executed | +| | | before dev dep are installed | +| | | in the Build image. Should start with | +| | | ``&&``. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``DEV_APT_DEPS`` | (see Dockerfile) | Dev APT dependencies installed | +| | | in the Build image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_DEV_APT_DEPS`` | | Additional apt dev dependencies | +| | | installed in the Build image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_DEV_APT_ENV`` | | Additional env variables defined | +| | | when installing dev deps. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``RUNTIME_APT_COMMAND`` | (see Dockerfile) | Runtime apt command executed before deps | +| | | are installed in the Main image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_RUNTIME_APT_COMMAND`` | | Additional Runtime apt command executed | +| | | before runtime dep are installed | +| | | in the Main image. Should start with | +| | | ``&&``. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``RUNTIME_APT_DEPS`` | (see Dockerfile) | Runtime APT dependencies installed | +| | | in the Main image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_RUNTIME_APT_DEPS`` | | Additional apt runtime dependencies | +| | | installed in the Main image. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_RUNTIME_APT_ENV`` | | Additional env variables defined | +| | | when installing runtime deps. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_HOME`` | ``/opt/airflow`` | Airflow’s HOME (that’s where logs and | +| | | SQLite databases are stored). | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_UID`` | ``50000`` | Airflow user UID. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_GID`` | ``50000`` | Airflow group GID. Note that most files | +| | | created on behalf of airflow user belong | +| | | to the ``root`` group (0) to keep | +| | | OpenShift Guidelines compatibility. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_USER_HOME_DIR`` | ``/home/airflow`` | Home directory of the Airflow user. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``CASS_DRIVER_BUILD_CONCURRENCY`` | ``8`` | Number of processors to use for | +| | | cassandra PIP install (speeds up | +| | | installing in case cassandra extra is | +| | | used). | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``INSTALL_MYSQL_CLIENT`` | ``true`` | Whether MySQL client should be installed | +| | | The mysql extra is removed from extras | +| | | if the client is not installed. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_PIP_VERSION`` | ``20.2.4`` | PIP version used. | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``PIP_PROGRESS_BAR`` | ``on`` | Progress bar for PIP installation | ++------------------------------------------+------------------------------------------+------------------------------------------+ + +There are build arguments that determine the installation mechanism of Apache Airflow for the +production image. There are three types of build: + +* From local sources (by default for example when you use ``docker build .``) +* You can build the image from released PyPI airflow package (used to build the official Docker image) +* You can build the image from any version in GitHub repository(this is used mostly for system testing). + ++-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ +| Build argument | Default | What to specify | ++===================================+========================+===================================================================================+ +| ``AIRFLOW_INSTALLATION_METHOD`` | ``apache-airflow`` | Should point to the installation method of Apache Airflow. It can be | +| | | ``apache-airflow`` for installation from packages and URL to installation from | +| | | GitHub repository tag or branch or "." to install from sources. | +| | | Note that installing from local sources requires appropriate values of the | +| | | ``AIRFLOW_SOURCES_FROM`` and ``AIRFLOW_SOURCES_TO`` variables as described below. | +| | | Only used when ``INSTALL_FROM_PYPI`` is set to ``true``. | ++-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ +| ``AIRFLOW_VERSION_SPECIFICATION`` | | Optional - might be used for package installation of different Airflow version | +| | | for example"==2.0.1". For consistency, you should also set``AIRFLOW_VERSION`` | +| | | to the same value AIRFLOW_VERSION is resolved as label in the image created. | ++-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ +| ``AIRFLOW_CONSTRAINTS_REFERENCE`` | ``constraints-master`` | Reference (branch or tag) from GitHub where constraints file is taken from. | +| | | It can be ``constraints-master`` but also can be``constraints-1-10`` for | +| | | 1.10.* installations. In case of building specific version | +| | | you want to point it to specific tag, for example ``constraints-2.0.1`` | ++-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ +| ``AIRFLOW_WWW`` | ``www`` | In case of Airflow 2.0 it should be "www", in case of Airflow 1.10 | +| | | series it should be "www_rbac". | ++-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ +| ``AIRFLOW_SOURCES_FROM`` | ``empty`` | Sources of Airflow. Set it to "." when you install airflow from | +| | | local sources. | ++-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ +| ``AIRFLOW_SOURCES_TO`` | ``/empty`` | Target for Airflow sources. Set to "/opt/airflow" when | +| | | you want to install airflow from local sources. | ++-----------------------------------+------------------------+-----------------------------------------------------------------------------------+ diff --git a/docs/docker-stack/build.rst b/docs/docker-stack/build.rst new file mode 100644 index 0000000000000..8613c195370e4 --- /dev/null +++ b/docs/docker-stack/build.rst @@ -0,0 +1,380 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Building the image +================== + +Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the +way we do, you might want to know very quickly how you can extend or customize the existing image +for Apache Airflow. This chapter gives you a short answer to those questions. + +Airflow Summit 2020's `Production Docker Image `_ talk provides more +details about the context, architecture and customization/extension methods for the Production Image. + +Extending the image +------------------- + +Extending the image is easiest if you just need to add some dependencies that do not require +compiling. The compilation framework of Linux (so called ``build-essential``) is pretty big, and +for the production images, size is really important factor to optimize for, so our Production Image +does not contain ``build-essential``. If you need compiler like gcc or g++ or make/cmake etc. - those +are not found in the image and it is recommended that you follow the "customize" route instead. + +How to extend the image - it is something you are most likely familiar with - simply +build a new image using Dockerfile's ``FROM`` directive and add whatever you need. Then you can add your +Debian dependencies with ``apt`` or PyPI dependencies with ``pip install`` or any other stuff you need. + +You should be aware, about a few things: + +* The production image of airflow uses "airflow" user, so if you want to add some of the tools + as ``root`` user, you need to switch to it with ``USER`` directive of the Dockerfile. Also you + should remember about following the + `best practises of Dockerfiles `_ + to make sure your image is lean and small. + + .. code-block:: dockerfile + + FROM apache/airflow:2.0.1 + USER root + RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + my-awesome-apt-dependency-to-add \ + && apt-get autoremove -yqq --purge \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + USER airflow + + +* PyPI dependencies in Apache Airflow are installed in the user library, of the "airflow" user, so + you need to install them with the ``--user`` flag and WITHOUT switching to airflow user. Note also + that using --no-cache-dir is a good idea that can help to make your image smaller. + + .. code-block:: dockerfile + + FROM apache/airflow:2.0.1 + RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add + +* As of 2.0.1 image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment variable + to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. + + +* If your apt, or PyPI dependencies require some of the build-essentials, then your best choice is + to follow the "Customize the image" route. However it requires to checkout sources of Apache Airflow, + so you might still want to choose to add build essentials to your image, even if your image will + be significantly bigger. + + .. code-block:: dockerfile + + FROM apache/airflow:2.0.1 + USER root + RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + build-essential my-awesome-apt-dependency-to-add \ + && apt-get autoremove -yqq --purge \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + USER airflow + RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add + +* You can also embed your dags in the image by simply adding them with COPY directive of Airflow. + The DAGs in production image are in ``/opt/airflow/dags`` folder. + +Customizing the image +--------------------- + +Customizing the image is an alternative way of adding your own dependencies to the image - better +suited to prepare optimized production images. + +The advantage of this method is that it produces optimized image even if you need some compile-time +dependencies that are not needed in the final image. You need to use Airflow Sources to build such images +from the `official distribution folder of Apache Airflow `_ for the +released versions, or checked out from the GitHub project if you happen to do it from git sources. + +The easiest way to build the image is to use ``breeze`` script, but you can also build such customized +image by running appropriately crafted docker build in which you specify all the ``build-args`` +that you need to add to customize it. You can read about all the args and ways you can build the image +in :doc:`build-arg-ref`. + +Here just a few examples are presented which should give you general understanding of what you can customize. + +This builds production image in version 3.6 with default extras from the local sources (master version +of 2.0 currently): + +.. code-block:: bash + + docker build . + +This builds the production image in version 3.7 with default extras from 2.0.1 tag and +constraints taken from constraints-2-0 branch in GitHub. + +.. code-block:: bash + + docker build . \ + --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ + --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ + --build-arg AIRFLOW_INSTALLATION_METHOD="https://github.com/apache/airflow/archive/2.0.1.tar.gz#egg=apache-airflow" \ + --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ + --build-arg AIRFLOW_BRANCH="v1-10-test" \ + --build-arg AIRFLOW_SOURCES_FROM="empty" \ + --build-arg AIRFLOW_SOURCES_TO="/empty" + +This builds the production image in version 3.7 with default extras from 2.0.1 PyPI package and +constraints taken from 2.0.1 tag in GitHub and pre-installed pip dependencies from the top +of ``v1-10-test`` branch. + +.. code-block:: bash + + docker build . \ + --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ + --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ + --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ + --build-arg AIRFLOW_VERSION="2.0.1" \ + --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ + --build-arg AIRFLOW_BRANCH="v1-10-test" \ + --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2.0.1" \ + --build-arg AIRFLOW_SOURCES_FROM="empty" \ + --build-arg AIRFLOW_SOURCES_TO="/empty" + +This builds the production image in version 3.7 with additional airflow extras from 2.0.1 PyPI package and +additional python dependencies and pre-installed pip dependencies from 2.0.1 tagged constraints. + +.. code-block:: bash + + docker build . \ + --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ + --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ + --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ + --build-arg AIRFLOW_VERSION="2.0.1" \ + --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ + --build-arg AIRFLOW_BRANCH="v1-10-test" \ + --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2.0.1" \ + --build-arg AIRFLOW_SOURCES_FROM="empty" \ + --build-arg AIRFLOW_SOURCES_TO="/empty" \ + --build-arg ADDITIONAL_AIRFLOW_EXTRAS="mssql,hdfs" \ + --build-arg ADDITIONAL_PYTHON_DEPS="sshtunnel oauth2client" + +This builds the production image in version 3.7 with additional airflow extras from 2.0.1 PyPI package and +additional apt dev and runtime dependencies. + +.. code-block:: bash + + docker build . \ + --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ + --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ + --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ + --build-arg AIRFLOW_VERSION="2.0.1" \ + --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ + --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ + --build-arg AIRFLOW_SOURCES_FROM="empty" \ + --build-arg AIRFLOW_SOURCES_TO="/empty" \ + --build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc" \ + --build-arg ADDITIONAL_PYTHON_DEPS="pandas" \ + --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" \ + --build-arg ADDITIONAL_RUNTIME_APT_DEPS="default-jre-headless" \ + --tag my-image + + +The same image can be built using ``breeze`` (it supports auto-completion of the options): + +.. code-block:: bash + + ./breeze build-image \ + --production-image --python 3.7 --install-airflow-version=2.0.1 \ + --additional-extras=jdbc --additional-python-deps="pandas" \ + --additional-dev-apt-deps="gcc g++" --additional-runtime-apt-deps="default-jre-headless" + + +You can customize more aspects of the image - such as additional commands executed before apt dependencies +are installed, or adding extra sources to install your dependencies from. You can see all the arguments +described below but here is an example of rather complex command to customize the image +based on example in `this comment `_: + +.. code-block:: bash + + docker build . -f Dockerfile \ + --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ + --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ + --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ + --build-arg AIRFLOW_VERSION="2.0.1" \ + --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ + --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ + --build-arg AIRFLOW_SOURCES_FROM="empty" \ + --build-arg AIRFLOW_SOURCES_TO="/empty" \ + --build-arg ADDITIONAL_AIRFLOW_EXTRAS="slack" \ + --build-arg ADDITIONAL_PYTHON_DEPS="apache-airflow-backport-providers-odbc \ + apache-airflow-backport-providers-odbc \ + azure-storage-blob \ + sshtunnel \ + google-api-python-client \ + oauth2client \ + beautifulsoup4 \ + dateparser \ + rocketchat_API \ + typeform" \ + --build-arg ADDITIONAL_DEV_APT_DEPS="msodbcsql17 unixodbc-dev g++" \ + --build-arg ADDITIONAL_DEV_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | \ + apt-key add --no-tty - && \ + curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list" \ + --build-arg ADDITIONAL_DEV_ENV_VARS="ACCEPT_EULA=Y" \ + --build-arg ADDITIONAL_RUNTIME_APT_COMMAND="curl https://packages.microsoft.com/keys/microsoft.asc | \ + apt-key add --no-tty - && \ + curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list" \ + --build-arg ADDITIONAL_RUNTIME_APT_DEPS="msodbcsql17 unixodbc git procps vim" \ + --build-arg ADDITIONAL_RUNTIME_ENV_VARS="ACCEPT_EULA=Y" \ + --tag my-image + +Customizing images in high security restricted environments +........................................................... + +You can also make sure your image is only build using local constraint file and locally downloaded +wheel files. This is often useful in Enterprise environments where the binary files are verified and +vetted by the security teams. + +This builds below builds the production image in version 3.7 with packages and constraints used from the local +``docker-context-files`` rather than installed from PyPI or GitHub. It also disables MySQL client +installation as it is using external installation method. + +Note that as a prerequisite - you need to have downloaded wheel files. In the example below we +first download such constraint file locally and then use ``pip download`` to get the ``.whl`` files needed +but in most likely scenario, those wheel files should be copied from an internal repository of such .whl +files. Note that ``AIRFLOW_VERSION_SPECIFICATION`` is only there for reference, the apache airflow ``.whl`` file +in the right version is part of the ``.whl`` files downloaded. + +Note that 'pip download' will only works on Linux host as some of the packages need to be compiled from +sources and you cannot install them providing ``--platform`` switch. They also need to be downloaded using +the same python version as the target image. + +The ``pip download`` might happen in a separate environment. The files can be committed to a separate +binary repository and vetted/verified by the security team and used subsequently to build images +of Airflow when needed on an air-gaped system. + +Preparing the constraint files and wheel files: + +.. code-block:: bash + + rm docker-context-files/*.whl docker-context-files/*.txt + + curl -Lo "docker-context-files/constraints-2-0.txt" \ + https://raw.githubusercontent.com/apache/airflow/constraints-2-0/constraints-3.7.txt + + pip download --dest docker-context-files \ + --constraint docker-context-files/constraints-2-0.txt \ + apache-airflow[async,aws,azure,celery,dask,elasticsearch,gcp,kubernetes,mysql,postgres,redis,slack,ssh,statsd,virtualenv]==2.0.1 + +Since apache-airflow .whl packages are treated differently by the docker image, you need to rename the +downloaded apache-airflow* files, for example: + +.. code-block:: bash + + pushd docker-context-files + for file in apache?airflow* + do + mv ${file} _${file} + done + popd + +Building the image: + +.. code-block:: bash + + ./breeze build-image \ + --production-image --python 3.7 --install-airflow-version=2.0.1 \ + --disable-mysql-client-installation --disable-pip-cache --install-from-local-files-when-building \ + --constraints-location="/docker-context-files/constraints-2-0.txt" + +or + +.. code-block:: bash + + docker build . \ + --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ + --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ + --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ + --build-arg AIRFLOW_VERSION="2.0.1" \ + --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ + --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ + --build-arg AIRFLOW_SOURCES_FROM="empty" \ + --build-arg AIRFLOW_SOURCES_TO="/empty" \ + --build-arg INSTALL_MYSQL_CLIENT="false" \ + --build-arg AIRFLOW_PRE_CACHED_PIP_PACKAGES="false" \ + --build-arg INSTALL_FROM_DOCKER_CONTEXT_FILES="true" \ + --build-arg AIRFLOW_CONSTRAINTS_LOCATION="/docker-context-files/constraints-2-0.txt" + + +Customizing & extending the image together +.......................................... + +You can combine both - customizing & extending the image. You can build the image first using +``customize`` method (either with docker command or with ``breeze`` and then you can ``extend`` +the resulting image using ``FROM`` any dependencies you want. + +Customizing PYPI installation +............................. + +You can customize PYPI sources used during image build by adding a ``docker-context-files``/``.pypirc`` file +This ``.pypirc`` will never be committed to the repository and will not be present in the final production image. +It is added and used only in the build segment of the image so it is never copied to the final image. + +External sources for dependencies +................................. + +In corporate environments, there is often the need to build your Container images using +other than default sources of dependencies. The docker file uses standard sources (such as +Debian apt repositories or PyPI repository. However, in corporate environments, the dependencies +are often only possible to be installed from internal, vetted repositories that are reviewed and +approved by the internal security teams. In those cases, you might need to use those different +sources. + +This is rather easy if you extend the image - you simply write your extension commands +using the right sources - either by adding/replacing the sources in apt configuration or +specifying the source repository in pip install command. + +It's a bit more involved in the case of customizing the image. We do not have yet (but we are working +on it) a capability of changing the sources via build args. However, since the builds use +Dockerfile that is a source file, you can rather easily simply modify the file manually and +specify different sources to be used by either of the commands. + + +Comparing extending and customizing the image +--------------------------------------------- + +Here is the comparison of the two types of building images. + ++----------------------------------------------------+---------------------+-----------------------+ +| | Extending the image | Customizing the image | ++====================================================+=====================+=======================+ +| Produces optimized image | No | Yes | ++----------------------------------------------------+---------------------+-----------------------+ +| Use Airflow Dockerfile sources to build the image | No | Yes | ++----------------------------------------------------+---------------------+-----------------------+ +| Requires Airflow sources | No | Yes | ++----------------------------------------------------+---------------------+-----------------------+ +| You can build it with Breeze | No | Yes | ++----------------------------------------------------+---------------------+-----------------------+ +| Allows to use non-default sources for dependencies | Yes | No [1] | ++----------------------------------------------------+---------------------+-----------------------+ + +[1] When you combine customizing and extending the image, you can use external sources +in the "extend" part. There are plans to add functionality to add external sources +option to image customization. You can also modify Dockerfile manually if you want to +use non-default sources for dependencies. + +More details about the images +----------------------------- + +You can read more details about the images - the context, their parameters and internal structure in the +`IMAGES.rst `_ document. diff --git a/docs/apache-airflow/docker-images-recipes/gcloud.Dockerfile b/docs/docker-stack/docker-images-recipes/gcloud.Dockerfile similarity index 100% rename from docs/apache-airflow/docker-images-recipes/gcloud.Dockerfile rename to docs/docker-stack/docker-images-recipes/gcloud.Dockerfile diff --git a/docs/apache-airflow/docker-images-recipes/hadoop.Dockerfile b/docs/docker-stack/docker-images-recipes/hadoop.Dockerfile similarity index 100% rename from docs/apache-airflow/docker-images-recipes/hadoop.Dockerfile rename to docs/docker-stack/docker-images-recipes/hadoop.Dockerfile diff --git a/docs/docker-stack/entrypoint.rst b/docs/docker-stack/entrypoint.rst new file mode 100644 index 0000000000000..a7889c429787b --- /dev/null +++ b/docs/docker-stack/entrypoint.rst @@ -0,0 +1,201 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Entrypoint +========== + +If you are using the default entrypoint of the production image, +there are a few actions that are automatically performed when the container starts. +In some cases, you can pass environment variables to the image to trigger some of that behaviour. + +The variables that control the "execution" behaviour start with ``_AIRFLOW`` to distinguish them +from the variables used to build the image starting with ``AIRFLOW``. + +The image entrypoint works as follows: + +* In case the user is not "airflow" (with undefined user id) and the group id of the user is set to ``0`` (root), + then the user is dynamically added to ``/etc/passwd`` at entry using ``USER_NAME`` variable to define the user name. + This is in order to accommodate the + `OpenShift Guidelines `_ + +* The ``AIRFLOW_HOME`` is set by default to ``/opt/airflow/`` - this means that DAGs + are in default in the ``/opt/airflow/dags`` folder and logs are in the ``/opt/airflow/logs`` + +* The working directory is ``/opt/airflow`` by default. + +* If ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` variable is passed to the container and it is either mysql or postgres + SQL alchemy connection, then the connection is checked and the script waits until the database is reachable. + If ``AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD`` variable is passed to the container, it is evaluated as a + command to execute and result of this evaluation is used as ``AIRFLOW__CORE__SQL_ALCHEMY_CONN``. The + ``_CMD`` variable takes precedence over the ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` variable. + +* If no ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` variable is set then SQLite database is created in + ``${AIRFLOW_HOME}/airflow.db`` and db reset is executed. + +* If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command + if you specify extra arguments. For example: + + .. code-block:: bash + + docker run -it apache/airflow:master-python3.6 bash -c "ls -la" + total 16 + drwxr-xr-x 4 airflow root 4096 Jun 5 18:12 . + drwxr-xr-x 1 root root 4096 Jun 5 18:12 .. + drwxr-xr-x 2 airflow root 4096 Jun 5 18:12 dags + drwxr-xr-x 2 airflow root 4096 Jun 5 18:12 logs + +* If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if + you pass extra parameters. For example: + + .. code-block:: bash + + > docker run -it apache/airflow:master-python3.6 python -c "print('test')" + test + +* If first argument equals to "airflow" - the rest of the arguments is treated as an airflow command + to execute. Example: + + .. code-block:: bash + + docker run -it apache/airflow:master-python3.6 airflow webserver + +* If there are any other arguments - they are simply passed to the "airflow" command + + .. code-block:: bash + + > docker run -it apache/airflow:master-python3.6 version + 2.1.0.dev0 + +* If ``AIRFLOW__CELERY__BROKER_URL`` variable is passed and airflow command with + scheduler, worker of flower command is used, then the script checks the broker connection + and waits until the Celery broker database is reachable. + If ``AIRFLOW__CELERY__BROKER_URL_CMD`` variable is passed to the container, it is evaluated as a + command to execute and result of this evaluation is used as ``AIRFLOW__CELERY__BROKER_URL``. The + ``_CMD`` variable takes precedence over the ``AIRFLOW__CELERY__BROKER_URL`` variable. + +Creating system user +-------------------- + +Airflow image is Open-Shift compatible, which means that you can start it with random user ID and group id 0. +Airflow will automatically create such a user and make it's home directory point to ``/home/airflow``. +You can read more about it in the "Support arbitrary user ids" chapter in the +`Openshift best practices `_. + +Waits for Airflow DB connection +------------------------------- + +In case Postgres or MySQL DB is used, the entrypoint will wait until the airflow DB connection becomes +available. This happens always when you use the default entrypoint. + +The script detects backend type depending on the URL schema and assigns default port numbers if not specified +in the URL. Then it loops until the connection to the host/port specified can be established +It tries ``CONNECTION_CHECK_MAX_COUNT`` times and sleeps ``CONNECTION_CHECK_SLEEP_TIME`` between checks +To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``. + +Supported schemes: + +* ``postgres://`` - default port 5432 +* ``mysql://`` - default port 3306 +* ``sqlite://`` + +In case of SQLite backend, there is no connection to establish and waiting is skipped. + +Upgrading Airflow DB +-------------------- + +If you set ``_AIRFLOW_DB_UPGRADE`` variable to a non-empty value, the entrypoint will run +the ``airflow db upgrade`` command right after verifying the connection. You can also use this +when you are running airflow with internal SQLite database (default) to upgrade the db and create +admin users at entrypoint, so that you can start the webserver immediately. Note - using SQLite is +intended only for testing purpose, never use SQLite in production as it has severe limitations when it +comes to concurrency. + +Creating admin user +------------------- + +The entrypoint can also create webserver user automatically when you enter it. you need to set +``_AIRFLOW_WWW_USER_CREATE`` to a non-empty value in order to do that. This is not intended for +production, it is only useful if you would like to run a quick test with the production image. +You need to pass at least password to create such user via ``_AIRFLOW_WWW_USER_PASSWORD_CMD`` or +``_AIRFLOW_WWW_USER_PASSWORD_CMD`` similarly like for other ``*_CMD`` variables, the content of +the ``*_CMD`` will be evaluated as shell command and it's output will be set as password. + +User creation will fail if none of the ``PASSWORD`` variables are set - there is no default for +password for security reasons. + ++-----------+--------------------------+----------------------------------------------------------------------+ +| Parameter | Default | Environment variable | ++===========+==========================+======================================================================+ +| username | admin | ``_AIRFLOW_WWW_USER_USERNAME`` | ++-----------+--------------------------+----------------------------------------------------------------------+ +| password | | ``_AIRFLOW_WWW_USER_PASSWORD_CMD`` or ``_AIRFLOW_WWW_USER_PASSWORD`` | ++-----------+--------------------------+----------------------------------------------------------------------+ +| firstname | Airflow | ``_AIRFLOW_WWW_USER_FIRSTNAME`` | ++-----------+--------------------------+----------------------------------------------------------------------+ +| lastname | Admin | ``_AIRFLOW_WWW_USER_LASTNAME`` | ++-----------+--------------------------+----------------------------------------------------------------------+ +| email | airflowadmin@example.com | ``_AIRFLOW_WWW_USER_EMAIL`` | ++-----------+--------------------------+----------------------------------------------------------------------+ +| role | Admin | ``_AIRFLOW_WWW_USER_ROLE`` | ++-----------+--------------------------+----------------------------------------------------------------------+ + +In case the password is specified, the user will be attempted to be created, but the entrypoint will +not fail if the attempt fails (this accounts for the case that the user is already created). + +You can, for example start the webserver in the production image with initializing the internal SQLite +database and creating an ``admin/admin`` Admin user with the following command: + +.. code-block:: bash + + docker run -it -p 8080:8080 \ + --env "_AIRFLOW_DB_UPGRADE=true" \ + --env "_AIRFLOW_WWW_USER_CREATE=true" \ + --env "_AIRFLOW_WWW_USER_PASSWORD=admin" \ + apache/airflow:master-python3.8 webserver + + +.. code-block:: bash + + docker run -it -p 8080:8080 \ + --env "_AIRFLOW_DB_UPGRADE=true" \ + --env "_AIRFLOW_WWW_USER_CREATE=true" \ + --env "_AIRFLOW_WWW_USER_PASSWORD_CMD=echo admin" \ + apache/airflow:master-python3.8 webserver + +The commands above perform initialization of the SQLite database, create admin user with admin password +and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver. + +Waits for celery broker connection +---------------------------------- + +In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower`` +commands are used the entrypoint will wait until the celery broker DB connection is available. + +The script detects backend type depending on the URL schema and assigns default port numbers if not specified +in the URL. Then it loops until connection to the host/port specified can be established +It tries ``CONNECTION_CHECK_MAX_COUNT`` times and sleeps ``CONNECTION_CHECK_SLEEP_TIME`` between checks. +To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``. + +Supported schemes: + +* ``amqp(s)://`` (rabbitmq) - default port 5672 +* ``redis://`` - default port 6379 +* ``postgres://`` - default port 5432 +* ``mysql://`` - default port 3306 +* ``sqlite://`` + +In case of SQLite backend, there is no connection to establish and waiting is skipped. diff --git a/docs/docker-stack/img/docker-logo.png b/docs/docker-stack/img/docker-logo.png new file mode 100644 index 0000000000000..d83e54a7e9ce5 Binary files /dev/null and b/docs/docker-stack/img/docker-logo.png differ diff --git a/docs/docker-stack/index.rst b/docs/docker-stack/index.rst new file mode 100644 index 0000000000000..29a7daf1b4d69 --- /dev/null +++ b/docs/docker-stack/index.rst @@ -0,0 +1,54 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +.. image:: /img/docker-logo.png + :width: 100 + +Docker Image for Apache Airflow +=============================== + +.. toctree:: + :hidden: + + Home + build + entrypoint + recipes + +.. toctree:: + :hidden: + :caption: References + + build-arg-ref + +For the ease of deployment in production, the community releases a production-ready reference container +image. + +The docker image provided (as convenience binary package) in the +`apache/airflow DockerHub `_ is a bare image +that has a few external dependencies and extras installed.. + +The Apache Airflow image provided as convenience package is optimized for size, so +it provides just a bare minimal set of the extras and dependencies installed and in most cases +you want to either extend or customize the image. You can see all possible extras in +:doc:`extra-packages-ref`. The set of extras used in Airflow Production image are available in the +`Dockerfile `_. + +The production images are build in DockerHub from released version and release candidates. There +are also images published from branches but they are used mainly for development and testing purpose. +See `Airflow Git Branching `_ +for details. diff --git a/docs/docker-stack/recipes.rst b/docs/docker-stack/recipes.rst new file mode 100644 index 0000000000000..8b89a3ef1bae8 --- /dev/null +++ b/docs/docker-stack/recipes.rst @@ -0,0 +1,70 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Recipes +======= + +Users sometimes share interesting ways of using the Docker images. We encourage users to contribute these +recipes to the documentation in case they prove useful to other members of the community by +submitting a pull request. The sections below capture this knowledge. + +Google Cloud SDK installation +----------------------------- + +Some operators, such as :class:`~airflow.providers.google.cloud.operators.kubernetes_engine.GKEStartPodOperator`, +:class:`~airflow.providers.google.cloud.operators.dataflow.DataflowStartSqlJobOperator`, require +the installation of `Google Cloud SDK `__ (includes ``gcloud``). +You can also run these commands with BashOperator. + +Create a new Dockerfile like the one shown below. + +.. exampleinclude:: /docker-images-recipes/gcloud.Dockerfile + :language: dockerfile + +Then build a new image. + +.. code-block:: bash + + docker build . \ + --build-arg BASE_AIRFLOW_IMAGE="apache/airflow:2.0.1" \ + -t my-airflow-image + + +Apache Hadoop Stack installation +-------------------------------- + +Airflow is often used to run tasks on Hadoop cluster. It required Java Runtime Environment (JRE) to run. +Below are the steps to take tools that are frequently used in Hadoop-world: + +- Java Runtime Environment (JRE) +- Apache Hadoop +- Apache Hive +- `Cloud Storage connector for Apache Hadoop `__ + + +Create a new Dockerfile like the one shown below. + +.. exampleinclude:: /docker-images-recipes/hadoop.Dockerfile + :language: dockerfile + +Then build a new image. + +.. code-block:: bash + + docker build . \ + --build-arg BASE_AIRFLOW_IMAGE="apache/airflow:2.0.1" \ + -t my-airflow-image diff --git a/docs/exts/airflow_intersphinx.py b/docs/exts/airflow_intersphinx.py index 2f80fbb76b626..135b8d91a7f58 100644 --- a/docs/exts/airflow_intersphinx.py +++ b/docs/exts/airflow_intersphinx.py @@ -71,14 +71,15 @@ def _generate_provider_intersphinx_mapping(): f'/docs/apache-airflow/{current_version}/', (doc_inventory if os.path.exists(doc_inventory) else cache_inventory,), ) - - if os.environ.get('AIRFLOW_PACKAGE_NAME') != 'apache-airflow-providers': - doc_inventory = f'{DOCS_DIR}/_build/docs/apache-airflow-providers/objects.inv' - cache_inventory = f'{DOCS_DIR}/_inventory_cache/apache-airflow-providers/objects.inv' + for pkg_name in ['apache-airflow-providers', 'docker-stack']: + if os.environ.get('AIRFLOW_PACKAGE_NAME') == pkg_name: + continue + doc_inventory = f'{DOCS_DIR}/_build/docs/{pkg_name}/objects.inv' + cache_inventory = f'{DOCS_DIR}/_inventory_cache/{pkg_name}/objects.inv' airflow_mapping['apache-airflow-providers'] = ( # base URI - '/docs/apache-airflow-providers/', + f'/docs/{pkg_name}/', (doc_inventory if os.path.exists(doc_inventory) else cache_inventory,), ) diff --git a/docs/exts/docs_build/dev_index_template.html.jinja2 b/docs/exts/docs_build/dev_index_template.html.jinja2 index 8ab8183039fe5..e6d1d09c10aec 100644 --- a/docs/exts/docs_build/dev_index_template.html.jinja2 +++ b/docs/exts/docs_build/dev_index_template.html.jinja2 @@ -68,6 +68,17 @@ +
+
+ Docker - logo +
+
+

Docker image

+

+ It makes efficient, lightweight, self-contained environment and guarantees that software will always run the same no matter of where it’s deployed. +

+
+
Helm Chart - logo @@ -78,7 +89,6 @@ It will help you set up your own Airflow on a cloud/on-prem k8s environment and leverage its scalable nature to support a large group of users. Thanks to Kubernetes, we are not tied to a specific cloud provider.

-
diff --git a/docs/exts/docs_build/docs_builder.py b/docs/exts/docs_build/docs_builder.py index 42e9ad92d642e..4035c97cf9d15 100644 --- a/docs/exts/docs_build/docs_builder.py +++ b/docs/exts/docs_build/docs_builder.py @@ -54,9 +54,9 @@ def _doctree_dir(self) -> str: @property def is_versioned(self): """Is current documentation package versioned?""" - # Disable versioning. This documentation does not apply to any issued product and we can update + # Disable versioning. This documentation does not apply to any released product and we can update # it as needed, i.e. with each new package of providers. - return self.package_name != 'apache-airflow-providers' + return self.package_name not in ('apache-airflow-providers', 'docker-stack') @property def _build_dir(self) -> str: @@ -241,4 +241,10 @@ def get_available_providers_packages(): def get_available_packages(): """Get list of all available packages to build.""" provider_package_names = get_available_providers_packages() - return ["apache-airflow", *provider_package_names, "apache-airflow-providers", "helm-chart"] + return [ + "apache-airflow", + *provider_package_names, + "apache-airflow-providers", + "helm-chart", + "docker-stack", + ] diff --git a/docs/exts/docs_build/fetch_inventories.py b/docs/exts/docs_build/fetch_inventories.py index e9da26442bd54..5af2f69bbb699 100644 --- a/docs/exts/docs_build/fetch_inventories.py +++ b/docs/exts/docs_build/fetch_inventories.py @@ -80,12 +80,13 @@ def fetch_inventories(): f'{CACHE_DIR}/apache-airflow/objects.inv', ) ) - to_download.append( - ( - S3_DOC_URL_NON_VERSIONED.format(package_name='apache-airflow-providers'), - f'{CACHE_DIR}/apache-airflow-providers/objects.inv', + for pkg_name in ['apache-airflow-providers', 'docker-stack']: + to_download.append( + ( + S3_DOC_URL_NON_VERSIONED.format(package_name=pkg_name), + f'{CACHE_DIR}/{pkg_name}/objects.inv', + ) ) - ) to_download.extend( ( f"{doc_url}/objects.inv",