DocsDeveloper GuidePython, R, Node.js and Shell ScriptsBuilding a custom Docker image

Building a custom Docker image

This page shows how to build a custom Docker image for your script tasks.

You can bake all dependencies needed for your script tasks directly into the Kestra's base image. Here is an example installing Python dependencies:

dockerfile

FROM kestra/kestra:latest-full

USER root
RUN apt-get update -y && apt-get install pip -y

RUN pip install --no-cache-dir pandas requests boto3

Then, point to that Dockerfile in your docker-compose.yml file:

yaml

services:
  kestra:
    build:
      context: .
      dockerfile: Dockerfile
    image: kestra-python:latest

Once you start Kestra containers using docker compose up -d, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS runner:

yaml

id: python_process
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: PROCESS
    script: |
      import pandas as pd
      import requests
      import boto3
      print(f"Pandas version: {pd.__version__}")
      print(f"Requests version: {requests.__version__}")
      print(f"Boto3 version: {boto3.__version__}")

Building a custom Docker image for your script tasks

Imagine you use the following flow:

yaml

id: zip_to_python
namespace: company.team

variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"

tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"

  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    warningOnStdErr: false
    runner: DOCKER
    docker:
      image: ghcr.io/kestra-io/pydata:latest
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd

      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"

      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"

The Python task requires pandas to be installed. Pandas is a large library and it's not included in the default python image. In this case, you have the following options:

Install pandas in the beforeCommands property of the Python task.
Use one of our pre-built images that already include pandas, such as the ghcr.io/kestra-io/pydata:latest image.
Build your own custom Docker image that includes pandas.

1) Installing pandas in the `beforeCommands` property

yaml

id: install_pandas_at_runtime
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: PROCESS
    beforeCommands:
      - pip install pyarrow pandas
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")

2) Using one of our pre-built images

yaml

id: use_prebuilt_image
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: DOCKER
    docker:
      image: ghcr.io/kestra-io/pydata:latest
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")

3) Building a custom Docker image

If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:

dockerfile

FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion

Then, build the image:

bash

docker build -t kestra-custom:latest .

Finally, use that image in your flow:

yaml

id: zip_to_python
namespace: company.team

variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"

tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"

  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    warningOnStdErr: false
    runner: DOCKER
    docker:
      image: kestra-custom:latest # ⚡️ Use your custom image here
      pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd

      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"

      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"

Note how we use the pullPolicy: NEVER property to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.

Was this page helpful?

ScriptsDOCKER and PROCESS runners

ScriptsInstalling dependencies at runtime

​Building a custom ​Docker image

Building a custom Docker image for your script tasks

1) Installing pandas in the beforeCommands property

2) Using one of our pre-built images

3) Building a custom Docker image

Building a custom Docker image

1) Installing pandas in the `beforeCommands` property