Building a custom Docker image
This page shows how to build a custom Docker image for your script tasks.
You can bake all dependencies needed for your script tasks directly into the Kestra's base image. Here is an example installing Python dependencies:
FROM kestra/kestra:latest-full
USER root
RUN apt-get update -y && apt-get install pip -y
RUN pip install --no-cache-dir pandas requests boto3
Then, point to that Dockerfile in your docker-compose.yml
file:
services:
kestra:
build:
context: .
dockerfile: Dockerfile
image: kestra-python:latest
Once you start Kestra containers using docker compose up -d
, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS
runner:
id: python_process
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
runner: PROCESS
script: |
import pandas as pd
import requests
import boto3
print(f"Pandas version: {pd.__version__}")
print(f"Requests version: {requests.__version__}")
print(f"Boto3 version: {boto3.__version__}")
Building a custom Docker image for your script tasks
Imagine you use the following flow:
id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
warningOnStdErr: false
runner: DOCKER
docker:
image: ghcr.io/kestra-io/pydata:latest
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"
The Python task requires pandas to be installed. Pandas is a large library and it's not included in the default python
image. In this case, you have the following options:
- Install pandas in the
beforeCommands
property of the Python task. - Use one of our pre-built images that already include pandas, such as the
ghcr.io/kestra-io/pydata:latest
image. - Build your own custom Docker image that includes pandas.
1) Installing pandas in the beforeCommands
property
id: install_pandas_at_runtime
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
runner: PROCESS
beforeCommands:
- pip install pyarrow pandas
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")
2) Using one of our pre-built images
id: use_prebuilt_image
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
runner: DOCKER
docker:
image: ghcr.io/kestra-io/pydata:latest
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")
3) Building a custom Docker image
If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:
FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion
Then, build the image:
docker build -t kestra-custom:latest .
Finally, use that image in your flow:
id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
warningOnStdErr: false
runner: DOCKER
docker:
image: kestra-custom:latest # ⚡️ Use your custom image here
pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"
Note how we use the pullPolicy: NEVER
property to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.
Was this page helpful?