Mastering Apache Airflow, Part 6: Secrets Backends, Connections, and Deployment Architectures

From Dev to Prod the Right Way: Securing Secrets, Managing Connections, and Architecting Reliable Deployments

Lorenzo Bradanini

and

Lorenzo Tettamanti

May 05, 2025

Giving Context 🔒⚙️🌍

Hey everyone! How are you?

After diving deep into Airflow’s executors and authentication backends, it’s time we shift gears and focus on something a little less flashy—but absolutely essential for running Airflow reliably in production: secrets, connections, and deployment architecture.

This chapter is all about production hygiene. Not the glamorous stuff. But the kind of engineering work that, if done right, quietly keeps things running smoothly—and if done wrong, absolutely has the potential to wake you up at 3 a.m. when a critical DAG fails because a secret expired, a connection string was misconfigured, or your scheduler node crashed under load.

We’re going to unpack the “ops” side of Airflow—what it takes to make your orchestration system secure, reliable, and scalable in the real world. Specifically, we’ll cover:

Secrets Backends – How to keep credentials, tokens, and passwords out of your metadata DB and config files—and manage them the right way using tools like Vault, AWS Secrets Manager, or GCP Secret Manager.
Connections Management – How Airflow abstracts access to external systems (like databases and cloud APIs), and how to organize, inject, and validate those connections with confidence.
Deployment Architectures – The three most common ways to run Airflow in dev, staging, and production environments—including local setups, Docker Compose, Kubernetes, and managed offerings.

Each of these pieces plays a vital role in turning your Airflow project from a cool local prototype into a robust, team-friendly orchestration layer that can handle real-world complexity, safely and predictably.

Let’s start at the foundation: managing secrets safely🗝️

🔐 Secrets Backends: Keep Your Credentials Off Disk

When running Airflow in production, one of the first (and one of the most important) rules of thumb is: don’t store secrets in plaintext. That means no database passwords in airflow.cfg, no API keys in your DAG files, and definitely no access tokens hardcoded into your environment.

Why? Because secrets are sensitive by their actual nature—they grant access to your most critical systems. A leaked API key could mean someone pushes to your production Kafka topic, accesses your warehouse, or drains your billing account.

To address this, Airflow ships with native support for pluggable secrets backends: external systems built specifically to store and serve secrets in a secure, auditable, and encrypted manner.

✅ Supported Secrets Backends (Out of the Box)

Airflow supports several backends for secure secret management. These can be enabled individually or chained together for fallback behavior:

Environment Variables – EnvVarSecretBackend (simple, but lacks lifecycle management)
HashiCorp Vault – Ideal for enterprise-grade secret lifecycle and access control
AWS Secrets Manager – Fully integrated with AWS IAM and KMS
Google Secret Manager – Integrated with GCP IAM and audit logging

You can configure them using the [secrets] section in your airflow.cfg or via environment variables.

Here’s an example using Vault:

[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {"conn_id": "vault_conn", "mount_point": "airflow"}

Want to chain multiple backends (e.g., fall back from Vault to environment variables or local files)? That’s supported too:

[secrets]
backend = airflow.secrets.local_filesystem.LocalFilesystemBackend,airflow.providers.hashicorp.secrets.vault.VaultBackend

Each backend will be queried in order until a matching secret is found.

🔒 Why Use a Secrets Backend?

Here’s what you gain by delegating secrets to a dedicated backend:

✅ Centralized management – Secrets are stored in one place and updated without touching Airflow deployments
✅ Rotation & expiration – You can rotate secrets regularly and enforce TTLs
✅ Access logging – Most secret managers provide audit trails of who accessed what, and when
✅ No plaintext exposure – Secrets aren’t saved to disk, config files, or the metadata DB
✅ Runtime injection – Secrets can be retrieved at task execution time with scoped access

In short, you can reduce risk while gaining greater control and more visibility.

🔁 Secrets Lookup Order

When Airflow retrieves a secret (such as a connection or variable), it follows a clear precedence order:

Airflow Variables / Connections defined via UI or CLI
Secrets Backends (in the configured order)
Fallback to metadata DB (only if nothing is found above)

This means a secret stored in Vault will override one defined in the Airflow UI, unless the UI version is prioritized.

🧠 Naming Patterns and Best Practices

Airflow looks for secrets using specific naming conventions, depending on what it's loading:

Connections: airflow/connections/<conn_id>
Variables: airflow/variables/<var_key>

Examples:

airflow/connections/my_postgres
airflow/variables/slack_token

Stick to these patterns so your secrets backend can serve them properly.

💡 Pro Tip: Secrets Expire—Plan for It

Most secret managers support secret rotation or TTL-based expiration. That’s great, but here’s a catch: Airflow workers can cache secrets—especially if they’re running long-lived processes or persistent workers (e.g., Celery).

If you rotate a secret in Vault or AWS, make sure to reload your workers or set up a refresh mechanism. Otherwise, Airflow might still use the stale version.

Proper secret management is one of those invisible wins: you don’t always see it working, but it’s silently keeping your pipelines secure and your credentials safe. And when a compliance audit or a new teammate asks, “Where do we store credentials?”—you’ll have a solid answer.

🛠️ Connection Management: Don’t Let Broken Links Sink Your DAGs

Airflow's ability to orchestrate workflows depends on how well it integrates with the outside world—databases, cloud services, message queues, APIs. And at the center of that communication is the humble yet powerful abstraction: the Connection object.

Connections define how Airflow talks to external systems. They include everything from hostnames and ports to authentication tokens and extra configuration fields.

🔌 How Are Connections Defined?

Airflow gives you multiple ways to define connections, depending on your workflow and environment:

1. UI (Admin → Connections)

Ideal for manual setups or early prototyping. Add credentials directly in the web interface.

2. CLI

Perfect for automation or version-controlled deployments:

airflow connections add 'my_db' \
  --conn-uri 'postgresql://user:pass@host:5432/db'

3. Environment Variables

This is especially useful in containerized or serverless environments like Docker or Kubernetes:

export AIRFLOW_CONN_MY_DB=postgresql://user:pass@host:5432/db

Airflow will automatically pick up any variable that starts with AIRFLOW_CONN_ and convert it to a Connection at runtime.

4. Secrets Backends

The most secure and scalable method. Define connections in HashiCorp Vault, AWS Secrets Manager, or GCP Secret Manager, and serve them to Airflow dynamically.

Use the naming convention:

airflow/connections/my_db

✅ Best Practices for Managing Connections

Following best practices for connection management ensures that your DAGs are portable, secure, and robust:

🧱 Use Namespaces
Stick to standard names like aws_default, gcp_default, slack_default—Airflow providers often look for these by default.

🔐 Inject From Secrets
Avoid hardcoding passwords in code or config files. Use a secrets backend instead to rotate, encrypt, and audit access.

🧪 Validate Critical Connections
Use DAG-level or startup-time checks to verify that critical connections (like your warehouse or alerting service) are reachable. This can prevent painful runtime failures.

📑 Document in the UI
The Connection UI supports descriptions—use them to clarify usage (e.g., “Used by all ingestion DAGs” or “Slack webhook for ops alerts”).

Share The Software Frontier

🔄 Dynamically Creating or Overriding Connections

Airflow also allows programmatic connection management. This is handy in environments where DAGs must adjust dynamically, like for example in:

Multi-tenant architectures
Testing environments
Ephemeral deployments

Here’s how to create a connection at runtime:

from airflow.models import Connection
from airflow import settings

session = settings.Session()
conn = Connection(
    conn_id='slack_default',
    conn_type='http',
    host='https://hooks.slack.com',
    password='your-token-goes-here'
)
session.add(conn)
session.commit()

You can combine this with BaseHook.get_connection() to dynamically patch or update connection details before a task runs.

Pro Tip: Connection Caching and Refresh

If you're injecting connections dynamically or from a secrets backend, remember: Airflow may cache them in memory—especially with long-lived workers like Celery. If you update a connection mid-run, it may not take effect until the worker restarts.

🏗️ Deployment Architectures: From Laptop to Cloud

As you probably recall, over the past weeks, we’ve explored Airflow’s inner mechanics—from task execution and DAG orchestration to XComs, secrets, and plugins. Now it’s time to bring it all together.

Because no matter how well you design your DAGs, production success depends on your deployment architecture.

There’s no “one-size-fits-all” strategy here. Instead, we’ll walk through three canonical environments—Dev, Staging, and Production—each one representing a milestone in your team’s Airflow maturity.

🧪 Dev: The Solo Flight

LocalExecutor + SQLite

This is your playground: where new DAGs are born and break without consequence.

🔧 No Celery or Kubernetes
🖥️ Scheduler and webserver run on the same host
⚡ DAGs run immediately using local processes
📦 Simple to spin up:

docker-compose -f docker-compose.yaml up

This setup is great for:

Learning the Airflow interface
Validating DAG logic or hooks
Onboarding new team members

But it’s not built for scale—and definitely not for concurrency beyond a few DAGs.

🏗️ Staging: Simulate Scale Without K8s

CeleryExecutor + PostgreSQL + Redis

Here we step into distributed orchestration.

🧵 Workers are separate processes or nodes
💬 Message broker (Redis or RabbitMQ) coordinates task queues
🗃️ Metadata DB (PostgreSQL/MySQL) stores DAG state and execution history
🧰 Best run via Docker Compose or Helm for simplicity

This is ideal for:

Load testing and concurrency tuning
Multi-tenant DAG testing
Previewing production environments without full Kubernetes complexity

Staging is the bridge between development freedom and operational discipline.

☁️ Production: Cloud-Native & Scalable

KubernetesExecutor or CeleryExecutor on Kubernetes

This is Airflow in its final form: containerized, resilient, and cloud-ready.

Two common options:

KubernetesExecutor – ephemeral, container-based task execution
CeleryExecutor – persistent worker model with autoscaling

You can either:

Use a managed service like MWAA, Cloud Composer, or Astronomer
Roll your own using Helm and Terraform

✅ Key Components:

Layer Tooling Executor KubernetesExecutor / CeleryExecutor Metadata DB PostgreSQL / CloudSQL / RDS Broker Redis / RabbitMQ Secrets Vault, AWS Secrets Manager, GCP SM Logging S3, GCS, or Elasticsearch Monitoring Prometheus, Grafana, StatsD CI/CD DAG validation + auto-deploy

Example: Helm Snippet for Production

executor: KubernetesExecutor
logs:
  persistence:
    enabled: false

Bonus: Add CI Linting for Your DAGs

python -m airflow_linter dags/

This catches errors before they land in production.

🧩 Wrapping Up: Airflow as a Scalable Platform

By now, you’ve probably realized: mastering Airflow isn’t just about DAG syntax—it’s about building a platform.

A production-ready Airflow setup is:

🔐 Secure – secrets are stored in encrypted, auditable systems
⚙️ Connected – integrations are validated, consistent, and documented
📈 Scalable – infrastructure adapts to load with CI/CD and observability in place

This final layer—deployment architecture—turns Airflow from a task scheduler into a reliable, enterprise-grade orchestrator.

🎉 Final Thoughts: End of the Series

This article marks the conclusion of our Mastering Apache Airflow series. From DAG internals and execution models to XComs, connections, secrets, and deployment—you now have a complete blueprint to run Airflow like a pro.

Whether you're just getting started or already scaling workflows across teams, I hope this series helped demystify the platform and unlock its full potential.

If you found this series valuable, share it with your team—or drop a comment to let me know what you’d like to explore next.

Until then:
Stay curious. Stay secure. And keep scheduling smart. ⏱️🚀

The Software Frontier