Skip to content

Snowflake

SnowflakeSource (src/daedalus/catalog/table.py) reads from Snowflake via a query pushdown: the query runs in Snowflake, its result streams back as Arrow, and that Arrow table is registered as a local DuckDB view the rest of the pipeline reads. Daedalus owns no Snowflake execution beyond the pushed query — per the roadmap's gated Snowflake scope.

An optional, gated backend

Snowflake support is optional. The connector is kept out of core dependencies so import daedalus never requires it, and both snowflake-connector-python and cryptography are imported lazily inside SnowflakeSource — core imports never pull them.

Install on demand:

bash
uv sync --extra snowflake
# snowflake-connector-python>=4.6.0 + cryptography>=43.0.0

Key-pair (JWT) authentication

SnowflakeSource authenticates with key-pair JWT auth (authenticator="SNOWFLAKE_JWT"). You supply the PEM private key content (private_key_pem); at connect time it is loaded and converted to DER / PKCS8 and passed to the connector as private_key. There is no password.

The private key is a secret — supply it via ${ENV} and never inline it; to_dict() masks it (see the warning below).

Pushdown only → Arrow → local DuckDB view

SnowflakeSource supports source-level query pushdown only — there is no table_name path. build_source requires a query and dispatches when database_path is a snowflake://<account> URI; the account is parsed out of that URI. The flow:

  1. Connect to Snowflake with the JWT key-pair kwargs (account / user / warehouse, plus optional database / schema / role).
  2. Run the pushed query; fetch the result as an Arrow table (fetch_arrow_all).
  3. Register that Arrow table as a local DuckDB view named snowflake_<digest> — which sql_expr() returns. Downstream operators read it like any other DuckDB relation.

Feature-view YAML example

yaml
# feature_views/user_snowflake.yaml
name: user_attributes
entities:
  - user
source:
  name: user_attributes_snowflake
  database_path: "snowflake://${SNOWFLAKE_ACCOUNT}"     # account in the URI
  query: "SELECT user_id, country, tier FROM analytics.public.user_dim"
  snowflake_user: "${SNOWFLAKE_USER}"
  snowflake_warehouse: "${SNOWFLAKE_WAREHOUSE}"
  snowflake_private_key: "${SNOWFLAKE_PRIVATE_KEY}"     # PEM content (secret)
  snowflake_database: "${SNOWFLAKE_DATABASE}"           # optional
  snowflake_schema: "${SNOWFLAKE_SCHEMA}"               # optional
  snowflake_role: "${SNOWFLAKE_ROLE}"                   # optional
  timestamp_field: event_timestamp
features:
  - name: country
    dtype: VARCHAR
  - name: tier
    dtype: VARCHAR

build_source fails fast if any required field is missing — the account in the snowflake://<account> URI plus snowflake_user, snowflake_warehouse, and snowflake_private_key are all required (and a query is mandatory, since only pushdown is supported). snowflake_database / snowflake_schema / snowflake_role are optional and only passed to the connector when set.

Secrets via ${ENV}, never inline

Always inject snowflake_private_key (the PEM key content) and the snowflake://${SNOWFLAKE_ACCOUNT} account via ${ENV} references — never paste a literal key or account into the YAML. to_dict() masks snowflake_private_key (a value that is entirely a single ${VAR} reference is shown verbatim; any literal key becomes ***), so catalog show / service show / lineage never leak it.

See also the Sources Overview, DuckLake, Parquet & Postgres, and Configuration.