Skip to content

Getting Started

This page walks you from a fresh checkout to your first training run.

Prerequisites

  • Python 3.13+ — the project requires >=3.13 (pinned in .python-version at the repo root).
  • uv — the package and project manager used for every command below.

Install

Install the project and its dev dependencies (pytest, ruff):

bash
uv sync --dev

This installs the core engine. Both optional backends import lazily, so a plain import daedalus never requires them — install an extra only when you actually use that backend.

Optional extras

bash
# Unlocks catalog.table.SnowflakeSource:
#   key-pair JWT auth + Arrow query pushdown
uv sync --extra snowflake
bash
# Unlocks the MLflowTracker (mlflow-skinny) projection of training runs
uv sync --extra mlflow
  • snowflake — pulls in snowflake-connector-python + cryptography so a data source can be backed by Snowflake (snowflake://<account> with key-pair JWT auth). Without it, the Snowflake source stays unimportable but the rest of Daedalus works.
  • mlflow — pulls in mlflow-skinny so each TrainingRunRecord (feature service @ version, git SHA, config paths) can be projected to MLflow. Tracking is off by default (a no-op NullTracker); enable it by setting TrainingRunConfig.tracking.kind="mlflow".

Feature sources expect data under data/mewtant/ — typically a symlink to external storage. The directory holds one subdir per feature group: aesthetic/, feed/, generation/, likes/, properties/, siglip2_vectors/, tack/, and user/.

First, download the monthly feature parquets from S3 with daeda download (a YYYYMM range, comma-separated feature list, and an output directory):

bash
daeda download \
    --features properties,feed,generation,siglip2_vectors \
    --start 202601 \
    --end 202605 \
    --output-dir /path/to/spacious_storage

Preview before you pull

Add --dry-run to print the generated sync commands plus a per-feature coverage report (present / missing / unknown prefixes) without downloading anything. daeda download autodetects the transfer backend, preferring s5cmd and falling back to aws s3 sync; force one with --backend s5cmd|aws.

Then symlink the downloaded directory into the project so the feature definitions can find it:

bash
unlink data/mewtant
ln -sf /path/to/spacious_storage data/mewtant

First commands

Inspect the feature catalog

Start by browsing what features exist — no Python required:

bash
# List every feature view with a one-line summary
uv run daeda catalog list

# Print a view's metadata + schema as JSON
uv run daeda catalog show user_profile

Run a training pipeline

Training is a single canonical path (Core v1): daeda pipeline train &lt;service> compiles the feature service into an operator DAG and runs the unified skeleton → enrich pipeline.

bash
# Full run over the month range configured in runtime.yaml
uv run daeda pipeline train dssm_ranking

Useful flags for iterating:

bash
# Narrow to a single day (YYYY-MM-DD)
uv run daeda pipeline train dssm_ranking --target-date 2026-05-15

# Run just one stage
uv run daeda pipeline train dssm_ranking --skeleton-only --target-date 2026-05-15
uv run daeda pipeline train dssm_ranking --enrich-only   --target-date 2026-05-15

The processed month range (feed_start / feed_end) and other knobs are config-driven — see Configuration. The available services are dssm_ranking (DSSM retrieval) and xgb_reranker (XGBoost reranker).

Enrich needs the Lance store

The enrich stage reads avg-pooled embeddings from a Lance store (data/store/v6/artwork_embedding by default). It is built automatically on first run from data/mewtant/siglip2_vectors, and must reside on a local ext4 filesystem — GPFS / network mounts are not supported (Lance atomic-rename limitation).

Running tests

Run the full suite:

bash
uv run pytest

Tests are auto-marked by directory (see conftest.py). Run a subset by marker:

bash
uv run pytest -m unit              # fast unit tests
uv run pytest -m integration       # integration tests (slow)
uv run pytest -m cli               # CLI behavior tests
uv run pytest -m "unit and not slow"

Or target a single file or test:

bash
uv run pytest tests/catalog/test_loader.py
uv run pytest tests/catalog/test_loader.py::test_load_feature_view_from_yaml

Lint and format with ruff (line length 88):

bash
uv run ruff check .
uv run ruff format .

Next steps