Skip to content

What is Daedalus?

Daedalus is the feature platform behind PixAI's recommendation system. It turns declarative YAML feature definitions into point-in-time-correct training datasets, and wraps the whole thing in a lean operational layer (orchestration, an HTTP control plane, agent interop, and a CLI) over a single shared engine.

The name is the project's north star: a small set of well-fitted parts — a catalog, an engine, a pipeline, and a thin platform skin — that compose into something that flies.

The big picture

Daedalus is built as four layers, each one feeding the next:

  1. Declarative YAML feature catalog — Feature views live as YAML in feature_views/*.yaml and are loaded by FeatureCatalog.from_yaml_dir(). Each view carries its schema, source metadata, and the canonical entity join keys (useruser_id, artworkartwork_id). Adding a feature is a YAML edit, not a code change. See Feature Catalog.

  2. DuckDB / Polars rolling-aggregation engine — A point-in-time aggregation engine (aggregate_pit_table) with two interchangeable, bit-for-bit equivalent implementations: a DuckDB windowed list-aggregation path (default) and a Polars rolling().agg() path kept for A/B testing and fallback. You pick one via the compute.engine field in runtime.yaml. See Compute Engine.

  3. Operator-pipeline training (skeleton → enrich) — A feature service is compiled into an engine-tagged operator DAG and run as one compile → skeleton → enrich pipeline through the executor. The skeleton stage does feature joins + rolling aggregations and writes per-day parquets; the enrich stage appends avg-pooled artwork embeddings via Ray Data streaming actors backed by a Lance store. See Operator Pipeline.

  4. A lean platform layer — The same in-process training API is exposed through four surfaces, with no feature logic re-derived:

    • Dagster orchestration (daedalus.definitions) — each training dataset is a daily partitioned graph-backed asset.
    • A FastAPI JSON-RPC control plane (daeda serve-api).
    • An MCP server (daeda mcp) for external agents.
    • The daeda agent CLI (config, lineage, materialize-day).

    See Platform Surfaces.

Core v1: one canonical training path

Training is intentionally a single entry point — daeda pipeline train <service> — referred to throughout the docs as Core v1. It compiles the feature service into an operator DAG and runs the unified skeleton → enrich path through the executor. The standalone daeda skeleton / daeda enrich commands were removed in the Core-v1 lean cut (last dual-path tag: v0.7.1); output is byte-identical to that older pair.

bash
# Compile a service to an editable operator DAG (inspect / hand-tune)
daeda pipeline compile dssm_ranking

# Run the full compile → skeleton → enrich pipeline (the canonical path)
daeda pipeline train dssm_ranking

Who consumes Daedalus

Daedalus produces the training datasets for PixAI's recommender models. Feature services are the unit of consumption — each one resolves a set of feature views into a typed training schema:

  • dssm_ranking — features for the DSSM retrieval tower.
  • xgb_reranker — features for the XGBoost reranker.

Additional consumers (Node2Vec graph embeddings, search) are planned and will be added as new feature services without changes to the engine.

Declarative by design

New features and new consumers are almost always a YAML change. The engine, pipeline, and platform layers stay fixed — you describe what you want in feature_views/ and feature_services/, and Daedalus compiles and runs it.

Next steps

  • Getting Started — install Daedalus, link your data, and run your first training pipeline.
  • Architecture Overview — a deeper tour of the catalog, engine, pipeline, and platform layers.