What is Daedalus?
Daedalus is the feature platform behind PixAI's recommendation system. It turns declarative YAML feature definitions into point-in-time-correct training datasets, and wraps the whole thing in a lean operational layer (orchestration, an HTTP control plane, agent interop, and a CLI) over a single shared engine.
The name is the project's north star: a small set of well-fitted parts — a catalog, an engine, a pipeline, and a thin platform skin — that compose into something that flies.
The big picture
Daedalus is built as four layers, each one feeding the next:
Declarative YAML feature catalog — Feature views live as YAML in
feature_views/*.yamland are loaded byFeatureCatalog.from_yaml_dir(). Each view carries its schema, source metadata, and the canonical entity join keys (user→user_id,artwork→artwork_id). Adding a feature is a YAML edit, not a code change. See Feature Catalog.DuckDB / Polars rolling-aggregation engine — A point-in-time aggregation engine (
aggregate_pit_table) with two interchangeable, bit-for-bit equivalent implementations: a DuckDB windowed list-aggregation path (default) and a Polarsrolling().agg()path kept for A/B testing and fallback. You pick one via thecompute.enginefield inruntime.yaml. See Compute Engine.Operator-pipeline training (skeleton → enrich) — A feature service is compiled into an engine-tagged operator DAG and run as one compile → skeleton → enrich pipeline through the executor. The skeleton stage does feature joins + rolling aggregations and writes per-day parquets; the enrich stage appends avg-pooled artwork embeddings via Ray Data streaming actors backed by a Lance store. See Operator Pipeline.
A lean platform layer — The same in-process training API is exposed through four surfaces, with no feature logic re-derived:
- Dagster orchestration (
daedalus.definitions) — each training dataset is a daily partitioned graph-backed asset. - A FastAPI JSON-RPC control plane (
daeda serve-api). - An MCP server (
daeda mcp) for external agents. - The
daedaagent CLI (config,lineage,materialize-day).
See Platform Surfaces.
- Dagster orchestration (
Core v1: one canonical training path
Training is intentionally a single entry point — daeda pipeline train <service> — referred to throughout the docs as Core v1. It compiles the feature service into an operator DAG and runs the unified skeleton → enrich path through the executor. The standalone daeda skeleton / daeda enrich commands were removed in the Core-v1 lean cut (last dual-path tag: v0.7.1); output is byte-identical to that older pair.
# Compile a service to an editable operator DAG (inspect / hand-tune)
daeda pipeline compile dssm_ranking
# Run the full compile → skeleton → enrich pipeline (the canonical path)
daeda pipeline train dssm_rankingWho consumes Daedalus
Daedalus produces the training datasets for PixAI's recommender models. Feature services are the unit of consumption — each one resolves a set of feature views into a typed training schema:
dssm_ranking— features for the DSSM retrieval tower.xgb_reranker— features for the XGBoost reranker.
Additional consumers (Node2Vec graph embeddings, search) are planned and will be added as new feature services without changes to the engine.
Declarative by design
New features and new consumers are almost always a YAML change. The engine, pipeline, and platform layers stay fixed — you describe what you want in feature_views/ and feature_services/, and Daedalus compiles and runs it.
Next steps
- Getting Started — install Daedalus, link your data, and run your first training pipeline.
- Architecture Overview — a deeper tour of the catalog, engine, pipeline, and platform layers.