Overview
Daedalus is a feature platform for PixAI's recommendation system. Its job is to turn declarative feature definitions into point-in-time-correct training datasets, and to wrap that engine in a lean operational layer.
The system is built as four layers, each one feeding the next. Every layer is Arrow-native (see Arrow-Native Design) and the definitions are the single source of truth — adding a feature is a YAML edit, not a code change.
The four layers
┌───────────────────────────────────────────────────────────────────────┐
│ 1. DEFINITIONS feature_views/*.yaml (FeatureViewDef) │
│ (declarative YAML) feature services (column-level contracts) │
│ │
│ FeatureCatalog.from_yaml_dir() → entities, views, dtypes → Arrow │
└───────────────────────────────┬───────────────────────────────────────┘
│ catalog (entities + views + services)
▼
┌───────────────────────────────────────────────────────────────────────┐
│ 2. COMPUTE aggregate_pit_table(spine, specs) │
│ (rolling agg) Engine.DUCKDB │ Engine.POLARS │
│ windowed list-agg │ rolling().agg() │
│ — bit-for-bit equivalent — │
└───────────────────────────────┬───────────────────────────────────────┘
│ rolling point-in-time features
▼
┌───────────────────────────────────────────────────────────────────────┐
│ 3. TRAINING compile → skeleton → enrich │
│ (operator pipeline) FeatureServiceDef → OperatorPipeline (DAG) │
│ executor.run_pipeline: │
│ skeleton (SQL source layer) writes per-day │
│ enrich (Pythonic) appends embeddings (Ray) │
└───────────────────────────────┬───────────────────────────────────────┘
│ per-day dt=YYYY-MM-DD/*.parquet
▼
┌───────────────────────────────────────────────────────────────────────┐
│ 4. PLATFORM SURFACES one in-process API, no feature logic re-derived │
│ Dagster · FastAPI JSON-RPC · MCP server · daeda CLI │
└───────────────────────────────────────────────────────────────────────┘1. Definitions — declarative YAML
Feature views live as YAML in feature_views/*.yaml and are loaded by FeatureCatalog.from_yaml_dir() (src/daedalus/catalog/registry.py). Each view carries its schema (DuckDB-like dtypes parsed into Arrow types), its source metadata (DataSourceDef), and the entities it is keyed on. A feature service layers a column-level contract on top of those views — the resolved input schema for one model.
See Feature Catalog.
2. Compute — rolling aggregation
aggregate_pit_table is a plain module-level function with two interchangeable, bit-for-bit equivalent implementations: a DuckDB windowed list-aggregation path and a Polars rolling().agg() path. You choose one explicitly via the Engine enum and the get_aggregate_pit_table() factory — never a magical default import.
See Compute Engine.
3. Training — the operator pipeline
A feature service is compiled into an engine-tagged operator DAG and run as one compile → skeleton → enrich pipeline through the executor (daeda pipeline train <service>, the sole training entry point in Core v1). The skeleton stage runs the SQL source layer (feature joins + rolling aggregations) and writes per-day parquets; the enrich stage appends avg-pooled artwork embeddings via Ray Data streaming actors backed by a Lance store.
See Operator Pipeline and the Training Pipeline overview.
4. Platform surfaces
The same in-process training API is exposed through four surfaces, with no feature logic re-derived: Dagster orchestration (daedalus.definitions), a FastAPI JSON-RPC control plane (daeda serve-api), an MCP server (daeda mcp), and the daeda agent CLI.
See Platform Surfaces.
Canonical entity join keys
Entities are shared across every feature view and defined once in src/daedalus/catalog/model.py (registered in registry.py). The two canonical entities and their join keys are:
| Entity | Join key | Value type |
|---|---|---|
user | user_id | int64 |
artwork | artwork_id | int64 |
Reusing these canonical keys is a hard convention: a feature view declares which entities it is keyed on (entity: user, entity: [user, artwork]), and the catalog resolves the join keys from the shared ENTITIES map rather than letting each view invent its own.
from daedalus.catalog.registry import ENTITIES
ENTITIES["user"].join_keys # ["user_id"]
ENTITIES["artwork"].join_keys # ["artwork_id"]Design north star
A small set of well-fitted parts — a catalog, an engine, a pipeline, and a thin platform skin — that compose cleanly. The layers above only ever depend downward: definitions never know about training, compute never knows about the platform surfaces.