Roadmap

Daedalus is evolving from a feature-store-like service into a lean feature platform by stabilizing the semantic layer that every future adapter consumes. The goal is not a rewrite — it is to make the existing YAML catalog, SQL/Python pipeline code, DuckDB/Polars rolling aggregation, skeleton → enrich training flow, and CLI cohere behind a small, executable platform protocol.

This page summarizes ROADMAP.md; the repository file remains the source of truth. See also the Platform overview and the Architecture overview.

Executive framing

Feature Protocol is the roadmap term for the semantic layer: schema, join semantics, type and shape, version identity, lineage, materialization metadata, and serving eligibility. The invariant is one tested definition of feature semantics, many thin adapters. Established terms like "data contract" are not churned where they are already clear.

Guiding order:

Stabilize the Feature Protocol and facade.
Keep the CLI and existing training path useful.
Add materialization, training adapters, serving, orchestration, and UI only behind concrete workflow triggers.

Non-Negotiable Migration Rule — wrap, don't rewrite. The existing skeleton → enrich pipeline is large, tuned, and full of load-bearing behavior (DuckDB rolling aggregations and resource controls, Ray/Lance enrichment, resumable per-day sharded output, the model-type-curation-before-content-filter ordering, session/coverage filtering). The platform first wraps and documents this path through Feature Protocol metadata; only after parity tests prove equivalence can pieces move behind shared abstractions.

Roadmap Progress

As of 2026-06-27, v0.9.1. Legend: ✅ done · ◑ partial/gated · ⏳ deferred (gated, not started).

Phase	Status	Evidence / notes
1A.1 Definitions load + golden-day parity	✅	`feature_services/{dssm_ranking,xgb_reranker}.yaml` both load and resolve (second consumer proves the schema generalizes); golden-day + executor parity harnesses in `tests/pipelines/` and `tests/catalog/` (#53).
1A.2 PlatformRegistry / facade / CLI surfaces	✅	`PlatformRegistry`, `catalog/facade.py`, `daeda service list/show` + platform lineage (#53).
1A.3 TrainingRunRecord + dense-qid flag	✅	`src/daedalus/pipelines/records.py`; carried into Phase 3 (#53/#55).
1B Version Protocol (lock/check/diff)	✅	`catalog/versioning.py`, `registry.lock.json`, `daeda registry lock/check/diff` + CI check (#53).
2 Materialization + MetadataStore	✅	Service materialization to Parquet/DuckDB + `MetadataStore` records, `daeda materialize` (#54).
3 Training adapter over skeleton→enrich	✅	Service→`feature_refs`/`output_columns` bridge; `TrainingRunRecord` wired (#55).
4 DuckLake hardening + Snowflake	◑	Connection layer correct: `DuckLakeSink` follows the canonical DuckLake 1.0 ATTACH recipe and a gated `SnowflakeSource` (optional `snowflake` extra, key-pair JWT, Arrow query pushdown) landed (#84); earlier mock-data seeder + local E2E + config-only swap (#59). Remaining: wire DuckLake/Snowflake as an actual training source-of-record (skeleton/materialize still parquet-hardcoded), and a real Snowflake workflow caller before un-gating broadly.
5 Controlled serving (Quack + Online Store)	✅	Slice A materialize-to-online producer + `OnlineStore`/`RedisOnlineStore` (#56); Slice B Quack transport + custom authz callbacks (#57); Slice C DuckDB+Quack serving daemon + Arrow Flight removed + parity gate (#58). Quack stays DuckDB-beta-gated (internal-only, pinned DuckDB).
6 Optional UI	✅ gate reached	Delivered the orchestration + API + agent surfaces. The browser GUI was built (#76) then dropped from this repo (#80) → moving to a dedicated stack/repo consuming the JSON-RPC contract, to stay backend-lean.
Core v1 — pipeline operators (engineering "Phase 6")	✅	Unified skeleton→enrich onto an engine-tagged operator DAG, deleted the legacy path; `daeda pipeline train` is the sole training entry point. Validated byte-identical + temporally correct on a real production day (#64–#72).
Optional projections — Dagster	✅	Daily-partitioned graph-backed asset over the executor + schedule; `daedalus.definitions` (#73).
Optional projections — MLflow	✅	`daedalus.tracking`: `ExperimentTracker` protocol + no-op `NullTracker` + lazy `MLflowTracker` (optional `mlflow` extra = `mlflow-skinny`) + `get_tracker`; default-off projection of `TrainingRunRecord` (#83).

Platform surfaces shipped (the v0.9.x layer over the engine): Dagster runtime (daedalus.definitions), FastAPI JSON-RPC control plane (daeda serve-api), MCP server (daeda mcp), and agent CLI (daeda config/lineage/materialize-day). All reuse the same in-process API; no feature logic re-derived.

Two distinct "Phase 6" labels

The roadmap Phase 6 is Optional UI. The engineering Phase 6 was the Core-v1 pipeline-operator unification (daeda pipeline train). Both appear in the table above; do not conflate them.

Why a platform

A loose feature store cannot answer the questions consumers force at pre-training time: which feature views does model version X consume, exactly which columns, with what preprocessing, and can we reproduce it. The platform makes four concerns explicit and versioned — which views/columns a model version consumes, column-level lineage from source to served feature, reproducible preprocessing (not just outputs), and version control over all of the above. A phase earns its place only if it moves one of those from tribal knowledge into a tested, loadable definition. Current consumers are DSSM-style retrieval and XGBoost/Phoenix reranking; planned consumers include Node2Vec tag models and a multimodal search engine.

Phase summaries

Phase 1A — Feature Protocol Spine

Library + CLI only, no network, no training rewrite. Split into three milestones:

1A.1 — Definitions load + parity baseline. FeatureColumnRef (view:column parsing), FeatureServiceDef (schema, join protocol, type/shape metadata, serving metadata, owner/tags), fail-fast service loaders, fixed-size ARRAY dtype support (FLOAT[1152]), a seeded real dssm_ranking service plus a second consumer, and the golden-day skeleton → enrich parity harness in two CI tiers (a hermetic unit fixture on every PR; a production-like Ray + Lance integration/slow fixture). This harness is what every later "parity" exit criterion runs against.
1A.2 — Surfaces over the protocol. PlatformRegistry composing existing catalogs, additive lineage builders, catalog/facade.py as the single read boundary, daeda service list/show and platform lineage. ModelDef only when a real consumer needs a documented model-to-service binding.
1A.3 — Pulled-forward standalone items (no version-lock dependency, run in parallel to 1A.2): minimal TrainingRunRecord shape and an optional dense-qid compatibility flag/stage. Pulling these forward reduces serial risk for a small team.

Phase 1B — Version Protocol

Protect the stable protocol from accidental breaking changes with datetime version identifiers plus structural diffs — no hashing. Each service carries version: <YYYY-MM-DD[.N]>; registry.lock.json snapshots each service's resolved schema as plain JSON; daeda registry lock/check/diff compares structurally over the canonical logical shape (so FLOAT[1152] and FLOAT[] + shape.dimension: 1152 produce no diff). Reproducibility lives in run/materialization records (git SHA, config path, embedding-store snapshot), not definition hashes. Depends only on milestone 1A.1.

Phase 2 — Materialization and Metadata

Materialize feature services to Parquet/DuckDB first (DuckLake only where a real caller exists) and write operational metadata to MetadataStore — a single SQLite file under the run root, WAL mode, append-only run/materialization records (run_id, status, attempt, timestamps, service + version, git SHA, source snapshot, row count, timestamp range, join-key null rates, output path/manifest). Definitions and qid_map never move into MetadataStore. An optional late-Phase-2 --online flag may project latest-per-entity rows into the online store if a consumer appears first.

Phase 3 — Training Adapter over `skeleton → enrich`

Connect the Feature Protocol to the working training path without reimplementation: bridge FeatureServiceDef to the existing feature_refs / output_columns, assert feed_session via the golden-day harness (already emitted), and wire the carried-forward TrainingRunRecord to the full record set (now including the Phase 1B service version). Does not rewrite generate_training_dataset or change output defaults without a compatibility flag. MLflow and Dagster stay optional projections of these records, never sources of truth.

Phase 4 — DuckLake Hardening and Snowflake Trigger (gated)

Harden type-mapping parity and schema-evolution behavior for DuckLake service materializations with read/write smoke tests. Snowflake is a clean forward-looking addition — no Snowflake code exists today — and stays gated until a real clean ADS/HDS source is needed by a shipped workflow. When gated in: SnowflakeSource as an optional extra, source-level query= pushdown only, streaming Arrow into local DuckDB; full op-level translation and a generic Snowflake engine are deferred. No Snowflake dependency may leak into core imports.

Phase 5 — Controlled Serving (Quack + Online Store, gated)

Expose approved feature-service data-plane access without making Quack a public arbitrary-SQL service. The dominant work item is custom authorization callbacks against a beta protocol (deny DDL/DML, allowlist only approved relations — SQL macros first, since Python UDF callbacks are not viable; compiled extension only for stateful/per-user ACL complexity), not the thin serving/transport.py adapter. Also: a QuackServingBackend behind that interface, a dedicated DuckDB session seeing only approved relations, localhost + token auth, the materialize-to-online producer, an online/offline skew parity test, and a pinned DuckDB version while Quack is beta. Arrow Flight and daeda flight were removed in Slice C (2026-06).

Phase 6 — Optional UI (gated)

Add browser or terminal browsing only when CLI/facade/API usage shows a real multi-user review or operations need. UI consumes facade/API output only — it does not parse YAML, read MetadataStore directly, or define a second metadata model.

Core v1 — pipeline operators (engineering "Phase 6")

The unification step of the Non-Negotiable Migration Rule: the skeleton → enrich flow runs as one engine-tagged operator DAG through the executor, the legacy path was deleted, and daeda pipeline train is the sole training entry point — validated byte-identical and temporally correct (no time-travel) on a real production day. Each operator declares its engine (sql / sql_arrow_udf / pythonic); sources self-optimize while Daedalus owns only post-source compute.

Optional projections — Dagster and MLflow

Dagster and MLflow are projections of Daedalus records, not definition owners.

Dagster (daedalus.definitions): each training dataset is a daily partitioned graph-backed asset whose op sub-graph is compile → skeleton → enrich, with feature views as external (lineage) assets and a schedule for daily production runs.
MLflow (daedalus.tracking): an ExperimentTracker protocol, a no-op NullTracker, and a lazy MLflowTracker behind the optional mlflow extra (mlflow-skinny, no pandas pull), wired default-off through TrainingRunConfig.tracking; logs feature_service@version, git SHA, config paths, and run-record ids.

Core imports stay clean without the optional extras; the local Dagster/MLflow smoke paths work only when extras are installed.

Cross-cutting rules

YAML + SQL + Python code remain the file-of-record for definitions; definitions are YAML + SQL, never TOML (TOML is reserved for future operator/local runtime config). Git remains the audit source for definition changes.
MetadataStore holds operational state only; definitions never move into it.
ray and the redis client (the first OnlineStore backend — the dependency is on the Redis wire protocol, AWS ElastiCache for Valkey today) remain core while workflows require them. New stacks (api, dagster, mlflow, snowflake, UI) are optional extras until a phase has a real caller.
REST is out of scope — no phase delivers a REST/HTTP server; the facade is the read boundary a future renderer would sit behind.
A no-extras CI import test protects core from optional-dependency leakage.

Top risks and kill criteria

Headline risks: rewriting the training pipeline instead of wrapping it; starting version locks before the Type & Shape Protocol is stable; treating qid as platform identity; letting serving_allowed become a passive flag rather than an enforced chokepoint; Quack beta churn with no Flight fallback (an accepted trade-off, 2026-06); adding Snowflake before a real source exists; optional dependency leakage; and serial dependency on a single maintainer.

The effort pauses if, after Phase 1A, no second consumer's service definition loads cleanly (the protocol does not generalize); if the golden-day parity harness cannot pass against skeleton → enrich ("wrap, don't rewrite" is unachievable); if version checks produce constant false breakages and get bypassed; or if no model-development workflow can name a concrete benefit from column-level lineage or reproducible preprocessing after Phase 2.

Roadmap ​

Executive framing ​

Roadmap Progress ​

Why a platform ​

Phase summaries ​

Phase 1A — Feature Protocol Spine ​

Phase 1B — Version Protocol ​

Phase 2 — Materialization and Metadata ​

Phase 3 — Training Adapter over skeleton → enrich ​

Phase 4 — DuckLake Hardening and Snowflake Trigger (gated) ​

Phase 5 — Controlled Serving (Quack + Online Store, gated) ​

Phase 6 — Optional UI (gated) ​

Core v1 — pipeline operators (engineering "Phase 6") ​

Optional projections — Dagster and MLflow ​

Cross-cutting rules ​

Top risks and kill criteria ​