Execution & Tuning
daeda pipeline train <service> compiles a feature service and runs its operator DAG through the executor. This page covers the run flags, day partitioning and parallelism, the Ray envelope that bounds the pythonic enrichment operators, and the config files that drive a run.
daeda pipeline train dssm_rankingFor the one-pipeline mental model see Overview; for the operator-by-operator breakdown see Operators & Optimization.
daeda pipeline train flags
| Flag | Default | Description |
|---|---|---|
SERVICE | (required) | Feature service name (e.g. dssm_ranking). |
--feature-views-dir | feature_views | Directory of feature view YAML. |
--feature-services-dir | feature_services | Directory of feature service YAML. |
--runtime-config-path | config/training/runtime.yaml | Training runtime config. |
--aggregation-config-path | config/training/aggregations.yaml | Aggregation + embedding-store config. |
--output-root | (from config) | Override the output root. |
--target-date | (all days) | Narrow a run to a single YYYY-MM-DD day. |
--day-workers | (from config, 1) | Concurrent target days. |
--skeleton-only | off | Run only the source-side operators. |
--enrich-only | off | Run only the enrichment operators. |
--ray-num-cpus | 16 | Ray logical CPUs for the pythonic enrichment operators. |
--skeleton-only / --enrich-only run a sub-range of the DAG
These flags do not select "stages" — they select a contiguous sub-range of the one operator DAG:
--skeleton-onlyruns only the source-side operators — everysql/sql_arrow_udfoperator (Scan/Filter/Project/Join/PointInTimeJoin/RollingAggregate/Sink). It stops before theRayUdfTransformenrichment operators.--enrich-onlyruns only the enrichment operators — thepythonicRayUdfTransform(s) — over an already-written source-side output.
A run with no flag executes the full DAG: source-side operators first, then the enrichment operators (only if the pipeline declares a RayUdfTransform). Both sub-ranges are resumable, so a failed run is safe to re-run.
# Full DAG, single day
daeda pipeline train dssm_ranking --target-date 2026-05-15
# Source-side operators only, then enrichment operators only
daeda pipeline train dssm_ranking --skeleton-only --target-date 2026-05-15
daeda pipeline train dssm_ranking --enrich-only --target-date 2026-05-15
# Override config + output locations, size the Ray envelope
daeda pipeline train dssm_ranking \
--runtime-config-path config/training/runtime.yaml \
--aggregation-config-path config/training/aggregations.yaml \
--output-root /path/to/output \
--ray-num-cpus 16Day partitioning & parallelism
The pipeline iterates every day from feed_start through feed_end (inclusive), processing each as an independent unit and writing one dt=YYYY-MM-DD/part-N-0.parquet partition per day. --target-date narrows a run to a single day; the month range and chunk sizes are config-driven in config/training/runtime.yaml.
--day-workers N (or day_workers in runtime.yaml) processes N target days concurrently. The default is 1; raise it only when the host has enough CPU, memory, and DuckDB spill bandwidth for N independent daily pipelines. The source-side operators are memory-bounded per day — each major operator is materialized as its own DuckDB COPY stage so memory is released between the spine, rolling, lookup, and assembly work.
The Ray envelope for pythonic operators
The RayUdfTransform enrichment operators run under the 16c / 64 GiBRayEnvelope. A singleton local Ray cluster is sized to that budget; the actor pool auto-sizes to ~80% of the worker-CPU budget so Ray Data read/write tasks aren't starved.
| Resource | Default | Notes |
|---|---|---|
Ray logical CPUs (--ray-num-cpus) | 16 | Total CPUs exposed to the local Ray node. |
| Ray memory budget | 64 GiB | Total, including object store + system reservation. |
| Object store | 20 GiB | Ray object store size. |
| Ray system reservation | 1 CPU / 4 GiB | Reserved for Ray system processes. |
| Worker CPU budget | ray-num-cpus − 1 | Actor pool auto-sizes to ~80% of this. |
Key behaviors:
--preload-day-embeddings(default on) — the day's needed FP16 embedding subset is loaded once on the driver and shared with actors via the object store. A single day touches only ~470K distinct artworks, so this stays under 1 GiB.- Chronological sharded output — output is sharded parquet under
dt=YYYY-MM-DD/with Ray Datapreserve_orderon, so each shard is a contiguous chronologicalevent_timestamprange (no serial single-file merge). - Day-atomic & resumable — days are processed sequentially for memory isolation; an already-enriched day is skipped, so a failed run resumes cleanly.
Read-only cgroup pods
Ray's enable_resource_isolation needs a writable cgroup v2. On read-only-cgroup devpods it is off by default — enforce the envelope with the external memory governor (bench/memcap.py) instead.
The Lance embedding pool is cumulative (current pinned: data/store/v6/artwork_embedding, ~42.5M rows / 185 GB), and image_embedding lives only on the enrichment side — it is never read by the source-side operators.
Config files
A run is driven by three layered config files (Pydantic Settings with YAML sources):
| File | Drives |
|---|---|
config/base.yaml | Feature dimensions, feature lists, S3 paths. |
config/training/runtime.yaml | Chunk sizes, feed_start / feed_end, feature_refs, output_columns, day_workers, compute.engine (the rolling_engine for RollingAggregate). |
config/training/aggregations.yaml | aggregation_specs, enrichment_specs, embedding_stores. |
feature_refs selects which views the source-side Scan/Join operators load (all columns in each view's features are loaded, not just the referenced one). output_columns entries given as {name, default} dicts are filled with the specified literal (e.g. NULL) when absent — so the schema is consistent even for a --skeleton-only run that hasn't appended embeddings yet.
Next
- Operators & Optimization — the operator vocabulary and the end-to-end DAG.
- Overview — the one-pipeline model, compile, output layout.
- Operator Pipeline — compiler + executor + the
RayEnvelope.