Date: 2026-04-27
Author: Trevor Hart
Status: Draft
Supersedes: family-perspective-prd-v2.md (2026-04-27), family-perspective-prd.md (v1, 2026-04-24)
Audience: CDI leadership, Subscriber Analytics Engineering, CDW team, Commerce
Foundation team, analytics stakeholders
Key changes from v2: Expanded scope from the base subscriber dimension to the full unified pipeline family (state transitions, growth movements, growth accounting, LMA, aggregate balance, metrics). Added assessment of the NRT timing mismatch for supplementary fields. Added state machine recommendation for movement classification. Resolved remaining question about ad_state/tv_state derivability.
Unchanged from v2. Disney streaming analytics operates across four services (Disney+, Hulu, ESPN+, Star+) but has no single table where an analyst can answer the question: "What does this subscriber look like across all their streaming services?"
| Perspective | Table | Grain | What it shows | What it doesn't show |
|---|---|---|---|---|
| Service Perspective | SERVICE_SUBSCRIBER_DIM_VW (83 consumers) |
subscriber x service x date | One row per subscriber per streaming service per day. Full attribute detail per service. | No cross-service view. Must self-join across 4 service rows. |
| Unified Perspective | UNIFIED_SUBSCRIBER_DIM_VW (46 consumers) |
subscriber x date (deduplicated) | One row per subscriber per day, cross-service. Picks a "primary" subscription. | Only includes subscribers with a Disney+ subscription. Primary selection loses non-primary detail. |
| Conformed Composite | (wide engagement table) | subscriber x date | Combines subscriber state with engagement metrics. | Too wide for subscription-focused analysis. Mixes concerns. |
| Unified v2 (in QA) | fact_unified_v2_subscribers_vw (48 cols) |
subscriber x date | All subscribers across services. Family-level flags. | Built on dim_subscribers_scd via 4-way self-join. 10 UAT bugs, 4 unresolved after 9 months. "Multiple Values" pattern. 4-day backfill. |
Three specific audiences are blocked. See v2 Section 1 for the full question mapping.
- ESPN Flagship launch — every hardcoded h/e/d/s pattern must be extended.
- Commerce-foundation-pro agreement model is production-ready — hourly refresh, all subscribers including unactivated.
- Unified_v2 UAT stuck for 9 months — root cause is the self-join architecture.
Unchanged from v2. Three independent pipelines from DATOS:
- commerce-foundation-pro (intended source of truth, hourly, all subscribers)
- subscriber-fuze (legacy parallel, daily, identity tables cover activated only)
- dim_subscribers_scd (current foundation, daily, all subscribers, full history)
See v2 Section 2 for full details including the identity_id coverage gap (70M+
unactivated, 15M actively paying) and the NULL is_disney flag finding.
The "Family Perspective" is not just a base subscriber dimension. The current unified pipeline is a 7-layer stack, and the Family Perspective must eventually provide equivalents for the full stack:
Layer 1: dim_unified_subscribers_scd (5.58B rows)
Foundation: subscriber x date SCD with previous_* columns
|
Layer 2: fact_unified_subscriber_state_transitions
Grain: subscriber x transition_date
Detection: SCD row boundaries (each new SCD row = a transition)
Output: 53 change-detection flags (current vs previous for each attribute)
|
Layer 3: fact_unified_subscriber_growth_movements (5.77B rows)
Grain: subscriber x transition_date
Classification: JOIN to dim_subscriber_transition_movement_mapping
Output: 46 binary movement flags + LMA exclusion flags
Movement categories: 60+ types across signups, conversions, cancels,
pauses, product switches, service changes, at-risk transitions
|
Layer 4: fact_unified_subscriber_dimension_movements (3.97B rows)
Grain: subscriber x date
Tracks: every dimensional attribute change with upgrade/downgrade flags
|
Layer 5: fact_unified_subscriber_growth_accounting (92.9M rows)
Grain: date x dimensions (NO subscriber_sk_id)
Aggregation: movements + LMA adjustments
|
Layer 6: fact_unified_subscriber_aggregate_balance (2.80B rows)
Grain: date x dimensions
Calculation: starting balance + movements = ending balance
|
Layer 7: fact_unified_subscriber_metrics (2.66B rows, 100+ KPIs)
Grain: date x dimensions
Output: final aggregated metrics for BI/Looker/executive reporting
07:00 UTC dim_unified_subscribers_daily_load (Layer 1)
08:00 UTC fact_unified_subscriber_facts_load (Layers 2-4)
parallel: dimension_movements_agg_load
parallel: growth_accounting_load (Layer 5)
parallel: aggregate_balance_load (Layer 6)
08:00 UTC fact_unified_subscriber_metrics_load (Layer 7)
09:00 UTC fact_unified_v2_subscribers_daily_load (the v2 base table)
Each unified layer has a service-level counterpart:
| Unified | Service-Level |
|---|---|
| dim_unified_subscribers_scd | dim_subscribers_scd (per streaming_service) |
| fact_unified_subscriber_growth_movements | fact_service_subscriber_growth_movements |
| fact_unified_subscriber_growth_movements_lma | fact_service_subscriber_growth_movements_lma |
| fact_unified_subscriber_growth_accounting | fact_service_subscriber_growth_accounting |
| fact_unified_subscriber_dimension_movements | fact_service_subscriber_dimension_movements |
| fact_unified_subscriber_aggregate_balance | fact_service_subscriber_aggregate_balance |
| fact_unified_subscriber_metrics | fact_service_subscriber_metrics |
R1-R7 unchanged from v2. Summary:
- R1: Cross-service subscriber state (paid/entitled/trial across all services)
- R2: Cross-service product attributes (preserve actual values, not "Multiple Values")
- R3: Service-level drill-down (filter, not helper tables)
- R4: Bundle and multi-subscription classification
- R5: All subscribers included (including 70M+ unactivated 3P partners)
- R6: Identity linkage (identity_id where available, agreement_id as surrogate)
- R7: Service extensibility (new services = new rows, not new columns)
"When this subscriber's state changed, what happened?"
- R8: Family-level state transitions — detect day-over-day changes in the subscriber's cross-service state. When a subscriber goes from "paid on Disney+ and Hulu" to "paid on Disney+ only", that's a family-level transition.
- R9: Growth movement classification — classify each transition with a movement category (signup, cancel, pause, upgrade, downgrade, service change, etc.) using mapping-table-driven logic, not hardcoded CASE WHEN blocks.
- R10: LMA (Last Mile Adjustments) — apply partnership-specific exclusion rules for financial reporting accuracy.
"What are the aggregate KPIs for the family perspective?"
- R11: Growth accounting — aggregated movement counts by date and dimensional attributes, with LMA adjustments applied.
- R12: Aggregate balance — running subscriber counts (starting + adds - stops = ending) at the family level.
- R13: Metric store — 100+ pre-aggregated KPIs for Looker and executive dashboards.
NR1-NR5 unchanged from v2. Added:
- Movement categories must be defined in mapping tables, not SQL CASE WHEN blocks.
- Adding a new movement category (e.g., for a new product type) should require only a DML INSERT to the mapping table, not a code change.
- Mapping tables must be version-controlled as seed DML in the repo, not managed as production-only data.
- Valid subscriber states and valid transitions between them should be defined as a formal state machine stored in a reference table.
- Any observed transition not in the state machine should be flagged as "unexpected" rather than silently falling into a catch-all "Other" bucket.
- The state machine table serves as both runtime logic and documentation.
Updated from v2 to address the NRT timing mismatch for supplementary fields.
DATOS raw events
|
commerce-foundation-pro (Databricks, hourly)
|
+-- agreement_product_dim (agreement x product SCD, Delta + Iceberg)
+-- agreement_dim (agreement SCD, carries identity_id)
|
| Iceberg auto-refresh to Snowflake
v
commerce-core-ds-pro (Snowflake)
|
+-- agreement_product_dim Iceberg table (auto-refreshed)
+-- agreement_dim Iceberg table (auto-refreshed)
|
+-- family_subscriber_summary_vw (NEW VIEW)
| Grain: subscriber_sk_id x business_date
| Source: agreement tables (primary) + dim_subscribers_scd (supplementary)
|
+-- Looker explores join to agreement_product_dim for service drill-down
Four fields (subscriber_ad_state, subscriber_tv_state, subscriber_home_country,
is_activated) are not on the agreement tables and currently require a join to
dim_subscribers_scd. However, the agreement tables refresh hourly while
dim_subscribers_scd is daily batch — creating a window where new subscribers appear
in the agreement data but have no SCD match.
Resolution: hybrid derivation with COALESCE
For each field, derive an NRT approximation from the agreement data and fall back to the SCD value when available:
| Field | NRT Approximation (from agreement data) | Batch Exact (from SCD) | How |
|---|---|---|---|
subscriber_ad_state |
Derive from offer.labels array (has_ads label) |
dim_subscribers_scd.subscriber_ad_state |
Requires upstream change: add offer_labels extraction to commerce-foundation-pro's agreement_product_read_optimized_stage. The ESPN Unlimited pipeline in the same repo already extracts offer.labels — same pattern. ~99% accuracy for post-unification subscriptions. |
subscriber_tv_state |
Derive from offer.labels array (live_tv, flagship labels) |
dim_subscribers_scd.subscriber_tv_state |
Same upstream change. ~90% accuracy — gap is legacy Hulu live TV subs pre-ACP, which is a shrinking population within the 2025+ MVP scope. |
subscriber_home_country |
Use storefront_country from agreement_product_dim |
dim_subscribers_scd.subscriber_home_country |
Already available on agreement tables. ~85-95% match vs 6-source priority resolution. |
is_activated |
identity_id IS NOT NULL on agreement_dim |
dim_subscribers_scd.is_activated |
Reasonable proxy for activation. |
-- Pattern for each field:
COALESCE(
scd.subscriber_ad_state, -- exact, available after daily batch
derived_ad_state_from_offer_labels -- approximate, available within 1 hour
) AS subscriber_ad_stateNew subscribers get the approximation immediately; it upgrades to the exact value within 24 hours. Analysts see complete data at all times.
Upstream prerequisite: Commerce-foundation-pro team adds offer.labels extraction
to the agreement_product_read_optimized_stage application config. This is a
single-line addition to the column map — the field exists in the DATOS payload and the
extraction pattern is already used in the ESPN Unlimited pipeline in the same repo.
agreement_product_dim (SCD with effective dates)
|
+-- Each new SCD row = a product-level state transition
| Previous state: prior SCD row's product_state
| Current state: current SCD row's product_state
|
+-- dim_product_state_movement_mapping (NEW MAPPING TABLE)
| Key: (prev_product_state, curr_product_state)
| Output: movement_category, movement_description
| ~70 rows covering all valid state pairs
| Version-controlled as seed DML in the repo
|
+-- family_state_machine (NEW REFERENCE TABLE)
| Defines: valid family-level states, valid transitions, labels
| States: combinations of per-product states across services
| Transitions: edges in the state graph
| Unexpected transitions flagged rather than silently bucketed
|
v
family_subscriber_transition_fact_vw (NEW VIEW)
Grain: subscriber_sk_id x transition_date
Movement classification via mapping table JOIN (not CASE WHEN)
|
v
family_subscriber_growth_movement_fact_vw (NEW VIEW)
Adds: LMA exclusion flags, movement direction
The current unified pipeline has 214 CASE WHEN branches spread across its views:
| Component | Branches | Purpose |
|---|---|---|
| State transitions (service) | 52 | Mechanical change detection |
| State transitions (unified) | 53 | Same + cross-service |
| Identity entitlement movements | 64 | State-pair to movement (hardcoded, has catch-all "Other") |
| DTCI terminal state mapping | 23 | State collapsing |
| LMA exclusions | ~10 | Partner-specific rules |
| LMA DML | 12 | Partnership minimum guarantees |
The identity entitlement movements view was built iteratively (7 commits, multiple
"fix dup issue" messages) — new state pairs added as they appeared in production data.
The movement mapping table (dim_subscriber_transition_movement_mapping) is the one
well-designed component: 46 binary flags keyed by (prev_state, curr_state, last_entitled_over_one_year_ago, has_most_recent_entitled_date_flag). However, it is
production-managed and not version-controlled — changes require DML against the
production table.
The recommended approach for Family Perspective:
-
Mapping table for movement classification —
dim_product_state_movement_mappingwith ~70 rows. Changes = INSERT/UPDATE on the table, not code changes. Version- controlled as seed DML. -
Formal state machine for family states — defines all valid (prev_state, curr_state) pairs with transition labels. Any observed transition not in the table is flagged as "unexpected" rather than falling into "Other". The state machine table is both runtime logic and documentation.
-
Event-sourced input — the agreement model is already SCD-based with effective dates. Each new SCD row IS a transition. No day-over-day snapshot diffing needed.
This pattern is established at scale: Stripe uses formal state machines for billing lifecycle; dbt Labs' subscriber metrics methodology recommends state-graph-driven transition classification.
These layers are pure aggregation — they GROUP BY date and dimensional attributes, computing sums of movement counts and running balances. The architecture is identical to the current unified pipeline; only the input changes (family movement facts instead of unified movement facts).
The LMA (Last Mile Adjustment) logic is the most complex component: 3,554 lines of partnership-specific exclusion rules. This must be reviewed separately to determine whether the same rules apply at the family level or need adaptation. LMA rules are currently hardcoded in SQL and maintained via a cloned production table.
| Layer | Currently reads from | Commerce-foundation-pro provides | Gap |
|---|---|---|---|
| L1: Base subscriber dim | dim_subscribers_scd (4-way self-join) | agreement_product_dim (product rows, GROUP BY) | Covered. Supplementary fields via COALESCE pattern. |
| L2: State transitions | SCD row boundaries | agreement SCD has effective dates | Covered. SCD rows define transitions natively. |
| L3: Growth movements | mapping table + transitions | Events have state_src/state_scd | Covered with new mapping table. 64-branch CASE WHEN replaced by ~70-row lookup. |
| L4: Dimension movements | SCD attribute diffs | agreement SCD has product/offer attributes | Partially covered. 40+ dimensional attributes tracked currently; agreement model has rich product metadata but different column names. Needs column mapping. |
| L5: Growth accounting | movements + LMA | Movements (from L3) + LMA rules | Gap: LMA rules. 3,554 lines of partnership-specific logic. Needs review for family-level applicability. |
| L6: Aggregate balance | growth accounting rollup | Pure aggregation | Covered once L5 is built. |
| L7: Metrics | balance + facts | Pure aggregation | Covered once L6 is built. |
LMA rules are partnership-specific exclusions applied to growth movements before they feed into growth accounting. Examples:
- 3PP aggregate partnership free trial caps
- Disney partnership minimum guarantee billing platform rules
- Date-gated partner-specific exclusions (Movistar, Vodafone, Docomo, Astro, Hotstar)
These rules are maintained as:
- A cloned production table (
dim_subscriber_transition_movement_mapping) - ~10 inline CASE WHEN branches in the unified transitions view
- 3,554 lines of dedicated LMA DML
For the Family Perspective, LMA rules need to be:
- Reviewed for family-level applicability (some may only apply at service level)
- Migrated to version-controlled mapping tables
- Applied to the family movement facts
This is the single largest complexity item beyond the base table and should be scoped as a separate workstream.
| Phase | Scope | Dependencies | Effort |
|---|---|---|---|
| Phase 1 | Base subscriber dimension (family_subscriber_summary_vw) |
Iceberg tables for agreement_product_dim + agreement_dim; offer_labels extraction; Alan repoints views | Medium (6-9 days for view, 8-12 for Alan's repoint) |
| Phase 2a | State transitions + growth movements | Phase 1; product state mapping table; state machine definition | Medium (mapping table design + view SQL) |
| Phase 2b | Dimension movements | Phase 1; column mapping from agreement model to current 40+ attributes | Medium |
| Phase 3a | Growth accounting (without LMA) | Phase 2a; aggregation views | Low (pure GROUP BY aggregation) |
| Phase 3b | LMA rules migration | Phase 3a; LMA rule review with partnership team | High (3,554 lines of partnership logic) |
| Phase 4 | Aggregate balance + metrics | Phase 3a/3b; running balance calculation | Medium |
| Phase 5 | Publish to subscriber360 + Looker integration | Phase 1 minimum; Phase 4 for full stack | Medium |
The base subscriber dimension covers the immediate analytics need and unblocks all three stakeholder audiences identified in the problem statement. Phases 2-5 build the full growth accounting and metrics stack but are not required for the initial launch.
| # | Question | Resolution |
|---|---|---|
| 1 | Can ad_state be derived from the agreement model? | Yes, with upstream change. offer.labels in DATOS contains has_ads. Commerce-foundation-pro needs to add label extraction to the agreement pipeline (single-line config change; pattern already exists in ESPN Unlimited pipeline). ~99% accuracy for post-unification subscriptions. |
| 2 | Can tv_state be derived from the agreement model? | Mostly yes. offer.labels contains live_tv and flagship. Same upstream change. ~90% accuracy — gap is legacy Hulu live TV pre-ACP (shrinking, within 2025+ scope). |
| 3 | NRT timing mismatch for supplementary fields? | Solved with COALESCE pattern. NRT approximations from agreement data; exact values fill in from dim_subscribers_scd within 24 hours. |
| 4 | What's the scope beyond the base table? | 7-layer pipeline. State transitions, growth movements, growth accounting (with LMA), aggregate balance, metrics. Phases 2-5 in the delivery plan. |
| 5 | Is there a plan to carry ad_state/tv_state/home_country/activation natively? | Architectural blockers for native computation. ad_state and tv_state require multi-subscription aggregation + SKU metadata + Hulu MVP data. home_country requires 6-source priority resolution. is_activated requires account-level activation dates from a separate DATOS domain. The COALESCE pattern is the right long-term approach — NRT approximation + batch exact. |
| # | Question | Who | Impact |
|---|---|---|---|
| 1 | Will commerce-foundation-pro add offer.labels extraction to the agreement pipeline? |
Commerce Foundation team | Prerequisite for NRT ad_state/tv_state derivation |
| 2 | How should the partner field be derived from agreement_source_partner in agreement_dim? |
Commerce Foundation team / CDW | Determines service letter derivation |
| 3 | Will Snowflake Iceberg tables be created for agreement_product_dim and agreement_dim? |
Commerce Foundation team | Prerequisite for Snowflake consumption |
| 4 | Do LMA rules apply at the family level or only service level? | Partnership team / CDW | Determines Phase 3b scope |
| 5 | What Looker dashboards would use the family perspective? | Subscriber AE (Kate C.), Bundle Analytics (Jeff L.) | Determines view vs materialized table |
| 6 | Are family-level growth movements in scope for MVP or Phase 2? | CDI leadership | Scopes initial deliverable |
| Document | Description |
|---|---|
| Family Perspective - Design Requirements | Field specs, scenarios, sample queries (2025) |
| Unified v2 Validation Tracker | UAT issues, project plan |
| Unified_v2 Test Results | QE test cases |
| Family Perspective Slides | Design overview presentation |
| subscriber360 Domain Narrative | MMM-151 workspace documentation (13 clusters, 514 views) |
| identity_id vs subscriber_sk_id Findings | Data investigation: 70M+ unactivated subscribers, subscription_id 1:1 mapping |
| Version | Date | Key changes |
|---|---|---|
| v1 | 2026-04-24 | Initial PRD. Foundation: subscriber-fuze identity_agreement tables. Scope: base table only. |
| v2 | 2026-04-27 | Corrected foundation to commerce-foundation-pro. Added identity_id coverage gap analysis. Scoped to 2025+ for MVP. |
| v3 | 2026-04-27 | Expanded to full 7-layer pipeline. Added state machine recommendation. Resolved NRT timing mismatch. Added phased delivery plan. |
| Artifact | Purpose | Status |
|---|---|---|
family-perspective-rebase-proposal.md |
v1 technical proposal (subscriber-fuze based) | Superseded by this PRD; design options still valid for reference |
family-perspective-rebase-visual.html |
v1 visual architecture diagrams | Needs update for commerce-foundation-pro + full pipeline architecture |
family-perspective-sql-implementation.md |
v1 SQL implementation | Needs update: source changes, mapping table pattern, COALESCE pattern |
| Component | Branches | Purpose |
|---|---|---|
| State transitions (service-level) | 52 | Mechanical change detection (prev vs current) |
| State transitions (unified) | 53 | Same + cross-service states |
| Identity entitlement movements | 64 | State-pair to movement mapping (hardcoded, catch-all "Other") |
| DTCI terminal state mapping | 23 | Terminal state collapsing |
| Unified transitions LMA exclusion | ~10 | Partner-specific exclusions |
| LMA DML | 12 | Partnership minimum guarantees |
| Total | ~214 |
| Table | Key | Output | Managed |
|---|---|---|---|
dim_subscriber_transition_movement_mapping |
(prev_state, curr_state, last_entitled_1y, has_mre_flag) | 46 binary movement flags | Production-cloned (not version-controlled) |
dim_subscriber_transition_breakout_mapping |
(is_country_change, is_product_switch, is_service_change, is_paid_to_unpaid) | movement_direction (in/out) | Production-cloned |
dim_subscription_subscriber_state_mapping |
(partner, subscription_state) | subscriber_state, stacking_priority, is_entitled, paid_state | Production-cloned |
| Component | Detail |
|---|---|
| Streaming stage | agreement_product_read_optimized_stage — 30-second trigger from S3 DATOS |
| SCD build | agreement_offer_phases_scd -> agreement_dim -> agreement_product_dim |
| Schedule | Hourly (0 * * * *) via commerce_foundation_agreement_product_dim_dag |
| Storage | Databricks Delta with Iceberg universal format enabled |
| Existing Snowflake consumer | cdp-pro: audience_unified_nrt_agreement_product_dim_iceberg |
| Time (UTC) | DAG | Layer |
|---|---|---|
| 07:00 | commerce_core_dim_unified_subscribers_daily_load |
L1: Base SCD |
| 08:00 | commerce_core_fact_unified_subscriber_facts_load |
L2-L4: Transitions, movements, dimension movements |
| 08:00 | commerce_core_fact_unified_subscriber_growth_accounting_load |
L5: Growth accounting |
| 08:00 | commerce_core_fact_unified_subscriber_aggregate_balance_load |
L6: Aggregate balance |
| 08:00 | commerce_core_fact_unified_subscriber_metrics_load |
L7: Metrics |
| 09:00 | commerce_core_fact_unified_v2_subscribers_daily_load |
v2 base table (the one in UAT) |
| Table | Row Count | Storage | Layer |
|---|---|---|---|
dim_unified_subscribers_scd |
5.58B | — | L1 |
fact_unified_subscriber_growth_movements |
5.77B | — | L3 |
fact_unified_subscriber_dimension_movements |
3.97B | — | L4 |
fact_unified_subscriber_aggregate_balance |
2.80B | — | L6 |
fact_unified_subscriber_metrics |
2.66B | — | L7 |
fact_unified_subscriber_growth_accounting |
92.9M | — | L5 |
fact_unified_v2_subscribers |
1.77B | — | v2 base |
fact_unified_v2_service_helper |
3.67B | — | v2 helper |
SUBSCRIPTION_HIST (native) |
416B | 241 TB | Legacy |
SUBSCRIPTION_HIST_EXT (Delta) |
7.4T | 736 TB | Legacy |
DIM_SUBSCRIBERS_SCD |
13.6B | 4.0 TB | Current foundation |