Effectivity Satellite from Intersection Table with Payload

Hi,

I recently started to learn AutomateDV. It works quite well, but I’m struggeling a little with specific link-satellites.

Consider a simple m:m relation coming from an intersection table:

create table "source".cse_customer_employee
(
	cse_rowid serial primary key,
	cse_created_at timestamp,
	usr_created_by int,
	cse_modified_at timestamp,
	usr_modified_by int,
	cse_deleted_at timestamp,
	usr_deleted_by int,
	-- payload
	cst_customer int,
	emp_employee int,
	cod_role int,
	cse_effective_from date,
	cse_effective_to date, 
	cse_primary boolean
);

In DataVault, I would model this as h_customer and h_employee. These hubs are connected via l_customer_employee with its effectivity satellite s_customer_entitiy.

  • there’s a technical key (cse_rowid) which I’m not going to use, since I think the “datavault-way” of doing it is to integrate using the business keys (see below)
  • business keys (PK) are hk_customer, hk_employee, cod_role and cse_effective_from
  • i want to include/track the payload attributes as well

First Question: Is it okay to put the payload (without the FKs) into s_customer_employee or should it go to a “normal” satellite instead? As far I know, a satellite hanging off of a link is considered an effectivity satellite, yet it’s still a satellite :wink:

In AutomateDV I picked the eff_sat macro. Basically, it works as expcected, but it won’t detect changes in payload (non-key) attributes.

{{ config(materialized='incremental')  }}

{%- set source_model = "v_cse_customer_employee" -%}
{%- set src_pk = "HK_CUSTOMER_EMPLOYEE" -%}
{%- set src_dfk = "HK_CUSTOMER" -%}
{%- set src_sfk = ["HK_EMPLOYEE", "COD_ROLE", "CSE_EFFECTIVE_FROM"] -%}
{%- set src_start_date = "START_DT" -%}
{%- set src_end_date = "END_DT" -%}
{%- set src_extra_columns = ["cse_rowid", "cse_created_at", "usr_created_by", "cse_modified_at", "usr_modified_by", "cse_deleted_at", "usr_deleted_by", "cse_effective_to", "cse_primary"] -%}

{%- set src_eff = "EFFECTIVE_FROM" -%}
{%- set src_ldts = "load_ts" -%}
{%- set src_source = "RECORD_SOURCE"  -%}

-- src_extra_columns added
{{ automate_dv.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk,
                       src_start_date=src_start_date, 
                       src_end_date=src_end_date,
                       src_extra_columns=src_extra_columns,
                       src_eff=src_eff, src_ldts=src_ldts, 
                       src_source=src_source,
                       source_model=source_model) }}

Second question: Should I use the “normal” satellite template instead or should I take an all different approach? (I don’t think a hub “customer_employee_assigment is a good idea)

Since I included all the necessary attributes into src_dfk and src_sfk, changes in the link’s combination are detected and loaded. However, it won’t detect changes in any other fields, especially cse_effective_to.

Usually, the source system would update cse_effective_from and cse_effective_to (eg. from 2099 to something else) and set cse_modified* - so the change is detected and loaded.

However it can also just update cse_effective_to (and modified) or any other payload attribute (such as the cse_primary flag) - these changes are not detected.

As to my background: I have theoretical understanding of the DV2 modelling, since I attended that “Boot Camp” with Michael Olschimke, but that’s like 10 years since. I got back to their book (the one with the racing car) every here and then to fresh-up my knowledge. Just this week, I received Patrick Cuba’s book (data vault guru) from Amazon. I haven’t built a real Data Vault yet.

Hi @darkchanter !

Thanks for laying out the example so clearly. Intersection tables that carry both effectivity dates and relationship attributes always expose the strange corners of DV modelling, so you’re in familiar territory. Bit of a late reply on my part, but this is a good modelling question and still very common, so here’s how I’d think about it. Hopefully it is still of use to you!

From what I can see, you have (and correct me if I’m wrong):

  • H_CUSTOMER

  • H_EMPLOYEE

  • L_CUSTOMER_EMPLOYEE representing “customer–employee–role at a point in time”.

  • A relationship that needs both effective from/to and some payload fields

You’ve used eff_sat, but payload-only changes are not producing new rows. So to hopefully answer both of your questions!

1. The modelling itself is fine

If the attributes describe the relationship, they belong on a satellite attached to the link. Things like role, primary flag, and effective window all belong at that grain. So nothing wrong at the logical level.

To be clear, in my opinion this is absolutely a Link Satellite and it doesn’t need to be a normal Hub hanging Satellite just because it has a payload.

The issue is how AutomateDV handles effectivity.

2. Why eff_sat ignores payload changes

The eff_sat only treats the following as change drivers:

  • The link keys

  • The effectivity start and end dates

Any extra columns passed as src_extra_columns are not part of the hashdiff. They are carried through but do not trigger history.

So if only a payload field changes, the hashdiff is unchanged and AutomateDV will not create a new row. This is intentional. The eff_sat is designed to manage time windows, not payload history.

From this I see two obvious directions:

Option A: keep eff_sat for dates only and split the payload

Use:

  • One satellite for effectivity, loaded with eff_sat

  • A second satellite for payload, loaded with the standard sat macro

The payload satellite includes all descriptive fields in its hashdiff, so any change triggers a new row.

This keeps responsibilities clear.

Option B: skip eff_sat and use a normal SAT

Load a single SAT where the hashdiff includes both the dates and the payload. Any change then creates a new record.

You lose the specialised effectivity behaviour from eff_sat, but history becomes more predictable because it behaves like a standard SAT.

Where my head is at on this

Most of the time, Option A is the simplest and most predictable:

  • Let eff_sat manage the validity window

  • Use a separate SAT for the relationship attributes

  • Keep both satellites at the same grain under the same link

This avoids fighting the macro and keeps lineage and history clean.

Hope this provides some direction!

Joe

1 Like