Computation in DataVault

Hello,
I have stored my data in DataVault and need to do complex computations, which involve the entities from multiple schemas of the foundation layer. The results need to be persisted in the DataVault. What is the right place for it? Should it be stored in the input or foundation layer? I guess foundation layer makes no sense, because the result does not have the structure of hub, satelite etc.?
I did not find any literature on that.
Best regards,
Jules

Hi Jules,

Generally speaking, the outcomes of computations are considered Business Vault objects. I’m not sure what you mean with foundation layer (raw vault?) and why you’ve separated it into multiple schemata. However, generally speaking again, derived data (outcome of calculations) are based on raw-vault data (and/or already created business-vault-data).
Does that clarify things?
Regards,
Klaas

1 Like

Hi Jules, perhaps some nomenclature can help resolve some issues.

Source data: (Multiple schema data, disparate data)

Stage: (load and stage tables) for landing data and implementing hard business rules (HBRs) or data type alignment.

Raw Data Vault: (data loaded from stage) contains hubs, links, and satellites. No calculations applied at this layer. Only HBRs.

Business Vault: (PIT and bridge tables, well-defined soft business rules (SBRs)). Optional

Information layer: calculated data, SBRs, sources raw data vault, may source business vault.

Your computations persist in the information layer, which can be physicalized or virtualized.

Hope this provides some insight and clarity.

Cheers,

Z

  • RV + BV = DV
  • RV = Hubs, Links and Satellites
  • BV = Links and Satellites

All DV is the captured output of calculations, business rules etc.

  • RV stores the outcome of business rules from source systems (the software used to automate business processes)
  • BV stores the output of soft rules, these use RV tables as a source (and sometimes other BV artefacts). This is why BV never has a hub table, it is merely extending RV with derived rules in your data warehouse/lakehouse (or whatever).
  • PITs and Bridges are NOT BV artefacts because they are ephemeral, they are structures used to simplify and speed up extraction of DV content for your Information Mart / Presentation layer. PITs and Bridges can be built up and taken down reference the keys and dates from RV and BV

How to apply calculations and store them as BV artefacts is easy

  • They can be in any language you choose, python, sql, rust, whatever – RV sources are the same

  • You must have an applied date, that way your BV artefacts are tied to the upstream artefacts they are based on. Joining RV and BV around a hub or a link is easy then if they have a date that these rules are based on. It also means, your DV is always bi-temporal — you will have an applied date (extract date) and a load date (version date)

  • To load BV, you simply stage the output of running soft rules on RV and use the same loading patterns for satellites and links and therefore through naming standards have sparsely extended the RV … i.e. RV+BV=DV. Building ephemeral structures on top of these as PITs and Bridges takes away from users the needlessly apply creativity for querying your DV, you have solved the complexity for them.