Hi all,
A lot of data vault literature talks about taking the time to properly identify business keys.
But not too many examples are given, apart from VIN which is mentioned over and over again.
I get that government identifiers are useful and often represent a good business key, however most business concepts / hubs will not have a nice government identifier available and consistently populated.
Dan states the importance of business keys in his source system vault article: Source System #datavault and Business Keys - DataVaultAlliance . He says donât use surrogate keys as they are generated by the system. He then of course mentions VIN as an example. But then in a real life scenario he mentions he used contract number and tail number - but these are one off events, contracts arenât like customers or staff members, so arenât that helpful as an example. And how practical is it for an organisation to accurately capture something like contract_number in every source system that refers to it? Wouldnât it often be manually entered and hence not 100% reliable? (He also mentions that he had 125 source systems - does he mean systems or objects / tables - resulting in 75 data vault objects. I donât understand how DV can result in LESS objects than the source tables, unless there is significant preprocessing to denormalise the data)
Another example of a business key is often a customer name - but isnât that dependent on how it is entered into a system? Isnât it the case that some systems split out names into firstname and lastname and others donât? Wouldnât human entry of names into systems sometimes be different (e.g. John vs. Jon vs. Johnny), even if it is the same customer?
If the answer is âget the source to populate it properlyâ, what if the source is contact centre staff? How would you ensure they populate it properly and type in John instead of Jon when it is all manual entry?
And in terms of surrogates being lost during a system migration, wouldnât a system migration / replacement store the legacy systemâs ID as a field in the new system, allowing a same as link to be applied?
Most info on keys say to use a surrogate key if nothing else exists / as a last resort. I would be interested to know how often this last resort is used in practice?
And finally, would be great if people could give examples of good business keys for:
- Sale
- Address
- Asset
- Store
- Transaction
- Template
- Model
- Forecast
(assuming these are all âfirst classâ concepts and managed by an organisation in their own right). Or will the answer inevitably be - âdepends on the organisationâ? If this is the case, how does a business key stand the test of time? Yes a modeller might have done multiple iterations with the âbusinessâ (btw who / what is âthe businessâ? frontline sales? customer support? marketing analytics?), but 3 years down the track would the latest modeller and new staff members of the business agree with the original decision? (note: answer could be a massive yes, generally they do, just want to understand the thought process behind business key identification)
Thanks for your help