Ingesting JSON in to Raw Vault

mlord1979 · 30 January 2026 16:36

Hi

Are there any patterns for ingesting JSON files in to the raw vault? We have an upcoming use case where we will need to ingest medical questionnaires. We have developed our own framework for automating ingestion in to DV but so far all of our data sources have been structured data.

Thanks

emanueol · 30 January 2026 18:02

I would say critical hubs pk identification as per business needs, and core attributes of hubs and links:

ingest json into VARIANT similar type as-is, due to potentially complexity better just flatten the real few core json fields, then as business consumers start asking more fields, its easy BV views flattening json fields from the raw SAT.
Eventually that will continue (new views for demanded new fields), flattening json is fast in Snowflake and more Important views are easy to update.. eventually everyone will gain good grasp of what great hubs, SATs and links would a materialization of the json data could be.

So: ingest json in single column VARIANT–> expand with views –> wait till consumers stabilize on their needs –> and build views on BV or Presentation data products layer –> wait let interaction happen with views till materialize questionnaires into their own hubs, sats, links and etc..

For example its very common and good practice to keep in 1 single variant column metadata about all dv actions (source system, source table, Ingestion process , app_id, bu_id, etc etc, governance cols, etc)

Frankie · 31 January 2026 22:47

Hi Martin

Semi-structured data is always going to be the bane of any architecture really. I think DV2.1 does more thinking in this space but I’ll give my take on the limited DV2.0 perspective.

As Emmanuel mentioned all modern data platforms have options for storing semi-structured data directly in a column. So you can treat it as a payload column. If there’s business keys in there you can pull them out in staging and handle them separately. This is my least prefered method mostly because these platforms are still trying to figure out how to handle these data types efficiently. Storing semi-structured data in an architected warehouse of any stripes feels like an anti-pattern.

The alternative that I do like but isn’t always applicable is more of an organisational approach. Establish with your source what the schema of the incoming JSON is,. And enforce this as a data-contract, ensure that any data not matching this schema is rejected and reported as a source error.

If it matches the schema then normalise and flatten the data on ingestion and treat the structured data like structured data.

If you can’t establish the shape of the data you’re being provided then trying to model it in anyway that favours optimization or efficiency is basically a non-starter.

Frankie

patrickcuba · 29 May 2026 08:39

Schematise as early as possible. Why?
They will be cheaper to query, they will be cheaper to store. Last resort, load them to dv_payload in a raw vault satellite table.

Topic		Replies	Views
Steaming data into data vault 2.0 Data Vault 2.0	1	261	5 February 2023
Loading Pre-aggregated Data into Raw Vault. Data Vault 2.0 raw	3	495	5 February 2023
Multi-active data as Document SATs using dbtvault AutomateDV automatedv	5	400	23 February 2022
Can I restructure source data before loading it into the raw vault? Data Vault 2.0	18	1911	3 April 2022
For JSON Source if there is a LINK why need HUB and SAT ? Data Vault 2.0 business-key , link , hub	3	411	6 February 2023

Ingesting JSON in to Raw Vault

Related topics