Flattening JSON in Snowflake: Complete Step-by-Step Guide

Flattening JSON enables converting nested and complex hierarchical data into a simple relational structure of rows and columns. This allows easier analysis, querying, and extraction of deeply embedded values using SQL. This guide walks through all steps required to flatten a JSON file in Snowflake, from initial setup to generating the final flattened table.


The JSON File Used for Flattening

The JSON file contains multiple records, each representing a person with attributes such as Name, Gender, Date of Birth, Address, Skills, Work, and Mobile. The structure includes nested objects such as Address and arrays such as Skills.

{
 "Name": "Anil Kumar",
 "Gender": "Male",
 "DOB": "1996-10-01",
 "Skills": [
   "Pyspark",
   "Databricks",
   "Snowflake"
 ],
 "Address": {
   "House Number": "123",
   "City": "Hyderabad",
   "State": "Telangana"
 },
 "Work": 96757,
 "Mobile": 12367498
},
{
 "Name": "Priya Sharma",
 "Gender": "Female",
 "DOB": "1992-07-01",
 "Skills": [
   "AIDP",
   "Pega"
 ],
 "Address": {
   "House Number": "45-B",
   "City": "Bangalore",
   "State": "Karnataka"
 },
 "Work": 9154578,
 "Mobile": 25167905
}

[Image Description: Visual representation of the JSON file showing two person records with nested Address objects and Skills arrays.]


Why to Flatten JSON File?

Flattening JSON provides several advantages:

  1. It makes nested and complex data easier to read and process, especially in tools like SQL or Excel.
  2. It helps convert hierarchical (tree-like) JSON into a simple table format.
  3. It becomes easier to analyze, filter, and query values stored deep inside the JSON.

Prerequisites:

  1. Connect Amazon S3 with Snowflake. [https://techtown.in/snowflake/connect-aws-s3-to-snowflake/]
  2. Load the JSON file into an Amazon S3 bucket. [https://techtown.in/snowflake/load-data-from-aws-s3-to-snowflake/]

Step-by-Step Explanation

Below are all steps required to flatten JSON data in Snowflake, including schema creation, staging, loading, and flattening.


Step-01: Creating Required Schemas

CREATE OR REPLACE SCHEMA BOOTCAMP.external_stages;
CREATE OR REPLACE SCHEMA BOOTCAMP.STAGE_TBLS;
CREATE OR REPLACE SCHEMA BOOTCAMP.INTG_TBLS;

Explanation

  • external_stages – used for external staging areas (e.g., S3).
  • STAGE_TBLS – used for raw staging tables.
  • INTG_TBLS – used for integrated or final processed tables.

creation of three schemas designed for external stages, raw tables, and final integrated tables.


Step-02: Creating JSON File Format

CREATE OR REPLACE FILE FORMAT BOOTCAMP.file_formats.FILE_FORMAT_JSON
TYPE = JSON
STRIP_OUTER_ARRAY = TRUE;

Explanation

  • When JSON files contain arrays, STRIP_OUTER_ARRAY = TRUE ensures each element becomes an individual row.
  • This is essential when flattening arrays such as Skills.

panel describing file format details and purpose of STRIP_OUTER_ARRAY


Step-03: Creating the Stage Object

CREATE OR REPLACE STAGE BOOTCAMP.external_stages.STAGE_JSON
STORAGE_INTEGRATION = AWS_INT
URL = 's3://awstosnowflake1995/landing/';

Explanation

Purpose:
Creates or replaces a stage in Snowflake called STAGE_JSON.

Location:
Stored under the schema BOOTCAMP.external_stages.

STORAGE_INTEGRATION:
Uses a predefined integration AWS_INT that securely connects Snowflake to Amazon S3.

URL = ‘s3://awstosnowflake1995/landing/’;
Points to the S3 folder where the JSON files exist.

In short, this builds a secure connection that lets Snowflake load or unload data from S3.

Object creation description showing purpose, location, integration details, and S3 path.


Listing Files in the Stage

LIST @BOOTCAMP.external_stages.STAGE_JSON;

This command displays all files available in the connected S3 bucket.
The JSON file contacts_data.json appears in the listing.

File list showing JSON files stored in the landing folder of the S3 bucket.


Step-04: Creating Stage Table to Store Raw Data

CREATE OR REPLACE TABLE BOOTCAMP.STAGE_TBLS.contact_data_JSON_RAW
(raw_file variant);

Explanation

  • Creates a table named contact_data_JSON_RAW.
  • Contains a single column raw_file of type VARIANT.
  • VARIANT stores semi-structured data (JSON, Avro, XML).

This holds raw JSON exactly as loaded before parsing.

raw table creation and VARIANT column usage


Step-05: Copy Raw Data into the Stage Table

COPY INTO BOOTCAMP.STAGE_TBLS.contact_data_JSON_RAW
FROM @BOOTCAMP.external_stages.STAGE_JSON
file_format= BOOTCAMP.file_formats.FILE_FORMAT_JSON
FILES=('contacts_data.json');

Explanation

  • Loads the file contacts_data.json from S3 into the raw table.
  • Specifies the external stage as the source.
  • Uses JSON file format settings.
  • Imports only the specified file.

This loads raw JSON rows directly into Snowflake.

Panel explaining purpose, source, file format usage, and behavior of the COPY INTO process


View Raw Table Data

SELECT * FROM BOOTCAMP.STAGE_TBLS.contact_data_JSON_RAW;

Table displaying JSON rows stored inside the raw_file column


Step-06: Flatten JSON into the Final Table

CREATE OR REPLACE TABLE BOOTCAMP.INTG_TBLS.contact_data AS
SELECT
 raw_file:Name::string as Name,
 raw_file:Gender::string as Gender,
 raw_file:DOB::date as DOB,
 f1.value::string as Skill,
 raw_file:Address."House Number"::string as House_No,
 raw_file:Address.City::string as City,
 raw_file:Address.State::string as State,
 raw_file:Phone.Work::number as Work_Phone,
 raw_file:Phone.Mobile::number as Mobile_Phone
FROM BOOTCAMP.STAGE_TBLS.contact_data_JSON_RAW,
     table(flatten(raw_file:Skills)) f1;

Explanation

What it does:

  • Creates a new table contact_data inside INTG_TBLS.
  • Reads raw JSON and extracts fields such as Name, Gender, DOB, Address details, and Phone numbers.
  • Uses flatten(raw_file:Skills) to extract each skill as its own row.

How it works:

  • flatten expands each skill value for every user.
  • value::string retrieves the text representation of each skill.
  • All selected fields are assigned readable column names.

Section illustrating field extraction, flattening logic, and final table creation.


Step-07: Insert New Flattened Data into the Existing Table

INSERT INTO BOOTCAMP.INTG_TBLS.contact_data
SELECT
 raw_file:Phone.Mobile::number as Mobile_Phone
FROM BOOTCAMP.STAGE_TBLS.contact_data_JSON_RAW,
     table(flatten(raw_file:Skills)) f1;

Explanation

What it does:

  • Inserts parsed and flattened records into the existing contact_data table.

How it works:

  • Same logic as table creation, but used when appending new data.
  • Ensures each skill becomes a separate row.

Note: Only use this for incremental loads and ensure no duplicate source files are inserted.

Insert New Flattened Data into the Existing Table


Viewing Final Flattened Data

SELECT * from BOOTCAMP.INTG_TBLS.contact_data;

Table showing fully flattened data with Name, Gender, DOB, Skill, Address fields, and Phone numbers


Re-start from Beginning (Cleanup Commands)

DROP SCHEMA INTG_TBLS;
DROP SCHEMA STAGE_TBLS;
DROP STAGE BOOTCAMP.STAGE_TBLS.contact_data_JSON_RAW;

These commands remove all objects created during the flattening process.


Summary of Steps: Flattening JSON in Snowflake

  1. Create required schemas
    Logical organization for your objects.
  2. Create a JSON file format
    Use STRIP_OUTER_ARRAY = TRUE so each item in an array becomes a row.
  3. Create a stage object
    Link your S3 bucket to Snowflake.
  4. Create a raw stage table
    A table with a VARIANT column to store raw JSON data.
  5. Copy JSON data into the raw stage table
    Load your JSON from S3 into Snowflake using the correct file format.
  6. Flatten and parse JSON into the final table
    Use flatten(raw_file:Skills) so each skill appears as a separate row per person.
  7. INSERT INTO (Incremental loads)
    Append new data using INSERT statements while avoiding duplicate inserts.