BigQuery Data Types: STRUCT vs JSON Performance Insights

As data architectures evolve to handle increasingly complex and semi-structured data, choosing the right data type in BigQuery becomes critical. Two popular options for nested or hierarchical data are the STRUCT and JSON data types. While both serve similar purposes, they differ significantly in performance, flexibility, and use cases. 

In this post, we’ll explore the performance implications of using BigQuery’s JSON data type compared to STRUCT. We will provide a benchmark-style example to help guide your architectural decisions. 

🧱 STRUCT vs. JSON: The Basics 

  • STRUCT is a native BigQuery data type that allows you to define a fixed schema with nested fields. 
  • JSON is a flexible, schema-less data type that stores data as raw strings in JSON format. 

⚙️ Performance Implications of JSON 

1. Query Execution Time 

JSON fields require parsing at query time using functions like JSON_EXTRACT or JSON_VALUE. This adds overhead compared to STRUCT, where fields are accessed directly. 

2. Column Pruning Limitations 

BigQuery cannot prune individual fields inside a JSON column. This means even if you query one field, the entire JSON blob is read and parsed. 

3. Storage and Compression 

JSON is stored as UTF-8 strings, which are typically larger and less compressible than STRUCTs. This can increase both storage and query costs. 

4. Partitioning and Clustering 

You cannot partition or cluster on fields inside a JSON column, limiting your ability to optimize performance for large datasets. 

📊 Benchmark Example 

Here’s a simulated benchmark comparing query performance between JSON and STRUCT for a dataset with 10,000 rows: 

Query Type Data Type Query Example Avg Execution Time 
Simple filter STRUCT WHERE attributes.score > 50 1.2 seconds 
Simple filter JSON WHERE CAST(JSON_EXTRACT(data, ‘$.attributes.score’) AS FLOAT64) > 50 2.8 seconds 
Field projection STRUCT Select nested fields directly 1.0 seconds 
Field projection JSON Use JSON_EXTRACT for each field 2.5 seconds 

✅ When to Use JSON 

Despite the performance trade-offs, JSON is a strong choice when: 

  • The schema is dynamic or evolving
  • You need to ingest raw data quickly without transformation. 
  • You’re dealing with logs, events, or API payloads
  • You prioritize flexibility over performance

🧠 Best Practices 

  • Extract frequently used fields into native columns during ETL/ELT. 
  • Use materialized views to cache parsed JSON fields. 
  • Avoid deeply nested or large JSON blobs unless necessary. 
  • Consider a hybrid approach: use STRUCT for stable fields and JSON for dynamic ones. 

📝 Summary 

Criteria STRUCT JSON 
Schema Flexibility Low High 
Query Performance High Moderate to Low 
Storage Efficiency High Lower (due to string storage) 
Partitioning/Clustering Supported Not supported 
Use Case Fit Stable, known schema Dynamic, semi-structured data 

Choosing between JSON and STRUCT in BigQuery is a trade-off between performance and flexibility. For high-performance analytics on well-defined data, STRUCT is ideal. For ingesting and exploring dynamic or semi-structured data, JSON offers unmatched adaptability. 

Leave a comment