V-Order: What It Is and How It Works 🚀

Ayush_Tiwari · ‎05-06-2025

V-Order: What It Is and How It Works 🚀

Imagine you're prepping for a study session, but your backpack is filled with a jumbled mess of books—math, science, history, and art. Every time you need one, you have to dig through it all. It's tedious and slow. Now, picture this: you organize the backpack by subject—math in one section, science in another, and so on. The next time you need a book, you can grab it right away—much quicker, right?

This simple idea applies to data systems, too. Just like sorting your backpack, V-Order organizes data efficiently, reducing the amount of data scanned during queries and speeding up read times.

So, what exactly is V-Order? 🔍

V-Order is an open-source, Parquet-compliant optimization technique designed to improve read performance and reduce storage needs in data-heavy environments, boosting overall performance.

How V-Order Optimizes Parquet Files 🛠️

Verti-Scan Technology: Enhances performance with tools like PowerBI and SQL.
Reduced Read Times: Cuts read times by up to 50% in Spark environments.
Efficient Storage: Uses sorting, row group distribution, and compression for better storage.
Resource Efficiency: Lowers network, disk, and CPU usage, reducing costs.
Full Compatibility: Works with Z-Order and time travel for optimized table partitions.

In today’s data-driven world, V-Order streamlines data operations, making them faster, more cost-efficient, and reliable.

Validate the Effect of V-Order on Datasets 📊

Before adopting any new optimization technique, it’s crucial to validate its effectiveness in real-world scenarios. In this section, we put V-Order to the test. Through a series of carefully designed experiments, we aim to assess how different factors—such as dataset size, structure, and Fabric configuration—affect its performance. The ultimate goal is clear: to determine where V-Order delivers a tangible benefit and where its impact might be less noticeable.

Test Benchmarks 🔬

We ran a set of experiments using a sample employee dataset to showcase the performance and storage benefits of V-Order optimization. Here’s a breakdown of the data and setup:

1. Dataset Size and Properties 📂

Number of Records: ~900 million
Dataset Size: 39.69 GB
Schema (Datatype Distribution):
- Total Columns: 80
- Columns used for querying
  personnelnumber: string, fulltimeweeklyworkhours: double, workcountrydesc: string, lastmodifieddatetime: timestamp

Structure: The dataset includes duplicate records per personnelnumber, which helps us better observe how V-Order affects both query speed and storage efficiency.
Looking Ahead: Future posts will dive deeper into how V-Order performs on de-duplicated datasets, offering a direct comparison between the two scenarios.

2. Fabric Region, Workspace Capacity, and Storage 🌎

Region: West Central US
Workspace Capacity (SKU): FT1
Semantic Model Storage Format: Small semantic model storage format

3. Spark Configuration Settings ⚙️

To ensure V-Order is enabled, we ran a few checks:

Spark Session:

%%spark
spark.conf.get('spark.sql.parquet.vorder.default')

Spark SQL:

%%sql
SHOW TBLPROPERTIES <table-name>

Enabling V-Order Optimization 🚀

There are several ways to enable V-Order optimization, depending on how you are interacting with the data in your Fabric notebook. Below are three common methods:

Spark Session:

%%spark   
spark.conf.set('spark.sql.parquet.vorder.default', "true")

Spark SQL:

%%sql 
CREATE TABLE person (id INT, name STRING, age INT)
USING parquet TBLPROPERTIES("delta.parquet.vorder.default"="true");

Fabric Lakehouse:

Navigate to your Lakehouse
Right-click → Select Maintenance
Run Optimize command to maximize reading speeds using V-Order → Run Now

Key Notes 🔑

Once the table property is set to true, all INSERT, UPDATE, and MERGE commands will follow V-Order optimization.
If V-Order is enabled in the Spark Session, all write commands will follow optimization, even if the table property is set to false.
To optimize existing tables using Spark SQL, you can use the OPTIMIZE command:

%%sql
OPTIMIZE <table|fileOrFolderPath> VORDER;
OPTIMIZE <table|fileOrFolderPath> WHERE <predicate> VORDER;

4. Spark Cluster Configuration 🖥️

We also tested the effectiveness of V-Order across a range of Spark pool configurations to ensure it works well under different setups:
Starter Pool: Size: Medium; Node Family: Memory optimized; Nodes: Autoscale (1-10)

Small Pool: Size: Small; Node Family: Memory optimized; Nodes: 3

Medium Pool: Size: Medium; Node Family: Memory optimized; Nodes: 3

Large Pool: Size: Large; Node Family: Memory optimized; Nodes: 3

Test Methodology: Assessing Write Performance, Storage Optimization and Read Efficiency🧪

To evaluate V-Order's impact on write performance, storage optimization and read efficiency we followed these steps:

Dataset Ingestion: We first ingested the dataset into the table without V-Order, then repeated the process with V-Order enabled.
Query Execution: After preparing both tables, we executed various queries—basic aggregations, window functions, predicate filters, and group-by operations—to assess read efficiency.
Testing Across Spark Configurations: All tests were conducted across four different Spark pool configurations, with results visualized below.

Results 📈

Below is a summarized view of how V-Order influenced write, read and storage metrics across various spark pools.
Write Performance

Write Time (seconds) across Spark pools

Storage Optimization

Storage Size (GB) across Spark pools

Read Efficiency

Query Types: Dataset Operations Overview 🔍

Here’s a brief overview of the query types used to evaluate the performance of V-Order on the dataset:

Basic Aggregations: Operations like Count, Average, and Sum on the fulltimeweeklyworkhours column to test aggregation efficiency.
Window Functions with Predicate: Lag and Lead functions partitioned by personnelnumber and lastmodifieddatetime, filtered by a predicate on personnelnumber and workcountrydesc.
Window Functions without Predicate: Sum and Percentage calculations over a window partitioned by workcountrydesc.
Group By Aggregations: Sum of fulltimeweeklyworkhours grouped by personnelnumber and workcountrydesc.
Average Analytical Query: A complex query using CTEs, window functions, aggregations, and joins to test overall performance.

Read Time (seconds) and Data Scanned (MB)

Read Time (Seconds) for Average Analytical Query

Data Scanned (MB) for Various Query Types

Key Observations: Results Overview

Automatic Write Optimization: V-Order triggers a Delta optimization job during Spark writes, indexing and compressing data, significantly reducing storage size by leveraging sorting and compression.
Reduced Data Scanned: Queries on V-Order-enabled tables scan less data, improving query speed and efficiency.
Targeted Query Efficiency: V-Order boosts performance (10-40%) when only specific columns are queried but offers limited benefits for full-table scans.
Faster Execution: Enabling V-Order reduces execution times by ~30%, enhancing processing speed across various data operations.
Optimized Data Read: Compression and sorting reduce data read volume by up to 50%, speeding up queries and lowering resource use.
Fewer Spark Tasks: V-Order reduces the number of Spark executor tasks by about 50%, improving resource allocation and parallelism.

When to Use V-Order? 🤔

Based on our testing results, here are some recommendations on when to enable V-Order, particularly within a traditional medallion architecture. The decision should depend on your workload characteristics and data access patterns at each layer.

Consider enabling V-Order if your workload involves:

Frequent ad-hoc queries
Analytics
Scenarios where users access specific subsets of columns

Zone	Delta	Purpose	Enable V-Order	Rationale
Bronze	No	Raw data from source system with full change history	No	Bronze tables are primarily written once and rarely read. Since read optimization is minimal, V-Order offers little value here.
Silver	Yes	Clean data – Apply business logic	Maybe	Silver tables may experience a balanced mix of reads and writes. Enable V-Order if the table is read frequently or used in downstream aggregations and disable it for write-heavy operations.
Gold	Yes	Dimensions and Fact Models for reporting and query	Yes	Gold tables are heavily read for analytics and reporting. V-Order significantly improves query performance in this zone.

The benefits of V-Order are most noticeable in read-heavy environments, where full-table scans are not common. In these cases, V-Order can improve query speed and reduce resource usage.

Conclusion and Other Considerations ✨

The experiment demonstrates that V-Order is a powerful optimization for improving query performance in Spark environments. On average, V-Order resulted in:

~30% reduction in execution time
~50% less data read
~50% fewer executor tasks

V-Order excels in read-heavy, analytical workloads, especially when queries focus on specific columns. While results may vary based on dataset characteristics, the overall performance boost is consistent across most analytical scenarios.

Its impact is most notable in the silver and gold layers of a medallion architecture, where data is cleaned, modelled, and frequently queried. While less effective for full-table scans or write-heavy operations, V-Order can drive significant performance improvements and optimize resource use in the right scenarios.

Appendix or Other Links 📚

Further reading and resources on V-Order and related topics can be found in the appendix and provided links.

Contributors

@kranthimeda @RajeshDadi @rbhallamudi @gachoure @Ayush_Tiwari

mithunm · ‎05-09-2025

Great Article team @Ayush_Tiwari , @kranthimeda , @rbhallamudi , @gachoure , @RajeshDadi !!

With Continuous Improvement (CI) as a core focus, these configurations—carefully identified based on data use cases from both read-optimized and write-optimized perspectives, as well as a balanced approach—are instrumental in reducing costs, improving refresh SLA performance, and ensuring optimized, maintainable code.

While these changes may seem minor in isolation, they reflect the deep level of analysis and rigor the team applied through POCs to realize tangible gains and successfully productionize them as part of One Global HCM.

A big thank you to the team for publishing this article, enabling broader organizational adoption and contributing to elevated coding standards and efficiency across the board ..

Unlock the power of V-Order: Revolutionize Data Read times and storage efficiency

V-Order: What It Is and How It Works 🚀

Validate the Effect of V-Order on Datasets 📊

Test Benchmarks 🔬

1. Dataset Size and Properties 📂

2. Fabric Region, Workspace Capacity, and Storage 🌎

3. Spark Configuration Settings ⚙️

Enabling V-Order Optimization 🚀

Key Notes 🔑

4. Spark Cluster Configuration 🖥️

Test Methodology: Assessing Write Performance, Storage Optimization and Read Efficiency🧪

Results 📈

Key Observations: Results Overview

When to Use V-Order? 🤔

Conclusion and Other Considerations ✨

Appendix or Other Links 📚

Contributors

Transform Configuration Management with Fabric Var...

Azure cross tenant access to Fabric Warehouse and ...

Spark Connector for Fabric Warehouse: Unified Anal...

Re: Transform Configuration Management with Fabric...

Unlock the power of V-Order: Revolutionize Data Re...

Become a Certified Power BI Data Analyst!

Unlock the power of V-Order: Revolutionize Data Read times and storage efficiency

V-Order: What It Is and How It Works 🚀

Validate the Effect of V-Order on Datasets 📊

Test Benchmarks 🔬

1. Dataset Size and Properties 📂

2. Fabric Region, Workspace Capacity, and Storage 🌎

3. Spark Configuration Settings ⚙️

Enabling V-Order Optimization 🚀

Key Notes 🔑

4. Spark Cluster Configuration 🖥️

Test Methodology: Assessing Write Performance, Storage Optimization and Read Efficiency🧪

Results 📈

Key Observations: Results Overview

When to Use V-Order? 🤔

Conclusion and Other Considerations ✨

Appendix or Other Links 📚

Contributors

Transform Configuration Management with Fabric Var...

Azure cross tenant access to Fabric Warehouse and ...

Spark Connector for Fabric Warehouse: Unified Anal...

Re: Transform Configuration Management with Fabric...

Unlock the power of V-Order: Revolutionize Data Re...