Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Ayush_Tiwari

Unlock the power of V-Order: Revolutionize Data Read times and storage efficiency

V-Order: What It Is and How It Works 🚀

Imagine you're prepping for a study session, but your backpack is filled with a jumbled mess of books—math, science, history, and art. Every time you need one, you have to dig through it all. It's tedious and slow. Now, picture this: you organize the backpack by subject—math in one section, science in another, and so on. The next time you need a book, you can grab it right away—much quicker, right?

This simple idea applies to data systems, too. Just like sorting your backpack, V-Order organizes data efficiently, reducing the amount of data scanned during queries and speeding up read times.

So, what exactly is V-Order? 🔍

V-Order is an open-source, Parquet-compliant optimization technique designed to improve read performance and reduce storage needs in data-heavy environments, boosting overall performance.

How V-Order Optimizes Parquet Files 🛠

  • Verti-Scan Technology: Enhances performance with tools like PowerBI and SQL.

  • Reduced Read Times: Cuts read times by up to 50% in Spark environments.

  • Efficient Storage: Uses sorting, row group distribution, and compression for better storage.

  • Resource Efficiency: Lowers network, disk, and CPU usage, reducing costs.

  • Full Compatibility: Works with Z-Order and time travel for optimized table partitions.

In today’s data-driven world, V-Order streamlines data operations, making them faster, more cost-efficient, and reliable.


Validate the Effect of V-Order on Datasets 📊

Before adopting any new optimization technique, it’s crucial to validate its effectiveness in real-world scenarios. In this section, we put V-Order to the test. Through a series of carefully designed experiments, we aim to assess how different factors—such as dataset size, structure, and Fabric configuration—affect its performance. The ultimate goal is clear: to determine where V-Order delivers a tangible benefit and where its impact might be less noticeable.

Test Benchmarks 🔬

We ran a set of experiments using a sample employee dataset to showcase the performance and storage benefits of V-Order optimization. Here’s a breakdown of the data and setup:

1. Dataset Size and Properties 📂

  • Number of Records: ~900 million

  • Dataset Size: 39.69 GB

  • Schema (Datatype Distribution):

    • Total Columns: 80

    • Columns used for querying

      personnelnumber: string, fulltimeweeklyworkhours: double, workcountrydesc: string, lastmodifieddatetime: timestamp
  • Structure: The dataset includes duplicate records per personnelnumber, which helps us better observe how V-Order affects both query speed and storage efficiency. 
    Looking Ahead: Future posts will dive deeper into how V-Order performs on de-duplicated datasets, offering a direct comparison between the two scenarios.

2. Fabric Region, Workspace Capacity, and Storage 🌎

  • Region: West Central US

  • Workspace Capacity (SKU): FT1

  • Semantic Model Storage Format: Small semantic model storage format

3. Spark Configuration Settings ⚙️

To ensure V-Order is enabled, we ran a few checks:

Spark Session:

%%spark
spark.conf.get('spark.sql.parquet.vorder.default')

Spark SQL:

%%sql
SHOW TBLPROPERTIES <table-name>

Enabling V-Order Optimization 🚀

There are several ways to enable V-Order optimization, depending on how you are interacting with the data in your Fabric notebook. Below are three common methods:

Spark Session:

%%spark   
spark.conf.set('spark.sql.parquet.vorder.default', "true")

Spark SQL:

%%sql 
CREATE TABLE person (id INT, name STRING, age INT)
USING parquet TBLPROPERTIES("delta.parquet.vorder.default"="true");

Fabric Lakehouse:

  • Navigate to your Lakehouse

  • Right-click → Select Maintenance

  • Run Optimize command to maximize reading speeds using V-Order → Run Now

    Ayush_Tiwari_1-1745989815429.png

Key Notes 🔑

  • Once the table property is set to true, all INSERT, UPDATE, and MERGE commands will follow V-Order optimization.

  • If V-Order is enabled in the Spark Session, all write commands will follow optimization, even if the table property is set to false.

  • To optimize existing tables using Spark SQL, you can use the OPTIMIZE command:

%%sql
OPTIMIZE <table|fileOrFolderPath> VORDER;
OPTIMIZE <table|fileOrFolderPath> WHERE <predicate> VORDER;

4. Spark Cluster Configuration 🖥

We also tested the effectiveness of V-Order across a range of Spark pool configurations to ensure it works well under different setups:
Starter Pool: Size: Medium; Node Family: Memory optimized; Nodes: Autoscale (1-10)

Small Pool: Size: Small; Node Family: Memory optimized; Nodes: 3

Medium Pool: Size: Medium; Node Family: Memory optimized; Nodes: 3

Large Pool: Size: Large; Node Family: Memory optimized; Nodes: 3


Test Methodology: Assessing Write Performance, Storage Optimization and Read Efficiency🧪

To evaluate V-Order's impact on write performance, storage optimization and read efficiency we followed these steps:

  1. Dataset Ingestion: We first ingested the dataset into the table without V-Order, then repeated the process with V-Order enabled.

  2. Query Execution: After preparing both tables, we executed various queries—basic aggregations, window functions, predicate filters, and group-by operations—to assess read efficiency.

  3. Testing Across Spark Configurations: All tests were conducted across four different Spark pool configurations, with results visualized below.


Results 📈

Below is a summarized view of how V-Order influenced write, read and storage metrics across various spark pools.
Write Performance

 

Ayush_Tiwari_0-1746168640206.png

 

Write Time (seconds) across Spark pools

 

Storage Optimization

Ayush_Tiwari_1-1746168832707.png

Storage Size (GB) across Spark pools

 

Read Efficiency

Query Types: Dataset Operations Overview 🔍

Here’s a brief overview of the query types used to evaluate the performance of V-Order on the dataset:

  1. Basic Aggregations: Operations like Count, Average, and Sum on the fulltimeweeklyworkhours column to test aggregation efficiency.

  2. Window Functions with Predicate: Lag and Lead functions partitioned by personnelnumber and lastmodifieddatetime, filtered by a predicate on personnelnumber and workcountrydesc.

  3. Window Functions without Predicate: Sum and Percentage calculations over a window partitioned by workcountrydesc.

  4. Group By Aggregations: Sum of fulltimeweeklyworkhours grouped by personnelnumber and workcountrydesc.

  5. Average Analytical Query: A complex query using CTEs, window functions, aggregations, and joins to test overall performance.

Read Time (seconds) and Data Scanned (MB)

 

Ayush_Tiwari_2-1746169504758.png

Read Time (Seconds) for Average Analytical Query

 

Ayush_Tiwari_3-1746169596941.png

Data Scanned (MB) for Various Query Types


Key Observations: Results Overview

  1. Automatic Write Optimization: V-Order triggers a Delta optimization job during Spark writes, indexing and compressing data, significantly reducing storage size by leveraging sorting and compression.

  2. Reduced Data Scanned: Queries on V-Order-enabled tables scan less data, improving query speed and efficiency.

  3. Targeted Query Efficiency: V-Order boosts performance (10-40%) when only specific columns are queried but offers limited benefits for full-table scans.

  4. Faster Execution: Enabling V-Order reduces execution times by ~30%, enhancing processing speed across various data operations.

  5. Optimized Data Read: Compression and sorting reduce data read volume by up to 50%, speeding up queries and lowering resource use.

  6. Fewer Spark Tasks: V-Order reduces the number of Spark executor tasks by about 50%, improving resource allocation and parallelism.


When to Use V-Order? 🤔

Based on our testing results, here are some recommendations on when to enable V-Order, particularly within a traditional medallion architecture. The decision should depend on your workload characteristics and data access patterns at each layer.

Consider enabling V-Order if your workload involves:

  • Frequent ad-hoc queries

  • Analytics

  • Scenarios where users access specific subsets of columns

Zone

Delta

Purpose

Enable V-Order

Rationale

Bronze  

No

Raw data from source system with full change history

  No

Bronze tables are primarily written once and rarely read. Since read optimization is minimal, V-Order offers little value here.

Silver

Yes

Clean data – Apply business logic

  Maybe

 

Silver tables may experience a balanced mix of reads and writes. Enable V-Order if the table is read frequently or used in downstream aggregations and disable it for write-heavy operations.

 

Gold

Yes

Dimensions and Fact Models for reporting and query

  Yes

Gold tables are heavily read for analytics and reporting. V-Order significantly improves query performance in this zone.

 

 

The benefits of V-Order are most noticeable in read-heavy environments, where full-table scans are not common. In these cases, V-Order can improve query speed and reduce resource usage.


Conclusion and Other Considerations

The experiment demonstrates that V-Order is a powerful optimization for improving query performance in Spark environments. On average, V-Order resulted in:

  • ~30% reduction in execution time

  • ~50% less data read

  • ~50% fewer executor tasks

V-Order excels in read-heavy, analytical workloads, especially when queries focus on specific columns. While results may vary based on dataset characteristics, the overall performance boost is consistent across most analytical scenarios.

Its impact is most notable in the silver and gold layers of a medallion architecture, where data is cleaned, modelled, and frequently queried. While less effective for full-table scans or write-heavy operations, V-Order can drive significant performance improvements and optimize resource use in the right scenarios.


Appendix or Other Links 📚

Further reading and resources on V-Order and related topics can be found in the appendix and provided links.

  1. Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn
  2. Solved: V-Order & Z-Order - Microsoft Fabric Community
  3. Fabric Espresso – Episodes about Performance Optimization & Compute Management in Microsoft Fabric |...

Contributors

@kranthimeda @RajeshDadi @rbhallamudi @gachoure @Ayush_Tiwari 

Comments

Great Article team @Ayush_Tiwari , @kranthimeda , @rbhallamudi , @gachoure , @RajeshDadi  !! 

With Continuous Improvement (CI) as a core focus, these configurations—carefully identified based on data use cases from both read-optimized and write-optimized perspectives, as well as a balanced approach—are instrumental in reducing costs, improving refresh SLA performance, and ensuring optimized, maintainable code.

While these changes may seem minor in isolation, they reflect the deep level of analysis and rigor the team applied through POCs to realize tangible gains and successfully productionize them as part of One Global HCM.

A big thank you to the team for publishing this article, enabling broader organizational adoption and contributing to elevated coding standards and efficiency across the board ..