Data Engineering with Microsoft Fabric: Efficientl...

Ilgar_Zarbali · ‎11-20-2024

Introduction to Lakehouses
The Lakehouse in Microsoft Fabric serves as a central repository for your files and tables. When we refer to "tables," we mean data stored in the Delta Parquet format. This guide begins by logging into Fabric at fabric.microsoft.com and navigating to the Synapse Data Engineering workspace, which is equipped with tools like Lakehouses, notebooks, Spark job definitions, and data pipelines.

Microsoft Fabric

Setting Up Your Workspace
If you're familiar with Power BI, you’ll find the concept of workspaces familiar. In this guide, we’re working in a private workspace where all artifacts—including the Lakehouse we create—will be stored. To enable the full functionality of Fabric, we’ve assigned a Fabric trial capacity to this workspace, which provides 60 days of access to premium features. To start your trial, click the account manager icon and follow the steps to activate your trial.

Trial Capacity

Creating a Lakehouse
To create a Lakehouse, navigate to the Data Engineering homepage and select Lakehouse. Assign a name to your Lakehouse and click Create. The Lakehouse will appear in your chosen workspace. This also automatically generates a SQL endpoint, allowing external tools like SQL Server Management Studio or Azure Data Explorer to query the data. You can find this endpoint in the Lakehouse settings.

Lakehouse

Creating Lakehouse

Organizing and Loading Data
1. Files vs. Tables:
- Files Folder: Stores raw files like CSV, JSON, or Excel files. These files remain as-is and cannot be queried directly using SQL.
- Tables Folder: Stores data in Delta Parquet format, making it accessible for querying and analysis.

Files-SubFolder

2. Uploading Files:
- Create a subfolder (e.g., Sales) in the Files folder.
- Upload files by selecting Upload, then choose a file or folder from your local machine. For example, we uploaded a file named `items.csv` containing 400,000 rows.

Upload Upload

3. Loading Files into Tables:
- To query data, move files from the Files folder to the Tables folder. Use the Load to Table option, ensuring column names in your file do not contain spaces (replace them with underscores if necessary).
- After successfully loading, the file becomes a Delta Parquet table, ready for SQL queries and other operations.

Load to Table

4. Using Shortcuts:
- Instead of copying data, you can create a shortcut to files stored in other locations like Azure Data Lake Storage or Amazon S3. This approach avoids duplicating data while making it accessible in your Lakehouse.

Querying Data in Delta Parquet Format
Once your data is in the Tables folder, you can query it using SQL or other compute engines like Power BI, Excel, or Spark notebooks. The Delta Parquet format ensures compatibility and provides features like transaction logs and version history.

Next Steps
This guide has covered creating a Lakehouse, uploading files, and converting them into queryable tables. In part two, we’ll explore querying the data using SQL and other tools, along with advanced data engineering techniques.

By following this guide, you’ll gain a comprehensive understanding of managing and querying data in Microsoft Fabric Lakehouses.

Data Engineering with Microsoft Fabric: Efficiently Loading Data into a Lakehouse

Transform Configuration Management with Fabric Var...

Azure cross tenant access to Fabric Warehouse and ...

Data Wrangler - The python transformation method s...

Spark Connector for Fabric Warehouse: Unified Anal...

Unlock the power of V-Order: Revolutionize Data Re...

Become a Certified Power BI Data Analyst!