Autor: Miguel Peredo Zurcher
(Update 30.January.2022, Hybrid tables now work with Snowflake) (see here)
As an introduction, let's agree that we want dashboards with response times of less than one second. Nobody enjoys (or uses) dashboards that take two or more seconds to load.
How do you achieve fast-loading dashboards in BI?
BI tools have two access modes: Load Mode and Live or Direct Query Mode. In Load Mode, the data is copied (loaded, imported) from the source to the in-memory layer of the BI tool, which for specific volumes provides a sub-second response time. In Direct Query mode, the response time of the dashboard depends on the response time of the source where the tool is connected, which usually is not a sub-second time. There are other aspects to consider to Load Mode (import) or Direct Query mode (live); check your BI tool documentation.
Most cloud Dataware house-oriented products have an incredible response time, they can process vast amounts of data, and the response time will be just a couple of seconds, but not below one second.
Because of this, you must first load the data to the BI tool (load mode), which is usually an in-memory layer that will respond in less than one second for most cases. And then, query the data from this layer.
So far, so good; we can load our data to the in-memory layer of the BI tool, and our sub-second response time challenge is solved, but can we really load all our data to the in-memory layer?
The answer is a qualified yes. Loading the data to BI tool in-memory engine had some challenges:
- In-memory is expensive
- The complexity of doing incremental loads
- Stale dashboards accentuated by streaming and frequent updates into the source data
These challenges have had workarounds, such as Power BI aggregates, allowing you to load aggregated data into the in-memory capacity. But as you can see, there is still the need to load from the data warehouse to the in-memory layer.
I must mention that several vendors are working on putting an ultra-fast layer on top of their data warehouse solution, for example: What is BigQuery BI Engine? | Google Cloud, Using the Search Optimization Service — Snowflake Documentation, etc.
Making an extra load impacts the freshness of the data in your dashboard, meaning the data that has been written (maybe streamed) to the data warehouse is not visible on the dashboard until you do the next load.
Power BI has now a great feature (in public preview) that allows you to tackle the challenges mentioned before; here is the official announcement: Announcing Public Preview of Hybrid Tables in Power BI Premium | Microsoft Power BI Blog | Microsoft Power BI
What can you do with Power BI Hybrid Tables?
It allows you to decide which parts of a table (usually a fact table, like sales for the last ten years) to load in the in-memory layer and which parts you keep in the source but still accessible in the dashboards with different response times.
For example, your Sales table could look like this:
In part A, you have the data has been inserted into the source after the last load to B.
In part B, you have sales data from the previous two years, which resides in memory. You load this data in a scheduled way. (For example, every 8 hours).
In part C, you have historical data.
The objective is to make all the data available for the BI dashboards.
The access modes for different parts of the Sales table are:
For A - Direct Query Mode. Response time is acceptable when querying from the source since it's a small amount of data.
For B - Load Mode. The performance of querying from memory will be top of the line. Most dashboard queries will hit only the last two years of data (as agreed with the business).
For C - Direct Query Mode. For queries that span more than the last two years, the performance will depend on the source. But again, this is what the business requested.
All this is transparent to the report creators; for them is just another table, but behind the scenes, it is a union of in-memory and direct queries, automatically managed by Power BI.
If that layout does not fit your scenario, you have the freedom to choose what goes to the in-memory (load mode) and what stays in the source.
Do you want to try it out?
Remember that you need Power BI Premium (can be PPU or Capacity)
And I suggest following the instructions from the blog mentioned above: Announcing Public Preview of Hybrid Tables in Power BI Premium | Microsoft Power BI Blog | Microsoft Power BI
When experimenting, I first tried Snowflake, but it did not work. I suppose because the feature is still in preview. I posted the issue in the Power BI community:
Hybrid Tables, Snowflake and Query Folding - Microsoft Power BI Community
(Update 30.January.2022, Hybrid tables now work with Snowflake)
Then I tried Synapse, and it worked.
After publishing to the Power BI premium and doing one refresh that created the partitions, I used Tabular Model Editor to see the partitions and check for each partition if those are in Load Mode or Direct Query Mode. The Tabular Model Editor also allowed me to change the settings for the partitions, but as mentioned in the Blog from Power BI, you can also make the changes programmatically.
Some screenshots: