Microsoft's Power BI and DirectQuery: Making data analysis and visualisation easier

Mixing datasets to generate complex insights can help even the smallest organisation maximise business value from its data.

istock-679353026.jpg

Mixing datasets to generate complex insights can help organisations get the most business value from data.

Getty Images/iStockphoto

Microsoft's Power BI analysis tool might not get the publicity of the rest of the Power Platform, but as the company continues to invest in its business application and automation platform it continues to receive regular updates with new features. The latest batch makes it a lot easier to customise models and to work with more data sources, and even to use Power BI analytics in third-party analysis and visualisation tools. 

One of the more important features in Power BI is DirectQuery, which provides a mechanism for connecting to external data sources without importing the data into your own local analysis store. While data imports will always be more efficient than working with remote data sources, you may be in a position where you can't bring data into your Power BI instance. That might be for regulatory or data ownership reasons, or where data is changing rapidly and there's no scope for imports, or if you simply don't have the storage capacity to replicate data. 

powerbi-directquery-models.jpg

An Import model can integrate data from any number of supported data source types (top). In DirectQuery mode (above), models don't import data but consist of metadata defining the model structure. When the model is queried, native queries retrieve data from the underlying data source.

Images: Microsoft

Using DirectQuery 

DirectQuery connections behave differently from standard imports. You build queries much the same way as you would for any other connection, selecting sources and tables. Where things differ is that no data is imported and, when you create or update a visualisation, Power BI queries the remote data source as part of the report-building process. That can take some time, especially if you're working with very large data sets that might not support in-memory queries or other Power BI optimisations. 

Not every supported connector offers DirectQuery, although you'll find that most of those that do are other business analysis tools or are large-scale warehouses. This makes sense: you don't want to have spent all that time building a massive Teradata warehouse or setting up SQL Server Analysis Services to then replicate all that work inside Power BI. By building your models using familiar tools and then using Power BI as a visualisation and reporting layer, you're making best use of available resources and skill sets, with analysis happening at scale on servers, and visualisation and exploration on PCs. 

By using DirectQuery, you can start to build around those existing models, combining them with additional data to build custom analysis quickly. Maybe you have a sales data warehouse for a big store and want to tie that to data from your ERP system, without overloading either system with complex queries or filling reporting servers with terabytes of expensive data. Microsoft is promising more DirectQuery connectors in future releases, focusing on systems like these that are designed to work interactively. 

Building reports from live data with DirectQuery 

As models from existing analytics services are often built around their own measures, they're already using aggregated data. DirectQuery can work with these measures rather than the underlying data, so you're not duplicating effort, adding un-needed complexity to your queries. You can now work with big data analysis tools, like Apache Spark, letting specialised tooling manage unstructured data and taking advantage of their specialised features from inside Power BI without having to design your own complex queries. 

Once you've built a visualisation or a report that's based on a DirectQuery connection, the visualisation will initially be static, based on the last query made against the source. If you refresh the view, it will reload. If you're using them as tiles in a dashboard, you'll need to set up a refresh schedule -- usually hourly. When you open a report, you'll see the last refresh state, and if you want up-to-date data outside of a report update schedule, you'll have to manually refresh the data. 

SEE: Windows 10 Start menu hacks (TechRepublic Premium)

There's another good reason for using DirectQuery: if your data needs to be secure, then applying an access-rights management layer to the source data means that only approved data is delivered to Power BI. The DirectQuery connection passes credentials from Power BI to the source, allowing protected data to be released to authorised users. Compartmentalising queries in this way ensures that dashboards for CEOs and CFOs can be tested by development teams without compromising sensitive data. 

It's important to note that DirectQuery does limit some Power BI features, both in building models and running reports. They're not significant issues, but if you rely on calculated tables, for example, then you might prefer to use a traditional connection. You're prevented from using some of Power BI's machine-learning-based features, like Quick Insights and Q&A, as they rely on having local datasets for speed, or for building ML models. 

Building chained analytics with DirectQuery and Power BI 

powerbi-directquery-chaining.jpg

Extending beyond a chain length of three models is not supported and results in errors. 

Image: Microsoft

Where things get interesting is using DirectQuery with Power BI datasets and Azure Analysis Services. Recently released in preview, this lets you combine data from your own models with other data, from existing DirectQuery connections or from other data sources. Perhaps you have a rough dataset in Excel and want to see how it affects predictions or other analyses. You can quickly set up a live connection to your existing Power BI models and start to make changes. 

This approach gives you the option of chaining models and data sets, starting with, say, an Azure Analysis Services model that brings in multiple sources in an Azure Data Lake. With a DirectQuery connection in Power BI, you can mix that with other sources to build a new model. Perhaps you're using IoT data from a windfarm to show power output in different wind speeds and mixing that with weather predictions to produce a Power BI model that predicts the performance of the windfarm. Chaining that model into another that predicts pricing allows you to build a system that suggests the timing and quantity of output required to get the best price. 

SEE: 60 Excel tips every user should master

You can currently chain up to three models -- any more will give you an error. You also need to be aware that data from one model can flow into another, possibly allowing unintended information leakage -- especially if one of the interim datasets is a standard query. 

It's not hard to see how important these new features are. Microsoft is bringing analysis features that used to require significant amounts of development work into its desktop business intelligence tooling. Mixing datasets to generate complex insights can help even the smallest organisation get the most business value from its data, going from big data in Azure to targeted dashboards on executive desktops in a few queries. The resulting information will help those executives make better decisions, in a much more-timely fashion. In an uncertain world, that's something well worth building. 

Also see