Microsoft's new Azure data services explained

Featured Content

This article is courtesy of TechRepublic Premium. For more content like this, as well as a full library of ebooks and whitepapers, sign up for Premium today. Read more about it here.

Join Today

Azure SQL Data Warehouse, Data Lake and Elastic Database Pool give SQL developers the tools to create a scalable data warehouse, Hadoop-oriented exabyte-scale storage, and an elastic resource pool for multiple databases.

Now that the Azure SQL Database is more compatible with the latest on-premises version of SQL Server, it's easier for Microsoft to offer more traditional database workloads as services. The latest is setting up a data warehouse, which is already a good match for a cloud service because of the number of servers and amount of storage involved.

You can already set up a data warehouse on Azure using IaaS, and Microsoft even provides PowerShell scripts to simplify the setup. However, there are still plenty of steps you have to go through, including configuring storage pools and filegroups, and the assumption is that you'll already have done the design work on your own data warehouse that you're moving to Azure.

Azure SQL Data Warehouse

azure-sql-dw.jpg
Azure SQL Data Warehouse is a dynamic enterprise-class cloud-based data warehouse that will work with Microsoft's existing data tools.
Image: Microsoft

The new Azure SQL Data Warehouse (Azure SQL DW) service that's currently in beta is a platform service that you can set up quickly: it only takes a few minutes to configure, although you'll have to allow time to import any data not already stored on Azure (like in an Azure blob) — you can bring in data from SQL and NoSQL databases and Hadoop, as well as connecting to SAP, Dynamics and other server applications. And like other Azure data services, you can use the data as a source for Azure Machine Learning, or analyse it in Power BI.

Enjoying this article?

Download this article and thousands of whitepapers and ebooks from our Premium library. Enjoy expert IT analyst briefings and access to the top IT professionals, all in an ad-free experience.

Join Premium Today

Microsoft is claiming several advantages over the Redshift data warehouse service on AWS. For one thing, you can run it on your own servers, if you have already deployed a private cloud using the Windows Azure Pack in order to keep regulated data under your control. You can also scale the compute and storage you use up and down independently, in Data Warehouse Units, and you can pause Azure SQL DW if you don't need to run any analysis or predictions for a while.

You work with Azure SQL DW using the usual portal, which is well laid out, easy to understand and fast to work with once you have all the connection information you need to hand. You can see the pricing in advance as you drag the performance slider to add or reduce the amount of compute you need, and the list of databases shows you the price tiers for your storage as well. Monitoring tools in the portal show you everything from how many queries are running to how many datasets you have connected (and whether any of them are producing errors), how many pipelines you have set up and how many linked services and gateways you've connected to aggregate data. The drag-and-drop visual interface for connecting tables and blobs, creating data slices and setting up pipelines is easy to work with, although it assumes you already know the principles of running a data warehouse.

Azure SQL DW makes it faster and cheaper to set up a traditional data warehouse, which will be useful for businesses lacking the resources to do that on-premises. But the real advantage will be if you can connect it up to other cloud data stores to aggregate information, and then use it with cloud analysis tools. You might want to use machine learning to create an API that sends predictions to mobile apps for your sales staff, for example, or employ the Azure Apps tools to make a Logic App that plugs into other services (like sending text messages to store managers if the predictions suggest you're going to see more shoppers in store than usual). Those are far easier to work with in the cloud than on-premises.

Azure Data Lake

azure-data-lake.jpg
Azure Data Lake is a hyper-scale data store for big-data analytic workloads.
Image: Microsoft

If you're putting very large amounts of information into Azure, whether it's for a data warehouse, or machine learning or stream processing information from IoT sensors, the new exabyte-scale Data Lake may be the place to put it. This is a distributed file store using the Hadoop File System (HDFS) that will store any format or size of file, up to multiple petabytes. It's replicated across three storage locations inside an Azure region, and is designed for both large data files and small files that are updated very frequently (readings from IoT sensors or website logs, for example).

Azure Data Lake storage connects directly to the Azure HDInsight analytics service and Microsoft's newly-acquired Revolution-R Enterprise statistics platform. However, the Hadoop compatibility means you can use it with other Hadoop systems like Hortonworks and Cloudera, or specific Hadoop tools like Spark and Storm (many of which are already available on Azure), or with NoSQL databases like columnar and key-value stores.

Data Lake is still in private preview, but it will make Azure suitable for extremely large data-processing and analytics systems where you want to collect your data now and decide how you're going to use it later.

Azure SQL Database Elastic Pool

azure-sql-elastic-pool.jpg
A pool of elastic databases can scale and share resources when demands are unpredictable.
Image: Microsoft

If you need large amounts of storage for multiple databases where demand varies unpredictably, the beta of Azure's new SQL Database Elastic Pool service lets you group up to a hundred databases into a pool that you can manage and query together (using T-SQL), but where you can dynamically allocate storage and resources (allocated as Database Throughput Units, or DTUs) between the different databases as required. It's designed for developers building their own cloud services to offer to multiple customers and departments who want to pay for and manage a single storage resource.

You can set the throughput range (from 200 to 1,200 DTUs for all the databases in your pool) as well as the minimum and maximum throughput for each individual database (up to 100 DTUs per database if you're using S3 instances) via the usual sliders and controls in the Azure portal, and as usual you see an estimated cost as you configure the service. Alternatively, you can use PowerShell and REST APIs to create and configure the pool, and move databases in and out of the pool using these tools or T-SQL as well as the portal interface. You can see how much of your DTU and storage quota is in use and look at the query load and the resource utilization in the portal, or set alerts for notification.

With elastic pools come elastic queries, and again it's the usual straightforward Azure experience for creating and running jobs — pasting in SQL scripts, for example — so you can make changes or run admin tasks against multiple databases with automatic retries in case of failure.

SQL Database Elastic Pool brings the shared, elastic database resource concept to developers and admins who are familiar with SQL and SQL Server; you can use familiar development tools and techniques like ADO.NET Linq, entity framework and T-SQL, or your usual business intelligence tools for reports and analysis. And you can use those techniques for building a cloud service where you won't have time to manage thousands of Azure SQL Database instances individually. With other elastic database services you'd usually have to switch to NoSQL technologies instead.

Another difference is that other elastic database services offer you a resource that scales up and down on demand for a single database; if you want to extend that out to multiple databases you'd have to handle the management between them yourself. Azure SQL Database Elastic Pool lets you set the limits and then handles the management for you. Many cloud database services are for customers who want to consume a cloud service; Azure SQL Database Elastic Pool also lets you build cloud services, but the simplicity of a PaaS service makes it a quite different approach.

Join Premium Today