Big Data

SQL Server 2019 CTP 2.1: A first look

The next release of Microsoft's relational database is getting closer. We took it for a spin.

Microsoft's flagship database is an important tool, with on-premises and in-cloud versions offering powerful storage and analytic tools. It's also become an important application for data scientists, providing a framework for building and testing machine-learning models. There's a lot in SQL Server, and a new release can show where Microsoft thinks your data needs will go over the next few years.

The latest CTP for SQL Server 2019, version 2.1, is now available to help you evaluate and test the next release outside your production environments. Like its predecessor it comes in Windows and Linux versions, although there's now added support for containers and Kubernetes. Adding container support, using Docker and the Linux version of SQL Server, is an interesting option as it will allow you to build SQL Server into massive Kubernetes-based analytic engines that work across Azure-hosted data lakes using Apache Spark.

Setting up

sql-server-2019install.png

Install options for SQL Server 2019 CTP 2.1.

Image: Simon Bisson/TechRepublic

The current preview installer gives you the option of a basic installation, quick and fast, or a more detailed customized install. The first option takes less disk space, as it's the files needed to run that basic install, while a custom install brings down the entire SQL Server 2019 install media. For most basic developer tasks a basic installation is enough, although we'd recommend a custom installation as part of a full evaluation. You can also download installation media if you're planning on installing on more than one machine to evaluate SQL Server's cluster features.

SEE: Network security policy template (Tech Pro Research)

Once you've downloaded the appropriate media, the familiar SQL Server installer walks you through the process of choosing options, checking for prerequisites and any configuration changes you need to make. It's a straightforward process, and once you've configured your choice of options installation is quick and easy. We were able to get a standalone test system up and running in under 20 minutes.

Machine learning

Machine learning is an important part of SQL Server 2019, and it now includes integrated tools for building and testing machine-learning models. You can install it with support for the popular R and Python languages, so your data science team can work inside the database, preparing and testing models before training them on your data. Microsoft is using its own R Open distribution and the Anaconda data science Python environment, which includes additional numerical analysis libraries, among them the popular NumPy.

There's also the option of installing SQL Server 2019 as a standalone machine-learning development environment. Local instances of SQL Server on developer workstations will be able to use familiar R and Python tooling to work directly with training data sets, without affecting production systems or using server resources.

Really BIG data

Working with data at scale has long been an issue, with very few database engines designed to work as part of a distributed architecture. With SQL Server 2019 you can now build what Microsoft is calling Big Data Clusters, using a mix of SQL Server and Apache Spark containers on top of Kubernetes using SQL Server's existing PolyBase features. With public clouds supporting native Kubernetes you'll be able to deploy Big Data Clusters on Azure, on AWS, and on GCP, as well as on your own infrastructure. Integration with the Azure Data Studio tools makes it easier to build, run, and share complex queries.

Microsoft's focus on data science scenarios fits well with the company's intelligent cloud/intelligent edge strategy. Data is essential to building machine-learning tools, and by running R and Python code inside your database you're able to deliver complex queries from the SQL Server command line, using familiar tooling to build and test code before deploying and running it. Microsoft is delivering sample code through GitHub, showing how to mix relational data with big data. It's also sharing sample architectures that show you how to use this as a basis for building machine learning systems on top of other open-source technologies like Kafka.

Under the hood

There's a lot of change under the surface too, with improvements to the SQL Server database engine. One that might appear trivial is support for UTF-8. Earlier versions that needed to store non-English character sets had to use UTF-16, which meant that Unicode string data would take 22 bytes per character. Switching to UTF-8 still supports most Unicode, but cuts storage requirements to 12 bytes per character. If you're storing a lot of string data you'll now need much less disk space — and you can use the familiar CHAR datatype rather than NCHAR.

SEE: Windows spotlight: 30 tips and tricks for power users (Tech Pro Research)

Other new features, like static data masking, focus on securing and sanitizing data so that it can be used without affecting regulatory compliance. Applying static data masking to columns in a database export allows developers to work with real-world data while preventing sensitive information from leaking. There's no way to retrieve the original data, as it's a one-way process. Earlier versions of SQL Server introduced dynamic data masking, which only works with the original database. By exporting with static masking there's little or no risk of developers accidentally unmasking or affecting live data, while letting them produce code that can be put into production without any changes.

At a database level, when you're building indexes you can now stop and start. If a disk is filling, you can pause an index operation, add more storage to the volume, and then resume from where you left off. You don't need to start again from scratch, saving time and compute. There's also the option to restart after failures, again saving time once you've corrected the error that caused an index operation to crash.

A cross-platform database

SQL Server is no longer only a Windows tool, and its Linux edition is picking up plenty of new features with this release. Perhaps the most important update is support for SQL Server Replication, which allows you to build distributed SQL databases more effectively, especially in conjunction with the Linux release of the Distributed Transaction Coordinator. As more and more modern applications are built on Kubernetes, tools like these simplify scaling.

Tooling remains important, and there's a new release of the SQL Server Management Studio. Database admins will find it supports the new security features, with a familiar look and feel. However it's probably time to start looking at the new Azure Data Studio, which works across on-premises and cloud instances, with development and management tools as well as monitoring dashboards. Data scientists can use its built-in notebook tools to experiment with queries, much like the open-source Jupyter Notebooks. There's also the option to use Azure Data Studio to manage and operate new SQL Server scenarios, like Big Data Clusters.

With SQL Server 2019 Microsoft is proving that even though relational databases have been around a long time, there's still room for improvement and for innovation. By building a database engine that works like every SQL Server has worked in the past, and at the same time supporting working with machine learning and massive-scale big data, it's delivering a tool that's ready to upgrade what you have and to support you as you work with your data securely, on-premises and in public clouds. All you need to do is download it and see what it can do for you.

Also see

About Simon Bisson

Born on the Channel Island of Jersey, Simon moved to the UK to attend the University of Bath where he studied electrical and electronic engineering. Since then a varied career has included being part of the team building the world's first solid state...

Editor's Picks

Free Newsletters, In your Inbox