Microsoft's Azure data transfer tools are getting smarter. A lot smarter.
Microsoft's family of Azure Data Box devices is an important part of its cloud migration pitch. Getting data from your data center to the cloud can be an issue, bandwidth notwithstanding. At heart it's a question of whether you're sending batches of data or delivering a constant stream of information.
For most of the Data Box family, instead of pushing data over (relatively) slow network connections, Microsoft sends you disks. You load them with data and send them back. As the saying goes, "Never underestimate the bandwidth of a pickup truck full of backup tapes" and with modern high-density hard drives that's a significant amount of data being shifted: Data Box Disk is for less than 40TB of data, Data Box is for up to 100TB, and Data Box Heavy is a 1PB device that can ship more than 500TB of data at a time.
Upload to the cloud by disk
The first three Data Box products are like Amazon's AWS Snowball -- a tool for a on-off upload to the cloud. They're what you use if you're moving all your current data to AWS, as part of a cloud migration. Once that data's there you can switch your entire data processing service away from on-premises servers and storage to Azure, using VPNs to synchronize the data that loaded while the Data Box was making its way to one of Microsoft's data centers.
Shipping data is only part of the story. Not all cloud migrations are one-shot data uploads: many require permanent connections between on-premises data sources and cloud applications. With Data Box's offline tools you've got your data there. Now you need a way of connecting online sources to your Azure applications.
Continuous connections to Azure for your data
That's where the remainder of the Azure Data Box tooling comes into play: Data Box Gateway and Data Box Edge. The first of the two is a relatively simple tool, a virtual device running on a hypervisor (with support for both Hyper-V and VMware) in your data center, working as a storage gateway between your network and Azure Storage. Acting as an SMB or NFS storage device, using standard shares, it ships data as it arrives, storing it in an Azure Blob, an Azure Page Blob or in Azure Files. You can choose how to tier data once shipped to Azure, so data can be automatically stored in long-term low-cost archives.
Data Box Gateway makes uploading to Azure look like a file share. More complex options come with Data Box Edge, which adds tooling to preprocess your data as part of the upload process, using elements of Azure Cognitive Services. Delivered as a 1U rack-mounted appliance, Data Box Edge plugs into your network, acting as an extension of your Azure services inside your data center. All you need to do is provide power and networking; the software is kept up to date by Microsoft and you manage your services from the Azure Portal.
Setting up Data Box Edge
Getting a Data Box Edge connected is relatively simple. Microsoft provides a list of ports and domains that need to be configured in any firewall, along with suggestions for minimum bandwidth for connections to Azure. One thing to note: you can't use Data Box Edge with a pay-as-you-go Azure subscription. Instead, you need to pay a separate subscription fee on a per-unit basis -- currently about £520 a month (which is roughly equivalent to $700). It might not be cheap, but if you don't have to deploy an entire rack of servers to a site, then it's still a saving in power and in management overheads.
If you're using Data Box Edge to only upload data to an Azure store, then it offers the same features as Data Box Gateway. If that was its only use case, then you'd be better off using a virtual appliance. But if you want to take advantage of its AI capabilities, then it's a powerful tool, and one that can change the way you work with data.
For example, it can preprocess data to give you an indication of any key insights that might need additional processing using cloud resources. Data preprocessing in Data Box Edge reduces bandwidth requirements and allows you to use it as an edge computing device where tools like Azure Stack would take up too much space. It's easy to imagine a Data Box Edge in the back office of a supermarket, offering sales analytics as part of its regular upload to cloud-hosted ERP tools, or one filtering signals from IoT devices so that only relevant data is delivered for additional processing.
An Azure extension in your network
It's perhaps best to think of Data Box Edge as an extension of Azure into your data center. It's managed using the same tools you manage your Azure applications and services, with settings pushed down from the cloud to your device; for example setting share names for local access to its upload service. The same tools manage containers running on the device, linking Edge compute to your local network and provisioning any users.
Much of Data Box Edge administration is handled in the Azure Portal or through the Azure CLI, but for some basic admin tasks there's a local web-based UI. With a similar look and feel to the Azure Portal, the local UI runs in a browser and gives you tooling to control device configurations and maintenance. It's where you change device passwords, control connectivity settings, as well as turning it on and off. The local UI contains hardware status information, so you can use it to report any issues to your support contacts.
Running code on Data Box Edge
If you want to run your own code on Data Box Edge, start by configuring its edge compute feature in the Azure Portal. To simplify things, it uses IoT Hub resources, so you can migrate any existing IoT Edge apps to Data Box Edge quickly. Code can be given access to device storage shares and is deployed as IoT Edge modules. That simplifies the development process, as you can use the same tools as you use to develop IoT Edge applications, without having to learn any new skills. More complex applications can be delivered as Docker containers, and downloaded to your device. Much of your development will be serverless, using events and other triggers to launch code. A trigger can be as simple as file being written to a share, launching a data processing task, before the result is transferred to Azure.
Combining the Azure connectivity of Data Box Gateway with the compute capabilities of IoT Edge makes a lot of sense. IoT's programming model is one that supports a lot more than IoT devices, and you can use it alongside Azure services like the Cognitive Services machine learning APIs (and the new Cognitive Services container deployments) to build complex applications that solve significant business problems. All you need is space for a 1U appliance and sufficient bandwidth.