Here's how a new machine learning software can beef up cloud-based databases

A Purdue data team just developed a way for organizations to improve performance and efficiency through cloud-hosted databases.

How to synchronize Ubuntu server directories with Unison

As the coronavirus pandemic has brought the workforce online, organizations are struggling to manage dynamic remote workloads. On Thursday, a team of data scientists led by a Purdue University professor, Somali Chaterji, introduced a solution called OPTIMUSCLOUD.

This new software technology, which runs with a database server, harnesses machine learning to create algorithms to improve the efficiency of virtual machine selection and options for database management systems. The system is designed to help organizations reap the greatest benefit from cloud-based databases.

Chaterji directs the Innovatory for Cells and Neural Machines and teaches agricultural and biological engineering. Her software system can be used in "rightsizing resources to benefit both the cloud vendors who do not have to aggressively over-provision their cloud-hosted servers for fail-safe operations and the clients," as they will earn the savings, according to the press release.

SEE: Managing AI and ML in the enterprise 2020: Tech leaders increase project development and implementation (TechRepublic Premium)

"It also may help researchers who are crunching their research data on remote data centers, compounded by the remote working conditions during the pandemic, where throughput is the priority," Chaterji stated. "This technology originated from a desire to increase the throughput of data pipelines to crunch microbiome or metagenomics data."

OPTIMUSCLOUD uses Amazon's AWS, Google Cloud, and Microsoft Azure—and could work with others down the line—and harnesses Amazon's AWS cloud computing with the NoSQL technologies Apache Cassandra and Redis, the release states.

Chaterji says that her team's product can take on "long-running, dynamic workloads, whether it be workloads from the ubiquitous sensor networks in connected farms or high-performance computing workloads from scientific applications or the current COVID-19 simulations from different parts of the world in a rush to find the cure against the virus."

OPTIMUSCLOUD has other applications: It could improve safety for self-driving vehicles. It can also be used in healthcare and IoT infrastructures in farms and factories, according to the release.

"Also, in these strange times when both traditionally compute-intensive laboratories such as ours and wet labs are relying on compute storage, such as to run simulations on the spread of COVID-19, throughput of these cloud-hosted VMs is critical and even a slight improvement in utilization can result in huge gains," Chaterji said in the press release. "Even the best data centers [today] run at lower than 50% utilization and so the costs that are passed down to end-users are hugely inflated."

OPTIMUSCLOUD, on the other hand, can sift through hundreds of options and select the best fit according to cost. "When it comes to cloud databases and computations," Chaterji said in the press release, "you don't want to buy the whole car when you only need a tire, especially now when every lab needs a tire to cruise."

Also see

overview-optimus.jpg

A Purdue team created a technology called OPTIMUSCLOUD – which is designed to help achieve cost and performance efficiency for cloud-hosted databases. 

Image: Purdue University