Intel, Georgia Tech, and MIT code similarity project could address tech shortages

The machine inferred code similarity system has recorded scores that are at times 40 times more accurate than other existing systems, according to Intel.

misim-intel.jpg

Image: Intel

In the era of digital transformation, more companies are looking to leverage automation to streamline their business models and enhance efficiencies. At the same time, many companies are struggling to onboard the talent to fulfill their operational objectives. The tech talent shortage has been widely discussed over the past few years.

In 2017, it was estimated that there would be as many as 1 million developer positions left unfilled by 2020, according to Gartner . At the time, more than 80% of representatives on the TechRepublic CIO Jury reported difficulties finding necessary tech talent at their organizations. The coronavirus pandemic has even highlighted the risks associated with scant programmer talent; namely COBOL programmers to assist with older mainframe systems.

SEE: Quick Glossary: DevOps (TechRepublic Premium)

To assist, a consortium of organizations including Intel are working to develop a system to determine functionality similarities between snippets of code.

On Wednesday, Intel released details surrounding the programming project in partnership with Massachusetts Institute of Technology (MIT) and Georgia Tech (Georgia Institute of Technology). The machine inferred code similarity (MISIM) system has been engineered to study the overall structure of code as well as analyze the "syntactic differences of other code with similar behavior" to in essence "learn" the code's intent.

In general, machine programming (MP) efforts are focused on enhancing development production via automated tools, according to Intel. The company believes that code similarity is crucial to a host of MP tools.

"Intel's ultimate goal for machine programming is to democratize the creation of software. When fully realized, MP will enable everyone to create software by expressing their intention in whatever fashion that's best for them, whether that's code, natural language, or something else. That's an audacious goal, and while there's much more work to be done, MISIM is a solid step toward it," said Justin Gottschlich, principal scientist and director of Intel's machine programming research, in a press release.

SEE: Top 5 programming languages for systems admins to learn (free PDF) (TechRepublic)

Today, there are a number of challenges surrounding building these code similarity systems, as accuracy exists as a "relatively unsolved problem," per Intel. Such a system aims to understand if two snippets of code express analogous qualities or seek comparable outcomes. This is "a daunting task when having only source code to learn from," as Intel points out.

When analyzing a pair of code snippets, MISIM is able to accurately calculate computational similarities, per Intel. MISIM's context-aware semantic structure (CASS) is the differentiating factor between this code-similarity system and others. Instead of trying to discern how a snippet of code does something, MISIM's CASS allows the system to more aptly discern what this code is intended to do.

Within the structure, neural networks assign "similarity scores to pieces of code based on the jobs they are designed to carry out." MISIM identified "similar pieces of code up to 40x more accurately than prior state-of-the-art systems," according to Intel.

Also see