Meet the Microsoft AI project that will transform corporate data into knowledge

We take a look inside Microsoft's new collaborative development process for SharePoint and machine learning-based knowledge management systems.

microsoft-project-cortex-diagram.jpg

Project Cortex uses AI to organise content, delivering topic cards, topic pages and knowledge centres in Office, Outlook and Teams.

Image: Microsoft

Microsoft's history with knowledge management goes back a long way, from pre-SharePoint tooling with Site Server, through its abortive Knowledge Network platform, to today's mix of Bing and the Microsoft Graph for Microsoft 365 subscribers. Now the company is trying again, adding machine learning to the mix to help organisations understand what they know, and more importantly, who knows it.  

This time there's a lot more training data, a deeper understanding of the knowledge graph that underpins most businesses, and above all, the hyperscale compute of the modern cloud. 

SEE: How to securely and completely delete files in Windows 10 without third-party software (TechRepublic)

Microsoft's Project Cortex is an ambitious set of tools built around Microsoft 365, intended to automate the complex process of building and deploying knowledge management systems. Using a mix of search and machine learning techniques, it extracts information from stored documents, from Teams and Yammer conversations, and from relationships in the Microsoft Graph. After all, goes the thinking in Redmond, there's a lot of information in your data, so how can you get the most from it? 

Originally announced at Ignite 2019, the planned big project is being broken up into smaller sets of tools focused on parts of the problem, with the first, SharePoint Syntex, now shipping and more tools arriving in 2021. That's a typical Microsoft way of working: delivering a tool that quickly delivers value, while continuing to work on the more complicated part of the problem. 

How did we get here? Normally much of Microsoft's product development process is carried out internally, with program managers providing a link between developers and end users. But that has not been the case with Project Cortex, where key pieces of its development have been carried out in partnership not only with customers, but also with some of its business partners -- especially with companies that have developed significant expertise in Microsoft 365, and who will be the main way Project Cortex will be deployed. 

sharepoint-syntex.jpg

SharPoint Syntex, the first shipping product to emerge from Project Cortex, uses AI and ML to deliver enterprise content services.

Image: Microsoft

Collaboration with partners 

We spoke to Chris O'Brien, products and services director at consultancy Content+Cloud, on how that partnership worked, and how it helped to create the Project Cortex tools that Microsoft is shipping. It's been a long road to get to the shipping Syntex, he told TechRepublic: "Microsoft started to talk about their ideas probably two or two-and-a-half years ago, in terms of this thing that they were building that would let them take knowledge management to the next level by using AI to do a lot of the work that humans would have done otherwise." 

As O'Brien notes, businesses aren't homogenous -- they're made up of different groups that have different knowledge bases. "They have these communities of practice with ranges of knowledge, but how do you tap into what the 'golden knowledge' is for each one of those? How do you capture that information, ingest it, create it and present it back to the business, and really make the golden content come to the top? Often that's been a labour-intensive process with lots of people involved in gathering up that information and having an information manager for that particular part of the business." 

Project Cortex is perhaps best thought of as that information manager, or at least as a set of tools to help them do their job more effectively. Using machine learning (ML), it extracts and summarises content from stores, helping to categorise and index files. 

Taking a customer-led approach to building knowledge management tools is a sensible choice. We could spend years defining common data models like Dublin Core, or we could ask people what they want to see and know. The latter option is often simpler, easier to implement, and in many cases is easier to generalise. Project Cortex's use of ML fills in many of the gaps, automatically customising itself based on the data. Relationships in your company won't be the same as in any other, nor will the documents you create. So, by building a set of basic models that work across many different organisations, Microsoft can ship a system with the flexibility you require. 

Working with customers 

Working with real-world customers as part of the development process is par for the course with Redmond. It's long had programmes like TAP (the Technology Adoption Program) and RDP (the Rapid Deployment Program) to quickly get feedback on new applications and new ways of working. Building Project Cortex on top of real-world customer data is an extension of this process, one that takes advantage of it being a non-destructive tool that extracts and manages data. O'Brien feels that Microsoft's new multi-disciplinary approach to development -- "We were being put in front of different teams of engineers and different groups around Microsoft" -- is important, and something that "probably wouldn't happen two years ago". 

The key difference here is that the customer preview was happening at the same time as development -- something that normally doesn't happen until much later in a product's development lifecycle. This approach allowed Microsoft to understand how to split out Cortex functions into SharePoint Syntex, as O'Brien says: "I think we helped them shape things that got separated out into SharePoint Syntex, and we were in pivotal discussions around exactly how the AI would work and how it would surface in the user experiences." 

SEE: Windows 10 PowerToys: A cheat sheet (TechRepublic)

You probably wouldn't take this approach with a tool that processed data. Instead, it's all about providing information support to workers, making it easier to find what they need, when they need it. By making development an interactive process, Microsoft was able to tune algorithms that surfaced the information customers needed. As O'Brien notes, this required more than documents: "Who could we start a conversation with, who could we use to assemble our bid team on this work, or who could we try and take to a pitch? And that's where Microsoft started to use some of these ideas, to bring in some of the aspects like people- and expertise-finding, and making sure that I can get expert answers." 

Surfacing information is the key feature here -- using the Microsoft Graph to generate summary content, and using topic cards to highlight links when key terms appear in documents or in chat, or in SharePoint content on an intranet. These cards show who is the right person to contact about this information -- not just who created the documents, but also who owns the content and who knows the most about it. 

Preserving SharePoint's future 

With SharePoint part of Microsoft 365, and the foundation of key tools like OneDrive and Teams, as well as powering new applications like Lists, Microsoft is now in an interesting position to bring together many of the sources of enterprise content. With SharePoint hosting everything from unstructured documents to semi-structured lists, from formal emails to quick chats, as well as corporate video, Project Cortex and Syntex are key to getting control over your content -- especially as new ML-based transcription tools can unlock meaning from audio and video. 

Rapid change in working methods, with remote and hybrid models, are likely to speed up both adoption and development, changing the relationships between Microsoft, its partners, and its customers. O'Brien is looking forward to those changes: "We're all in a different world now, and the old ways that people would consume and disseminate information have changed. And I think that's the underlying movement that's driving some of this demand." 

Also see