London UK, August 8th 2020: Microsoft flagship store in Regent St, on Oxford Circus. The main entrance with logo. During lockdown, Covid-19, Coronavirus.
Image: Alex Yeung/Adobe Stock

Despite a global rush toward enterprise digital transformation, the document remains at the heart of most businesses, and unfortunately, managing them still remains a distinctly manual process. It’s more or less part of the day-to-day workflow of every knowledge worker.

SEE: Document retention policy (TechRepublic Premium)

Despite its structured nature, the flexibility of a document makes it hard to automate business processes, and taking data from multiple line-of-business applications to insert it in a document is a matter of cut-and-paste, from screen to document and often back again once a document is received.

Launched at Ignite in October 2022, Microsoft Syntex is here to solve some of these tediously manual issues, adding document processing tools to SharePoint. The solution uses machine learning to help construct and parse documents, turning a manual process into one where humans guide and check software, and where legal, regulatory and contractual requirements are still met. In this in-depth look at Syntex, learn more about content AI and some of the current use cases for this release.

Jump to:

What is content AI?

The idea of content AI is an intriguing one, as it’s about finding ways to work with the many different types of documents we generate and the often-unstructured information they contain. Systems like Syntex need to infer the context of that data, process it, extract it and summarize data to deliver it in structured formats. It also needs to be able to go the other way, using documents to produce templates that can automate document creation.

SEE: Artificial intelligence: Cheat sheet (TechRepublic)

At the core of this process is a set of machine learning models that are designed for content understanding. Designed to work against SharePoint document libraries, these pre-built models are similar to those used by Azure’s Cognitive Services, with custom models that can be trained on your own content.

Custom models in Microsoft Syntex

It’s the custom models that are most interesting, as they are what frame how Syntex works with unstructured content, supporting document processing and creation. Custom models use a teaching method to train the model, labeling existing documents as part of the training process to identify key content elements. The key to working with unstructured information is to show the key phrases and patterns that identify the important elements of a document.

SEE: 5 tips to improve data quality for unstructured data (TechRepublic)

The unstructured documents we work with are often formal letters and contracts, and while these will differ structurally, they will have specific phrases and structures that are common between documents with specific business meanings.

That last point is the most important, as it gives you the context for extracting data in a usable format, like a date or a price. By teaching a model that a set of words has a specific meaning and is associated with data that needs to be used, one can start to build a model that identifies variations of the key phrases and content formats.

Document understanding as a service

If you’ve been tracking Microsoft’s Cognitive Services AI platform, you’ll have come across its document understanding service. This was the predecessor of Syntex’s unstructured document processing model. For document understanding to work, you will need a selection of documents in a SharePoint content center to train your model, creating two types of tools: classifiers and extractors.

SEE: Hiring kit: CRM developer (TechRepublic Premium)

Classifiers are used to identify documents that are loaded into a library. As an example, you could create one that would find all the Request For Proposal documents loaded into a library. Extractors then identify key data in the document, providing that data to external applications. For example, extractors can use the name of a client requesting the RFP in the document library and add their contact details into your CRM system as a sales prospect.

Microsoft Syntex use cases

Microsoft Syntex can be used to organize, label and better understand your documents in a variety of ways. Here are some of the most common business use cases of Syntex:

Adding machine learning to your documents in SharePoint

Creating a classifier

Creating a classifier sets up a new SharePoint content type that will be attached to all identified documents. Training is fairly straightforward, starting with a selection of appropriate documents that can be used to build the model. The training process asks you to label the documents with the type you’re trying to identify; you need at least five positive sample documents and one negative.

You then need to provide an explanation for why those documents should be identified as a specific type, listing the key phrases and words used. The system will then use this as part of the model, checking to see if it matches all your sample documents correctly. If it fails, simply add more detail and run the training process again.

The training process is relatively quick. Once complete, you can test the model on other documents to see how effective your classifier is. Like most machine learning systems, don’t expect it to be right the first time — it may take several passes through this loop to get a model that works for your documents.

Creating an extractor

The process of building an extractor is similar, requiring you to label the data you want to extract from your files. Here, you need to highlight the information on your pages with a separate model for each specific piece of information you want.

Extractors can be refined if you need more complex rules in your model or need to avoid duplicates. In-use extractors add data to SharePoint document library columns where they can be passed to other applications using SharePoint APIs.

Applying content templates

Microsoft provides tools to simplify building explanations for both classifiers and extractors, using templates that already encompass many common content formats used in documents. For example, instead of adding all the different data variations that appear in your documents, you can pick the appropriate date template and add it to your explanation.

There’s a long list of templates that includes important financial data types as well as ways to extract international phone numbers and email addresses. For email addresses, users can set it up to recognize if data belongs to a sender or a receiver.

Generating documents using Syntex and Power Platform

Syntex’s document models can be used to generate documents as well, using familiar data sources to populate what Microsoft is calling a “modern template.” Existing Word documents can be used to generate a template by uploading them to SharePoint and using Syntex’s Template Studio to add fields to the base document structure. These fields can then be linked to a SharePoint list or library, allowing you to automate document creation.

SEE: The CIO’s guide to low-code platforms (TechRepublic Premium)

This isn’t a tool for flexible document creation; it’s more of a modern low-code alternative to mail merge. A modern template managed by Syntex can be the output of a Power Automate flow, taking data from across various business systems and using them to generate common documents, like contracts, invoices and invitations, where the format doesn’t change but the content does.

Once a template has been created and published, Syntex provides a framework for adding new values from a SharePoint list as well as automatically filling in some values where possible.

Alternatively, you can use a Power Automate action to automate the process. For example, you can automatically create a new contract when a new sale is recorded in Dynamics 365 or generate an employee handbook when a new name is added to Azure Active Directory.

The future of Syntex and SharePoint

It’s clear Microsoft has big plans for automating documents in Syntex, though the current preview is still rather limited compared to other machine learning-powered document processing services.

SEE: Artificial intelligence ethics policy (TechRepublic Premium)

Building on top of SharePoint makes a lot of sense, though. It’s an important enterprise content management system, so making it the home for Syntex’s document processing and creation hub fits well with its existing role and sets an interesting direction for SharePoint’s future role.

Read next: The 8 best alternatives to Microsoft Project (Free & paid) (TechRepublic)

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays