For search engines and enterprise writing assistance, the top contender is OpenAI, which yesterday announced the latest model of its language model, GPT-4.
GPT-4 is now available on ChatGPT Plus and as an API, for which developers can join a waitlist. It’s throwing a new weapon into the AI war, in which organizations jostle to provide the best, most flexible writing AI.
- GPT-4 improves reasoning capabilities
- How OpenAI works toward more ‘factual responses’
- What does GPT-4 mean for business leaders?
- Is GPT-4 a great leap forward or an old idea in new clothes?
GPT-4 improves reasoning capabilities
OpenAI demonstrated the new natural language model with a challenge: “Explain the plot of Cinderella in a sentence where each word has to begin with the next letter in the alphabet from A to Z, without repeating any letters.” It’s a neat riddle to show the AI can perform some reasoning along with producing straightforward text, but what does it do in the office?
Creative and technical tasks are both on the table for GPT-4, OpenAI said in their announcement. The new model is able to match an individual person’s writing style as well as take directions about voice and tone. Some of GPT-4’s answers in the demonstration are simpler, taking Occam’s Razor to scheduling problems. The new nuance comes in part from its training on a custom-built Microsoft Azure AI supercomputer.
Specifically, GPT-4 is referred to as a large multimodal deep learning model, meaning it accepts either image and text inputs and creates text outputs. Note the distinction between OpenAI’s various iterations of its product. ChatGPT is the popular chatbot based on GPT-3. Meanwhile, GPT-3.5 is a first draft of GPT-4 and began training about one year ago.
SEE: How Salesforce uses ChatGPT (TechRepublic)
The exact difference between GPT-3 and GPT-4’s capabilities can be hard to measure. By OpenAI’s own admission, the difference is “subtle.” OpenAI tracked GPT-4’s progress by putting both it and GPT-3 through a variety of academic tests, such as those administered at the end of AP high school classes or the Uniform Bar Exam, and GPT-4 generally scored higher. More information about this can be found in the complete technical report.
How OpenAI works toward more ‘factual responses’
One of the common criticisms of natural language AI like this is that the results they produce tend to sound like human speech, but are not based on the actual facts of the content – they don’t check for accuracy. OpenAI seems aware of this, noting that GPT-4 is “40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.”
GPT-4 is also 82% less likely to “respond to requests for disallowed content.” Disallowed content includes hate speech, obscenity, threats of harm or other non-workplace-appropriate conversation topics the model might have picked up from around the internet text on which it was trained. “High risk government decision-making” and law enforcement decisions are also officially disallowed.
To avoid these, OpenAI has taken advantage of feedback submitted by users of ChatGPT; it also employed AI experts in the fields of safety and security. However, OpenAI notes that one of the major problems with the model is still its tendency to spit out “social biases, hallucinations, and adversarial prompts.”
Interestingly, some of that process involved the AI itself.
“We used GPT-4 to help create training data for model fine-tuning and iterate on classifiers across training, evaluations and monitoring,” OpenAI wrote.
OpenAI is also releasing open source code for OpenAI Evals, its framework for evaluation of AI performance, for anyone to look at their criteria and report problems.
What does GPT-4 mean for business leaders?
Business leaders may want to decide how much and which natural language AI service to allow or encourage their employees to use. So far, ChatGPT has solicited feedback from a variety of companies that use its products, such as the language learning app Duolingo, visual accessibility app Be My Eyes, and wealth management firm Morgan Stanley.
Microsoft has been running GPT-4 behind the scenes of the Bing search engine for about five weeks.
Is GPT-4 a great leap forward, or an old idea in new clothes?
Google and Microsoft are going head-to-head when it comes to adding AI to search capabilities. Google’s Bard AI will be made available to a test group soon, while Microsoft’s Bing partnered with ChatGPT starting in February. Anthropic, a generative AI company backed by Google dollars, has also entered the fray with a natural language model called Claude.
With Google leading the pack but also feeling the pressure, it’s hard to say where the AI trend will go next. Does it behoove your business to jump on board or wait? Maybe it depends on what the AI assistant could be used for. Where could it save time, or where could it get in the way? After all, some of these features start to sound like simply a more flexible, more resource-intensive version of Microsoft’s lost and much-memed Clippy.