Between Trainspotting-style adverts encouraging people back into the office and the news that Google won’t bring staff back into the office before July 2021, it’s hard to be sure what the future of work will look like. But for the foreseeable future, it’s certain to include a lot more video meetings.

The new Together mode that Microsoft added to Teams this summer is an attempt to make those video meetings less tiring and more productive, using some simple tricks that factor in the way the human brain works (and some things it’s not good at) and the way we react to other people.

SEE: Video teleconferencing do’s and don’ts (free PDF) (TechRepublic)

The real problem with video conferencing isn’t the colleague who doesn’t realise they’re on mute, or the person eating noisily, or any of the other bugbears. Even if the technology works perfectly, you still don’t feel as if you’re in the same room as other people. That might make you slightly anxious, it might make meetings more tiring for you or make it harder to concentrate in long meetings, and it can lead to more misunderstandings or less polite behaviour.

What you’re missing is the completely subconscious web of interpersonal cues that you don’t even know you’re looking for, Microsoft researcher Jaron Lanier explains to TechRepublic. Put those back and you get social, spatial and interpersonal awareness that makes people more comfortable.

Through the looking glass

Jaron Lanier, VR pioneer and Interdisciplinary Scientist at Microsoft Research.
Image: Wikipedia

“Your brain wants to know where other people are: so subconsciously it’s scanning around and keeping track of what everybody’s intent seems to be — what their state of attention is; if they’re trying to get your attention, if they’re reacting to you or to someone else and so on,” explains Lanier, a virtual-reality pioneer whose title at Microsoft Research is ‘Interdisciplinary Scientist’.

“The human brain has specialised areas for keeping track of where stuff is in the environment and where you are in the environment — but in particular, where people are in the environment. We evolved to be very good at tracking other humans and assessing very quickly what’s going on with them. It was a crucial survival skill to know if they were hostile.”

Teachers and presenters will be very aware of this, Lanier suggests: “There’s an amazing sensation of being able to keep track of a hundred people in front of you at the same time and know which ones aren’t paying attention. When people are paying attention to each other, you can tell that they’re paying attention to each other.”

The grid layout of most video-conferencing apps makes that impossible, no matter how many little squares there are on-screen. And because the camera is up, down, at the side or anywhere but behind the screen you’re looking at, the brain also can’t work out what other people are paying attention to by noticing where they’re looking.

“When we speak to one another, we’re not just exchanging words, we’re exchanging glances and gestures and subtle changes of head position and subtle eye movement, even changes in skin tone; these are all things that we know measurably are part of communication, although they’re usually subconscious. In order for those things to work, you have to understand your spatial relationship to other people, or you won’t know who they’re reacting to.”

Unlike the gaze correction coming to Teams for Surface Pro X users, which tries to make it look like you’re gazing into your camera even when you’re looking down at your keyboard while typing into a chat window, Together mode doesn’t change your appearance in real time. But it makes it look as if everyone is in the same place, fools your brain into thinking that it knows when people are looking at you, and also makes you want to fit into the environment.

“What you want is a design that makes it hard for the brain to notice that the angles are wrong, but at the same time gives the brain the consistency of physical or virtual space in which to scan people, and have perception,” Lanier explains.

Together mode uses a surprisingly simple bit of what Lanier calls ‘scientific trickery’. Teams cuts you out of the video stream the same way it does to apply a virtual background; but when it drops the cutouts into the group background it also flips them, so you’re seeing yourself and everyone else as if they were in a mirror.

“When you have that geometry, it turns out the brain is poor at estimating where somebody is looking because there are two angles: there’s the initial angle and then the bounce, and because the brain didn’t evolve in an environment filled with mirrors it’s just not good at that.”

As a result, Lanier says, your brain doesn’t notice as much when people aren’t looking at you, so you don’t feel ignored in the same way. “By creating the virtual mirror, we retain the spatial awareness the brain needs for social contact, but we remove the specific person-to-person vector, where the brain can detect errors easily. We’re keeping the part the brain needs, but then throwing out the part that we can’t do with software alone.”

Solving both those problems is why what looks like a cheesy effect in a screenshot is actually a profoundly different experience when you actually use Together mode.

“You’re no longer in boxes, you’re no longer separated by a barrier. If I point at someone, you can tell who I’m pointing at. It creates a different atmosphere: it creates a sense of a shared place, it creates a sense of a shared goals, and a shared stake,” says Lanier.

The virtual environment is far from perfect, but good enough that it changes behaviour, Lanier says: “People notice how they appear in the room, and they start to subconsciously perform in such a way that their responses and their cues are correct and honest for the other people around them. People being aware of how they appear to others in depth strengthens this web of social spatial interpersonal awareness. It’s good cognitively, it’s good emotionally, and it’s good practically.”

Some of what people do in Together mode is see whether they can high-five each other, throw around an invisible ball or point the webcam at their dog so it shows up in the group. “This was a near-universal phenomenon — that people became playful on the first encounter,” Lanier notes, and while he was initially worried that was bad for the productivity the system was designed to improve, he quickly changed his mind.

“The science is very solid that play is not some sort of arbitrary flaw in human nature, but rather is a strategy honed by evolution by which people get to know one another, get to know their environment, assess the situation, develop comfort with it and develop patterns for cooperation.”

If Together mode is good enough to make people feel playful, that means it’s working.

That feeling of playfulness often leads people to ask for a richer set of backgrounds or the ability to design their own. Lanier is cautious about this because getting backgrounds that work means “walking a tightrope and balancing a lot of different issues to get the cognitive and social perception effects”.

SEE: How to manage your privacy and other settings in Microsoft Teams

The lecture theatre setup in Together mode works partly because the rows are staggered: change that and it’s less successful — not just because one person is obscuring someone behind them, but because you lose the diagonals that allow open perception between people.

Cool graphic designs aren’t enough; they have to support the moment-to-moment interactions between people, says Lanier. “You want to avoid positioning between people that would excite the fight-or-flight response; you want to allow people to fine-tune their non-verbal communications to be authentic and correct: all these interactive things that just absolutely have to take precedence over pure static graphic design interpretation.”

If Together mode takes off, people will experiment with different designs and arrangements of people: that will allow rules to emerge for what works and what doesn’t. Some of the ideas take advantage of the way technology creates what Lanier calls ‘a new theatre of experience’. Together mode is already being used for virtual audiences at sporting events. It might also work for crossovers with something like Minecraft.

“You could build up a Roman Colosseum of blocks of Minecraft and have the Minecraft avatars sit there. Then, as long as you have some indication of where somebody else is seated in front of their webcam, you can create a unified audience between the two.”

Truth and responsibility in technology

Too many video meetings really do make you exhausted.
Image: Microsoft

Together mode deliberately doesn’t resize your video to make you fit in the background better or try to unify lighting — or clean up any wrinkles or bags under your eyes. “There’s a danger of being overbearing,” Lanier warns. “I’m critical of many of the social media designs, because I think they make people unnecessarily paranoid and angry and create a kind of a dystopian effect. There’s a balancing act where you need to create technologies that make sense to people, but stop right at the line where you’d be starting to manipulate those people.

“A lot of computer interactions don’t give people the opportunity to have a choice of how they exist within the interaction,” Lanier points out. By not adjusting how your video scales to compensate for how far you’re sitting from the webcam or how wide-angle its lens is, Together mode gives people another way to make an effort to join in and be present, and most people will shift their position or their device to fit in.

“That creates an opportunity for people to cooperate and create social trust that would be absent if we intervene technologically to force them all to be the same size. The mere possibility that you can reach over and tap somebody on the shoulder or find a way to invade their space, but you choose not to, creates an opportunity for signalling social respect and cooperation that isn’t present in a normal grid mode. What I hope is going on is that that slight amount of shared responsibility for the scene does shift the mentality of it, so that people are aware that everybody does really have responsibility to one another, if there’s to be a conversation – because unless everyone takes responsibility for keeping any conversation, in whatever setting, sensible and civil there will not be a conversation.”

Similarly, Lanier hopes that video conferencing systems don’t normalise virtual plastic surgery. “There are algorithms that make people ‘look better’, that can adjust people’s skin tone, even adjust their faces.” Done badly, that might be as disturbing as the Cats movie, but it also imposes expectations of how you are ‘supposed’ to look in a professional conversation: “You’ve given up some control and some power at the point where everybody says ‘okay, somebody behind the curtain can decide how I look’.”

Lanier connects that to what he calls ‘the rising anxiety about deep fakes’ (noting that Microsoft has introduced a detection tool to spot them) and the responsibility of building the ‘theatre of experience’ that technology creates for people. “If you have really vivid technologies that can kind of engross you, you can really profoundly affect people. We don’t have a corresponding structure for how we can act reasonably and ethically, and with fealty to the truth, with our computer; so for the moment, we have to treat it a little bit as an intuitive art — that’s the responsibility of individual designers.”

“I think we’re on this edge where we absolutely have to come up with the culture and the ethics, and the structure and the institutional support, for deep truth rather than deep lies,” says Lanier.

He uses the example of Microsoft Flight Simulator using real-time data that let people fly into hurricanes. “If it’s done with integrity — meaning that it can never be perfect but the people doing it are trying their best to make it as accurate as possible — it’s almost like a deep truth mechanism. It’s saying ‘we’re going to give people access to levels of truth about their world that were previously obscure’. There’s a balance: the clouds are as accurate as they can be at that moment and yet you can’t fly down and snoop in somebody’s window. It doesn’t disadvantage others more than it advantages people. A lot of the people who created the internet were hoping that that would be exactly the effect that would come about — that ordinary people would have access to more truth and more of a sense of being connected and responsible and part of the world, rather than being sort of remote and insignificant and ignorant.”

Designed for this moment

This is your brain in Together mode.
Image: Microsoft

While Together mode isn’t perfect, it’s good enough that Microsoft wanted to release something that was built in just a couple of months because “it’s designed for this moment,” as Lanier puts it.

“It relies on assumptions that only apply in the pandemic. It assumes that each person is in front of one webcam in a different physical location. It currently does not assume that a conference room exists. It currently does not assume that you have whiteboards in your environment that you want to share. It currently doesn’t assume a whole lot of things that were kind of normal behaviours before the pandemic. This current design is being released during the pandemic to make the pandemic a little less miserable.”

SEE: 9 tech tools designed to make online learning better for students and teachers (TechRepublic)

Under normal circumstances, Together mode might have stayed in the lab for a couple of years, but initial results in testing were positive enough to make it worth releasing sooner rather than later.

“When we test this, we see results that are reminiscent of the results that we’ve seen in the past only with elaborate and expensive volumetric cameras and displays,” Lanier says.

Measurements of brain activity show it’s helpful. “We see an indication that people who are using this are more relaxed and also more attentive, whereas people who use conventional video tend to take on certain stress levels and become less attentive in the course of a meeting, and particularly in the course of multiple meetings in a day.”

What users say about how they feel matches that: “They’re more relaxed, more attentive, in general and have a better sense of well-being after doing a lot of meetings.”

The way people behave in meetings changes too, Lanier says. “Do people tend to keep your cameras on more when they’re in this? They do. Do they tend to look at others more than themselves? They do, which is amazing. Do they tend to spend less time negotiating who’s talking? They do. Do they remember what was said better? They do. Do they remember who was present in a large meeting better after a few days? They do. We’re seeing measurable improvements in meeting efficacy.”

Lanier notes that this kind of waking activity is easier to measure in ways that let you come to reasonable initial conclusions than more complex interactions about, say, the effect of light colour on sleep habits (where those conclusions have changed after the technology industry has already made changes to operating systems and monitors).

There’s more research to be done on how well Together mode works for neurodivergent users and those with disabilities. “We’re trying very hard to start with diverse populations whenever we do user studies, because there’s been a tragic problem of people inadvertently biassing things like AI algorithms or user interface designs by having an inadequately diverse population. But that’s not good enough because we also need to have specialised tests for different populations to understand what’s really going on. I have active projects working on these types of approaches for the blind and deaf, we have active projects for people with attention deficit disorders and so on. I view this as absolutely crucial.”

That’s another reason to prefer making fewer changes to the way people are presented, Lanier notes.

“It’s incorrect to assume that everybody wants to be staring into each other’s eyes, and that’s not only very dependent on individual cognition and the particular relationships with people. It’s also very culturally dependent. I’m quite concerned that people, if we start gaze correcting too much, will actually prefer some cultures over others without fully realising it, and will prefer people with certain cognitive styles over others without realising it, and will probably start to prefer people in social certain socio-economic classes over others in a given society without realising it,” he says.

“One of the small positive effects of this horrible pandemic is that people have been doing so much video communication that they’ve finally just given up on trying to wear nice clothes or whatever. There’s a new acceptance that you don’t have to put a lot of effort into presenting yourself to talk to others and I think that that’s for the good, I think that makes the whole system more inclusive and less scary and less elitist, so I hope that that shift will continue after the pandemic.”