What If Real-Time Really Was Real

Sam McLean
Virtual Reality Pop
8 min readJul 18, 2018

--

Latency usually means something negative. When someone refers to latency it means ‘lag’, ‘delay’, ‘wait time’, or just ‘slow’. All bad.

Real-time, on the other hand, has always been a positive — as in good responsiveness. We want our apps to respond instantly. We do not want to wait for our bank ATM. We want the brakes in our car to respond in milliseconds. Real time response is the key ingredient for any technology that has to interact with the real world.

Imaging and depth measurement are critical parts of next generation real time systems. What are the objects in front of me (which could be a car, robot or human), how far away are they, and how fast are they moving?

However, for the most part, imaging and depth measurement systems have been characterized by latency, not real time responsiveness. Achieving real time imaging and depth has always come with a sacrifice — reducing field of view to a narrow window, radically reducing the resolution, simplifying the system to a single sensor pointing in one direction at a time, or spinning sensors around on gimbals.

This in turn has fundamentally limited the functionality of the systems. Robots that can only see in one direction, cars with massive computers and sensors that only work in fair weather, drones that can see, but not measure reliably…

But what if we remove that limitation?

Essentially the question we’re asking at Suometry is: What happens when you have real-time stereoscopic 360 degree capture? That is, when you get rid of the latency, the overhead, from the camera in applications that demand millisecond response.

Let’s consider the latency continuum and the applications that unlock as we move into the real-time zone. All of the killer apps concerned with this zone have one thing in common: immediate interaction and response.

The Latency Continuum by Suometry

Telepresence

Considered a silver bullet of sorts when discussing the use cases of cameras and immediate interaction, telepresence is now a viable solution that allows a true sense of ‘being there’. For lack of a better term, it is akin to digital teleportation.

Given the immersive qualities of stereoscopic 360 degree video and the advancements in real-time volumetric capture and mixed reality headsets, it’s closer than ever to true functionality. Its official definition states:

The use of virtual reality technology, especially for remote control of machinery or for apparent participation in distant events.

  • a sensation of being elsewhere, created by the use of virtual reality technology.

Fundamentally, real-time stereoscopic vision is essential for delivering this latent technology solution in its full form; the remote conference meeting where two people in totally different locations stand in the same room, or in the same street, or in the same lab.

But how does it work and what is needed? Well firstly the one thing that can’t get in the way is the camera.

Telepresence involves an interconnected pipeline of processes that all require real-time responsiveness, from the broadband speed, the processing speed of the headsets or screens and the RTC protocols being used. It needs to operate as fast as human conversation demands.

The real-time camera can contribute in two ways.

  1. Firstly, by capturing the environment that the subjects’ re-projections inhabit in real-time stereoscopic depth. That is, the space in which the conference or meeting will physically take place. Having this information, as a 3D depth map, able to be retrieved in millisecond time allows a semantic understanding of the space and its objects.
  2. Secondly, capturing the subjects themselves (the remote people attending the meeting) as stereoscopic point clouds for projecting back into the space.

Ultimately telepresence will test the entire ecosystem of real-time capture capability; capturing and reconstructing, avatar generation and re-projection, bandwidth and delivery.

Robotic/Machine Vision

Navigation, object recognition and learning are essential for machine vision. Stereoscopic cameras, mainly with a limited field of view, have been used for implementing robotic detection systems thus far, but offering a full stereoscopic 360 real-time capture solution would help assist in solving some of the persistent challenges facing the robotics market.

Collision avoidance is still in its infancy. Its fundamental attribute is real-time response. Currently a combination of lag in the embedded computer vision, limited FOV cameras and sluggish directional movement on the hardware side have slowed progress. Omni-directional movement is where the industry is heading, allowing robotic systems to maneuver smoothly whilst adapting to cluttered and dynamic environments.

It’s common sense that in order to properly realize the application of robots being able to respond to situations in which humans are potentially at risk, millisecond response has to be present. Stereoscopic 360 vision is essential for enabling this in real-time and latency cannot get in the way. The same principle applies to next generation ADAS systems.

To hammer this point home let’s compare the depth maps from a standard front facing stereo camera to that of a full equirectangular point cloud.

Limited FOV depth mapping
Equirectangular stereoscopic 360 depth map

It’s easy to see the vast improvement in terms of awareness in all directions. A crucial aspect in this regard is the capability for wide encompassing stereo ‘sweet spots’.

Real-time remote control is another crucial and emerging application. The concept is simple; the user takes the perspective of the machine and assumes control of its movement through instant feedback. The important thing for the user, outside of millisecond response, is the ability to gauge the depth of the environment they’re viewing — made possible through real-time stereoscopic 360 capture.

James Cameron wants his robot back [source]

The potential of real-time abilities has been the subject of many research programs and papers for years, and is now even spawning companies dedicated to solving the problem. Realtime Robotics state in their mission that machines ‘remain limited by their inability to instantly generation collision-free motion paths in response to fast-changing situations’ — this could eventually be a moot point — but only if real time 360 degree imaging is available.

AR Environment Capture

Recently at AWE 2018 the main talking point concerned the construction of the AR cloud; the concept being the need for a shared map of the world in order to meet the potential of widespread augmented reality applications.

The augmented content embedded inside the virtual map needs to be persistent, accessible cross-platform and there has be a built in semantic understanding for the objects to react with the environment naturally and convincingly. This will allow for a ubiquitous architecture that can be built upon over time.

The AR Cloud — building a virtual map of the world by real-time depth mapping [6d.ai][source]

The objects themselves need to be retrieved and interacted with on a millisecond basis. In order to have the objects navigate the environment, the ‘mesh’ of the space needs to be understood in real-time.

The intuitive way to go about doing this involves the utilization of a stereoscopic 360 degree camera capable of delivering instantaneous point cloud data; the need for a real-time database of sorts. The need for an ecosystem is apparent.

The hurdles of processing the data and connecting the environmental understanding with the objects themselves, as well as occlusion meshing and cross-platform accessibility, are fundamental challenges that the AR industry is tackling head on.

Ultimately the starting point calls for a camera solution capable of capturing 3D spaces in real-time as oppose to simply utilizing the limited capabilities of smartphone cameras. From there the depth information would act as a foundation for both software and coordinate mapping technologies to build on.

True First Person POV Experience

Having a fully realized virtual reality pipeline in place (room-scale movement, six degrees of freedom, spatial audio, exceptional resolution, ‘low-latency’) allows users to experience immersion. It’s a tool for transmitting experience.

But what if we removed the latency completely?

The key traits of a immersive experience include intimacy, presence and interaction— all of this is only possible with a stereoscopic experience.

Intel’s True VR program is one of the most ambitious immersive sports plays, providing the viewer with the ability to take any perspective on the field. It has been labelled the bumblebee approach, literally allowing viewpoints from the POV of the quarterback. It’s an intriguing proposition. Imagine this in real-time? It could fundamentally disrupt the way people experience live sports.

Similarly, Valve experimented with a VR spectator mode in the eSports realm; 4.5 million DOTA 2 spectators were given the ability to jump into VR and experience the game as if they were on the battlefield. This was in 2016 and was touted as being the beginnings of real-time social VR.

Once the challenges of streaming, encoding and wrangling the vast amounts of data needed to allow real-time high quality VR pipelines are solved, the way in which we experience all forms of streaming entertainment will change. Strange Days, a largely overlooked nineties cyberpunk film, explored the concept of using VR to look out of the eyes of a stranger— someday this might become a normal way of passing time.

What Does It All Mean?

Stereoscopic 360 degree vision, whilst being a fundamental starting point for many of the applications explored above, is only just a piece of a bigger picture.

Real-time response is essential. I don’t know about you but I wouldn’t want to be the person who was killed by a robot or a driverless vehicle simply because of latency. Latency is bad.

It’s clear that instant communication, interaction and collaboration are crucial aspects of modern technology. In order to foster an age of innovation in a variety of sectors the cogs in the pipeline all need to act in a responsive way.

The latency of the camera, the computer vision, cannot get in the way.

--

--