Teleoperation & Real-Time Stereoscopic 360° Vision

Published in

Virtual Reality Pop

8 min readOct 18, 2018

Operating a robot in a remote desert at 110 degrees. Parking a truck in a crowded lot from 1000 miles away. Repairing a server in a data center across the globe. Teleoperation should mean we can remotely see through the eyes and manipulate the limbs of our robotics without limitation. However, until the vision technology catches up with the complexity and speed of the challenge, it’s still just a pretty concept.

The Power Of Teleoperation

Teleoperation is the remote electronic control of a machine. It can include vehicles, rovers and manufacturing instruments, but it’s seeing increasing traction for deployment in the burgeoning markets of robotics and IoT innovation. With the inclusion of FPV (first person view) capability, it’s expected to play a profound role in the future of technological automation and control.

The concept behind FPV capability is straightforward. The operator straps into a virtual reality headset and can assume control of a machine, seeing out of the ‘eyes’ and manipulating the ‘limbs’. The advantages have implications in numerous verticals, be it military training and deployment, industrial mining and inspection or repetitive labour performance. In lieu of sending humans into hostile, inaccessible or dangerous environments we can now send robots, while the operator remains at a safe distance. Fundamentally, FPV capability allows the deployment of a machine in circumstances outside of the range of human line-of-sight.

With improvements, teleoperation will lead to cutting down on countless operating dollars and prevent loss of human life in hazardous working conditions.

The Ingredients of Successful Teleoperation

An operator needs to be able to see the environment a robot inhabits whilst disarming an explosive, they need to be able to judge the distance between the robotic limbs and the explosive and they need a real-time awareness in a time-dependent situation. They need the machine to react as they do.

Sight, depth and real-time are fundamental. Just as we can judge depth, see things with peripheral awareness as they happen and react to fast-changing and dynamic situations, teleoperation cannot be any different. The realization of these additions will move the current market from an interesting concept to a fully realized advancement.

The ideal model would be to mate the current systems with instruments capable of delivering these attributes. But teleoperation is still hampered with a lack of capabilities. Until these limitations are solved the market will remain in a suppressed state, capable of only a portion of its true potential.

The Current Limitations of Teleoperation

1. Limited FOV (Field of View)

Many currently teleoperated robot models, such as the recently announced Model H by Telexistence, apply stereoscopic computer vision using a front-facing two camera arrangement. Here, functionality is limited to the field of view. Object persistence and spatial awareness are problematic in this case.

Dependent on the environment, teleoperation demands a semantic understanding of the surroundings. Only a 360-degree depth-based view would allow the user to have a sense of peripheral vision and react to situations on-the-fly.

Drones are an obvious case study given that they need a wide field of view. Inspection and scouting, especially considering the safety risks of industrial environments, are huge verticals for drone deployment. Mine shafts and nuclear power stations are two obvious examples.

A system that ‘uses simultaneous localization and mapping (SLAM) based reconstruction with an omnidirectional camera’ to navigate the lack of native 360-degree capture is the focus of a 2017 paper on teleoperated drones.

This method builds a point-cloud of the environment to aid in navigation, but the paper expresses concern over the user not having the right spatial awareness from the computer vision and concedes that ‘real world factors like reconstruction noise and latency will likely inﬂuence results, and not having a preview of the course beforehand can change the operator’s behaviour’.

2. Latency & Real-Time Response

The annals of academic papers online still suggest that traditional latency problems still persist in teleoperation pipelines.

A 2016 paper on stereoscopic teleoperation of a humanoid robot, exploring panoramic view reconstruction, highlights this latency challenge. It suggests that the ‘problem is compounded since a remote user must first send a movement command to the robot via a network, the robot must then execute the movement, and the resulting stereoscopic camera images must then be captured and sent back to the user before being rendered on the immersive display’.

The resulting delay ‘can be well over 500 milliseconds, unacceptable for real-time viewing…even in an optimized situation where all machines are on the same local network’. And this is for a stereoscopic camera pair with a field of view of less than 180-degrees.

However, this method attempts to account for the lack of a real-time stereoscopic 360-degree camera; ‘much like stereoscopic 3D videos, saved reconstructions can also provide training data for future teleoperators and robot/machine learning algorithms…and can be navigated independently of time delay’. Although this allows the operator a real-time view of the environment, it’s impossible to react to fast-changing and dynamic environments.

The paper also addresses the commonality that seems to be that ‘360-degree cameras are not always present on humanoid robots’.

3. Depth Perception

Arming the robots with depth sensors alongside standard high definition monoscopic cameras is an approach coming out of Brown University’s research lab.

The system ‘uses the robot’s sensors to create a point-cloud model of the robot itself and its surroundings, which is transmitted to a remote computer connected to the HTC Vive’. The user then explores this virtual space and has access to the feeds from the arm-mounted cameras — both standard HD video feeds, but there is no 360-degree view present.

Brown University — Teleoperation

Similarly, in an application where VR was used to teleoperate a manufacturing robot, the lack of stereoscopic depth was addressed by relying on human inference; ‘instead of extracting 2D information from each camera and building out a full 3D model that can be re-displayed…the human brain does the rest, filling in the 3D information’. The accompanying video makes it very clear that depth perception is non-existent and the user instead has the task of judging the distance of objects manually.

Human Inference to estimate the visual depth

4. Processing Power: To 360 or Not To 360

All of the above limitations have one thing in common: processing time and computational power. A real-time response can only be achieved with an efficient rendering pipeline.

Improving the quality of latency comes down to bit-rate, encoding and streaming sizes. This is further exacerbated with 360-degree video sizes; the considerable increase in pixels requires a lot more processing than a standard visual frame and adding stereoscopic depth requires more rendering for image reconstruction.

‘Currently, we are working to improve the wireless network experience in terms of achieving a reasonable latency while maintaining a high-resolution 360-degree video for VR’, the conclusion of a 2018 paper on teleoperation for virtual tours states.

The figures indicate the usage of HTTP servers to feed the video into the headset, which for 360-video can be expected to introduce a delay between 5 and 20 seconds on average.

Pipeline for Teleoperation using a 360 Monoscopic Camera [source]

An example of real-time stereoscopic vision being implemented is the Robominer®, a teleoperated robot for mine safety and exploration. Developed by Enaex, the robot is designed for excavation and labour, and is ‘armed with the ability to capture mining environments in real time through the use of 3D vision’. But once again, this is with a limited field of view.

Consequently, the usage of 360-degree cameras in teleoperation remains limited. There is always a trade-off. For a better field of view, there is a lack of real-time and depth. For real-time response with better depth perception, there is a narrower field of view. It seems as though the teleoperation market is stuck in the middle of these two options.

So What Is The Solution?

Although there is still no all-in-one solution, as smarter machines hit the market the demand for improved sight is only going to increase. Real-time depth-enabled stereoscopic 360-degree vision is essential. It must have embedded capability and be light computationally.

On the processing side, methods such as tiled streaming and FOVAS are attempting to bridge the gap between real-time response and stereoscopic vision. A 30-frame-per-second 4K video can consume anywhere from 1 to 10 GB per minute while recording, and adding the 360-degree factor means every step up in quality squares in data amount, so the 4K equivalent of a 10gb file in 2K isn’t 20 gigs, it’s 100.

On the vision side, for industrial labour and warehouse applications, MIT’s CSAIL lab has been developing Dense Object Nets (DON) to teach teleoperated machines how to ‘see’ by training the robot to look at objects as a series of points to make up a coordinate system, and then ‘stitch’ these points together to make up an objects 3D shape. But this takes away the human component.

Furthermore, merging multiple systems of stereoscopic camera pairs is a consideration and a cornerstone of cinematic virtual-reality cameras, with the Google Jump ‘capable of processing hours of footage each day’. But this rendering and reconstruction time make instant-viewing virtually impossible.

One approach would be to combine a stereoscopic camera pair with a strategically placed array of depth sensors on the machine. Although the operator’s field of view would be limited, the depth information would, in essence, fill in the rest of the peripheral vision. With 180 FOV stereoscopic camera prices expected to decrease exponentially, situations that a solution like this could be utilized include the use of standard inspection rovers.

But for a full top-of-the-market teleoperation pipeline, one that arms a humanoid robotic with full surrogacy, the answer requires a system that can deliver the full 360-degree field of view as well as inferring stereoscopic depth in real-time. Something that ties together the reactions of the human operator and the machine without any compromises.

An obvious candidate would be a camera-system that merges these capabilities, has embedded support and can operate on a mobile GPU.

The Suometry OmniPolar™ Camera

The Path Forward

Once true teleoperation is possible we could see a shift in the way labour and work are performed. Notably, we could see an apogee in work-from-home policies and practices, wherein the employee or operator straps into a VR headset from home and operates a drone in their pyjamas.

With the constant developments concerning the remote operation of rovers in space, notably those on the moon and mars — and given the intent to eventually send humans to the red planet — we could see a new ideology emerge. With teleoperation, it could be better, faster and cheaper to send humanoid machines instead of risking human life.

It’s only a matter of time until it becomes a truly realized application and an integral part of manufacturing, industrial, medical and technology pipelines moving forward. Eventually, we may be able to ‘be there’, without being there.