Object Recognition in Augmented Reality

Anastasiia Bobeshko
Virtual Reality Pop
9 min readApr 7, 2017

--

When you think of augmented reality, one of the key elements to consider is object recognition technology, also known as object detection. This term refers to an ability to identify the form and shape of different objects and their position in space caught by the device’s camera. Augmented reality is the enhancement of the view of the real world with CG overlays such as graphics, text, videos or sounds, and across all AR applications, object recognition is particularly severe. Most of these apps are marker-rich, which means they use special images, pictures, or objects to trigger pre-defined 3D visualization, animation, video, or soundtrack. In other words, they use object detection and tracking to determine what relevant information should be added to the real world.

One of the most beneficial areas to apply augmented reality and object recognition technology for the tech community is, in my opinion, electronics. AR might be useful across all processes — from prototyping and construction to maintenance and operations. Just imagine, you aim the camera of your smartphone at some complex electronic device — let’s say a reactor control unit — and, lo and behold, you can see the position of all buttons on the screen of the smartphone as well as see detailed information about what is responsible for what. It may not seem that amazing, but believe me, it is.

I was wondering what the best tool for tracking augmented reality and object recognition is and have not found many comprehensive opinions. For that reason, I decided to conduct new research and came up with an article powerful enough to answer my question. Today, I explore the two most popular approaches to recognition of control buttons of an electronic device as well as their state and position regarding other objects, compare them, and give recommendations of which one to choose. As an electronic device, I decided to use a tester of linear integrated circuits that I found at the flea market.

Vuforia

Let’s start with an obvious choice of both novice developers and experts. Vuforia is a standalone library that allows applications to recognize images, boxes, cylinders, text, and arbitrary objects in the environment. This tool is extremely fast, robust, and is easy to use. Moreover, it is extremely well-integrated into Unity, a leading development engine, since Unity and Vuforia have developed a solid partnership and have a shared R&D lab. Together, they introduce an irrepressible power to create amazing augmented reality experiences.

To understand why Vuforia is one of the first tools to consider for object recognition, let me drop a few lines to explain how it works. Typically, Vuforia matches images caught from a camera with a pre-defined reference image. Because both images are byte arrays, searching for similar elements between the reference image and the image presented is a cumbersome task. One of most prominent Vuforia advantages is that this tool analyzes both images while searching for specific feature points.

By feature points, you should understand unique elements that each image has. They are typically high-contrast spots, curves, or edges that will not be significantly changed while you are looking at them from different angles.

Vuforia processes a reference image only once when searching for these feature points. Thus, if the image does not have enough feature points, it will likely not be detected well. Therefore, the main goal of reference images is to have plenty of feature points that may be a type of anchor for object recognition technology. Vuforia, in turn, remembers relative positions of all the feature points and brings them together into something resembling the shape of constellations. Then, it runs the same feature extraction algorithm on every camera frame and tries to match two sets of feature points. If the majority of reference feature points are found on a camera frame, then the image is recognized as the marker. By comparing relative positions of a reference “constellation” with recognized “constellation,” Vuforia understands the marker orientation in the physical world.

Vuforia is also capable of recognizing 3D objects. To be recognized, real objects must be opaque, solid, and contain no — or very few — moving parts. Moreover, their coloration should be contrasting (B&W will be the best option) or at least contain contrast-based features. Objects that have holes or might be easily deformed will not be recognized by Vuforia.

All objects are scanned using the Vuforia built-in application called Object Scanner. The principle of scanning volumetric objects is similar to scanning plane ones. In order to ensure seamless recognition, you should traverse the scanned object from all sides. Vuforia Object Scanner will search for contrast feature points and, if enough of them are found, the object will be quickly and flawlessly recognized.

Vuforia Advantages

1. Apart from being easy to master and even more easy to use, Vuforia is characterized by good integration into Unity.

2. Equally good in terms of recognizing flat, convex, and volumetric objects.

3. Markers to trigger augmented reality may be complex as well as simple.

4. Vuforia helps facilitate and simplify the development process.

Disadvantages of Vuforia

However, Vuforia is not all roses — there are many limitations. Most of them are related to flat images that act as markers and object recognition itself.

For flat markers:

1. In order to be recognized, flat markers need to have contrasting coloring, preferably with small details that will act as unique elements to be bound to.

2. For seamless recognition, markers should be matte in order to exclude flares.

3. Markers with striped elements or fully stripped, in turn, will not be recognized in any case.

4. If there is an object that covers a part of the marker, it might not be recognized.

For objects (in a flesh):

1. The chosen object has to be convex and preferably of uniform shape (without prominent protrusions), otherwise, forget about scanning it with Vuforia.

2. It is better for an object to have enough contrast points. In other words, it has to be colorful with many small details.

3. There is a maximum of two objects to be recognized simultaneously.

Another flaw of Vuforia is that it does not provide access to the detected feature points on a reference image. Let’s say we want to detect a front panel of our electronic device with many knobs and levers on it. Yes, the tester of linear integrated circuits (the one from the flea market) contains plenty of them, so it will be a good example.

When scanning the front panel of our test object, Vuforia will place the feature points not only on the panel but also on levers and knobs. Thus, if the position of any of these knobs or layers changes, chances are that the object will stop being recognized are quite high, since the relative positions of feature points on the panel will change.

Moreover, there is a limited distance at which objects are recognized well. Let’s assume that a typical smartphone camera has a 45-degree field-of-view (FOV). In a best case scenario, a marker or an object will be recognized if occupying 1/8 of the camera FOV. At worst (bad lighting, object scanning failed, very few feature points), the object has to cover at least 1/3 of the camera FOV.

Furthermore, if there are two objects to recognize, and the smaller is covered with the larger one, there’s a limit to the possible camera positions.

OpenCV

Open Source Computer Vision, that is often shortened to OpenCV, is an open-source library of programming functions mainly aimed at real-time computer vision and image processing. This library has a cross-platform nature and is free to use under a BSD license. Basically, OpenCV is a set of filters and operations that can be applied to 2D images.

The most prominent features of it include edge and transition detection, circle and line detection, smoothing, blurring, perspective recovery, feature point extraction (SIFT, SURF, ORB, etc.), and face detection.

OpenCV has a C# wrapper — EmguCV — that can be built for PC, iOS or Android. Moreover, OpenCV already has a fully-developed plug-in for Unity. This plugin has a few out-of-the-box tracking abilities, such as marker-based AR, facial recognition, hand position tracking, and multi-object tracking based on color scheme.

Below, I have composed a sequence of actions to take to recognize a certain knob of our electronic device. In brackets, you can find a tip for what tool may help you perform a certain task.

1) Capture the image from the camera by hovering it over the chosen knob of a tester of linear integrated circuits (OpenCV).

2) Make it grayscale and adjust brightness (this should be done with the code).

3) Cut a tetragon from an image perspective (OpenCV WarpPerspective).

4) Calculate brightness transitions along the Y axis (OpenCV Sobel).

5) Calculate brightness transitions along the X axis (OpenCV Sobel).

6) Detect edges (OpenCV Canny).

7) Create edge direction map (for this, you need to write a few lines of code).

8) Detect circles (OpenCV HoughCircles).

9) Search for pointer orientation (custom created functional).

OpenCV Advantages

1. OpenCV is highly customizable. If any of functions should be enhanced, adjusted or created from scratch, it is enough to write a few line of code, and voilà, the tool is ready to perform what you need.

2. It has plenty of algorithms, one of which is presented above as an example.

3. Developers can craft a unique custom algorithm to detect things that have very few or no feature points at all.

Disadvantages of OpenCV

1. The majority of algorithms are very complex. They require great knowledge of math (derivatives, vector algebra, applied math, data types).

2. There are not enough filters in OpenCV to suit all development needs; it’s necessary to write custom filters, which generally takes a lot of time.

3. OpenCV is not a common solution. Because of that, for almost every single object you are required to write a unique recognition algorithm. This, in turn, takes days or even weeks of development.

4. OpenCV loses performance with the increase of image size. In order to establish a solid performance, a programmer has to always follow the principles of C# optimization avoid using insecure code.

Conclusion

It is, in fact, quite difficult to achieve the desired results of recognizing an electronic device solely with Vuforia or OpenCV.

Vuforia is fast and easy to use but its functionality is limited. While this tool repelled by the number of feature points found on the image, the objects are required to be contrasting in order to ensure recognition. If a taken object does not possess enough feature points, they should be somehow drawn. Moreover, it cannot recognize more than two physical objects simultaneously. Also, the size of markers should be very specific according to Vuforia requirements.

OpenCV can do many things necessary for the seamless functioning of AR apps, but a task that goes beyond simple image recognition requires writing additional code. Unlike Vuforia, OpenCV can work with big or very small and even damaged images. However, the biggest drawback of this tool is an enormous development time that might be a problem for fixed price projects. Also, OpenCV is quite complex especially for inexperienced or novice developers since it requires a lot of learning and a bunch of additional knowledge stack.

To fulfill the goal we have set in the beginning (or any other on your choice), I suggest combining Vuforia with OpenCV. Vuforia is a perfect solution to catching/detecting/recognizing physical objects and/or their elements. After the object is recognized, I propose to extract the necessary image and convey it to OpenCV for further processing.

As an alternative for special hardcore fans, everything can be done using OpenCV. It will allow for unlimited customizing of algorithms; however, the development time significantly increases as well as the complexity of the task grows in progression.

This article has been originally published at GeeksWithBlogs.

--

--

A technology addict and marketing executive who writes about business, emerging tech, crypto, blockchain, XR, and gaming.