Object Detection the Deep Learning Way


About 450 million years ago, only a few species of lazy ocean animals existed because they couldn’t see, so they floated around waiting for food to jump into their mouths. Then, primitive eyes formed – and, all of a suddenly, thousands of species evolved to see a huge buffet of blind food. The formation of eyes had a dramatic effect on the evolution of life.

More than half of the human brain is dedicated to vision (visual cortex). Someone could spend days describing an object to someone else, and then the first time they see the object, they still don’t know what it is. If, however, you were handed an object, told what is was, and looked at it for a few seconds, it would be permanently burned into your ability to recognize it. Each time you see it in the future - even if the object was upside down, a different color, or mostly obstructed by some other object - you’d still have no problem recognizing it. This ability us humans have for object detection is learned by example data, not by lengthy procedural descriptions.

Not only can humans detect if an object is in their view; they know where it is in the view, the orientation, specific landmarks of the object. For example, when looking at another person’s face, you know where their eyes, nose, ears, etc. are specifically located. This is an amazing ability that computers are just starting to get the hang of.

Computers and Learning

Up until 2012, most computers did not have the processing power or the vast amount of example data to learn what objects looked like. They relied on procedural algorithms to describe expected shapes and colors. Unfortunately, this model doesn’t work well because there are vast number of shapes for most objects considering 3D orientation and state. For example, if you could perfectly describe a ripe apple with geometry, but the computer received a picture with a partially eaten apple, it wouldn’t know it’s an apple. Think about how difficult it would be to descript objects that aren’t rigid, like cats and dogs.

Deep learning solved this in a similar way to how the human brain works. Just train the computer with lots and lots of object variations, and build up a large set of neurons to remember how it looks. Once the computer’s neurons are trained correctly, it could then recognize an object even if it doesn’t look the same as in the training images.

Furthermore, deep learning has advanced to not only let us know if the object is in an image; it could recognize if there are multiple objects, where they are located, specific landmarks on each object and so on. This is ideal for many image-based applications. With autonomous driving, the vehicle needs to recognize roads, lines, other vehicles, road signs, pot holes and the like. Deep learning can be used for just about any type of data recognition; signal analysis, speech recognition and translation are just a few examples. The use of deep learning has countless applications in military and medical fields.

Since neural network processing requires hundreds of cores working together, developers have turned to the power of the GPU with its massively parallel architecture. Abaco products such as the GVC1000 can be used to get your project processing more like a human.

 


Michael Both's picture

Michael Both

As a software architect, Michael’s career has focused on innovations in visualization and communication software. At Abaco, he’s been responsible for the development of DataView, EventView, AXISFlow, and AXISView. As a hobby, he started developing software for Atari back in the 80s. When the iPhone came out, he created the official Rubik’s Cube app, which five years later was featured in an Apple TV commercial (https://www.youtube.com/watch?v=ajj2-KYQ0R0). And yes, he can solve it in under 30 seconds. Bernie refers to him as Rubik’s Mike.

More Posts

Add new comment