CNNs revolutionized computer vision and deep learning. But they struggle when you have limited data. Fortunately, Sonasoft has the solution.
Convolutional Neural Networks (CNNs) revolutionized how we approach computer vision. They have also been fundamental in the growth of deep learning and general artificial intelligence. But as we saw previously, CNNs have some significant weaknesses. For instance, they don’t cope well with noisy data and they need huge training data sets. Recently, Sonasoft’s data science and engineering teams made a huge leap forward in solving these issues. Read on to learn more.
We looked at CNNs in our previous blog. But here is a quick refresher. A convolutional neural network is a complex algorithmic structure that aims to mimic parts of the human brain. CNNs are made up of many interconnected layers of artificial neurons. Specifically, CNNs make use of what’s often called a perceptron. A perceptron behaves in the same way as neurons in human and animal brains. It takes one or more input signals, combines them, and then activates if the result exceeds a trigger.
In a CNN, huge numbers of these perceptrons are joined together in layers. These layers either identify features in the input (convolutional layers) or they simplify the result (pooling layers). They are particularly useful in image processing because they can learn to identify a huge number of features in parallel. This is achieved by using large sets of labeled training data. For instance, if you want to train a CNN to identify dogs, you would provide it with thousands of images showing dogs.
The limitations of CNNs
As we saw in the last blog, CNNs have three fundamental limitations. Firstly, they need to be trained with large amounts of data. Without sufficient training data, they will perform really poorly. Secondly, they react badly to noisy data or data that doesn’t align with the training data, this is a real problem for real-life applications. Thirdly, they don’t work well when you are trying to identify sparse features or outliers. Let’s look at these in a bit more detail. For simplicity, we will focus on examples in computer vision, but the same problems crop up in any application of CNNs
CNNs are often used in computer vision to identify items in a photo. For instance, a CNN in a self-driving car needs to be able to identify pedestrians, other vehicles, road markings, street signs, etc. So, you need to train the CNN by showing it millions of labeled images of the objects you are interested in. This is just how you teach a child to identify items in the world around it. The problem is, CNNs are not good at the sort of extrapolation we are. If you show a human a picture of a face in profile, we will recognize that it is a face. But a CNN can only recognize this if you included faces in profile in its training data. This means you need a huge volume of training data in order for a CNN to function well. Sadly, in the real world, you often only have limited training data.
CNNs learn to identify features in data by applying filters. These help the CNN to perform tasks like identifying groups of pixels that are related to each other. Or to spot the edges of an object. It is tempting for us to assume that CNNs are doing this in the same way we do. After all, their design is based on how the human brain works, right? Well, no actually! While the structures in a CNN are based on neurons and synapses, they lack the sophistication of our brains. This means they are very easy to fool with noisy data, as we saw before. So, you can easily break a neural network by adding a little noise to an image!
CNNs cope really badly with noisy data (image source)
Adding a little white noise completely breaks this CNN
The last problem for CNNs is actually related to the need for lots of training data. Namely, it is really hard to train a CNN to look for unusual objects in the data. Likewise, if there are gaps in the training data, the CNN will struggle. This is a real issue for many real-world problems. Effectively, this means CNNs don’t perform well in cases where there is sparse data.
The limitations don’t just apply to image recognition problems. They are actually indicative of a fundamental flaw. CNNs are designed to identify patterns. Having identified a pattern, they assume it can be used again and again. But even we can be fooled into seeing things that aren’t there or failing to spot things that are. In short, these limitations prevent CNNs from being used in many day-to-day AI problems.
3D graphics as a solution?
In 3D graphics, each object is modeled as a set of related subobjects. For instance, a face consists of 2 eyes, a nose, a mouth, eyebrows, etc. All of these features are always in the same relative position to each other. The eyes must be next to each other and above the nose. To display a face in profile, you take those objects and apply a transformation to them. Theoretically, it is possible to train a CNN to do the same thing in reverse. Effectively, this is what we do when we see an object from a different angle.
The Sonasoft solution
Sonasoft is an AI-first company and we are always looking to push the boundaries of AI platforms. Our core AI engine is SAIBRE, making it extremely easy to create and embed AI solutions for any business problem. SAIBRE creates AI-powered bots to solve a range of different problems. These include forecasting, anomaly detection, classification, and knowledge discovery. The resulting bots can then be embedded into any of your business processes.
We challenged our engineers and data scientists to solve the limitations of CNNs. They rose to the challenge and came up with a patent-pending solution. The invention goes by the snappy title “Convolutional Hierarchical Temporal Memory Modules”. Like many AI solutions, our team took their inspiration from nature.
“By studying how intelligence works in biological systems, we can continue to improve and provide more robust artificial intelligence.” Max Lee, Sonasoft Head of Engineering
The solution provides a robust and noise-tolerant system that works well when you have sparse data, noisy inputs, or only limited training data. The models are reliable and robust. Moreover, because they are trained with smaller datasets, the resulting models are smaller. This means they can run on cheaper and simpler hardware. This is a big win for companies needing to embed models within their existing systems. To learn how we can help you adopt an AI-first approach in your business, please reach out to us below.