September 18, 2020

Using anomaly detection to find bad AI bots

Paul Clauson

Sonasoft NuGene creates powerful bots to automate your business processes. But how do we ensure the bots perform perfectly? The answer is anomaly detection.

Sonasoft NuGene is an AI bot factory that creates self-contained intelligent bots. The system learns from your raw data to create bots capable of forecasting, knowledge discovery, anomaly detection, and classification. These bots can then be deployed to transform your business processes, improving efficiency, cutting costs, and boosting performance. But how can you be sure these bots are working properly? Well, we don’t just use artificial intelligence to create our bots, we also use AI to monitor the resulting bots!

How NuGene creates bots

NuGene is one of the most advanced AI platforms on the market. At its heart, NuGene is powered by a unique causal inference engine. This uses unsupervised learning to extract actionable insights from your raw data. It then generates hypotheses and tests these for causation. That might sound a bit SciFi and hard to understand, so let’s look at what this really means. To start with, you need to understand how other AI platforms work.

The classic AI platform

Most AI platforms take data that has been pre-processed by data scientists. This process, known as data engineering, tries to rationalize the data and simplify it. The data scientist starts with an idea of what the result should look like, and uses this to refine the data. They then feed this data into the platform and tell it what they are looking for. For instance, are you trying to establish how the weather impacts your sales? The AI platform then takes that data and creates a number of different machine learning models looking at the various features that the data scientist has highlighted in the data. The platform then takes the most accurate of these models and provides it as code that can be embedded into an application.

NuGene’s causal inference engine

As we already said, NuGene is different. Firstly, NuGene wants to see your raw data. The more data you can feed it, the better the results. When NuGene first gets your data, it sets about trying to understand what is going on. It does this by looking for patterns and correlations. This is exactly what a human would do when analyzing a new dataset. But of course, correlations are often spurious. Tyler Vigen has a brilliant website giving some amazing examples. For instance, there is a strong correlation between the divorce rate in Maine, and the per-capita consumption of margarine.

Spurious correlations are pretty common. NuGene isn't fooled though.

Now, if you were dumb, you might assume this correlation is significant. Indeed, you might assume there’s some form of cause and effect. The problem is, machines are really dumb. So, a traditional AI platform might well see this correlation and assume it shows causation. NuGene is much more intelligent though. It actually further analyzes all your data to establish which correlations matter, and which are spurious. Only then does it start the process of trying to build a model that leverages this causal relationship

NuGene bots in production

NuGene’s bots are able to solve a huge variety of business problems. The majority of the bots fall into one of four categories. The bots can be deployed in Sonasoft’s own cloud, in your cloud/server, or even in stand-alone edge devices.


Forecasting is perhaps the archetypal AI application. It allows you to predict the future based on past behavior and current conditions. This can be used to drive a huge range of business applications. For instance, you can forecast electricity demand, allowing you to avoid expensive peaks. Or you can accurately predict stock demand, allowing you to optimize just-in-time production. You can even use forecasting to predict which will be the most popular product next season. Forecasting bots are trained based on your historical data, along with other data that may influence the outcomes. For instance, weather, fashion trends, news events, or macro-economic conditions.  

Knowledge discovery

Most AI applications require structured data to create their models. However, there are many areas of business where you don’t have access to any structured data. For instance, legal cases typically depend on written documents or emails. Likewise, patent searches depend on finding prior art in the literature. Knowledge discovery bots are designed to process this unstructured natural language and extract semantic knowledge. They start by using natural language processing to convert the text into a structured graph. They then use their understanding of language to parse the meaning. This is then compared to other data they have in order to extract the underlying knowledge in the data. This approach can be used to speed up patent searches. Or you can apply it to provide a contextual help and knowledge base function responding to user queries.


One of the most powerful AI applications is image recognition. This is crucial for things like self-driving cars. But it is also really useful for industrial applications. For instance, automatically recognizing tooling on a production line. Or identifying and sorting trash automatically. Early image recognition systems gave us computers that could (usually) tell a dog from a cat. Over the years, these systems have got more and more powerful. Underlying this is a process known as classification. When the machine sees a new image, it starts by segmenting it into regions. These regions are then classified using an AI classification model. Finally, another AI model can reassemble the image semantically, understanding how each part relates to the rest.

Anomaly detection

One of the hardest things for humans is correctly identifying anomalies. We are programmed by nature to spot patterns, so we assume every result that doesn’t fit a pattern is an anomaly. However, detecting real anomalies is far harder. Fortunately, machine learning models can be trained to become really good at this. Two key applications here are fraud detection and identifying network intrusions. In fraud detection, you are looking for any anomalous transactions that don’t fit the buyer’s established pattern of behavior. Network intrusions can be spotted by identifying sudden changes in a logged-in user’s behavior. These include downloading lots of data or accessing files they haven’t in the past.

Anomaly detection allows you to spot unexpected results in complex data

The impact of bad bots or dodgy data

Companies that adopt AI often become highly reliant on the system. As a result, it’s vitally important to monitor the health and performance of the AI bots. Let’s have a quick look at what might go wrong?

What can go wrong with bots?

Loss of data feed. This means the bot no longer has new data to work with. This has an immediate impact on almost all bots. More subtle is when just some of the data is lost. Some bots may receive feeds from multiple sources, so when one source fails, the bot still appears to work. However, its performance is degraded.

Unexpected errors in the data. Sometimes data sources may be glitchy. They may accidentally send the wrong data or repeat a block of data. As far as the bot is concerned, everything is OK. However, the output is now incorrect or out of date.

Drift in the underlying model. Data evolves over time. When bots are trained they are accurate. However, after a few weeks, this accuracy may start to reduce. At this stage, you need to retrain your bot using the latest data.

Using anomaly detection to spot problems

Here at Sonasoft, we have created a system that uses AI to identify problems in NuGene’s bots. More specifically, we apply anomaly detection to solve this problem. This allows us to identify problems and either rectify them or flag them to our users.

Identify bad data. Bad data often shows up as an unexpected series of data points. Anomaly detection quickly identifies this and flags it to the user. You will then be able to assess the data and resolve any issues.

Spot if the model is giving unexpected results. Anomaly detection makes it easy to see when your model is giving unexpected results. This then allows you to check what the cause is. You may need to retrain the model, or it may be a sign that there is a problem with the data feed.


AI bots are becoming increasingly vital in delivering efficiency and cost savings. However, they can suffer from unexpected problems. Fortunately, AI can provide a solution to this problem. In the near future, we will deploy a new feature in NuGene’s Bot Manager that uses anomaly detection to identify and classify faults in your bots. This will allow you to be confident that your bots are performing as expected.

White Paper

SAIBRE AI Ecosystem

End-to-end AI applications that solve any business problem