We firmly believe that in the future, the tech industry will start relying on anomaly detection. Therefore, we want to be at the forefront of this monitoring revolution. Here we’ll explain why this is the way forward.
Remove manually created rules
One thing about many monitoring setups is they actually search for anomalies, just predefined anomalies. When we create monitoring rules such as page me when 10% of the page requests fail, you have defined that 10% of page requests is abnormal and not acceptable for your service.
It is near impossible to not only identify all of the possible anomalies that could happen with your system but to correctly identify them. For example, you could have a process that has hundreds of actions per hour during the day, and when 10% fail per hour, it’s a sign of a severe issue. However, during the night normally has 1-2 actions per hour, and if one of those fails, it is possibly a connection issue and doesn’t need manual intervention but it is a 100-50% failure rate. If your monitoring rule only has 10% failure then during the night you’ll end up getting useless alerts that require no intervention. This is a reasonably common occurrence. Non-actionable alerts are one of the leading causes of people ignoring alerts; this lead to the downfall of Knight Capital Group that lost $440 million after their IT department ignored hundreds of alerts because they always got those alerts.
Detect hard to spot issues
Some issues are so severe that they result in a major outage; however, the signs of pending doom are often so small that they’re hard to spot. And the sign of pending doom isn’t a failure but abnormal activity within the applications. The underlying issue could cause users to have to do something repeatedly or not allow them to do one specific thing or even just a slower system performance.
An example is that we once found that the warning signal that a system was about to collapse because of session storage issues was that users needed to log in repeatedly. The session storage was evicting their login session. However, the system wasn’t generating any errors. It was just an annoyance for users who had to log in every 15 minutes instead of once a day. The anomaly there was the number of user logins was abnormally high. When left unattended, the system collapsed because the session storage system ultimately failed, and the system needed sessions.
Remove manual monitoring
For many businesses, the thing they care about is a transaction of some sort. For an electric mobility service provider, that is the number of charging sessions. For an online store, that is the number and value of the orders. For a price comparison site, that is the number of referrals made. However, there is often no way to automatically monitor these transactions. For many businesses, issues with these transactions are often only noticed manually days after the issue negatively affects these transactions.
Businesses often resort to creating a dashboard within their office and tell employees to monitor the dashboard. This introduces problems. If people have, to look at a dashboard manually, it is prone to basic human error of people forgetting to do so. Another one is that if they do it regularly at the start, and there is nothing to do, people are prone to start doing it less and less. People can only remain at full alert for a certain period of time after a while, they become complacent. And another issue is they look to see if they’re still transacting. So they look to see if there have been orders and not looking to see if the number of orders is usual or fits the usual pattern. This sort of issue is normally discovered days/weeks/months later when someone does an analysis and finds a percentage of sales were lost, or individual items aren’t selling at all or as well as previously.
What We Do
The important bit, for us at least, the sales pitch on what we do! We’re currently targetting eCommerce systems and focusing the anomaly detection on two areas. The main one being the order monitoring since this is the most important metric for a company, how much they’re doing in sales. We’re also adding anomaly detection into the uptime monitoring and performance monitoring of the web site. So we will also detect anomalies when it comes to the page load. We think this is useful since if your page is suddenly slower than normal, it’s usually a symptom of a more significant issue.
So head on over to our main site to see what we can do for you at www.ootliers.com