by Marco Supino and Ori Lahav
Yeah — I know, monitoring is a “must have” tool for every web application/operation functionality. If you have clients or partners that are dependant on your system, you don’t want to hurt their business (or your business) and react in time to issues. At Outbrain, we acknowledge that it is a tech system we are running on and tech systems are bound to fail. All you need is to catch the failure soon enough, understand the reason, react and fix. On DevOps terminology it is called TTD (time to detect) and TTR (time to recover). To accomplish that, you need a good system that will tell the story and wake you up if something is wrong long before it effects the business.
This is the main reason why we invested a lot in a highly capable monitoring system. With it, we are doing Continuous Deployment and a superb monitoring system is integral part of the Immune System that allows us to react fast to flaws in the continuous stream of system changes.