22 June 2018
Any Web application is constantly changing. Inside (new functions and technologies), and in terms of external conditions (size and activity of the audience). The quality of the application depends critically on correct and timely diagnosis. The dynamics of Web applications translates the diagnostics to a new one – a constant level. It is important not just to know the maximum about the system, but also to learn about the changes as quickly as possible. This is a monitoring task.
There are three main components of the monitoring system:
The task of status monitoring is to constantly check all components of the system for the correctness of their operation.
The most popular solutions:
To monitor the status of our clients’ servers, we in ASAPLabs use Zabbix – the ultimate Enterprise-class monitoring platform, as they call themselves. It is highly customizable, suitable for numerous components of different types of projects running simultaneously.
The main rule of monitoring settings is to check as many system indicators as possible. The more we know, the better.
The main task of status monitoring is to report problems. In practice, this is usually a letter or SMS message. The effectiveness of monitoring depends on 90% on the correct notifications configuration.
First of all, it is very useful to have a dashboard with the most important metrics. Most often those are the nodes that are directly responsible for generating a response to the user’s request:
The setting of notifications usually follows these principles:
Selecting the parameters. Not all settings need to be configured for notifications. Some of them are fundamental (for example, the availability of a Web server). Some auxiliary (for example, the number of open file descriptors).
Setting up notifications is not a one-time job. It should be done constantly because priorities change and new metrics appear. Observe the rules:
For our servers, we went further and added another system to monitor Zabbix. It ensures all notification will work properly. If the primary notification system demonstrates unstable performance or fails to send a warning, the secondary system will urgently warn us about it.
Knowledge of the current status of the system is not enough to make predictions. Clearly, the problem is better to prevent than to react to it. This requires systems for collecting and storing historical data on the change in the indicators. Such systems work in the same way as status ones, but usually, they collect much more indicators and store the entire history of their changes.
The most popular solutions are:
Analytics of historical data will allow predicting the need for scaling. In addition to the usual metrics, such as CPU utilization and the amount of available memory, higher-level indicators that should be included here are:
Trend collection systems also allow you to customize thresholds and notifications. Thresholds should be selected slightly lower than in the system of status monitoring. This will allow you to receive advance notice of possible future problems.
We are constantly tracking the performance of our servers, build graphs and analyze them to reveal trends. Our monitoring algorithm allows us to see growing trends live, foresee the prospects and react quickly. If we see the doubtless upcoming threat to the server stability, we’ll take immediate action to prevent or sustain it.
The normal operation of all components of the application does not always mean the proper operation of the application itself. Problems such as non-working registration or incorrect link in the letter will not be reflected in the mentioned monitoring systems. Many problems can be temporary or limited. For example, the inaccessibility of social authorization system or the load speed of pages for users from a particular region.
That is why you need to monitor business metrics. Many analytics systems, such as Google Analytics, allow you to conduct a detailed analysis of historical data. However, such tools are inconvenient to use to detect deviations in real time.
There are tools for collecting simple statistics and display it live, such as ioTrack. Integration is as simple as adding a counter to certain events.
Common examples of business-level metrics that can be tracked are:
Collection of such data will allow to find out deviations not only in the system operation but also in the environment. For example, spam attacks, which often dramatically change several business metrics, although they may not affect the load of the system.
It is very convenient to display several basic quantities in dashboards on separate monitors in the office. It allows not only to be aware of the problems, but also to receive “live” information about the application performance.
Remember, the monitoring task is to provide information about failures in the work of the server. It is not executed one-time, the changes must be implemented together with the changes of the application itself.