Abstract:
From a simple eCommerce website to a highly volatile stock trading application, the
availability of the system is crucial. If the application ecosystem is not available for
users, not just it will affect the trust but it could lead to loss of profits and even
lawsuits against the company, especially in the stock market sector.
Software engineers and system engineers put enormous effort into making sure the
system is available for the user in every second of the day.
When it is required to serve millions of users real-time data about every stock in a few
exchanges in realtime and the systems become more complex, it is inefficient to
monitor simple statistics by a small team of system engineers.
The focus of this research is to monitor those conditions and make sure each and
every service is up and running and healthy using deep learning by detecting
abnormality of the system compared to the history data.
This history data will include application and infrastructure metrics, including the
CPU Usage, memory(RAM) usage, IO throughput on disks and networks.
Usually, those metrics will depend on the usage of the system. For an example when
monitoring an e-commerce website, there might be a peak time where users visit the
website and search for the products, or in a stock trading application ecosystem,
schedule of the market, the number of trades performed will affect the CPU or
memory usage of the client-facing applications and relevant microservices. In this
research, we will also focus on systems that have some time-series based patterns with
the system metrics to increase the accuracy of the predictions.