Part 1: An introduction to ELK: The good, the bad and the ugly
If you are a SaaS or software-driven business, you are probably familiar with Elasticsearch as a stand-alone product. In this blog, we will provide an overview of the Elasticsearch Logstash Kibana (ELK) stack, a grouping of technologies from Elastic that when integrated create a very powerful open-source log management solution for managing the front-end (i.e. 30 days or less) of time-series data.
In Part 2 of this blog series, we are also excited to introduce Chaos Sumo to this log management and analytics picture — an equally powerful log and event data service that tackles the storage, management and analytics of historical data (i.e. days, weeks, months and years!).
Want to know how you can use them together to cut costs and more easily manage and extract value out of your log data, short-term and over multiple years? Read on!
What is the ELK Stack?
The ELK stack consists of Elasticsearch, Logstash, and Kibana. Although they’ve all been built to integrate well together, each one is a separate open-source project that is driven by Elastic—which itself began as an enterprise search vendor. It has now become a full-service analytics software company, built on the success of the ELK stack. Wide adoption of Elasticsearch for analytics has been the main driver of its popularity. Here’s an overview of each of the components:
Elasticsearch – Data Search
Elasticsearch is a search engine at its heart, with a myriad of use cases borne of its flexibility and ease of use. Based on Apache Lucene, Elasticsearch makes both the operational challenges (such as scalability and reliability) and application-based needs (like free text search and autocomplete) easier for end users.
Logstash – Data Routing
Collecting data is where it all starts. Logstash is capable of collecting machine data, but where it really shines is in the myriad of open source plugins available to enrich data. For example, while collecting web server logs is useful, deeply parsing the user-agent data to extract traffic statistics can be beneficial, which the user agent filter can do.
Kibana – Data Visualization
Kibana is a browser-based visualization tool for Elasticsearch. It enables users to easily consume aggregated data that can be difficult to process – making logs, metrics, and unstructured data searchable and more usable.
The ELK Stack in a SaaS Business
Log management is a structured set of policies, procedures and tools used to administer and manage the ingestion, storage, analysis and archiving of growing volumes of log and event data generated from a SaaS application.
Per Wikipedia, a log file is a file that records either events that occur in an operating system or other software application, or messages between different users of a communication software. Logging is the act of keeping a log. In the simplest case, messages are written to a single log file. Logs are basically a time series of data. A system or application log is automatically generated and time-stamped and serves as a system of record of events relating to a particular system or service. Every SaaS application or IT system (i.e. compute resource) produce log files.
There are a multitude of answers to critical SaaS business questions living in the massive amounts of log and event data that your applications and systems are generating — answers to questions such as:
- How many account signups this week?
- What is the effectiveness of our ad campaign?
- What is the best time to perform system maintenance?
- Why is my database performance slow?
- How are our top customers using the service?
Additionally, effective log management is essential to both security and compliance. Monitoring, documenting and analyzing system events is a crucial component of security intelligence. Regulations such as HIPAA and the Sarbanes Oxley Act have specific mandates relating to audit logs and compliance.
As a SaaS business, data constantly flows into your systems, with some SaaS businesses generating many hundreds of GBs of log and event data daily. That data can pile up quickly. As your data grows larger you have to expand your clusters to ensure the performance of your analytics doesn’t suffer, resulting in sluggish insights. The big challenge is how to maintain long-term, valuable business insights over big data sets as your data grows, without breaking the bank.
The Challenges of ELK
The ELK Stack is a wonderful solution for managing hot log and event data. However, as a set of open source tools straight out of GitHub, users often encounter challenges of using ELK in production, including:
- Planning your deployment (years 1-3)
- The amount of log data your systems will generate
- Procuring infrastructure or picking a cloud or hosting vendor
- Resources required to customize ELK for log management
- Maintaining, upgrading and patching the system
- Operating the system on an ongoing basis
In addition, there’s no management functionality and no ability to centralize or organize log data for easy access and retrieval. It doesn’t support relational analytics, so there’s no ability to perform aggregations and correlations, or a deeper analysis of your log data. Elasticsearch, by the nature of its design and architecture is expensive to scale, and data indexed through Lucene can explode up to 5x in size. So as data grows and as log data starts to pile up, the cost of a hot ELK stack running 200GB/ day could easily get into the many thousands of dollars per month. Your only option to manage historic log data is to prune it, archive it, or delete it.
So, how can you tackle some of the challenges described above? Stay tuned for Part 2 in this blog series where we will talk about how to achieve both manageable, efficient real-time insights using ELK and cost-effective, historical log data visibility with Chaos Sumo and S3 to better understand business trends and insights.