This ebook is designed to help developers, DevOps engineers, and operations teams that run and manage applications on top of AWS to effectively analyze their log data to get visibility into application layers, operating system layers, and different AWS services. This booklet is a step-by-step guide to retrieving log data from all cloud layers and then visualizing and correlating these events to give a clear picture of one’s entire AWS infrastructure.
Why should you look at your logs?
Cloud applications are inherently more distributed and built out of a series of components that need to operate together to deliver a service to the end user successfully. Analyzing logs becomes imperative in cloud environments because the practice allows relevant teams to see how all of the building blocks of a cloud application are orchestrated independently and in correlation with the rest of the components.
Why ELK (Elasticsearch, Logstash, and Kibana)?
ELK is the most common log analytics platform in the world. It is used by companies including Netflix, LinkedIn, Facebook, Google, Microsoft, and Cisco. ELK is an open source stack of three libraries (Elasticsearch, Logstash, and Kibana) that parse, index, and visualise log data (and, yes, it’s free).
So instead of going through the challenging task of building a production-ready ELK stack internally, users can signup and start working in a matter of minutes. In addition, Logz.io’s ELK as a service includes alerts, multi-user, and role-based access, and unlimited scalability. On top of providing an enterprise-grade ELK platform as a service, Logz.io employs unique machine-learning algorithms to automatically surface critical log events before they impact operations, providing users with unprecedented operational visibility into their systems.
Application logs are fundamental to any troubleshooting process. This has always been true — even for mainframe applications and those that are not cloud-based. With the pace at which instances are spawned and decommissioned, the only way to troubleshoot an issue is to first aggregate all of the application logs from all of the layers of an application. This enables you to follow transactions across all layers within an application’s code.
How do I ship application logs?
There are dozens of ways to ship application logs. The best method to use depends on the type of application, the format of the logs, and the operating system. For example, Java applications running on Linux servers can use Logstash or logstash-forwarder (a version that is lightweight and includes encryption) or ship it directly from the application layer using a log4j appender via HTTPs/HTTP. You can read more in our essay on The 6 Must-Dos in Modern Log Management.
Analyzing Infrastructure Logs
What are infrastructure logs?
We consider everything which is not the proprietary application code itself to be an infrastructure log. These include system logs, database logs, web servers logs, network device logs, security device logs, and countless others.
Why should I analyze infrastructure logs?
Infrastructure logs can shed light on problems in the code that is running or supporting your application. Performance issues can be caused by overutilized or broken databases or web servers, so it is crucial to analyze those log files especially when correlated with the application logs. While troubleshooting performance issues, we’ve seen many cases in which the root cause was a Linux kernel issue. Overlooking such low-level logs can make forensics processes long and fruitless. Read more about why it’s important to ship OS logs in our essay on Lessons Learned from Elasticsearch Cluster Disconnects.
How do I ship infrastructure logs?
Shipping infrastructure logs is usually done with open-source agents such as rsyslog, logstash, logstash forwarder, or nxlog that read the relevant operating system files such as access logs, kern.log, and database events. You can read here about more methods to ship logs here.
Monitoring System Performance with ELK
One of the challenges organizations face when troubleshooting performance issues is that they are looking at one dashboard that shows performance metrics and another to troubleshoot issues and analyze logs. In many cases, it’s possible to use a single dashboard to that shows both the performance metrics and the visualized log data that is being generated by all of the components of your system. In many cases, performance issues are related to events in application stacks that are recorded in log files. Collecting system performance metrics and shipping them as log entries then enables quick correlations between performance issues and their respective events in the logs.
How do I ship performance metrics?
To use ELK to monitor your platform’s performance, run probes on each host to collect system performance metrics. Software service operations teams can then visualize the data with the Kibana part of ELK and use the resulting charts to present their results.
For example, we encapsulated Collectl in a Docker container to have a Docker image that covered all of our data collecting and shipping needs. Read more and get a download on our site: How to Use ELK to Monitor Platform Performance.
Monitoring ELB Logs
What are ELB log files?
ELB is Amazon Web Services’ EC2 load balancer. The ELB logs are a collection of all of the traffic running through the ELB. This data includes from where the ELB was accessed, which internal machines were accessed, the identity of the requester (e.g., the operating system and browser) and additional metrics such as processing time and traffic volume.
How can I use ELB log files?
There are many uses for ELB logs, but the main reasons are to check the operational health of the ELB and it’s efficient operation. In the context of operational health, you might want to determine if your traffic is being equally distributed amongst all internal servers. For operational efficiency, you might want to identify the volume of access that you are getting from different locations in the world. You can visit the ELK Labs and search for “ELB” to find different visualizations, dashboards, and alerts.
CloudTrail logs is a logging mechanism of Amazon Web Services’ EC2, which records all of the changes done in an environment. This is a very powerful and robust tool that gives a different set of events for each EC2 object that can be leveraged according to the desired use. EC2 log events include, among other things, access to the EC2 account and changes to security groups as well as activation and termination of machines and services.
How can I use CloudTrail log files?
CloudTrail logs are very powerful and have many uses. One of the main uses revolves around auditing and security. For example, we monitor access and receive internal alerts on suspicious activity in our environment. Two important things to remember: Keep track of any changes being done to security groups and VPC access levels, and monitor your machines and services to ensure that they are being used properly by the proper people. You can visit the ELK Labs and can search for “CloudTrail” to find different visualizations, dashboards, and alerts.
How can I ship CloudTrail log files?
CloudTrail logs are easy to configure because they ship to S3 buckets. As opposed to some EC2 services, CloudTrail logs can be collected from all different regions and availability zones into a single S3 bucket. Once the files are in the S3 bucket, you can configure read-only access to that bucket here.
AWS VPC Flow Logs
What are VPC Flow Logs?
VPC flow logs provide the ability to log all of the traffic that happens within an AWS VPC (Virtual Private Cloud). The information captured includes information about allowed and denied traffic (based on security group and network ACL rules). It also includes source and destination IP addresses, ports, IANA protocol numbers, packet and byte counts, time intervals during which flows were observed, and actions (ACCEPT or REJECT).
How can I use the VPC logs?
VPC flow logs can be turned on for a specific VPC, VPC subnet, or an Elastic Network Interface (ENI). Most common uses are around the operability of the VPC. You can visualize rejection rates to identify configuration issues or system misuses, correlate flow increases in traffic to load in other parts of systems, and verify that only specific sets of servers are being accessed and belong to the VPC. You can also make sure the right ports are being accessed from the right servers and receive alerts whenever certain ports are being accessed. You can visit ELK Labs and search for “VPC” to find different visualizations, dashboards, and alerts.
How can I ship VPC logs?
Once enabled, VPC flow logs are stored in Cloudwatch logs, and you can extract them to a third-party log analytics service via several methods. The two most common methods are to direct them to a Kinesis stream and dump them to S3 using a Lambda function. At Logz.io, we recommend using a third-party open source tool to dump cloudwatch logs to S3. You can read more about the different methods here.
CloudFront logs are used mainly for analysis and verification of the operational efficiency of the CDN. You can see error rates through the CDN, from where is the CDN being accessed, and what percentage of traffic is being served by the CDN. These logs, though very verbose, can reveal a lot about the responsiveness of your website as customers navigate it. You can visit ELK Labs at https://app.logz.io/#/labs and search for “CloudFront” to find different visualizations, dashboards, and alerts.
How can I ship Cloudfront logs?
Once enabled, CloudFront will write data to your S3 bucket every hour or so. You can then pull the CloudFront logs to Logz.io by pointing to the relevant S3 Bucket. Go here for additional assistance and to see examples on how to configure access.
S3 Access Logs
What are S3 access logs?
S3 access logs record events for every access of an S3 Bucket. Access data includes the identities of the entities accessing the bucket, the identities of buckets and their owners, and metrics on access time and turnaround time as well as the response codes that are returned.
How can I use S3 Access logs?
Monitoring S3 access logs is a key part of securing AWS environments. You can determine from where and how buckets are being accessed and receive alerts on illegal access of your buckets. You can also leverage the information to receive performance metrics and analyses on such access to ensure that overall application response times are being properly monitored.
How can I ship S3 access logs?
Once enabled, S3 access logs are written to a S3 bucket of your choice. You can then pull the S3 access logs to Logz.io by pointing to the relevant S3 Bucket. Go here for additional assistance and to see examples of configuring access.
ELK is a very powerful platform and can provide tremendous value when you invest the effort to generate a holistic view of your environment. When running on AWS, the majority of infrastructure logs can be added with a single click of the button to Logz.io’s ELK Cloud platform. In a manner of minutes, you’ll be able to leverage the auto-generated dashboards and alerts.
There are many uses for AWS logs that range from performing audits to maintaining security — and all uses can supported with S3 access and CloudTrail logs and then monitored with CloudFront and VPC flow logs. Make sure to check out ELK Labs for the marketplace for auto-generated dashboards and alerts.