For any fashionable data-driven firm, having easy knowledge integration pipelines is essential. These pipelines pull knowledge from numerous sources, rework it, and cargo it into vacation spot programs for analytics and reporting. When working correctly, it gives well timed and reliable data. Nonetheless, with out vigilance, the various knowledge volumes, traits, and software habits may cause knowledge pipelines to grow to be inefficient and problematic. Efficiency can decelerate or pipelines can grow to be unreliable. Undetected errors lead to unhealthy knowledge and influence downstream evaluation. That’s why strong monitoring and troubleshooting for knowledge pipelines is crucial throughout the next 4 areas:
- Reliability
- Efficiency
- Throughput
- Useful resource utilization
Collectively, these 4 features of monitoring present end-to-end visibility and management over an information pipeline and its operations.
Right this moment we’re happy to announce a brand new class of Amazon CloudWatch metrics reported along with your pipelines constructed on high of AWS Glue for Apache Spark jobs. The brand new metrics present mixture and fine-grained insights into the well being and operations of your job runs and the info being processed. Along with offering insightful dashboards, the metrics present classification of errors, which helps with root trigger evaluation of efficiency bottlenecks and error prognosis. With this evaluation, you’ll be able to consider and apply the beneficial fixes and greatest practices for architecting your jobs and pipelines. Consequently, you achieve the advantage of greater availability, higher efficiency, and decrease price on your AWS Glue for Apache Spark workload.
This publish demonstrates how the brand new enhanced metrics allow you to monitor and debug AWS Glue jobs.
Allow the brand new metrics
The brand new metrics could be configured by means of the job parameter enable-observability-metrics
.
The brand new metrics are enabled by default on the AWS Glue Studio console. To configure the metrics on the AWS Glue Studio console, full the next steps:
- On the AWS Glue console, select ETL jobs within the navigation pane.
- Beneath Your jobs, select your job.
- On the Job particulars tab, develop Superior properties.
- Beneath Job observability metrics, choose Allow the creation of extra observability CloudWatch metrics when this job runs.
To allow the brand new metrics within the AWS Glue CreateJob
and StartJobRun
APIs, set the next parameters within the DefaultArguments
property:
- Key –
--enable-observability-metrics
- Worth –
true
To allow the brand new metrics within the AWS Command Line Interface (AWS CLI), set the identical job parameters within the --default-arguments
argument.
Use case
A typical workload for AWS Glue for Apache Spark jobs is to load knowledge from a relational database to an information lake with SQL-based transformations. The next is a visible illustration of an instance job the place the variety of employees is 10.
When the instance job ran, the workerUtilization
metrics confirmed the next pattern.
Word that workerUtilization
confirmed values between 0.20 (20%) and 0.40 (40%) for your entire length. This usually occurs when the job capability is over-provisioned and lots of Spark executors had been idle, leading to pointless price. To enhance useful resource utilization effectivity, it’s a good suggestion to allow AWS Glue Auto Scaling. The next screenshot reveals the identical workerUtilization
metrics graph when AWS Glue Auto Scaling is enabled for a similar job.
workerUtilization
confirmed 1.0 at first due to AWS Glue Auto Scaling and it trended between 0.75 (75%) and 1.0 (100%) based mostly on the workload necessities.
Question and visualize metrics in CloudWatch
Full the next steps to question and visualize metrics on the CloudWatch console:
- On the CloudWatch console, select All metrics within the navigation pane.
- Beneath Customized namespaces, select Glue.
- Select Observability Metrics (or Observability Metrics Per Supply, or Observability Metrics Per Sink).
- Seek for and choose the precise metric identify, job identify, job run ID, and observability group.
- On the Graphed metrics tab, configure your most popular statistic, interval, and so forth.
Question metrics utilizing the AWS CLI
Full the next steps for querying utilizing the AWS CLI (for this instance, we question the employee utilization metric):
- Create a metric definition JSON file (present your AWS Glue job identify and job run ID):
- Run the
get-metric-data
command:
Create a CloudWatch alarm
You’ll be able to create static threshold-based alarms for the totally different metrics. For directions, check with Create a CloudWatch alarm based mostly on a static threshold.
For instance, for skewness, you’ll be able to set an alarm for skewness.stage
with a threshold of 1.0, and skewness.job
with a threshold of 0.5. This threshold is only a advice; you’ll be able to modify the edge based mostly in your particular use case (for instance, some jobs are anticipated to be skewed and it’s not a problem to be alarmed for). Our advice is to guage the metric values of your job runs for a while earlier than qualifying the anomalous values and configuring the thresholds to alarm.
Different enhanced metrics
For a full checklist of different enhanced metrics accessible with AWS Glue jobs, check with Monitoring with AWS Glue Observability metrics. These metrics can help you seize the operational insights of your jobs, similar to useful resource utilization (reminiscence and disk), normalized error courses similar to compilation and syntax, consumer or service errors, and throughput for every supply or sink (information, information, partitions, and bytes learn or written).
Job observability dashboards
You’ll be able to additional simplify observability on your AWS Glue jobs utilizing dashboards for the perception metrics that allow real-time monitoring utilizing Amazon Managed Grafana, and allow visualization and evaluation of tendencies with Amazon QuickSight.
Conclusion
This publish demonstrated how the brand new enhanced CloudWatch metrics allow you to monitor and debug AWS Glue jobs. With these enhanced metrics, you’ll be able to extra simply establish and troubleshoot points in actual time. This leads to AWS Glue jobs that have greater uptime, quicker processing, and diminished expenditures. The top profit for you is simpler and optimized AWS Glue for Apache Spark workloads. The metrics can be found in all AWS Glue supported Areas. Test it out!
In regards to the Authors
Noritaka Sekiyama is a Principal Huge Knowledge Architect on the AWS Glue group. He’s liable for constructing software program artifacts to assist prospects. In his spare time, he enjoys biking together with his new street bike.
Shenoda Guirguis is a Senior Software program Improvement Engineer on the AWS Glue group. His ardour is in constructing scalable and distributed Knowledge Infrastructure/Processing Techniques. When he will get an opportunity, Shenoda enjoys studying and enjoying soccer.
Sean Ma is a Principal Product Supervisor on the AWS Glue group. He has an 18+ yr monitor file of innovating and delivering enterprise merchandise that unlock the facility of knowledge for customers. Outdoors of labor, Sean enjoys scuba diving and school soccer.
Mohit Saxena is a Senior Software program Improvement Supervisor on the AWS Glue group. His group focuses on constructing distributed programs to allow prospects with interactive and easy to make use of interfaces to effectively handle and rework petabytes of knowledge seamlessly throughout knowledge lakes on Amazon S3, databases and data-warehouses on cloud.