Many purchasers are involved in boosting productiveness of their software program improvement lifecycle through the use of generative AI. Not too long ago, AWS introduced the overall availability of Amazon CodeWhisperer, an AI coding companion that makes use of foundational fashions below the hood to enhance software program developer productiveness. With Amazon CodeWhisperer, you’ll be able to rapidly settle for the highest suggestion, view extra strategies, or proceed writing your individual code. This integration reduces the general time spent in writing information integration and extract, remodel, and cargo (ETL) logic. It additionally helps beginner-level programmers write their first strains of code. AWS Glue Studio notebooks permits you to creator information integration jobs with a web-based serverless pocket book interface.
On this publish, we focus on real-world use instances for CodeWhisperer powered by AWS Glue Studio notebooks.
Answer overview
For this publish, you employ the CSV eSports Earnings dataset, out there to obtain through Kaggle. The info is scraped from eSportsEarnings.com, which offers data on earnings of eSports gamers and groups. The target is to carry out transformations utilizing an AWS Glue Studio pocket book with CodeWhisperer suggestions after which write the information again to Amazon Easy Storage Service (Amazon S3) in Parquet file format in addition to to Amazon Redshift.
Conditions
Our resolution has the next conditions:
- Arrange AWS Glue Studio.
- Configure an AWS Identification and Entry Administration (IAM) function to work together with CodeWhisperer. Connect the next coverage to your IAM function that’s connected to the AWS Glue Studio pocket book:
- Obtain the CSV eSports Earnings dataset and add the CSV file
highest_earning_players.csv
to the S3 folder you’ll be utilizing on this use case.
Create an AWS Glue Studio pocket book
Let’s get began. Create a brand new AWS Glue Studio pocket book job by finishing the next steps:
- On the AWS Glue console, select Notebooks below ETL jobs within the navigation pane.
- Choose Jupyter Pocket book and select Create.
- For Job identify, enter
CodeWhisperer-s3toJDBC
.
A brand new pocket book shall be created with the pattern cells as proven within the following screenshot.
We use the second cell for now, so you’ll be able to take away all the opposite cells.
- Within the second cell, replace the interactive session configuration by setting the next:
- Employee sort to G.1X
- Variety of staff to three
- AWS Glue model to 4.0
- Furthermore, import the
DynamicFrame
module andcurrent_timestamp
operate as follows:
After you make these adjustments, the pocket book ought to be wanting like the next screenshot.
Now, let’s guarantee CodeWhisperer is working as meant. On the backside proper, you will discover the CodeWhisperer possibility beside the Glue PySpark standing, as proven within the following screenshot.
You’ll be able to select CodeWhisperer to view the choices to make use of Auto-Strategies.
Develop your code utilizing CodeWhisperer in an AWS Glue Studio pocket book
On this part, we present methods to develop an AWS Glue pocket book job for Amazon S3 as a knowledge supply and JDBC information sources as a goal. For our use case, we have to guarantee Auto-Strategies are enabled. Write your advice utilizing CodeWhisperer utilizing the next steps:
- Write a remark in pure language (in English) to learn Parquet recordsdata out of your S3 bucket:
After you enter the previous remark and press Enter, the CodeWhisperer button on the finish of the web page will present that it’s working to write down the advice. The output of the CodeWhisperer advice will seem within the subsequent line and the code is chosen after you press Tab. You’ll be able to study extra in Consumer actions.
After you enter the previous remark, CodeWhisperer will generate a code snippet that’s just like the next:
Word that you could replace the paths to match the S3 bucket you’re utilizing as an alternative of the CodeWhisperer-generated bucket.
From the previous code snippet, CodeWhisperer used Spark DataFrames to learn the CSV recordsdata.
- Now you can attempt some rephrasing to get a suggestion with DynamicFrame features:
Now CodeWhisperer will generate a code snippet that’s near the next:
Rephrasing the sentences written now has proved that after some modifications to the feedback we wrote, we bought the right advice from CodeWhisperer.
- Subsequent, use CodeWhisperer to print the schema of the previous AWS Glue DynamicFrame through the use of the next remark:
CodeWhisperer will generate a code snippet that’s near the next:
We get the next output.
Now we use CodeWhisperer to create some transformation features that may manipulate the AWS Glue DynamicFrame learn earlier. We begin by getting into code in a brand new cell.
- First, check if CodeWhisperer can use the right AWS Glue context features like ResolveChoice:
CodeWhisperer has advisable a code snippet just like the next:
The previous code snippet doesn’t precisely signify the remark that we entered.
- You’ll be able to apply sentence paraphrasing and simplifying by offering the next three feedback. Every one has completely different ask and we use the withColumn Spark Body technique, which is utilized in casting columns varieties:
CodeWhisperer will choose up the previous instructions and suggest the next code snippet in sequence:
The next output confirms the PlayerId
column is modified from string to integer.
- Apply the identical course of to the resultant AWS Glue DynamicFrame for the
TotalUSDPrize
column by casting it from string to lengthy utilizing thewithColumn
Spark Body features by getting into the next feedback:
The advisable code snippet is just like the next:
The output schema of the previous code snippet is as follows.
Now we’ll attempt to suggest a code snippet that displays the typical prize for every participant in accordance with their nation code.
- To take action, begin by getting the rely of the participant per every nation:
The advisable code snippet is just like the next:
We get the next output.
- Be part of the principle DataFrame with the nation code rely DataFrame after which add a brand new column calculating the typical highest prize for every participant in accordance with their nation code:
The advisable code snippet is just like the next:
The output of the schema now confirms the each DataFrames the place accurately joined and the Depend
column is added to the principle DataFrame.
- Get the code advice on the code snippet to calculate the typical
TotalUSDPrize
for every nation code and add it to a brand new column:
The advisable code snippet is just like the next:
The output of the previous code ought to seem like the next.
- Be part of the
country_code_sum
DataFrame with the principle DataFrame from earlier and get the typical of the prizes per participant per nation:
The advisable code snippet is just like the next:
- The final half within the transformation section is to kind the information by the best common prize per participant per nation:
The advisable code snippet is just like the next:
The primary 5 rows shall be just like the next.
For the final step, we write the DynamicFrame to Amazon S3 and to Amazon Redshift.
- Write the DynamicFrame to Amazon S3 with the next code:
The CodeWhisperer advice is just like the next code snippet:
We have to right the code snippet generated after the advice as a result of it doesn’t include partition keys. As we identified, partitionkeys
is empty, so we will have one other code block suggestion to set partitionkey
after which write it to the goal Amazon S3 location. Additionally, in accordance with the most recent updates associated to writing DynamicFrames to Amazon S3 utilizing glueparquet, format = "glueparquet"
is now not used. As a substitute, you could use the parquet sort with useGlueParquetWriter
enabled.
After the updates, our code seems to be just like the next:
Another choice right here can be to write down the recordsdata to Amazon Redshift utilizing a JDBC connection.
- First, enter the next command to verify whether or not CodeWhisperer will perceive the remark in a single sentence and use the right features or not:
The output of the remark is just like the next code snippet:
As we will see, CodeWhisperer accurately interpreted the remark by choosing solely the desired columns to write down to Amazon Redshift.
- Now, use CodeWhisperer to write down the DynamicFrame to Amazon Redshift. We use the Preaction parameter to run a SQL question to pick solely sure columns to be written to Amazon Redshift:
The CodeWhisperer advice is just like the next code snippet:
After checking the previous code snippet, you’ll be able to observe that there’s a misplaced format
, which you’ll be able to take away. You can even add the iam_role
as an enter in connection_options
. You can even discover that CodeWhisperer has mechanically assumed the Redshift URL to have the identical identify because the S3 folder that we used. Subsequently, you could change the URL and the S3 temp listing bucket to mirror your individual parameters and take away the password parameter. The ultimate code snippet ought to be just like the next:
The next is the entire code and remark snippets:
Conclusion
On this publish, we demonstrated a real-world use case on how AWS Glue Studio pocket book integration with CodeWhisperer helps you construct information integration jobs sooner. You can begin utilizing the AWS Glue Studio pocket book with CodeWhisperer to speed up constructing your information integration jobs.
To study extra about utilizing AWS Glue Studio notebooks and CodeWhisperer, try the next video.
In regards to the authors
Ishan Gaur works as Sr. Massive Information Cloud Engineer ( ETL ) specialised in AWS Glue. He’s keen about serving to prospects constructing out scalable distributed ETL workloads and analytics pipelines on AWS.
Omar Elkharbotly is a Glue SME who works as Massive Information Cloud Assist Engineer 2 (DIST). He’s devoted to aiding prospects in resolving points associated to their ETL workloads and creating scalable information processing and analytics pipelines on AWS.