sábado, dezembro 9, 2023

How Wallapop improved efficiency of analytics workloads with Amazon Redshift Serverless and knowledge sharing

Amazon Redshift is a quick, totally managed cloud knowledge warehouse that makes it simple and cost-effective to investigate all of your knowledge at petabyte scale, utilizing customary SQL and your current enterprise intelligence (BI) instruments. As we speak, tens of 1000’s of consumers run business-critical workloads on Amazon Redshift.

Amazon Redshift Serverless makes it easy to run and scale analytics workloads with out having to handle any knowledge warehouse infrastructure.

Redshift Serverless mechanically provisions and intelligently scales knowledge warehouse capability to ship quick efficiency for even essentially the most demanding and unpredictable workloads, and also you pay just for what you utilize.

That is very best when it’s tough to foretell compute wants comparable to variable workloads, periodic workloads with idle time, and steady-state workloads with spikes. As your demand evolves with new workloads and extra concurrent customers, Redshift Serverless mechanically provisions the proper compute sources, and your knowledge warehouse scales seamlessly and mechanically.

Amazon Redshift knowledge sharing permits you to securely share stay, transactionally constant knowledge in a single Redshift knowledge warehouse with one other Redshift knowledge warehouse (provisioned or serverless) throughout accounts and Areas with no need to repeat, replicate, or transfer knowledge from one knowledge warehouse to a different.

Amazon Redshift knowledge sharing allows you to evolve your Amazon Redshift deployment architectures right into a hub-and-spoke or knowledge mesh mannequin to raised meet efficiency SLAs, present workload isolation, carry out cross-group analytics, and onboard new use instances, all with out the complexity of knowledge motion and knowledge copies.

On this publish, we present how Wallapop adopted Redshift Serverless and knowledge sharing to modernize their knowledge warehouse structure.

Wallapop’s preliminary knowledge structure platform

Wallapop is a Spanish ecommerce market firm centered on second-hand gadgets, based in 2013. Day by day, they obtain round 300,000 new gadgets from patrons to be added to their catalog. {The marketplace} will be accessed by way of cell app or web site.

The typical month-to-month visitors is round 15 million energetic customers. Since its creation in 2013, it has reached greater than 40 million downloads and greater than 700 million merchandise have been listed.

Amazon Redshift performs a central position of their knowledge platform on AWS for ingestion, ETL (extract, rework, and cargo), machine studying (ML), and consumption workloads that run their perception consumption to drive decision-making.

The preliminary structure consists of 1 principal Redshift provisioned cluster that dealt with all of the workloads, as illustrated within the following diagram. Their cluster was deployed with 8 nodes ra3.4xlarge and concurrency scaling enabled.

Wallapop had three principal areas to enhance of their preliminary knowledge structure platform:

  • Workload isolation challenges with rising knowledge volumes and new workloads operating in parallel
  • Administrative burden on knowledge engineering groups to handle the concurrent workloads, particularly at peak occasions
  • Value-performance ratio whereas scaling throughout peak intervals

The areas of enchancment primarily centered on efficiency of knowledge consumption workloads together with the BI and analytics consumption software, the place excessive question concurrency was impacting the ultimate analytics preparation and its insights consumption.

Answer overview

To enhance their knowledge platform structure, Wallapop designed and constructed a brand new distributed method with Amazon Redshift with the help of AWS.

Their cluster measurement of the provisioned knowledge warehouse didn’t change. What modified was decreasing the utilization concurrency scaling to 1 hour, which is within the Free Tier utilization for each 24 hours of utilizing the principle cluster. The next diagram illustrates the goal structure.

Answer particulars

The brand new knowledge platform structure combines Redshift Serverless and provisioned knowledge warehouses with Amazon Redshift knowledge sharing, serving to Wallapop enhance their total Amazon Redshift expertise with improved ease of use, efficiency, and optimized prices.

Redshift Serverless measures knowledge warehouse capability in Redshift Processing Items (RPUs). RPUs are sources used to deal with workloads. You’ll be able to regulate the bottom capability setting from 8 RPUs to 512 RPUs in items of 8 (8, 16, 24, and so forth).

The brand new structure makes use of a Redshift provisioned cluster with RA3 nodes to run their fixed and write workloads (knowledge ingestion and transformation jobs). For cost-efficiency, Wallapop can also be benefiting from Redshift reserved situations to optimize on prices for these recognized, predictable, and regular workloads. This cluster acts because the producer cluster of their distributed structure utilizing knowledge sharing, that means the info is ingested into the storage layer of Amazon Redshift—Redshift Managed Storage (RMS).

For the consumption a part of the info platform structure, the info is shared with completely different Redshift Serverless endpoints to satisfy the wants for various consumption workloads.

Knowledge sharing gives workloads isolation. With this structure, Wallapop achieves higher workload isolation and ensures that solely the proper knowledge is shared with the completely different consumption purposes. Moreover, this method avoids knowledge duplication of their client half, which optimizes prices and permits higher governance processes, as a result of they solely should handle a single model of the info warehouse knowledge as an alternative of various copies or variations of it.

Redshift Serverless is used as a client a part of the info platform structure to satisfy these predictable and unpredictable, non-steady, and sometimes demanding analytics workloads, comparable to their CI/CD jobs and BI and analytics consumption workloads coming from their knowledge visualization software. Redshift Serverless additionally helps them obtain higher workload isolation on account of its managed auto scaling function that makes positive efficiency is persistently good for these unpredictable workloads, even at peak occasions. It additionally gives a greater consumer expertise for the Wallapop knowledge platform staff, because of the autonomics capabilities that Redshift Serverless gives.

The brand new answer combining Redshift Serverless and knowledge sharing allowed Wallapop to realize higher efficiency, price, and ease of use.

Eduard Lopez, Wallapop Knowledge Engineering Supervisor, shared the improved expertise of analytics customers: “Our analyst customers are telling us that now ‘Looker flies.’ Insights consumption went up because of it with out rising prices.”

Analysis of consequence

Wallapop began this re-architecture effort by first testing the isolation of their BI consumption workload with Amazon Redshift knowledge sharing and Redshift Serverless with the help of AWS. The workload was examined utilizing completely different base RPU configurations to measure the bottom capability and sources in Redshift Serverless. Base RPU ranges for Redshift Serverless vary from 8–512. Wallapop examined their BI workload with two configurations: 32 base RPU and 64 base RPU, after enabling knowledge sharing from their Redshift provisioned cluster to make sure the serverless endpoints have entry to the required datasets.

Primarily based on the outcomes measured 1 week earlier than testing, the principle space for enchancment was the queries that took longer than 10 seconds to finish (52%), represented by the yellow, orange, and crimson areas of the next chart, in addition to the long-running queries represented by the crimson space (over 600 seconds, 9%).

The primary check of this workload with Redshift Serverless utilizing a 64 base RPU configuration instantly confirmed efficiency enchancment outcomes: the queries operating longer than 10 seconds have been decreased by 38% and the long-running queries (over 120 seconds) have been nearly utterly eradicated.

Javier Carbajo, Wallapop Knowledge Engineer, says, “Offering a service with out downtime or loading occasions which can be too lengthy was one among our principal necessities since we couldn’t have analysts or stakeholders with out with the ability to seek the advice of the info.”

Following the primary set of outcomes, Wallapop additionally examined with a Redshift Serverless configuration utilizing 32 base RPU to check the outcomes and choose the configuration that might supply them the most effective price-performance for this workload. With this configuration, the outcomes have been much like the beforehand check run on Redshift Serverless with 64 base RPU (nonetheless exhibiting important efficiency enchancment from the unique outcomes). Primarily based on the assessments, this configuration was chosen for the brand new structure.

Gergely Kajtár, Wallapop Knowledge Engineer, says, “We observed a major improve within the every day workflows’ stability after the change to the brand new Redshift structure.”

Following this primary workload, Wallapop has continued increasing their Amazon Redshift distributed structure with CI/CD workloads operating on a separated Redshift Serverless endpoint utilizing knowledge sharing with their Redshift provisioned (RA3) cluster.

“With the brand new Redshift structure, we’ve observed outstanding enhancements each in velocity and stability. That has translated into a rise of two occasions in analytical queries, not solely by analysts and knowledge scientists however from different roles as effectively comparable to advertising, engineering, C-level, and so forth. That proves that investing in a scalable structure like Redshift Serverless has a direct consequence on accelerating the adoption of knowledge as decision-making driver within the group.”

– Nicolás Herrero, Wallapop Director of Knowledge & Analytics.


On this publish, we confirmed you ways this platform will help Wallapop to scale sooner or later by including new shoppers when new wants or purposes require to entry knowledge.

For those who’re new to Amazon Redshift, you may discover demos, different buyer tales, and the most recent options at Amazon Redshift. For those who’re already utilizing Amazon Redshift, attain out to your AWS account staff for help, and study extra about what’s new with Amazon Redshift.

In regards to the Authors

Eduard Lopez is the Knowledge Engineer Supervisor at Wallapop. He’s a software program engineer with over 6 years of expertise in knowledge engineering, machine studying, and knowledge science.

Daniel Martinez is a Options Architect in Iberia Digital Native Companies (DNB), a part of the worldwide business gross sales group (WWCS) at AWS.

Jordi Montoliu is a Sr. Redshift Specialist in EMEA, a part of the worldwide specialist group (WWSO) at AWS.

Ziad Wali is an Acceleration Lab Options Architect at Amazon Net Companies. He has over 10 years of expertise in databases and knowledge warehousing, the place he enjoys constructing dependable, scalable, and environment friendly options. Exterior of labor, he enjoys sports activities and spending time in nature.

Semir Naffati is a Sr. Redshift Specialist Options Architect in EMEA, a part of the worldwide specialist group (WWSO) at AWS.

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles