AWS Database Blog
How TalentNeuron optimized data operations and cut costs and modernized with Amazon Aurora I/O-Optimized
This is a guest post by Yaya Diawara, Team Lead DBA, from TalentNeuron, in partnership with AWS.
For years, TalentNeuron, a leader in talent intelligence and workforce planning, has been empowering organizations with data-driven insights by collecting and processing vast amounts of job board data. However, as their data operations grew, so did the complexity and cost of their legacy architecture. Recognizing the need for a more efficient and cost-effective solution, TalentNeuron partnered with AWS to modernize their data platform.
In this post, we share three key benefits that TalentNeuron realized by using Amazon Aurora (input/output) I/O-Optimized as part of their new data platform: reduced monthly database costs by 29%, improved data validation performance, and accelerated innovation through modernization.
The data processing challenge
TalentNeuron’s data architecture follows a medallion pattern. A medallion data architecture is a multi-layered approach to organizing data in a data lake, where raw data flows through bronze (raw), silver (cleaned and validated), and gold (business-ready) layers, with each layer progressively refining the data quality and structure for different use cases. TalentNeuron’s data architecture is illustrated in the following figure showing distinct layers for data ingestion, storage, and consumption.
The bronze layer stores the initial raw data collected from various data sources. Data in bronze represents the unprocessed, raw form of the data collected from web pages. These are stored as documents in Amazon Simple Storage Service (Amazon S3) with an archiving mechanism for historical versions.
Silver, stored in an Amazon Aurora MySQL database, represents where the system extracts and validates the data. TalentNeuron has a number of data validation services that partially process and validate the data. TalentNeuron extracts attributes from bronze in a structured and organized manner, and normalizes key attributes for consistency using both third-party and in-house services. The gold layer stores the final, refined data, also in an Aurora MySQL database. Downstream systems consume data from the gold layer. This is also where the system performs deduplication based on specific business rules, data curation, and enrichment.Though functional, this setup faced certain challenges like:
- High costs – Storing silver-tier data in Aurora MySQL-Compatible was I/O-intensive, driving monthly costs up due to heavy read/write operations from normalization.
- Validation bottlenecks – Tight integration between the layers with a rigid data contract created operational complexity and redundant data handling. A critical service for validating harvested data was tightly coupled to these databases, making it difficult to modernize or replace.
- Barriers to innovation – Internal users relied on the database administrator (DBA) team to craft SQL queries, slowing down access to insights and pulling DBAs away from strategic tasks. The DBAs have to focus their time and efforts managing the data pipeline. This hindered TalentNeuron’s ability to democratize data access and innovate on behalf of their customers.
TalentNeuron needed a new approach to reduce costs, simplify the data architecture, and find new business value of their data assets.
Solution overview
To start, TalentNeuron adopted Amazon Aurora I/O-Optimized to begin their modernization efforts. Unlike the Standard Aurora configuration, which charges separately for I/O operations, Aurora I/O-Optimized offers predictable pricing by bundling I/O costs into the instance and storage prices. This is ideal for I/O-heavy workloads like TalentNeuron’s data normalization processes.
The following diagram illustrates the solution architecture.
Business benefits
Reduced costs
TalentNeuron switched their silver-tier Aurora MySQL database to Aurora I/O-Optimized, resulting in a pure database cost reduction of 29.1%. They achieved this drop in monthly spend without sacrificing performance. TalentNeuron also saw 40% improvement in VolumeReadIOPs
and VolumeWriteIOPs
in their silver tier after adopting Aurora I/O-Optimized. “Amazon Aurora I/O-Optimized gave us the cost efficiency and performance we needed to rethink our data strategy,” says Scott Roan, Senior Vice President, Technology, at TalentNeuron. “It’s been a critical step in optimizing our operations”.
Improved data validation
Another direct benefit from the enhanced I/O capabilities was performance improvement of the data validation workflows. In a typical medallion architecture, data flows through the bronze-silver-gold stages and encounters multiple checks of validation and transformation. This often requires frequent reads and writes to the database.
In addition to data processing steps in the medallion architecture mentioned above, Amazon Aurora’s I/O-Optimized capabilities accelerated these additional data validation steps in conjunction with data movement processes in the pipeline. Specifically, TalentNeuron saw 28% reduction in data validation time from 72 to 51 minutes, meeting the internal service level agreement (SLA) to process and validate in under an hour at the silver layer where these validations occur, something TalentNeuron could not met adopting Aurora I/O-Optimized. This was important for TalentNeuron because when TalentNeuron’s customers search for relevant positions in the job market, the database needs to perform complex queries and retrieve datasets quickly.
Accelerated innovation
Beyond Aurora, TalentNeuron is starting to make a shift toward a data lake architecture. TalentNeuron team members are exploring the use of Apache Iceberg open table format (OTF) from experimental extract, transform, and load (ETL) workflows. Now that the DBAs spend less time managing the data pipeline, they can now more spend exploring new use cases like ELT with Amazon EMR orchestrated by AWS Step Functions.
TalentNeuron has yet to fully realize the benefits of the shift to Amazon EMR and Amazon S3, but this a first step towards decoupling their original design. Internal teams are already using Amazon EMR notebooks to query Amazon S3 data lake directly, reducing dependency on DBAs and accelerating time-to-insight. With data now stored in Iceberg OTF and metadata managed using the AWS Glue Data Catalog, internal teams at TalentNeuron now realize the benefits of enhanced cataloging and querying—directly enabling advanced analytics and DaaS offerings. TalentNeuron reduced the turnaround time for these tasks from days to hours, and is an important step toward the business goal of democratizing data access.
Another benefit to be realized from this are additional revenue streams. As TalentNeuron moves onward in their data modernization journey, they are exploring new ways to provide DaaS to external stakeholders as well.
Looking ahead
TalentNeuron’s data modernization journey with AWS is just beginning. With a modernized data platform in place, they now have the flexibility to explore additional AWS services to continue in our journey to be a data-first organization. For example, Amazon Athena could enable one-time querying of the data lake, and Amazon DataZone and AWS Data Exchange offer opportunities to package and share data as DaaS products.
“By optimizing our foundation with Aurora I/O-Optimized and AWS, we’re not just saving money—we’re building a platform for growth,” says Roan. “This is about empowering our teams and our customers with data like never before.”
Conclusion
TalentNeuron’s data re-architecture and a data-driven mindset showcases how AWS can turn data challenges into opportunities. By adopting Aurora I/O-Optimized, they reduced costs, simplified their architecture, and set the stage for a future where data is more accessible and actionable. As they continue to innovate with AWS, TalentNeuron is proving that strategic modernization can deliver both immediate wins and a roadmap for long-term success.
Transform your data operations today. Explore the benefits of Amazon Aurora I/O-Optimized and see how it may improve your workloads.