ESG-X

AI Workload Migration to AWS for ESG-X

Migrated ESG-X's OCR, translation, and embedding pipeline from GCP/Gemini to an AWS-native worker architecture powered by Amazon Bedrock Nova.

ESG-X case study

Overview

ESG-X helps companies process sustainability and ESG documents into structured, searchable knowledge. Data Max migrated a key AI processing workload from the existing GCP-based setup to AWS, replacing Gemini-dependent OCR and translation flows with Amazon Bedrock Nova and deploying the processing pipeline on ECS Fargate. The new service was integrated into the existing ESG-X application, allowing documents to continue flowing through the current product while shifting the AI workload execution to AWS.

Challenge

The existing document processing pipeline depended on GCP/Gemini-based AI services. ESG-X needed a migration path that reduced dependency on GCP while preserving the core application flow and avoiding disruption to users. The workload also required reliable handling of ESG documents, including OCR for PDFs, translation into English, chunking, embedding generation, artifact storage, and ingestion back into the ESG-X application.

Approach

We built an AWS-side processing gateway that receives document jobs from the existing ESG-X application, queues them asynchronously, and starts isolated Fargate workers for document processing. The worker downloads source documents, extracts text, applies Bedrock Nova OCR when needed, translates non-English chunks with Bedrock Nova, generates embeddings, stores pipeline artifacts in S3, and sends the processed records back into the ESG-X ingest service using secure cross-cloud authentication.

Solution

Data Max designed and deployed a Terraform-managed AWS architecture for the migrated AI workload. API Gateway and SQS provide a reliable ingestion layer, Lambda orchestrates ECS Fargate tasks, and Fargate runs the document processing engine. Amazon Bedrock Nova powers OCR and translation, while S3 stores run artifacts such as timing reports, quality metrics, extracted text, and embedding summaries. The pipeline includes CloudWatch logging, IAM-scoped permissions, environment-specific infrastructure, and regression fixtures for validating OCR, translation, and embedding output quality.

Results

AI workload migrated
OCR and translation moved from Gemini-based processing to Amazon Bedrock Nova
Application integration
AWS processing integrated with the existing ESG-X application and ingest flow
Scalable processing
Asynchronous API Gateway → SQS → Lambda → ECS Fargate architecture for document jobs
Quality validation
Golden test fixtures validate OCR text, translated output, and embedding similarity across ESG document samples
Operational visibility
Pipeline artifacts, timing metrics, cost estimates, quality signals, and logs captured for debugging and monitoring

Ready to accelerate your AI journey?

Let's talk about your data and AI challenges. We'll help you build the right strategy and execute with speed.

Get in Touch