AI Workload Migration to AWS for ESG-X
Migrated ESG-X's OCR, translation, and embedding pipeline from GCP/Gemini to an AWS-native worker architecture powered by Amazon Bedrock Nova.
Überblick
ESG-X helps companies process sustainability and ESG documents into structured, searchable knowledge. Data Max migrated a key AI processing workload from the existing GCP-based setup to AWS, replacing Gemini-dependent OCR and translation flows with Amazon Bedrock Nova and deploying the processing pipeline on ECS Fargate. The new service was integrated into the existing ESG-X application, allowing documents to continue flowing through the current product while shifting the AI workload execution to AWS.
Herausforderung
The existing document processing pipeline depended on GCP/Gemini-based AI services. ESG-X needed a migration path that reduced dependency on GCP while preserving the core application flow and avoiding disruption to users. The workload also required reliable handling of ESG documents, including OCR for PDFs, translation into English, chunking, embedding generation, artifact storage, and ingestion back into the ESG-X application.
Vorgehen
We built an AWS-side processing gateway that receives document jobs from the existing ESG-X application, queues them asynchronously, and starts isolated Fargate workers for document processing. The worker downloads source documents, extracts text, applies Bedrock Nova OCR when needed, translates non-English chunks with Bedrock Nova, generates embeddings, stores pipeline artifacts in S3, and sends the processed records back into the ESG-X ingest service using secure cross-cloud authentication.
Lösung
Data Max designed and deployed a Terraform-managed AWS architecture for the migrated AI workload. API Gateway and SQS provide a reliable ingestion layer, Lambda orchestrates ECS Fargate tasks, and Fargate runs the document processing engine. Amazon Bedrock Nova powers OCR and translation, while S3 stores run artifacts such as timing reports, quality metrics, extracted text, and embedding summaries. The pipeline includes CloudWatch logging, IAM-scoped permissions, environment-specific infrastructure, and regression fixtures for validating OCR, translation, and embedding output quality.
Ergebnisse
Bereit, Ihre KI-Reise zu beschleunigen?
Lassen Sie uns über Ihre Daten- und KI-Herausforderungen sprechen. Wir helfen bei Strategie und schneller Umsetzung.
Kontakt aufnehmen