Peoplesoft (India)
From Expensive Infrastructure to Serverless Efficiency
How we migrated an HR SaaS platform from self-hosted EMR cluster to AWS serverless architecture—eliminating infrastructure management and significantly reducing costs.
Quick Facts
The Cost of Legacy Infrastructure: EMR Cluster Overhead
Peoplesoft, an HR solutions provider in India, faced a growing problem: their data processing infrastructure was expensive, complex, and consumed valuable IT resources.
The Infrastructure Reality
Like many data-intensive companies, Peoplesoft ran a self-hosted EMR (Elastic MapReduce) cluster to process HR data for reporting and analytics. Beneath the surface, problems festered:
- Fixed High Costs: EMR cluster ran 24/7, even during low-usage periods
- Management Burden: IT team spent hours maintaining, patching, and troubleshooting clusters
- Manual ETL Jobs: Data engineers manually managed ETL job schedules and failures
- Schema Management Pain: Every schema change required manual ETL code updates
The Decision Point
"Are we in the business of managing infrastructure, or delivering HR solutions?"
The answer led to the decision to modernize.
Serverless Architecture: AWS Glue, Athena, and Data Lakes
We migrated Peoplesoft's entire data processing infrastructure to a serverless AWS architecture—eliminating servers, maintenance, and manual ETL management.
The Serverless Promise Delivered
"Serverless" doesn't mean "no servers"—it means no server MANAGEMENT. AWS runs the infrastructure, you focus on business logic. Result: Same data processing capabilities, zero infrastructure overhead.
AWS Glue ETL
Replacing Manual EMR Jobs
Amazon Athena
Serverless Query Engine
S3 Data Lake
Central Storage
- Raw, processed, and curated zones
- Unlimited scalability (PB-scale)
- Pennies per GB/month
Elasticsearch
Real-time Analytics & Search
- Employee search across millions
- Kibana dashboards
- Anomaly detection
Phased Migration with Zero Disruption
The migration was completed in 13 weeks through a systematic, risk-managed approach:
Key Phases:
Assessment & Design
Analyzed current EMR workloads, identified Glue equivalents, designed S3 data lake, cost modeling and risk assessment
Proof of Concept
Migrated representative ETL jobs to validate performance, data quality, and cost projections
Data Lake Setup
Created production-ready S3 data lake with proper structure, security, and Glue Data Catalog
ETL Migration
Batch migration of all ETL jobs from EMR to Glue with parallel validation and performance tuning
Query Migration
Migrated queries to Athena, updated BI tools, team training on Athena SQL
EMR Decommission
Final validation, parallel monitoring period, EMR shutdown and post-migration optimization
Results: Achieved 80% cost reduction ($7,800/month → $1,525/month) with improved scalability and serverless operations
Dramatic Cost Reduction + Zero Infrastructure Management
The migration delivered immediate and ongoing benefits for Peoplesoft.
| Metric | Before (EMR) | After (Serverless) |
|---|---|---|
| Infrastructure Costs | High fixed (24/7) | Pay-as-you-go |
| ETL Management | Manual job mgmt | Fully automated |
| Operational Overhead | High (cluster mgmt) | Minimal (AWS managed) |
| Scalability | Fixed capacity | Infinite auto-scale |
| Schema Changes | Manual code updates | Automatic detection |
ROI Example (Hypothetical)
EMR (Before) - Monthly
- • 3-node cluster, 24/7: $3,000
- • Database for queries: $800
- • Mgmt overhead (40 hrs): $4,000
- Total: $7,800/month
Serverless (After) - Monthly
- • Glue jobs (20 hrs): $500
- • S3 + Athena: $125
- • Elasticsearch: $400
- • Mgmt overhead (5 hrs): $500
- Total: $1,525/month
Savings: $6,275/month ($75K+ annually)
Key Insights from Serverless Migration
Serverless ≠ Always Cheaper (But Usually Is)
Serverless wins for variable workloads (nightly ETL, on-demand queries). For constant 24/7 processing, reserved capacity may be cheaper. Peoplesoft's nightly ETL + on-demand queries = perfect for serverless.
Schema Evolution Is a Game-Changer
Before: Schema changes required manual ETL code updates (days of work). After: Glue detects schema changes automatically (minutes to adapt). New HR data fields? Add to source, Glue handles the rest.
Partitioning Makes or Breaks Athena Performance
Bad partitioning = full table scans, slow queries, high costs. Good partitioning = query only relevant data, fast, cheap. Always partition by frequently filtered columns (date, region, etc.).
Migration Needs Parallel Run Period
Run EMR + Glue simultaneously for 1-2 weeks to catch edge cases. Peoplesoft found 2 edge cases during parallel run, fixed before cutover. The parallel run period is your safety net.
Explore Related Case Studies
Ready for Similar Results?
Let's discuss how we can help you achieve your technical goals.
Book Discovery CallNo commitment required