Data Engineer · San Francisco, CA

Hema
Harsha
Vardhan
Peela

Senior Data Engineer with 4+ years building the infrastructure behind enterprise analytics — from 10M+ event pipelines to RAG-based AI systems at scale.

4+
Yrs Experience
50TB+
Data Managed
10M+
Events / Month
1,200+
Hours Saved / yr
engineer.py
const engineer = {
  name: "Hema Harsha Vardhan Peela",
  role: "Senior Data Engineer",
  location: "San Francisco, CA",
  experience: "4+ years",
  stack: [
    "Python", "Spark", "Kafka",
    "AWS", "Azure", "GCP",
    "TensorFlow", "RAG"
  ],
  domains: ["FinTech", "Healthcare", "SaaS"],
  currentAt: "Wells Fargo",
  open: true
}

// 1,200+ hours saved · zero SLA violations
Currently at
Wells Fargo
Data Engineer — AI/ML Focus · Jun 2025 – Present
AWS Spark RAG Redshift Airflow
01 — ABOUT

The engineer
behind the data.

Building infrastructure that turns raw data into decisions. Four years across three cloud platforms, two continents, and one consistent goal: reliability at scale.

I'm a Data Engineer with a Master's in Computer Science from Texas Tech (GPA 3.7) and hands-on experience at Wells Fargo, Cognizant, and Cipla. I specialize in the infrastructure layer — the systems that make analytics possible before anyone can ask a question.

At Wells Fargo, I designed star and snowflake schemas serving 12+ business domains, operationalized 12 production deep learning models, and built RAG pipelines enabling natural-language querying over structured and unstructured enterprise data.

At Cognizant, I managed 50+ TB of enterprise data across Hadoop, Kafka, and Azure — while mentoring 5 engineers and maintaining zero SLA violations across every critical workflow. I believe good data engineering is invisible: it just works.

Expertise
ETL/ELT · Real-Time Pipelines
Data Warehousing · MLOps · RAG
Cloud
AWS · Azure · GCP
Multi-cloud, multi-region
Industries
Financial Services · Healthcare
Technology · SaaS
Education
M.S. Computer Science
Texas Tech University · 3.7 GPA
Status
Open to senior roles
02 — EXPERIENCE

Where I've
built things.

Jun 2025 – Present
● CURRENT
Data Engineer — AI/ML Focus
Wells Fargo · San Francisco, CA
Designed scalable data architectures — star & snowflake schemas on AWS (S3, Glue, Redshift, Snowflake) enabling enterprise analytics across 12+ business domains.
Engineered and operationalized 12+ production deep learning models (CNN NLP classifiers, sequence models) in TensorFlow & PyTorch, serving 8 clients.
Built end-to-end ML & RAG pipelines with CI/CD, Airflow orchestration, and vector retrieval — enabling intelligent querying across structured & unstructured data for 5 projects.
Architected batch & near real-time pipelines processing 10M+ monthly events via AWS Glue, Spark, and Python, integrating RESTful APIs across S3 and Power Automate.
Eliminated 15+ manual processes via PySpark automation, saving 1,200+ hours/year and improving downstream data reliability.
Optimized SQL queries reducing runtime from 40 min → under 10 min; improved pipeline efficiency .
Built dashboards for 25+ stakeholders; implemented data governance with IAM, KMS & Lake Formation.
Nov 2024 – May 2025
Data Engineering Intern
YourBook Team · Lubbock, TX
Built and maintained 10+ SQL-based KPI reports and executive dashboards for leadership business monitoring.
Identified and logged 50+ data issues across large datasets; supported validation and quality checks in reporting workflows.
Standardized reporting logic and data definitions to support consistent, organization-wide reporting.
Jan 2021 – Jun 2023
Data Engineer
Cognizant · Bengaluru, India
Orchestrated ETL/ELT pipelines using Hadoop, Hive & Kafka to manage 50+ TB of enterprise data across 10+ business divisions.
Resolved 30+ critical pipeline failures/month using Talend, Informatica IICS & IBM DataStage.
Administered Azure Data Services (ADF, Synapse, Databricks) for real-time and batch analytics. Contributed IaC deployments via Terraform.
Achieved zero SLA violations across critical workflows. Mentored 5+ junior engineers, cutting onboarding to under 2 weeks.
Nov 2019 – Dec 2020
Junior Data Engineer
Cipla · India
Consolidated clinical & operational data from 3+ hospital systems, supporting analysis of 150,000+ patient records via SQL and AWS Glue.
Reduced manual ETL effort by ~60 hrs/month. Built Power BI dashboards reviewed weekly by 20+ clinical stakeholders.
03 — SKILLS

Technology
stack.

Languages
Python SQL PySpark
Data Engineering
Apache Spark Kafka Hadoop Hive Airflow ETL / ELT Data Modeling OLAP Systems REST APIs
Cloud & Infra
AWS Azure GCP S3 · Glue · Redshift Lambda · EMR IAM · KMS ADF · Synapse Databricks BigQuery Terraform
ML & AI
TensorFlow PyTorch RAG Vector Search NLP MLOps CI/CD for ML Model Deployment
ETL Tools
Talend Informatica IICS IBM DataStage Snowflake Power Automate
Visualization
Tableau Power BI Amazon QuickSight Knowledge Graphs
Governance
Data Quality Lake Formation Access Control Encryption (KMS) Pipeline Optimization
04 — PROJECTS

Featured
work.

PROJECT_001 HEALTHCARE · AI
Generative AI-Powered Analytics Platform
GCP · BigQuery · RAG · Vector Search · Tableau

Built an end-to-end data pipeline on GCP using BigQuery and Python to ingest, transform, and model 2M+ healthcare records. Integrated Generative AI with Retrieval-Augmented Generation (RAG) — enabling natural-language queries over structured clinical data and surfacing patterns invisible to manual analysis. Results delivered through interactive Tableau dashboards for clinical decision-makers.

GCP BigQuery Python RAG Vector Search SQL Embeddings Tableau
2M+
Records
RAG
Architecture
GCP
Platform
Auto
Insights
05 — EDUCATION

Academic
background.

Master of Science — Computer Science
Texas Tech University · Lubbock, TX, USA
Aug 2023 – May 2025 Full-Time Lubbock, Texas
3.7
GPA / 4.0
06 — CONTACT

Let's work
together.

Open to senior data engineering, AI/ML engineering, and technical lead roles. Reach out directly.