Home

aravind - Data engineer
[email protected]
Location: Dallas, Texas, USA
Relocation: yes
Visa:
Data Engineer Associate with 8 years of experience in Python-based ETL development, data manipulation, and pipeline orchestration across AWS and Databricks ecosystems. I specialize in building scalable, reusable, and high-performance data frameworks using design patterns such as Factory, Lambda, and Kappa architectures to process batch and streaming data efficiently. Skilled in algorithmic optimization, schema validation, and data quality enforcement, I ve engineered solutions that reduce runtime, enhance reliability, and deliver ML-ready outputs across healthcare and financial domains. My experience spans AWS Glue, EMR, Redshift, and S3, integrating real-time data with Kafka, Kinesis, and Airflow, and enforcing data governance through CloudTrail and Glue Catalog lineage. I bring a Python-driven mindset to data engineering, combining automation, clean code design, and performance tuning to create production-grade solutions that scale.
SKILLS


Big Data & Processing: PySpark, Apache Spark, Databricks, Hadoop
Cloud & Data Lakes: AWS (Glue, Redshift, S3, Step Functions, Kinesis, Lambda), Azure (Data Factory, Synapse)
Streaming & Orchestration: Kafka, Apache NiFi, Airflow, AWS Step Functions
ETL & Databases: SQL, Snowflake, Oracle, SQL Server
Programming & Automation: Python, Shell Scripting, Terraform
Analytics & Data Visualization: SageMaker, Tableau, Power BI, Amazon QuickSight, Alteryx
DevOps & Agile: Git, Azure DevOps, JIRA, Agile/Scrum, CI/CD
CERTIFICATIONS


Databricks Certified Data Engineer Associate Data Analysis with Python
Databricks Fundamentals Database and SQL for Data Science
Python for Data Science & AI KNIME Certifications


EXPERIENCE

DATA ENGINEER
Centene Corp St. Louis, MO July 2024 - Current
Developed and automated ETL pipelines using SSIS, Talend, and Python, integrating data from Oracle, MongoDB, and SQL Server into Amazon S3 and Redshift for centralized storage, validation, and analytics.
Migrated legacy workflows to AWS Glue and scheduled batch jobs using AWS Systems Manager, improving pipeline scalability and reducing manual deployment overhead.
Built automated test frameworks in Python and SQL to validate ETL pipelines on AWS Glue and Redshift, integrating schema checks, reconciliation logic, and compliance rules to ensure reliable production data quality.
Ingested and validated MMIS-based Medicaid claims data into AWS Glue and Databricks, applying schema checks and business rules to maintain data quality and regulatory compliance.
Engineered Python-based ETL frameworks using AWS Glue and PySpark to perform large-scale data manipulation, schema validation, and deduplication for claims datasets.
Applied optimization algorithms for joins, partitioning, and incremental loads, improving job performance and runtime efficiency.
Leveraged Databricks Auto Loader to ingest and process smaller clinical/EHR files into Delta Lake, enabling faster turnaround for new analytical requirements and improving pipeline scalability.
Applied Python-driven design patterns to develop ingestion workflows integrating Kafka, Kinesis, and Glue.
Automated orchestration and data routing for real-time and batch layers, improving reliability and reusability.
Optimized AWS infrastructure for performance and cost by tuning Glue job memory, S3 storage tiers, and cluster sizing on EMR.
Configured and optimized Spark clusters on AWS EMR to process large-scale healthcare and claims datasets, enabling distributed computation, faster transformations, and reduced Glue job runtimes.
Engineered large-scale claims and TPL pipelines in Databricks with PySpark, optimizing transformations and notebook execution to reduce runtime and deliver AI/ML-ready datasets into SageMaker and downstream analytics platforms.
Designed CI/CD pipelines using AWS CodePipeline and CodeBuild for automated ETL job deployment, improving release efficiency and reducing human error.
Applied S3 lifecycle policies to manage historical datasets and optimize storage.
Designed and optimized PL/SQL procedures, triggers, and SQL queries to integrate healthcare claims data into AWS pipelines, improving performance, partitioning strategies, and downstream analytics efficiency.
Implemented AWS CloudTrail auditing across data pipelines to track API activity, ensuring compliance with healthcare regulations and enabling proactive security monitoring.
Performed deep-dive analyses on healthcare claims datasets in Redshift and Databricks, uncovering utilization trends and anomaly patterns that guided cost-containment initiatives.
Designed KPI-driven Power BI and QuickSight dashboards by transforming Redshift datasets, delivering insights that highlighted trends, risks, and cost-saving opportunities for business teams.
Developed reusable Python modules for ETL scheduling, exception handling, and audit logging, following object-oriented design principles to enhance maintainability and ensure SLA compliance
Developed Python monitoring scripts for AWS CloudWatch and Azure Log Analytics to proactively detect job failures.
Built and scaled enterprise-grade Databricks pipelines on AWS for TPL datasets, leveraging PySpark with Glue Data Catalog integration to enforce schema consistency, enable incremental loading, and reduce pipeline failures

DATA ENGINEER
Deloitte USI India January 2021 - August 2023
Designed and built SQL-based ETL workflows to move and transform large datasets from Oracle, MySQL, and SQL Server into HDFS and AWS-based environments.
Maintained legacy SSIS pipelines and transitioned workloads to AWS Glue and Lambda.
Set up real-time data pipelines using Apache Kafka to process and deliver streaming data to downstream systems for faster insights and actions.
Developed Python ETL frameworks to migrate high-volume SQL workloads to AWS, implementing transformation logic, data manipulation routines, and partitioning strategies that improved scalability and performance.
Leveraged Apache Hive on AWS and HDFS to perform batch transformations and analytical queries on large-scale retail and financial datasets, improving data accessibility and pipeline throughput.
Integrated Apache Iceberg tables within the AWS Glue Data Catalog for schema evolution and transactional consistency across the lakehouse.
Collaborated with data architects to design and implement dimensional data models and star schemas in Databricks, optimizing query performance and improving analytics accessibility for retail and finance stakeholders.
Developed and tuned Oracle PL/SQL ETL workflows to support migration into AWS-based data lakes.
Managed S3-based data lakes and integrated batch and streaming sources for unified analytics across Amazon Redshift, QuickSight, and Tableau.
Implemented backup and disaster recovery strategies, including automated snapshots, multi-AZ backups, and failover testing.
Automated data ingestion into Amazon RDS and DynamoDB to maintain up-to-date datasets.
Troubleshot pipeline and database issues using CloudWatch, CloudTrail, and system logs.
Implemented CI/CD automation using GitHub Actions, AWS CodePipeline to streamline deployment.
Worked closely with data architects to implement dimensional data models and star schemas, improving analytics accessibility.
Applied algorithmic optimization and Python transformation logic within Databricks to design scalable dimensional data models, leveraging caching and vectorized operations to improve query performance for analytics teams.
Orchestrated and scheduled ETL workflows with Apache Airflow, ensuring reliable and timely data delivery.
Built interactive dashboards in Amazon QuickSight and Tableau Server, enabling stakeholders to explore key metrics from S3 and Redshift.
Leveraged Amazon Athena for ad-hoc analysis on raw and curated datasets stored in S3, reducing turnaround time for exploratory data requests.
ETL DEVELOPER
Genius Consultants India July 2017 Dec 2020
Designed and implemented ETL pipelines to extract, transform, and load data from Oracle, SQL Server, flat files, and XML into target databases.
Built SSIS packages to automate data ingestion, cleansing, and transformation workflows based on business rules.
Created stored procedures, views, and functions to support data transformations and streamline the loading process.
Optimized SQL queries and ETL workflows (indexes, partitioning) to reduce load times and improve efficiency.
Developed Python-based synthetic data generators to simulate claims datasets for testing and data-quality validation.
Built data validation scripts in SQL to compare source and target datasets, ensuring accuracy and consistency in ETL jobs.
Partnered with business users to translate requirements into ETL logic, ensuring accurate and business-aligned data delivery.
Worked on Spark Streaming in initial implementations and later refactored pipelines to Structured Streaming, aligning with Spark s modern architecture and best practices.
Developed complex T-SQL stored procedures, window functions, and CTEs to support data transformations and reporting needs.
Monitored daily batch jobs and validated staging tables to ensure accurate data flow and catch discrepancies early.
Investigated ETL failures through log analysis and implemented fixes to ensure timely and accurate data delivery.
Scheduled and monitored ETL jobs using SQL Server Agent.
Automated ETL error handling and alerting with Shell scripts, improving SLA adherence.
Documented ETL workflows, data mappings, and transformation logic to support knowledge transfer and maintenance.
Developed reconciliation processes to ensure data integrity between staging and production systems.
EDUCATION


MASTER OF SCIENCE IN BUSINESS ANALYTICS AND ARTIFICIAL INTELLIGENCE MAY 2025
The University of Texas at Dallas

BACHELOR OF SCIENCE IN COMPUTER SCIENCE MAY 2017
Adikavi Nannaya University
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree active directory procedural language Arizona Missouri

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6406
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: