Resume View

Home

Brahma Vamsi Madasu - Senior Data Engineer

Location: , , USA

Relocation: Yes

Visa:

Professional summary:
With 11+ years of extensive IT experience, specializing in Azure Cloud Services, Big Data technologies, data modelling, ETL/ELT
development, validation, deployment, monitoring, visualization, and requirement gathering across healthcare and finance
domains.
Demonstrated expertise in Azure Cloud Platform: Azure Data Factory (ADF), Azure Databricks (PySpark, Spark SQL), Azure
Synapse Analytics, Azure Data Lake Storage Gen2, Blob Storage, Azure SQL Database, Event Hubs, Logic Apps, Functions,
Purview, Key Vault, DevOps, Repos, Monitor, Cosmos DB, AKS, and HDInsight.
Skilled in ETL/ELT pipeline design and orchestration using Azure Data Factory and Databricks, implementing incremental loads,
CDC, surrogate key strategies, and complex dataflows to process terabytes of structured and semi-structured data.
Hands-on expertise in migrating legacy ETL workloads (SSIS, Informatica, DataStage, Teradata utilities, Oracle PL/SQL)
into Azure Synapse Analytics, Databricks, and Snowflake, improving performance by 30 40% and reducing operational costs.
Architected and deployed Delta Lake Medallion Architecture (Bronze Silver Gold layers) with schema enforcement, ACID
transactions, and lineage tracking to manage healthcare PHI data and financial datasets.
Strong background in healthcare data engineering: integrated Electronic Health Records (Epic, Cerner), payer claims (EDI
837/835), HL7/FHIR feeds, IoT patient monitoring devices, lab systems, and CRM data into enterprise data lakes and
warehouses.
Designed healthcare-specific warehouse models (fact: claims, encounters, pharmacy, authorizations; dim: member, provider,
diagnosis, plan) to support payer analytics, utilization management, CMS/HEDIS reporting, and quality improvement initiatives.
Finance sector expertise: developed data warehouse schemas for transactions, accounts, payments, products, customers, and
channels; supported fraud detection, AML, compliance, and regulatory reporting.
Implemented real-time streaming frameworks using Kafka, Event Hubs, Spark Streaming, and Azure Stream Analytics to
process HL7 ADT events, IoT data, and financial transactions with low latency for alerts and fraud/risk scoring.
Built and optimized PySpark/Spark SQL transformations for cleansing, deduplication, and aggregation across SQL Server,
Oracle, PostgreSQL, Teradata, MongoDB, and MySQL.
Developed Snowflake data marts using surrogate keys, clustering keys, materialized views, Slowly Changing Dimensions (SCD),
and zero-copy cloning, achieving 25% improvement in query performance.
Implemented DBT models for reusable healthcare transformations, automated testing, documentation, and HIPAA compliance
audits.
Designed Power BI and Tableau dashboards for executives and clinicians, visualizing readmission rates, LOS, chronic disease
stratification, provider performance, fraud detection, portfolio risk, and cost/utilization trends.
Configured Azure Monitor, Purview, and Logic Apps to provide real-time pipeline monitoring, metadata management, lineage
tracking, and automated alerting, cutting incident response times by 30%.
Designed and implemented MLOps pipelines in Databricks with MLflow and Azure ML, deploying predictive models
for readmission risk, chronic disease progression, and fraud detection in production.
Led data governance and compliance programs: applied Azure RBAC, Key Vault, tokenization, and encryption at rest/in transit
to secure PHI/PII and achieve HIPAA/HITECH, PCI-DSS, and SOX compliance.
Automated infrastructure provisioning with Terraform and Azure DevOps Pipelines, deploying ADF, Databricks, Synapse, Event
Hubs, Snowflake, and secure VNETs across dev, QA, and prod.
Migrated SSIS, Informatica, and DataStage ETL workflows into ADF pipelines and Databricks notebooks, modernizing pipelines
with reusable templates and error handling.
Extensive Big Data background: Hadoop (HDFS, YARN, MapReduce), Hive, Pig, Sqoop, Oozie, NiFi, Spark (RDD, Data Frames,
Streaming), HBase, MongoDB, and AWS EMR/Redshift.
Developed Spark Streaming apps using Scala, Kafka, and HBase for real-time analytics and Lambda architecture pipelines.
Designed Hive schemas with partitioning, bucketing, ORC/Parquet formats, and Snappy compression; optimized HiveQL queries
for performance and scalability.
Skilled in query optimization and Spark tuning, using broadcast joins, caching, and partitioning to cut processing times
significantly.
Designed and modelled normalized, denormalized, relational, dimensional, Star and Snowflake schemas using Erwin and
ER/Studio, supporting both OLTP and OLAP applications.
Proficient in Python, PySpark, SQL (T-SQL, PL/SQL, HiveQL), Scala, Java, with strong shell scripting (Linux/Unix, PowerShell) for
automation, data manipulation, and deployment.
Delivered data validation, profiling, and reconciliation frameworks to ensure high data quality across healthcare and finance
reporting pipelines.
Experienced with CI/CD automation and DevOps: Git, GitHub, GitLab, Jenkins, GitHub Actions, Azure DevOps YAML pipelines,
Docker, Kubernetes, Helm, Terraform.
Experienced in Agile/Scrum methodology, participating in sprint planning, daily stand-ups, backlog grooming, retrospectives,
and mentoring junior engineers in Azure and Big Data best practices.

Education:
Master s in data science, University of New Haven, CT Jan 2012 to Dec 2013
Bachelor s in Electronics and communication Engineering, Sathyabama University Jun 2007 to Jun 2011

Certifications:
Microsoft Certified Azure Data Engineer Associate DP 203
Microsoft Certified Fabric Data Engineer Associate -DP 700
Microsoft Azure Fundamentals AZ-900
Databricks Certified Data Engineer Professional

Technical skills:
Azure Services Azure Data Factory (ADF), Azure Databricks, Azure Synapse Analytics, Azure Data
Lake Storage Gen2 (ADLS), Azure Blob Storage, Azure SQL Database, Azure Event
Hubs, Azure Functions, Azure Logic Apps, Azure Purview, Azure Key Vault, Azure
DevOps, Azure Repos, Azure Monitor, Azure Cosmos DB, Azure Kubernetes Service
(AKS), Azure Stream Analytics, Azure Virtual Machines, Azure Active Directory, Azure
HDInsight.
AWS Services Amazon S3, AWS Glue, AWS Redshift, AWS EMR, AWS Lambda, AWS Step Functions,
Amazon Kinesis, AWS Athena, AWS Lake Formation, AWS CloudWatch, AWS EventBridge,
AWS Secrets Manager.
Big Data Technologies Hadoop (HDFS, MapReduce, YARN), Apache Spark (PySpark, Spark SQL, Spark
Streaming), Hive, Pig, Sqoop, Kafka, Oozie, Airflow, Delta Lake, DBT, Snowflake,
Informatica, DataStage, Talend, Apache NiFi, Cloudera, Hortonworks, Zookeeper,
Elasticsearch, Kibana.
Languages SQL, T-SQL, PL/SQL, PySpark, Python (Pandas, NumPy), Scala, Java, HiveQL, APIs
(REST/SOAP).
Web Technologies HTML, CSS, JavaScript, XML, JSP, Restful, SOAP
Operating Systems Windows (10/7/XP/2000/NT/98/95) UNIX, LINUX, UBUNTU, OS.
File formats ORC, Avro, CSV, JSON, TXT, XML, Excel
Build Automation tools Maven, SBT
Version Control & CI/CD Tools Git, GitHub, Bitbucket, GitLab, Jenkins, GitHub Actions, Azure DevOps Pipelines
(YAML), Terraform, Docker, Kubernetes, Helm, RBAC/IAM Policies, Bicep,
CloudFormation.
IDE &Build Tools, Design Eclipse, Visual Studio, SSIS, SSAS, SSRS, SSMS
Visualization Tools Power BI, Tableau, Microsoft Fabric, QlikView, Alteryx.
Databases
Azure Synapse, Snowflake, SQL Server, Oracle, PostgreSQL, MySQL, MongoDB, Teradata,
DynamoDB, Cassandra.

Professional experience:
Client: Evolent, Tx. Sep 2022 - Present
Role: Senior Data Engineer
Responsibilities:

Designed and implemented enterprise-scale healthcare data pipelines using Azure Data Factory (ADF) and Databricks to
ingest and transform EHR (Epic, Cerner), payer claims (EDI 837/835), HL7/FHIR feeds, IoT medical devices, and lab
systems, ensuring seamless integration across multiple data sources.
Migrated large-scale on-premise Oracle and Teradata ETL processes to Azure Synapse Analytics and Snowflake, reducing
processing times by 40% and eliminating legacy system dependencies.
Architected and deployed Delta Lake Medallion Architecture (Bronze Silver Gold layers) in ADLS Gen2, supporting ACID
compliance, schema enforcement, and lineage tracking for PHI across healthcare domains such as claims, encounters,
member, provider, and pharmacy data.
Built real-time data streaming frameworks with Kafka, Azure Event Hubs, and Spark Streaming to process HL7 ADT
messages and IoT-generated patient monitoring data, providing clinicians and care managers with near real-time alerts.
Developed advanced PySpark and Spark SQL transformations to clean, normalize, and aggregate healthcare datasets from
SQL Server, PostgreSQL, Oracle, MongoDB, and MySQL, enabling unified patient and claims analytics.
Designed healthcare-specific data warehouse models (fact: claims, encounters, pharmacy, authorizations; dim: member,
provider, diagnosis, procedure, plan) to support payer and provider analytics, risk adjustment, utilization management,
and quality reporting (HEDIS, CMS Star Ratings).
Optimized Snowflake and Synapse workloads by implementing clustering keys, materialized views, partitioning strategies,
and zero-copy cloning, improving query performance by 25% and reducing storage costs.
Developed incremental load strategies in ADF using surrogate keys, lookup activities, and CDC techniques to process
terabytes of claims and patient data with minimal downtime.
Automated data ingestion and scheduling using Airflow DAGs, ADF pipelines, and Logic Apps, ensuring timely delivery of
payer and provider reports across multiple healthcare systems.
Implemented DBT for healthcare data transformations, enabling reusable data models, automated data quality checks,
metadata documentation, and faster compliance audits.
Designed and deployed Power BI dashboards and Tableau reports for clinical and financial stakeholders, providing
interactive visualizations for readmission rates, length of stay (LOS), chronic disease risk stratification, provider
performance, and cost/utilization trends.
Configured Azure Monitor and Logic Apps to create real-time monitoring and alerting for healthcare data pipelines,
reducing incident response times by 35%.
Led the design and implementation of MLOps pipelines in Databricks using MLflow and Azure ML, deploying predictive
models for patient readmission risk, chronic disease progression, and care gap detection.
Partnered with clinicians, payer IT teams, and compliance officers to integrate Enterprise Risk Management (ERM)
strategies, enhancing HIPAA/HITECH compliance, audit readiness, and regulatory reporting accuracy.
Implemented robust data governance frameworks using Azure Purview, Azure RBAC, Azure Key Vault, and Active Directory
groups to ensure secure, role-based access to PHI across dev, QA, and production environments.
Automated infrastructure provisioning using Terraform to deploy HIPAA-compliant Azure resources including ADF,
Databricks, Synapse, Event Hubs, and secure VNETs, reducing manual configuration errors.
Migrated legacy SSIS and DataStage ETL workflows into ADF and Databricks notebooks, optimizing performance and
modernizing integration with cloud-native services.
Created Python and Scala scripts in Databricks for validation, cleansing, deduplication, and transformation of healthcare
data, improving data quality across downstream analytics systems.
Designed centralized access governance models to enable secure cross-team collaboration for payer and provider data
analysis while meeting compliance requirements.
Applied data masking, tokenization, and encryption at rest/in transit strategies for PHI using Key Vault, ensuring secure
workflows for healthcare data processing.
Built batch and streaming data solutions combining Kafka, Spark, and Event Hubs with Azure Analysis Services for real
time insights into claims adjudication, provider performance, and patient monitoring.
Configured Databricks Unity Catalog and Purview integration to streamline healthcare data cataloguing, metadata
management, and lineage tracking.
Implemented Agile development practices, participating in sprint planning, backlog grooming, and daily stand-ups while
mentoring offshore and junior engineers on Azure data engineering and healthcare compliance best practices.
Conducted code reviews, performance tuning, and pipeline optimization, ensuring all healthcare data engineering
solutions adhered to compliance, scalability, and operational SLAs.
Collaborated with business stakeholders to design executive dashboards for cost-of-care, value-based care programs, and
population health management initiatives, directly impacting organizational strategy.
Led cross-functional initiatives with clinical operations, IT, and compliance teams to align healthcare data engineering
solutions with enterprise goals, reducing manual reporting efforts by 60%.

Environment: Azure Data Factory, Azure Databricks, PySpark, Spark SQL, Delta Lake, ADLS Gen2, Azure Synapse, Snowflake, Azure
SQL DB, Event Hubs, Kafka, Spark Streaming, HDInsight, PostgreSQL, Oracle, Teradata, MongoDB, DBT, Airflow, Terraform, Azure
DevOps, GitHub, Jenkins, Power BI, Tableau, Azure Purview, Azure Key Vault, HIPAA, HITECH, HL7, FHIR, EDI 837/835, IoT
healthcare devices.

Client: Discover, IL. July 2019 to Aug 2022
Role: Azure data engineer
Responsibilities:

Designed and implemented Azure Data Factory (ADF) pipelines to ingest, transform, and load transactional banking,
customer, and credit data from diverse on-premises systems (SQL Server, Oracle, PostgreSQL, Teradata) into Azure
Synapse Analytics and ADLS Gen2, ensuring high availability and data quality
Migrated large-scale on-premise ETL workloads (SSIS, Informatica, Teradata utilities) into Azure Databricks and Synapse,
reducing batch processing times by 35% and optimizing operational costs.
Developed end-to-end ETL processes with PySpark in Databricks to process terabytes of financial data, including customer
transactions, deposits, credit card usage, loan servicing, and payments, ensuring accuracy for reporting and regulatory
compliance.
Built data Lakehouse solutions on ADLS Gen2 and Delta Lake, supporting both batch and real-time ingestion pipelines for
transactional and customer data.
Designed data warehouse schemas (fact tables: transactions, accounts, payments; dimension tables: customer, branch,
product, channel) to support financial reporting, customer insights, and risk management analytics.
Implemented real-time streaming solutions using Kafka, Event Hubs, and Spark Streaming to capture banking transactions
and payment events, supporting fraud detection, AML (Anti-Money Laundering), and risk scoring with minimal latency.
Developed Snowflake data marts with surrogate keys, Slowly Changing Dimensions (SCD), clustering keys, and
materialized views to optimize compliance reporting and reconciliation workflows.
Implemented incremental replication strategies in ADF with CDC (Change Data Capture), ensuring accurate daily loads of
high-volume transactional data.
Automated data ingestion and orchestration using Airflow DAGs and ADF triggers, ensuring SLA compliance for risk and
compliance reporting deadlines.
Created SQL stored procedures, views, and functions for reconciliation of financial datasets across multiple systems,
enabling internal audit and external regulatory reporting.
Partnered with compliance and risk teams to align pipelines with PCI-DSS, SOX, and Basel III standards, ensuring end-to
end governance and audit readiness.
Secured sensitive PCI/PII data by configuring Azure RBAC, Azure Key Vault, and encryption at rest/in transit, reducing
security risks and supporting regulatory compliance.
Designed Power BI dashboards to visualize KPIs such as transaction volumes, fraud detection metrics, customer churn,
loan delinquency, and portfolio performance, enabling executives to make data-driven decisions.
Implemented CI/CD pipelines with Azure DevOps and GitHub Actions for automated deployment of ETL pipelines,
reducing manual errors and accelerating release cycles.
Optimized Databricks Spark jobs using partitioning, caching, and broadcast joins to improve runtime performance for
large transaction datasets.
Automated infrastructure provisioning using Terraform, creating repeatable deployments of Databricks clusters, Synapse
pools, and secure VNET configurations across dev, QA, and prod.
Migrated legacy SSIS ETL packages into ADF, standardizing pipelines and integrating with Azure monitoring for better
reliability.
Collaborated with QA and finance teams to validate ETL outputs against reconciliation rules, balance checks, and
compliance policies, ensuring accuracy in financial reporting.
Worked in Agile/Scrum environments, participating in sprint planning, backlog grooming, and daily stand-ups, while
mentoring junior engineers on Azure and big data best practices.

Environment: Azure Data Factory, Azure Databricks, PySpark, Spark SQL, Delta Lake, ADLS Gen2, Azure Synapse, Snowflake, SQL
Server, Oracle, PostgreSQL, Teradata, Kafka, Event Hubs, Spark Streaming, SSIS, Informatica, Airflow, Terraform, Azure Key Vault,
Azure DevOps, GitHub, Jenkins, Power BI, Tableau, PCI-DSS, SOX, Basel III compliance.

Client: AXA insurance, CT. May 2016 to June2019
Role: Big Data Developer
Responsibilities:

Developed NiFi workflows to automate secure data movement between heterogeneous Hadoop systems, ensuring
consistent ingestion and transformation pipelines.
Configured, deployed, and maintained multi-node Hadoop/YARN clusters and Kinesis Streams for real-time financial and
operational data processing.
Built Spark applications (RDD, Data Frames, Spark SQL) over Cloudera Hadoop for analytics on structured and
unstructured datasets, improving batch ETL efficiency compared to MapReduce.
Developed Scala and Java MapReduce programs to process large volumes of log, transaction, and customer data,
implementing aggregation, filtering, and anomaly detection logic.
Designed Hive schemas with partitioning, bucketing, and Snappy compression, and converted data into Parquet formats
for optimized querying and storage.
Imported large datasets from DB2 and RDBMS into Hive using Sqoop; created automated Sqoop jobs for incremental data
ingestion into Hadoop clusters.
Implemented Apache Pig scripts for data cleansing, transformation, and enrichment; optimized Pig UDFs in Java/Scala to
handle complex ETL requirements.
Coordinated ETL pipelines across Hive, Pig, and Spark jobs using Oozie workflows and coordinators, ensuring timely
availability of curated datasets for downstream analytics.
Designed and executed Spark Streaming applications with Lambda Architecture, processing real-time data feeds from DB2
into HBase tables via Kinesis Streams.
Integrated MongoDB with Spark for interactive queries and processed semi-structured JSON datasets for analytical
workloads.
Built HBase tables for fast lookups and random access to large healthcare/financial datasets, supporting operational
dashboards and APIs.
Used Spark SQL and HiveQL for interactive ad-hoc queries, supporting business analysts with reporting and predictive
modeling data sets.
Performed data extraction, profiling, and transformation across diverse sources (RDBMS, APIs, log files) into Hadoop Data
Lake, enabling advanced analytics.
Wrote Pig and Hive scripts to clean, enrich, and partition daily transaction and clickstream data, supporting customer
segmentation and churn analysis.
Developed Spark jobs for in-memory computation, enhancing performance for iterative machine learning algorithms and
complex transformations.
Worked with AWS Cloud services (EC2, S3, EBS, RDS, VPC) to integrate Hadoop/Spark workloads with cloud-native storage
and compute.
Analyzed SQL scripts and legacy ETL logic and re-engineered them into Spark pipelines, reducing batch runtime by up to
40%.
Migrated existing Hive/SQL queries into Spark SQL and Scala-based transformations, improving query performance and
scalability.
Developed custom Spark UDFs and aggregation functions to extend query capabilities for financial and customer behavior
datasets.
Responsible for Kinesis Spark HBase integration pipelines to deliver real-time curated datasets to customer-facing
APIs.
Built data ingestion pipelines with Flume and Kafka for streaming log and transactional data, later transformed into Hive
tables for reporting.
Provided support for BI teams, designing Hive external tables, creating data marts, and validating ETL outputs for
reporting.
Mentored junior developers in Hadoop/Spark best practices and contributed to Agile sprint planning and daily stand-ups.,
Kafka, Flume, AWS (EC2, S3, RDS, VPC), Cloudera, Hortonworks, Yarn, Linux.

Environment: Hadoop, HDFS, Hive, Pig, Sqoop, Spark (RDD, SQL, Streaming, Data Frames), Scala, Java, Python, Shell Scripting,
MapReduce, HBase, Kafka, Flume, Oozie, AWS (EC2, S3, RDS, VPC), MongoDB, Cloudera Manager, Hortonworks, Yarn,
Ubuntu/Linux, Agile.

Client: Hilmar Company, CA. Jan 2014 to Apr 2016
Role: SQL developer
Responsibilities:
Designed, developed, and optimized complex SQL queries, stored procedures, triggers, views, and user-defined
functions in SQL Server, Oracle PL/SQL, and MySQL to support enterprise reporting and operational analytics.
Built and maintained SSIS ETL workflows to extract, transform, and load financial, operational, and healthcare data from
multiple OLTP systems into staging and reporting databases, ensuring accuracy, lineage, and completeness.
Modelled and implemented relational, dimensional, star, and snowflake schemas, supporting both OLTP applications and
OLAP/BI use cases for executives and operational teams.
Performed query optimization and indexing strategies (clustered/non-clustered indexes, execution plan analysis) that
reduced execution times for mission-critical financial reports by up to 50%.
Developed parameterized stored procedures for incremental loads, reconciliation, audit logging, and data quality checks,
automating recurring ETL workflows and reducing manual intervention.
Migrated legacy MS Access and flat-file pipelines into SQL Server and Oracle, eliminating manual reporting inefficiencies
and standardizing ETL processes.
Built error handling and logging frameworks in SSIS and PL/SQL to provide robust monitoring and recovery for daily ETL
jobs.
Developed automation scripts in T-SQL and PL/SQL for recurring data loads, reconciliation checks, and financial audits,
ensuring compliance with internal and regulatory policies.
Supported financial and healthcare units by delivering ad-hoc SQL queries and reports for compliance audits, operational
KPIs, and strategic planning.
Integrated SQL Server with SSRS and Tableau dashboards to deliver interactive reports on transactions, claims, and
operational metrics for executives.
Provided database administration support (user access, backup/restore, indexing, performance monitoring) across
development, QA, and production environments.
Participated in full SDLC lifecycle including requirements gathering, database design, ETL development, testing,
deployment, and ongoing production support.
Collaborated in Agile teams, participating in sprint ceremonies and mentoring junior developers on SQL optimization and
ETL best practices.

Environment: SQL Server, T-SQL, SSIS, SSRS, Oracle PL/SQL, MySQL, ETL, Stored Procedures, Functions, Views, Indexing, Query
Optimization, Data Warehousing, Star/Snowflake Schemas, Performance Tuning, Tableau, Agile.
Keywords: continuous integration continuous deployment quality analyst machine learning business intelligence sthree database active directory information technology microsoft mississippi procedural language Arizona California Connecticut Illinois Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];6239

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: