| Brahma Vamsi Madasu - Senior Data Engineer |
| [email protected] |
| Location: , , USA |
| Relocation: Yes |
| Visa: |
|
Professional summary:
With 11+ years of extensive IT experience, specializing in Azure Cloud Services, Big Data technologies, data modelling, ETL/ELT development, validation, deployment, monitoring, visualization, and requirement gathering across healthcare and finance domains. Demonstrated expertise in Azure Cloud Platform: Azure Data Factory (ADF), Azure Databricks (PySpark, Spark SQL), Azure Synapse Analytics, Azure Data Lake Storage Gen2, Blob Storage, Azure SQL Database, Event Hubs, Logic Apps, Functions, Purview, Key Vault, DevOps, Repos, Monitor, Cosmos DB, AKS, and HDInsight. Skilled in ETL/ELT pipeline design and orchestration using Azure Data Factory and Databricks, implementing incremental loads, CDC, surrogate key strategies, and complex dataflows to process terabytes of structured and semi-structured data. Hands-on expertise in migrating legacy ETL workloads (SSIS, Informatica, DataStage, Teradata utilities, Oracle PL/SQL) into Azure Synapse Analytics, Databricks, and Snowflake, improving performance by 30 40% and reducing operational costs. Architected and deployed Delta Lake Medallion Architecture (Bronze Silver Gold layers) with schema enforcement, ACID transactions, and lineage tracking to manage healthcare PHI data and financial datasets. Strong background in healthcare data engineering: integrated Electronic Health Records (Epic, Cerner), payer claims (EDI 837/835), HL7/FHIR feeds, IoT patient monitoring devices, lab systems, and CRM data into enterprise data lakes and warehouses. Designed healthcare-specific warehouse models (fact: claims, encounters, pharmacy, authorizations; dim: member, provider, diagnosis, plan) to support payer analytics, utilization management, CMS/HEDIS reporting, and quality improvement initiatives. Finance sector expertise: developed data warehouse schemas for transactions, accounts, payments, products, customers, and channels; supported fraud detection, AML, compliance, and regulatory reporting. Implemented real-time streaming frameworks using Kafka, Event Hubs, Spark Streaming, and Azure Stream Analytics to process HL7 ADT events, IoT data, and financial transactions with low latency for alerts and fraud/risk scoring. Built and optimized PySpark/Spark SQL transformations for cleansing, deduplication, and aggregation across SQL Server, Oracle, PostgreSQL, Teradata, MongoDB, and MySQL. Developed Snowflake data marts using surrogate keys, clustering keys, materialized views, Slowly Changing Dimensions (SCD), and zero-copy cloning, achieving 25% improvement in query performance. Implemented DBT models for reusable healthcare transformations, automated testing, documentation, and HIPAA compliance audits. Designed Power BI and Tableau dashboards for executives and clinicians, visualizing readmission rates, LOS, chronic disease stratification, provider performance, fraud detection, portfolio risk, and cost/utilization trends. Configured Azure Monitor, Purview, and Logic Apps to provide real-time pipeline monitoring, metadata management, lineage tracking, and automated alerting, cutting incident response times by 30%. Designed and implemented MLOps pipelines in Databricks with MLflow and Azure ML, deploying predictive models for readmission risk, chronic disease progression, and fraud detection in production. Led data governance and compliance programs: applied Azure RBAC, Key Vault, tokenization, and encryption at rest/in transit to secure PHI/PII and achieve HIPAA/HITECH, PCI-DSS, and SOX compliance. Automated infrastructure provisioning with Terraform and Azure DevOps Pipelines, deploying ADF, Databricks, Synapse, Event Hubs, Snowflake, and secure VNETs across dev, QA, and prod. Migrated SSIS, Informatica, and DataStage ETL workflows into ADF pipelines and Databricks notebooks, modernizing pipelines with reusable templates and error handling. Extensive Big Data background: Hadoop (HDFS, YARN, MapReduce), Hive, Pig, Sqoop, Oozie, NiFi, Spark (RDD, Data Frames, Streaming), HBase, MongoDB, and AWS EMR/Redshift. Developed Spark Streaming apps using Scala, Kafka, and HBase for real-time analytics and Lambda architecture pipelines. Designed Hive schemas with partitioning, bucketing, ORC/Parquet formats, and Snappy compression; optimized HiveQL queries for performance and scalability. Skilled in query optimization and Spark tuning, using broadcast joins, caching, and partitioning to cut processing times significantly. Designed and modelled normalized, denormalized, relational, dimensional, Star and Snowflake schemas using Erwin and ER/Studio, supporting both OLTP and OLAP applications. Proficient in Python, PySpark, SQL (T-SQL, PL/SQL, HiveQL), Scala, Java, with strong shell scripting (Linux/Unix, PowerShell) for automation, data manipulation, and deployment. Delivered data validation, profiling, and reconciliation frameworks to ensure high data quality across healthcare and finance reporting pipelines. Experienced with CI/CD automation and DevOps: Git, GitHub, GitLab, Jenkins, GitHub Actions, Azure DevOps YAML pipelines, Docker, Kubernetes, Helm, Terraform. Experienced in Agile/Scrum methodology, participating in sprint planning, daily stand-ups, backlog grooming, retrospectives, and mentoring junior engineers in Azure and Big Data best practices. Education: Master s in data science, University of New Haven, CT Jan 2012 to Dec 2013 Bachelor s in Electronics and communication Engineering, Sathyabama University Jun 2007 to Jun 2011 Certifications: Microsoft Certified Azure Data Engineer Associate DP 203 Microsoft Certified Fabric Data Engineer Associate -DP 700 Microsoft Azure Fundamentals AZ-900 Databricks Certified Data Engineer Professional Technical skills: Azure Services Azure Data Factory (ADF), Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage Gen2 (ADLS), Azure Blob Storage, Azure SQL Database, Azure Event Hubs, Azure Functions, Azure Logic Apps, Azure Purview, Azure Key Vault, Azure DevOps, Azure Repos, Azure Monitor, Azure Cosmos DB, Azure Kubernetes Service (AKS), Azure Stream Analytics, Azure Virtual Machines, Azure Active Directory, Azure HDInsight. AWS Services Amazon S3, AWS Glue, AWS Redshift, AWS EMR, AWS Lambda, AWS Step Functions, Amazon Kinesis, AWS Athena, AWS Lake Formation, AWS CloudWatch, AWS EventBridge, AWS Secrets Manager. Big Data Technologies Hadoop (HDFS, MapReduce, YARN), Apache Spark (PySpark, Spark SQL, Spark Streaming), Hive, Pig, Sqoop, Kafka, Oozie, Airflow, Delta Lake, DBT, Snowflake, Informatica, DataStage, Talend, Apache NiFi, Cloudera, Hortonworks, Zookeeper, Elasticsearch, Kibana. Languages SQL, T-SQL, PL/SQL, PySpark, Python (Pandas, NumPy), Scala, Java, HiveQL, APIs (REST/SOAP). Web Technologies HTML, CSS, JavaScript, XML, JSP, Restful, SOAP Operating Systems Windows (10/7/XP/2000/NT/98/95) UNIX, LINUX, UBUNTU, OS. File formats ORC, Avro, CSV, JSON, TXT, XML, Excel Build Automation tools Maven, SBT Version Control & CI/CD Tools Git, GitHub, Bitbucket, GitLab, Jenkins, GitHub Actions, Azure DevOps Pipelines (YAML), Terraform, Docker, Kubernetes, Helm, RBAC/IAM Policies, Bicep, CloudFormation. IDE &Build Tools, Design Eclipse, Visual Studio, SSIS, SSAS, SSRS, SSMS Visualization Tools Power BI, Tableau, Microsoft Fabric, QlikView, Alteryx. Databases Azure Synapse, Snowflake, SQL Server, Oracle, PostgreSQL, MySQL, MongoDB, Teradata, DynamoDB, Cassandra. Professional experience: Client: Evolent, Tx. Sep 2022 - Present Role: Senior Data Engineer Responsibilities: Designed and implemented enterprise-scale healthcare data pipelines using Azure Data Factory (ADF) and Databricks to ingest and transform EHR (Epic, Cerner), payer claims (EDI 837/835), HL7/FHIR feeds, IoT medical devices, and lab systems, ensuring seamless integration across multiple data sources. Migrated large-scale on-premise Oracle and Teradata ETL processes to Azure Synapse Analytics and Snowflake, reducing processing times by 40% and eliminating legacy system dependencies. Architected and deployed Delta Lake Medallion Architecture (Bronze Silver Gold layers) in ADLS Gen2, supporting ACID compliance, schema enforcement, and lineage tracking for PHI across healthcare domains such as claims, encounters, member, provider, and pharmacy data. Built real-time data streaming frameworks with Kafka, Azure Event Hubs, and Spark Streaming to process HL7 ADT messages and IoT-generated patient monitoring data, providing clinicians and care managers with near real-time alerts. Developed advanced PySpark and Spark SQL transformations to clean, normalize, and aggregate healthcare datasets from SQL Server, PostgreSQL, Oracle, MongoDB, and MySQL, enabling unified patient and claims analytics. Designed healthcare-specific data warehouse models (fact: claims, encounters, pharmacy, authorizations; dim: member, provider, diagnosis, procedure, plan) to support payer and provider analytics, risk adjustment, utilization management, and quality reporting (HEDIS, CMS Star Ratings). Optimized Snowflake and Synapse workloads by implementing clustering keys, materialized views, partitioning strategies, and zero-copy cloning, improving query performance by 25% and reducing storage costs. Developed incremental load strategies in ADF using surrogate keys, lookup activities, and CDC techniques to process terabytes of claims and patient data with minimal downtime. Automated data ingestion and scheduling using Airflow DAGs, ADF pipelines, and Logic Apps, ensuring timely delivery of payer and provider reports across multiple healthcare systems. Implemented DBT for healthcare data transformations, enabling reusable data models, automated data quality checks, metadata documentation, and faster compliance audits. Designed and deployed Power BI dashboards and Tableau reports for clinical and financial stakeholders, providing interactive visualizations for readmission rates, length of stay (LOS), chronic disease risk stratification, provider performance, and cost/utilization trends. Configured Azure Monitor and Logic Apps to create real-time monitoring and alerting for healthcare data pipelines, reducing incident response times by 35%. Led the design and implementation of MLOps pipelines in Databricks using MLflow and Azure ML, deploying predictive models for patient readmission risk, chronic disease progression, and care gap detection. Partnered with clinicians, payer IT teams, and compliance officers to integrate Enterprise Risk Management (ERM) strategies, enhancing HIPAA/HITECH compliance, audit readiness, and regulatory reporting accuracy. Implemented robust data governance frameworks using Azure Purview, Azure RBAC, Azure Key Vault, and Active Directory groups to ensure secure, role-based access to PHI across dev, QA, and production environments. Automated infrastructure provisioning using Terraform to deploy HIPAA-compliant Azure resources including ADF, Databricks, Synapse, Event Hubs, and secure VNETs, reducing manual configuration errors. Migrated legacy SSIS and DataStage ETL workflows into ADF and Databricks notebooks, optimizing performance and modernizing integration with cloud-native services. Created Python and Scala scripts in Databricks for validation, cleansing, deduplication, and transformation of healthcare data, improving data quality across downstream analytics systems. Designed centralized access governance models to enable secure cross-team collaboration for payer and provider data analysis while meeting compliance requirements. Applied data masking, tokenization, and encryption at rest/in transit strategies for PHI using Key Vault, ensuring secure workflows for healthcare data processing. Built batch and streaming data solutions combining Kafka, Spark, and Event Hubs with Azure Analysis Services for real time insights into claims adjudication, provider performance, and patient monitoring. Configured Databricks Unity Catalog and Purview integration to streamline healthcare data cataloguing, metadata management, and lineage tracking. Implemented Agile development practices, participating in sprint planning, backlog grooming, and daily stand-ups while mentoring offshore and junior engineers on Azure data engineering and healthcare compliance best practices. Conducted code reviews, performance tuning, and pipeline optimization, ensuring all healthcare data engineering solutions adhered to compliance, scalability, and operational SLAs. Collaborated with business stakeholders to design executive dashboards for cost-of-care, value-based care programs, and population health management initiatives, directly impacting organizational strategy. Led cross-functional initiatives with clinical operations, IT, and compliance teams to align healthcare data engineering solutions with enterprise goals, reducing manual reporting efforts by 60%. Environment: Azure Data Factory, Azure Databricks, PySpark, Spark SQL, Delta Lake, ADLS Gen2, Azure Synapse, Snowflake, Azure SQL DB, Event Hubs, Kafka, Spark Streaming, HDInsight, PostgreSQL, Oracle, Teradata, MongoDB, DBT, Airflow, Terraform, Azure DevOps, GitHub, Jenkins, Power BI, Tableau, Azure Purview, Azure Key Vault, HIPAA, HITECH, HL7, FHIR, EDI 837/835, IoT healthcare devices. Client: Discover, IL. July 2019 to Aug 2022 Role: Azure data engineer Responsibilities: Designed and implemented Azure Data Factory (ADF) pipelines to ingest, transform, and load transactional banking, customer, and credit data from diverse on-premises systems (SQL Server, Oracle, PostgreSQL, Teradata) into Azure Synapse Analytics and ADLS Gen2, ensuring high availability and data quality Migrated large-scale on-premise ETL workloads (SSIS, Informatica, Teradata utilities) into Azure Databricks and Synapse, reducing batch processing times by 35% and optimizing operational costs. Developed end-to-end ETL processes with PySpark in Databricks to process terabytes of financial data, including customer transactions, deposits, credit card usage, loan servicing, and payments, ensuring accuracy for reporting and regulatory compliance. Built data Lakehouse solutions on ADLS Gen2 and Delta Lake, supporting both batch and real-time ingestion pipelines for transactional and customer data. Designed data warehouse schemas (fact tables: transactions, accounts, payments; dimension tables: customer, branch, product, channel) to support financial reporting, customer insights, and risk management analytics. Implemented real-time streaming solutions using Kafka, Event Hubs, and Spark Streaming to capture banking transactions and payment events, supporting fraud detection, AML (Anti-Money Laundering), and risk scoring with minimal latency. Developed Snowflake data marts with surrogate keys, Slowly Changing Dimensions (SCD), clustering keys, and materialized views to optimize compliance reporting and reconciliation workflows. Implemented incremental replication strategies in ADF with CDC (Change Data Capture), ensuring accurate daily loads of high-volume transactional data. Automated data ingestion and orchestration using Airflow DAGs and ADF triggers, ensuring SLA compliance for risk and compliance reporting deadlines. Created SQL stored procedures, views, and functions for reconciliation of financial datasets across multiple systems, enabling internal audit and external regulatory reporting. Partnered with compliance and risk teams to align pipelines with PCI-DSS, SOX, and Basel III standards, ensuring end-to end governance and audit readiness. Secured sensitive PCI/PII data by configuring Azure RBAC, Azure Key Vault, and encryption at rest/in transit, reducing security risks and supporting regulatory compliance. Designed Power BI dashboards to visualize KPIs such as transaction volumes, fraud detection metrics, customer churn, loan delinquency, and portfolio performance, enabling executives to make data-driven decisions. Implemented CI/CD pipelines with Azure DevOps and GitHub Actions for automated deployment of ETL pipelines, reducing manual errors and accelerating release cycles. Optimized Databricks Spark jobs using partitioning, caching, and broadcast joins to improve runtime performance for large transaction datasets. Automated infrastructure provisioning using Terraform, creating repeatable deployments of Databricks clusters, Synapse pools, and secure VNET configurations across dev, QA, and prod. Migrated legacy SSIS ETL packages into ADF, standardizing pipelines and integrating with Azure monitoring for better reliability. Collaborated with QA and finance teams to validate ETL outputs against reconciliation rules, balance checks, and compliance policies, ensuring accuracy in financial reporting. Worked in Agile/Scrum environments, participating in sprint planning, backlog grooming, and daily stand-ups, while mentoring junior engineers on Azure and big data best practices. Environment: Azure Data Factory, Azure Databricks, PySpark, Spark SQL, Delta Lake, ADLS Gen2, Azure Synapse, Snowflake, SQL Server, Oracle, PostgreSQL, Teradata, Kafka, Event Hubs, Spark Streaming, SSIS, Informatica, Airflow, Terraform, Azure Key Vault, Azure DevOps, GitHub, Jenkins, Power BI, Tableau, PCI-DSS, SOX, Basel III compliance. Client: AXA insurance, CT. May 2016 to June2019 Role: Big Data Developer Responsibilities: Developed NiFi workflows to automate secure data movement between heterogeneous Hadoop systems, ensuring consistent ingestion and transformation pipelines. Configured, deployed, and maintained multi-node Hadoop/YARN clusters and Kinesis Streams for real-time financial and operational data processing. Built Spark applications (RDD, Data Frames, Spark SQL) over Cloudera Hadoop for analytics on structured and unstructured datasets, improving batch ETL efficiency compared to MapReduce. Developed Scala and Java MapReduce programs to process large volumes of log, transaction, and customer data, implementing aggregation, filtering, and anomaly detection logic. Designed Hive schemas with partitioning, bucketing, and Snappy compression, and converted data into Parquet formats for optimized querying and storage. Imported large datasets from DB2 and RDBMS into Hive using Sqoop; created automated Sqoop jobs for incremental data ingestion into Hadoop clusters. Implemented Apache Pig scripts for data cleansing, transformation, and enrichment; optimized Pig UDFs in Java/Scala to handle complex ETL requirements. Coordinated ETL pipelines across Hive, Pig, and Spark jobs using Oozie workflows and coordinators, ensuring timely availability of curated datasets for downstream analytics. Designed and executed Spark Streaming applications with Lambda Architecture, processing real-time data feeds from DB2 into HBase tables via Kinesis Streams. Integrated MongoDB with Spark for interactive queries and processed semi-structured JSON datasets for analytical workloads. Built HBase tables for fast lookups and random access to large healthcare/financial datasets, supporting operational dashboards and APIs. Used Spark SQL and HiveQL for interactive ad-hoc queries, supporting business analysts with reporting and predictive modeling data sets. Performed data extraction, profiling, and transformation across diverse sources (RDBMS, APIs, log files) into Hadoop Data Lake, enabling advanced analytics. Wrote Pig and Hive scripts to clean, enrich, and partition daily transaction and clickstream data, supporting customer segmentation and churn analysis. Developed Spark jobs for in-memory computation, enhancing performance for iterative machine learning algorithms and complex transformations. Worked with AWS Cloud services (EC2, S3, EBS, RDS, VPC) to integrate Hadoop/Spark workloads with cloud-native storage and compute. Analyzed SQL scripts and legacy ETL logic and re-engineered them into Spark pipelines, reducing batch runtime by up to 40%. Migrated existing Hive/SQL queries into Spark SQL and Scala-based transformations, improving query performance and scalability. Developed custom Spark UDFs and aggregation functions to extend query capabilities for financial and customer behavior datasets. Responsible for Kinesis Spark HBase integration pipelines to deliver real-time curated datasets to customer-facing APIs. Built data ingestion pipelines with Flume and Kafka for streaming log and transactional data, later transformed into Hive tables for reporting. Provided support for BI teams, designing Hive external tables, creating data marts, and validating ETL outputs for reporting. Mentored junior developers in Hadoop/Spark best practices and contributed to Agile sprint planning and daily stand-ups., Kafka, Flume, AWS (EC2, S3, RDS, VPC), Cloudera, Hortonworks, Yarn, Linux. Environment: Hadoop, HDFS, Hive, Pig, Sqoop, Spark (RDD, SQL, Streaming, Data Frames), Scala, Java, Python, Shell Scripting, MapReduce, HBase, Kafka, Flume, Oozie, AWS (EC2, S3, RDS, VPC), MongoDB, Cloudera Manager, Hortonworks, Yarn, Ubuntu/Linux, Agile. Client: Hilmar Company, CA. Jan 2014 to Apr 2016 Role: SQL developer Responsibilities: Designed, developed, and optimized complex SQL queries, stored procedures, triggers, views, and user-defined functions in SQL Server, Oracle PL/SQL, and MySQL to support enterprise reporting and operational analytics. Built and maintained SSIS ETL workflows to extract, transform, and load financial, operational, and healthcare data from multiple OLTP systems into staging and reporting databases, ensuring accuracy, lineage, and completeness. Modelled and implemented relational, dimensional, star, and snowflake schemas, supporting both OLTP applications and OLAP/BI use cases for executives and operational teams. Performed query optimization and indexing strategies (clustered/non-clustered indexes, execution plan analysis) that reduced execution times for mission-critical financial reports by up to 50%. Developed parameterized stored procedures for incremental loads, reconciliation, audit logging, and data quality checks, automating recurring ETL workflows and reducing manual intervention. Migrated legacy MS Access and flat-file pipelines into SQL Server and Oracle, eliminating manual reporting inefficiencies and standardizing ETL processes. Built error handling and logging frameworks in SSIS and PL/SQL to provide robust monitoring and recovery for daily ETL jobs. Developed automation scripts in T-SQL and PL/SQL for recurring data loads, reconciliation checks, and financial audits, ensuring compliance with internal and regulatory policies. Supported financial and healthcare units by delivering ad-hoc SQL queries and reports for compliance audits, operational KPIs, and strategic planning. Integrated SQL Server with SSRS and Tableau dashboards to deliver interactive reports on transactions, claims, and operational metrics for executives. Provided database administration support (user access, backup/restore, indexing, performance monitoring) across development, QA, and production environments. Participated in full SDLC lifecycle including requirements gathering, database design, ETL development, testing, deployment, and ongoing production support. Collaborated in Agile teams, participating in sprint ceremonies and mentoring junior developers on SQL optimization and ETL best practices. Environment: SQL Server, T-SQL, SSIS, SSRS, Oracle PL/SQL, MySQL, ETL, Stored Procedures, Functions, Views, Indexing, Query Optimization, Data Warehousing, Star/Snowflake Schemas, Performance Tuning, Tableau, Agile. Keywords: continuous integration continuous deployment quality analyst machine learning business intelligence sthree database active directory information technology microsoft mississippi procedural language Arizona California Connecticut Illinois Texas |