| Tejasri elimineti - Senior Data Engineer |
| [email protected] |
| Location: Moorestown, New Jersey, USA |
| Relocation: Yes |
| Visa: GC |
| Resume file: Tejasri - Senior Data Engineer_1773065609438.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Tejasri Elimineti
[email protected]; (952)-356-8965 LinkedIn: linkedin.com/in/elimineti-tejasri SUMMARY: Senior Data Engineer with 12 years of experience delivering large-scale data engineering and cloud solutions across healthcare, retail, finance, manufacturing, and e-commerce. Experienced in leading end-to-end projects from requirements gathering to production deployment, ensuring scalability, reliability, and business impact. Hands-on expertise in cloud platforms AWS (Glue, S3, Redshift, EMR, Lambda, Kinesis, Step Functions), and Azure (Data Factory, Databricks, Synapse, Data Lake, Event Hub). Skilled at migrating on-premises systems to cloud-native platforms while optimizing cost, security, and performance. Strong background in big data frameworks and distributed processing with Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce, Sqoop), and Kafka. Experienced in building both real-time streaming and batch ETL/ELT pipelines to process high-volume structured and unstructured data. Deep experience with ETL and data integration tools including Informatica PowerCenter, AWS Glue, Azure Data Factory, Talend, and SSIS. Proficient in designing reusable frameworks, applying partitioning strategies, and performance-tuning pipelines for faster execution and reliability. Proficient in SQL, PL/SQL, Python, and Shell scripting, with the ability to write complex queries, optimize stored procedures, and automate workflows. Experienced in data modeling (star and snowflake schemas), building fact/dimension tables, and enabling BI teams with clean, analytics-ready datasets. Skilled in designing and implementing data lakes, data warehouses, and Delta Lake architectures to support enterprise analytics, BI dashboards, and regulatory reporting. Experience includes building scalable solutions for claims, retail transactions, e-commerce clickstream, and IoT data. Exposure to Generative AI and LLM integration, including building data foundation layers and pipelines to support AI-driven analytics using AWS Bedrock, OpenAI APIs, and SageMaker Feature Store. Strong understanding of data governance, lineage, and compliance frameworks, including HIPAA, SOX, and GDPR. Experience in implementing encryption, fine-grained IAM roles, and metadata standards to ensure secure, governed access to sensitive data. Effective in Agile/Scrum environments, collaborating closely with cross-functional teams of business analysts, BI developers, architects, and data scientists. Proven ability to translate business needs into scalable technical solutions. Adept at workflow automation and orchestration, using tools like Airflow, Control-M, AWS Step Functions, and Databricks jobs to reduce manual intervention and improve system reliability. Modernized legacy data systems into cloud-native architectures, improving performance, data accessibility, and analytics capabilities. TECHNICAL SKILLS: Cloud Platforms: AWS (S3, Glue, Redshift, EMR, Lambda, Kinesis, Step Functions, Athena, CloudWatch), Azure (Data Factory, Databricks, Synapse Analytics, Data Lake Storage, Event Hub, Stream Analytics, Purview). Big Data & Distributed Processing: Apache Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce,Sqoop), Kafka. ETL / Data Integration: Informatica PowerCenter (8.x/9.x), AWS Glue, Azure Data Factory, Talend, SSIS. Databases & Data Warehousing: Oracle (10g/11g), SQL Server, Teradata, Snowflake, Amazon Redshift, Azure Synapse. Machine Learning & AI: AWS SageMaker, SageMaker Feature Store, Feature Engineering Pipelines, Generative AI Integration, AWS Bedrock, OpenAI APIs, LLM-powered analytics workflows. Programming & Scripting: Python, SQL, PL/SQL, Shell Scripting (UNIX/Linux), Scala (basic). Data Modeling & BI Tools: Star/Snowflake schema design, Power BI, Tableau, SSRS, Excel (advanced). Workflow Orchestration: Apache Airflow, Control-M, Step Functions, Databricks Jobs. Version Control & DevOps: Git, GitHub, Jenkins, CI/CD pipelines, Agile/Scrum methodology. Other: Data Quality & Governance (metadata management, reconciliation, lineage), Performance Tuning (ETL, SQL, Spark), Security & Compliance (HIPAA, SOX, GDPR). WORK EXPERIENCE: Vanguard Group, Malvern, PA | Jan 2025 Till Date Sr. Data Engineer Project Description: The project s objective is to build an enterprise-grade AI-driven data platform that centralizes structured and semi-structured data across multiple domains - investments, customer analytics, and risk management - to enable self-service analytics, machine learning workloads, and generative AI insights. The platform integrates diverse data sources using automated ingestion frameworks and enforces governance, lineage, and security through AWS Lake Formation and IAM policies. Key Responsibilities: Designed and implemented scalable ETL/ELT pipelines using AWS Glue, PySpark, and AWS Lambda, transforming raw data into curated datasets following the Medallion (Bronze-Silver-Gold) architecture to support advanced analytics and machine learning workloads. Built real-time streaming pipelines using Amazon Kinesis Data Streams and SQS, enabling low-latency ingestion of transactional and behavioral data for downstream analytics and predictive models. Developed and optimized Amazon Redshift and Snowflake data warehouses by tuning sort keys, clustering, and distribution styles, improving query performance and accelerating access to analytical and training datasets. Developed feature engineering pipelines integrated with SageMaker Feature Store, enabling reusable and standardized machine learning features across multiple predictive models. Contributed to GenAI proof-of-concept initiatives by designing a data foundation layer integrating AWS Bedrock and OpenAI APIs to enable LLM-based insights on investment and risk datasets. Leveraged AWS Lake Formation to implement centralized data governance, metadata cataloging, and fine-grained access control for enterprise analytics datasets. Implemented data quality validation and monitoring frameworks using Great Expectations and custom Python validation scripts to ensure data accuracy, completeness, and schema consistency across pipelines. Implemented pipeline monitoring and observability using CloudWatch, Step Functions, and Lambda triggers, tracking pipeline health, schema drift, and data freshness. Automated infrastructure provisioning and CI/CD deployments for data pipelines using Terraform and GitHub Actions, ensuring consistent, scalable, and reproducible environments across development, staging, and production. Collaborated with Data Scientists, ML Engineers, and Analytics teams to design AI-ready datasets, build curated data marts, and enable reproducible datasets for experimentation and model training. Environment: AWS (S3, Glue, Lambda, Redshift, Lake Formation, Kinesis, SQS, Step Functions, Athena, CloudWatch, SageMaker), Python, PySpark, SQL, Terraform, GitHub Actions, Snowflake, Tableau, QuickSight, Great Expectations, Jira, Agile-Scrum. Tapestry Inc., New York city, NY | Sep 2023 Oct 2024 Senior Data Engineer Project: Cloud Data Lake & Analytics Modernization Tapestry, a global luxury fashion house, initiated a cloud-first data modernization program to unify data from retail stores, e-commerce platforms, supply chain systems, and customer loyalty applications. The goal was to build a centralized AWS data lake and analytics ecosystem to support enterprise-wide reporting, financial planning, and customer insights. As a Cloud Data Engineer, I have been responsible for designing scalable data pipelines, optimizing data storage, and enabling secure, governed access to enterprise data. Key Responsibilities: Designed and built ETL/ELT pipelines using AWS Glue, Lambda, and PySpark to ingest and process data from ERP, POS, Salesforce, and third-party retail systems into Amazon S3 and Amazon Redshift. Developed PySpark transformations on AWS EMR and Glue for cleansing, enrichment, and aggregation of large retail transaction datasets. Implemented Delta Lake architecture and partitioning strategies on Amazon S3 to optimize storage efficiency, improve query performance, and support incremental data processing. Integrated real-time streaming pipelines using Apache Kafka and AWS Glue Streaming to process e-commerce clickstream events and inventory data feeds. Designed and optimized Amazon Redshift data warehouse models, including fact and dimension tables, to support enterprise BI dashboards and ad-hoc analytical queries. Automated workflow orchestration and monitoring using AWS Step Functions, Apache Airflow, and Amazon CloudWatch, while managing pipeline code and Airflow DAGs through Git for version control and collaborative development. Established data quality and governance frameworks, including reconciliation reports, metadata standards, and security policies to ensure data accuracy, consistency, and reliability. Worked closely with BI and finance teams to enable self-service analytics using Tableau and Power BI, connected to Amazon Redshift and Amazon Athena. Collaborated with cross-functional teams on cloud cost optimization, leveraging lifecycle policies, compression strategies, and Redshift workload management to improve performance and reduce storage costs. Ensured SOX and GDPR compliance by implementing encryption using AWS KMS, fine-grained IAM access controls, and centralized audit logging. Environment: AWS Glue, Amazon S3, Amazon Redshift, EMR (PySpark), Apache Kafka, Lambda, Step Functions, Apache Airflow, Athena, Tableau, Power BI, SQL, Python, Git, Agile. Health Care Service Corporation, Richardson, TX | Nov 2021 May 2023 Senior Data Engineer Project: Enterprise Data Platform & Cloud Modernization. HCSC embarked on a large-scale cloud migration initiative to modernize its data ecosystem and enable advanced analytics across healthcare claims, provider, and member data. The goal was to move from legacy on-premises systems to a cloud-native data lake and warehouse on AWS, ensuring scalability, cost efficiency, and support for enterprise reporting. As an AWS Data Engineer, I was responsible for designing and implementing robust data pipelines, ensuring data quality, and supporting compliance requirements across the enterprise. Key Responsibilities: Migrated legacy ETL workflows from Informatica PowerCenter to Azure, redesigning pipelines using Azure Data Factory, Azure Functions, and PySpark on Azure Databricks to ingest and transform data from Oracle, SQL Server, APIs, and flat files into Azure Data Lake Storage Gen2 and Azure Synapse Analytics. Built a cloud-native data lake architecture on Azure Data Lake Storage Gen2, implementing partitioning strategies, file compaction, and lifecycle policies to optimize storage efficiency and reduce operational costs. Converted Informatica mappings, transformations, and workflows into Azure Databricks PySpark jobs, enabling scalable distributed data processing for large healthcare datasets. Integrated real-time streaming pipelines using Azure Event Hub and Azure Stream Analytics, processing real-time healthcare claims and provider data feeds for downstream analytics. Developed PySpark-based transformation frameworks on Azure Databricks, replacing legacy Informatica batch jobs for large-scale healthcare data cleansing, enrichment, and standardization. Designed and implemented Azure Synapse Analytics data warehouse schemas (Star and Snowflake models) to support enterprise reporting and analytical dashboards. Automated ETL orchestration using Azure Data Factory pipelines, Azure Functions triggers, and Apache Airflow, managing dependencies between Databricks pipelines and downstream Synapse loads. Managed pipeline source code, Databricks notebooks, and Airflow DAGs using Git, enabling version control, CI/CD integration, and collaborative development across the data engineering team. Implemented data quality and reconciliation frameworks during migration to validate outputs between Informatica pipelines and Azure pipelines, ensuring data accuracy and consistency. Optimized Azure Databricks workloads, Synapse queries, and ADF pipelines, improving pipeline throughput and reducing processing time compared to legacy Informatica workloads. Ensured HIPAA-compliant data security by implementing encryption using Azure Key Vault, role-based access control (RBAC) through Azure Active Directory, and centralized logging using Azure Monitor and Log Analytics. Environment: Azure Data Factory, Azure Data Lake Storage Gen2, Azure Databricks (PySpark), Azure Synapse Analytics, Azure Event Hub, Azure Stream Analytics, Azure Functions, Apache Airflow, Azure Monitor, Azure Key Vault, Python, SQL, Informatica PowerCenter, Oracle, SQL Server, Git, Agile. Kroger, Blue Ash, OH | Oct 2018 Jun 2021 Data Engineer Project: Enterprise Data Lake & Cloud Migration The initiative focused on consolidating structured and unstructured data from point-of-sale (POS), supply chain, e-commerce, and customer loyalty systems into a centralized Azure Data Lake. As an Azure Data Engineer, I was responsible for designing and implementing scalable data pipelines, integrating enterprise data sources, and ensuring high performance and data quality across the platform. Key Responsibilities: Designed and implemented ETL/ELT pipelines using Azure Data Factory (ADF) for ingesting data from on-prem Oracle/SQL Server systems, APIs, and flat files into Azure Data Lake Storage (ADLS) and Azure SQL Data Warehouse (Synapse Analytics). Developed and optimized PySpark/Spark transformations in Azure Databricks for cleansing, aggregating, and standardizing large-scale datasets. Built real-time ingestion pipelines using Azure Event Hub and Stream Analytics for processing e-commerce clickstream and POS data. Implemented data partitioning and Delta Lake architecture to support efficient querying and incremental processing. Designed data models and star schemas for downstream BI solutions in Power BI and SSRS. Applied data quality checks, exception handling, and reconciliation frameworks to ensure trusted analytics outputs. Automated job scheduling and orchestration through ADF pipelines and Databricks notebooks, reducing manual intervention. Monitored and optimized pipeline performance, improving processing times for critical retail datasets. Collaborated with cross-functional teams including business analysts, architects, and data scientists to deliver end-to-end analytics solutions. Supported data governance initiatives, including metadata management and implementing Azure Purview for data lineage and compliance. Environment: Azure Data Factory, Azure Databricks (PySpark, Spark SQL), Azure Data Lake Storage, Azure Synapse Analytics, Event Hub, Stream Analytics, Power BI, SQL Server, Oracle, Python, Git, and Agile. SoCal GAS, Los Angeles, CA| Mar 2016 Sep 2018 Data Engineer Project Description: Worked on enterprise data modernization initiatives to support SoCal Gas s operations, billing, and customer service functions. The focus was on building scalable ETL pipelines to process large volumes of customer usage, billing, and meter data, integrating disparate systems into a centralized data warehouse. The project involved migrating legacy data pipelines to modern big data frameworks, enabling advanced analytics for regulatory compliance, energy consumption forecasting, and operational efficiency. Key Responsibilities: Designed and developed ETL workflows using Informatica, SQL, and Python to extract, transform, and load customer and operational data from multiple source systems into the enterprise data warehouse. Worked with Oracle and SQL Server databases for relational storage and optimized queries to improve data processing performance. Implemented data quality checks, validations, and reconciliation processes to ensure accurate reporting for billing and compliance. Built and optimized batch pipelines for processing meter data, billing transactions, and asset management information. Collaborated with business analysts, operations teams, and compliance officers to translate energy usage and billing requirements into technical data models. Migrated portions of legacy ETL jobs into Hadoop/Spark-based pipelines for handling larger data volumes more efficiently. Developed ad-hoc and scheduled reports using Power BI and Tableau for customer analytics, usage forecasting, and regulatory reporting. Ensured compliance with energy regulatory standards by maintaining audit-ready data pipelines and producing consistent reporting outputs. Performed performance tuning and query optimization to reduce ETL load times and improve reporting SLAs. Supported data governance and metadata management efforts, ensuring lineage and traceability of critical energy data. Environment: Oracle 11g/12c, SQL Server, Hadoop, Spark, Informatica PowerCenter, Python, UNIX Shell Scripting, Power BI, Tableau, Control-M, Git, Teradata, Agile/Scrum. Experis, India | Aug 2013 Dec 2015 Data Consultant Key Responsibilities: Designed and optimized complex SQL/PL-SQL scripts for data validation, analysis, and integration from online quoting systems. Developed and maintained stored procedures, triggers, indexes, and views in Oracle and SQL Server to implement business rules and enable reporting. Built Hive bucketed and partitioned tables to optimize query performance and support large-scale distributed data processing. Wrote MapReduce programs and HiveQL scripts to extract, transform, and load (ETL) data into the Hadoop Distributed File System (HDFS). Used Sqoop for high-performance bulk data transfer between Oracle and Hive for downstream analytics. Assisted in configuring and maintaining Hadoop ecosystem components such as Hive, HBase, and Sqoop. Created and enhanced ETL workflows in Informatica PowerCenter, implementing complex transformations to meet business requirements. Leveraged mapplets and reusable transformations in Informatica to improve standardization and reusability across ETL jobs. Configured parameterized mappings and sessions for dynamic job execution and runtime flexibility. Monitored and troubleshot ETL workflows using Workflow Manager and Workflow Monitor. Defined and enforced metadata, data warehouse standards, and naming conventions to ensure consistency and maintainability. Tuned ETL job performance by resolving target bottlenecks, optimizing queries, and applying pipeline partitioning in Informatica. Environment: Informatica PowerCenter 8.6/9.x, Apache Hadoop, Hive, HBase, MapReduce, Sqoop, Oracle 10g/11g, SQL Server, PL/SQL, UNIX, Shell Scripting, Python (basic scripting), Toad, Windows NT. Oasis Infotech, India | Apr 2011 Jun 2013 ETL Developer Project: Laboratory Information Management System Key Responsibilities: Designed, developed, and maintained ETL workflows using Informatica PowerCenter 8.6/9.0, extracting and transforming data from Oracle databases and flat files into the enterprise data warehouse. Wrote and optimized complex SQL/PL-SQL queries, stored procedures, and triggers to enable reporting and analytics. Automated data ingestion and validation processes with UNIX Shell Scripts and scheduled jobs via cron. Performed data validation, reconciliation, and quality checks, ensuring accuracy between source and target systems. Supported business decision-making by preparing data summaries, performance reports, and ad-hoc analysis using SQL and Excel. Worked on database performance tuning and indexing strategies to optimize ETL and reporting workloads. Partnered with business analysts to translate requirements into scalable ETL and reporting solutions. Investigated and resolved data inconsistencies by tracing upstream feeds and implementing corrections in ETL workflows. Assisted with data migration, backup, and recovery activities during database upgrades and platform transitions. Environment: Oracle 10g/11g, PL/SQL, Informatica PowerCenter 8.6/9.0, UNIX/Linux, Shell Scripting, SQL*Plus, Toad, MS Excel, Windows Server. Certification: AWS Certified Solutions Architect - Associate EDUCATION: IIMT College of Engineering, Greater Noida, India | Jun 2007 May 2011 Bachelor of Technology in Computer Science Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree active directory microsoft mississippi procedural language California New York Ohio Pennsylvania Texas |