| Prem Sagar - Data Engineer |
| [email protected] |
| Location: , , USA |
| Relocation: Open to relocate |
| Visa: H1B |
| Resume file: Prem Sagar_( Data Engineer )_1759782710520.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
PREM SAGAR MAVURAPU
Email: [email protected] Contact number : 8589007909 PROFESSIONAL SUMMARY Proven expertise as a Data Engineer with 8+ years of designing and delivering scalable, cloud-native data solutions across AWS and Azure ecosystems. Demonstrated excellence in building high-performance ETL pipelines capable of processing multi-terabyte datasets daily with maximum scalability, resilience, and efficiency. Specialized in end-to-end data engineering, from ingestion and transformation to storage, orchestration, and analytics, supporting data-driven decision-making at scale. Deep proficiency in AWS (S3, Glue, Lambda, DynamoDB, Aurora, CloudWatch, SQS) and Azure (Blob Storage, Data Factory, Azure Functions, Cosmos DB, Synapse, Monitor, Event Hubs). Extensive hands-on experience with Snowflake, including performance tuning, data sharing, secure data access, and implementing role-based access controls for enterprise-grade data governance. Engineered robust ETL workflows using Azure Data Factory and AWS Glue, employing advanced transformations including Derived Columns, Conditional Splits, Merge Joins, Lookups, and dynamic SQL scripting for data normalization and enrichment. Built real-time, event-driven data ingestion frameworks using AWS Lambda and Azure Functions, seamlessly integrating with messaging systems like Amazon SQS and Azure Event Hubs to support low-latency data processing. Mastery in Designing optimized star and snowflake schemas for analytics platforms like AWS Redshift, Aurora, and Azure Synapse, enabling BI teams to deliver performant insights through dimensional modeling and advanced query tuning. Containerized data workloads using Docker, and orchestrated distributed processing environments on Kubernetes (EKS, AKS) for high availability, workload balancing, and simplified deployment. Expert in Developing and maintaining reusable, modular infrastructure templates using Terraform, automating the provisioning and scaling of multi-cloud environments, and reducing manual configuration drift. Streamlined development workflows using Azure DevOps, GitHub Actions, and GitLab CI, enabling continuous integration and delivery of ETL pipelines, infrastructure updates, and code changes with full testing coverage. Proficient in Creating enterprise-grade Power BI dashboards and paginated reports with row-level security, real-time auto-refresh, KPI visualizations, and managed Power BI Service workspaces for governed self-service analytics. Expert in SQL Server (T-SQL, SSIS, SSAS Tabular Models, Triggers, Views) and NoSQL platforms like DynamoDB and Cosmos DB, optimizing query performance and implementing TTL-based storage strategies for transient datasets. Skilled in Authoring and maintaining data dictionaries, lineage documents, and ETL blueprints, ensuring compliance with organizational data standards and improving cross-team transparency. Deployed end-to-end monitoring and alerting frameworks using AWS CloudWatch and Azure Monitor, enhancing operational visibility, pipeline uptime, and proactive incident resolution. Excellent stakeholder collaboration skills with the ability to translate complex technical solutions into actionable business outcomes. Agile delivery mindset with a strong focus on DevOps, automation, and continuous improvement in data pipeline reliability and performance. Strong collaboration and communication skills with a proven ability to work cross-functionally to deliver end-to-end data engineering solutions aligned with business objectives. TECHNICAL SKILLS: Cloud AWS (S3, Glue, Lambda, DynamoDB, Redshift, Aurora, CloudWatch, SQS), Azure (Data Factory, Blob Storage, Synapse, Cosmos DB, Functions, Event Hubs, Monitor) Languages SQL, Python, T-SQL, PowerShell, Bash Orchestration & IaC Kubernetes (EKS/AKS), Docker, Terraform, Azure DevOps, GitHub Actions, GitLab CI BI Tools Power BI (Desktop & Service), SSAS, SSIS Monitoring AWS CloudWatch, Azure Monitor, Application Insights Databases and Tools Azure SQL database, Azure cosmos, Postgres SQL, Azure Coche, Microsoft SQL Server, Azure data factory pipelines, AWS RDS, SQL Server management studio, Snowflake-UDF, Snow pipe, Snow sight. PROFESISONNAL EXPERINCE: Company: Petco Animal Supplies, Inc, Sandiego, CA. Oct 2022 - Present Role: Data Engineer Description: This corporation is responsible for building, operating, and maintaining the largest pet retail chain in delivering essential services. Customers Data is in cloud and on premises. Responsibilities: Engineered scalable data pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, and Pipelines to efficiently extract, transform, and load (ETL) data from diverse sources such as Azure SQL DB, Azure Blob Storage, and Azure Synapse Analytics. Successfully ingested and integrated complex datasets from enterprise systems like Oracle BICC, Oracle Data Integrator (ODI), and Oracle Fusion Cloud, ensuring seamless cross-platform data movement and compatibility. Elevated data quality and governance by implementing robust data validation, cleansing, and transformation workflows using ADF and Azure Databricks. Architected streaming data pipelines with Apache Spark Structured Streaming, Apache Kafka, and Azure Event Hubs, enabling near real-time data ingestion with consistency, reliability, and fault tolerance through Delta Lake integration. Designed and executed distributed data processing tasks utilizing Apache Spark, PySpark, SQL, T-SQL, and Python, optimizing performance for high-volume workloads. Spearheaded data architecture strategies encompassing relational, NoSQL, and Big Data technologies, aligning design with scalability, compliance, and performance requirements. Developed and deployed incremental load strategies from Azure SQL Database to Azure Synapse using ADF for performance-optimized data warehousing solutions. Provisioned and configured Azure Databricks clusters for both batch processing and continuous streaming workloads, ensuring efficient resource allocation and job execution. Built Spark-based data transformation pipelines in PySpark and Spark SQL to generate critical customer insights and behavioral analytics. Leveraged Azure DevOps CI/CD pipelines to automate deployment of Python-based data applications, improving release cycle times and operational reliability. Authored dynamic, business-driven SQL reporting scripts tailored to executive requirements using Snowflake and Azure SQL. Ensured secure and accurate historical data access by utilizing Snowflake s Time Travel feature for disaster recovery and rollback scenarios. Designed and implemented Snowpipe-based integrations for intraday pricing data ingestion, ensuring low-latency and continuous availability of financial data. Maintained comprehensive version control practices with GitLab, and tracked sprint progress and backlog grooming using JIRA for Agile development alignment. Collaborated cross-functionally with business and analytics teams to deliver actionable dashboards using Power BI, enabling real-time data storytelling and self-service BI capabilities. Environment: Azure Data Factory, Azure Synapse Analytics, Azure SQL Database, Azure Data Lake Storage Gen2, Azure Databricks, Power BI, Snowflake (Snowpipe, UDF, Time Travel), Apache Spark, Kafka, Event Hubs, GitLab, VS Code, Kubernetes, SQL Server, Python, T-SQL, Git, JIRA Company: Safeway, Pleasanton CA Dec 2021 Sep 2022 Role: Data Engineer Description: This corporation operates one of the largest supermarket chains in the U.S., delivering grocery, pharmacy, and retail services. Customer data is managed across cloud and on-premises systems to support operations and personalized services. Responsibilities: Translated complex business requirements into robust and scalable data engineering frameworks, enabling high-performance analytics and informed decision-making across departments. Designed and implemented automated ETL pipelines using Apache Airflow, Apache Spark, and AWS Glue, ensuring seamless integration of heterogeneous data sources into enterprise-grade data lakes and warehouses. Engineered high-volume, low-latency real-time data streaming architectures utilizing Apache Kafka, AWS Kinesis, and Azure Event Hubs, supporting mission-critical analytics and operational intelligence. Devised and deployed cloud-native data storage architectures leveraging AWS S3, Azure Blob Storage, HDFS, and Snowflake to manage structured, semi-structured, and unstructured datasets efficiently. Applied advanced data wrangling and feature engineering techniques using Python (pandas, NumPy, SciPy, scikit-learn), SQL, and Spark, enhancing downstream machine learning model performance. Built and maintained scalable data models for AI/ML initiatives, utilizing libraries like TensorFlow, Keras, and PyTorch, driving innovation in predictive and prescriptive analytics. Automated and enforced data quality checks using Great Expectations, dbt, and custom SQL validations, ensuring consistency and reliability throughout the data lifecycle. Delivered optimized and query-efficient data lakes and data warehouses using Snowflake, Delta Lake, BigQuery, and Amazon Redshift, meeting the demands of modern analytical workloads. Constructed visually rich, interactive dashboards with Tableau, Power BI, and Looker, transforming raw data into actionable insights for stakeholders across business units. Designed and managed a range of relational and NoSQL databases such as PostgreSQL, MySQL, MongoDB, and Cassandra, ensuring high availability and scalability for critical data applications. Developed performant SQL queries, stored procedures, and UDFs in PostgreSQL, SQL Server, and MySQL, streamlining complex data transformation and aggregation tasks. Implemented end-to-end CI/CD pipelines for data systems using GitHub Actions, Jenkins, and Azure DevOps, reducing deployment times and minimizing production incidents. Applied Infrastructure-as-Code (IaC) practices using Terraform and AWS CloudFormation to automate provisioning of data infrastructure across multi-cloud environments. Conducted root cause analysis on data integrity issues and deployed anomaly detection frameworks, improving system reliability and trustworthiness. Drove compliance and data governance initiatives, ensuring adherence to regulatory standards such as GDPR, HIPAA, and CCPA through metadata management and access controls. Collaborated cross-functionally with data scientists, business analysts, and software engineering teams to align data strategy with organizational goals and foster data-driven culture. Optimized data pipeline execution and storage efficiency through advanced tuning of Databricks, Spark, and Hadoop, significantly improving processing speed and reducing cost. Environment: Python, Apache Airflow, Apache Spark, AWS (Glue, S3, Kinesis, Redshift, SageMaker), Azure (Blob Storage, Event Hubs, Azure ML), GCP Vertex AI, Snowflake, Delta Lake, BigQuery, HDFS, Hive, MongoDB, Cassandra, PostgreSQL, SQL Server, MySQL, Tableau, Power BI, Looker, Databricks, Terraform, AWS CloudFormation, GitHub Actions, Jenkins, Azure DevOps, Great Expectations, dbt, MATLAB, SSIS/SSRS/SSAS, SAS, Jupyter Notebook Company: LINOVA INFO PRIVATE LIMITED, Chennai, India. Jun 2019 - Jul 2021 Role: Data Engineer Client: MoneyGram Inc Project: MoneyGram Retail Analytics and Insights Description: MoneyGram International Inc. is a global leader in cross-border peer-to-peer payments and money transfers. Operating in over 200 countries and territories, MoneyGram provides financial inclusion services through digital platforms and retail agent networks. This project focuses on centralizing transactional and operational data from various regions to enable actionable insights, improve compliance monitoring, and support strategic business decisions through advanced analytics and reporting. Responsibilities: Designed and maintained a centralized analytics data warehouse on Amazon RDS for SQL Server, enabling consolidated, high-performance querying for business intelligence use cases. Collaborated closely with cross-functional teams and business stakeholders to elicit requirements, perform data analysis, and design custom data solutions and analytical reports aligned with business objectives. Engineered data ingestion pipelines by extracting dimension data from SAP BW using AWS Glue Jobs and Python scripts, loading cleansed data into Amazon RDS. Implemented a robust incremental load strategy for dimension tables using AWS Lambda and Glue Workflows, ensuring timely, efficient, and automated data refresh cycles. Developed end-to-end ETL pipelines in AWS Glue to parse and transform nested JSON files residing in Amazon S3, converting semi-structured data into relational formats for analytics. Orchestrated complex ETL workflows and job triggers using Glue Workflows, automating pipeline dependencies and error handling for improved reliability and maintainability. Provisioned and optimized Amazon EMR clusters leveraging Apache Spark for distributed data transformation tasks, supporting massive-scale POS and transactional data processing. Created and fine-tuned PySpark and Scala-based EMR notebooks to ingest, preprocess, and load high-volume retail sales data from S3 to RDS, improving downstream reporting and forecasting capabilities. Managed environment promotion lifecycle by migrating data pipelines and configurations from development to production using AWS CodePipeline and CloudFormation templates, ensuring infrastructure consistency and traceability. Authored efficient PySpark, Scala, and SQL scripts in EMR notebooks for data modeling, aggregations, and generating output tables consumed by reporting platforms. Implemented modular ETL orchestration using AWS Step Functions, enabling dynamic execution of parent-child EMR notebook jobs for scalable data workflows. Designed complex stored procedures, views, and SQL queries in Amazon RDS, enabling advanced analytical dashboards and business-critical KPIs. Environment: AWS Glue, AWS Lambda, Amazon RDS (SQL Server), Amazon S3, AWS EMR (PySpark/Scala), AWS CodePipeline, AWS CloudFormation, AWS Step Functions, SAP BW, Python, Scala, SQL Company: TECHMATRICS SOLUTIONS PRIVATE LIMITED, Hyderabad, India. June 2017 May 2019. Role: Junior Developer Responsibilities: Actively participated in Agile/Scrum daily stand-up meetings, sprint planning, and retrospectives to align with project timelines, track progress, and coordinate development efforts and system enhancements. Engaged in collaborative sessions with cross-functional teams and clients to capture both technical and business requirements, ensuring clear alignment and accurate implementation of project objectives. Executed ETL transformations in SAS, applying advanced logic on columns, rows, and queries to cleanse, split, merge, pivot/unpivot, and transpose large datasets to prepare them for downstream analysis. Handled large-scale data processing by performing data quality assurance, data profiling, metadata management, and data organization to support robust reporting and analytics frameworks. Extracted and loaded structured data from flat files and Excel sheets into SQL Server databases using bulk insert techniques and SSIS (SQL Server Integration Services), enabling high-throughput data ingestion. Built and optimized SQL scripts, indexes, and complex queries for data manipulation and reporting, leveraging relational models and best practices for performance tuning. Developed advanced SQL queries incorporating multi-table joins, aggregate functions, subqueries, and user-defined functions (UDFs) to support data extraction and analytical workflows. Orchestrated ETL pipelines using Apache Airflow to automate and streamline ingestion, transformation, and loading of large-scale XML and JSON datasets processing over 40,000 records efficiently into Databricks. Implemented data modeling workflows on Databricks using PySpark, DataFrames, and SQL APIs to support business intelligence and analytics needs with scalable and performant pipelines. Conducted comprehensive Gap Analysis and implemented quality control (QC) checks against the target databases to validate data accuracy, providing final sign-off upon successful test case execution. Utilized JIRA for tracking development tasks, bug reporting, and sprint progress; ensured timely updates and transparent collaboration with team members. Maintained clean, modular, and reusable code within Visual Studio Code, adhering to version control best practices and development standards. Environment: SQL Server, Python, SAS, XML, JSON, PySpark, Apache Airflow, Visual Studio Code, JIRA, Microsoft Excel Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree database business works California |