| Sai Ahalya - Data Engineer |
| [email protected] |
| Location: Cleveland, Ohio, USA |
| Relocation: Yes |
| Visa: H4EAD |
| Resume file: SaiAhalya_AzureDataEngineer_1773669180185.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Summary of Skills:
Overall, 5+ years of experience encompassing a wide range of skill set in Big Data, Azure, SQL and Python technologies. Strong knowledge of Azure services including Azure Data Factory, Azure Databricks, Azure Storage Accounts, and Azure Synapse Analytics. Built and maintained data pipelines in Azure Data Factory for data integration and orchestration. Automated batch jobs and configured event-based email notifications for monitoring and alerting. Developed Spark jobs using PySpark and Spark SQL for large-scale data processing. Built PySpark applications in Azure Databricks environments. Created transformations using Spark SQL and DataFrames to process structured and semi-structured data. Strong proficiency in Python, Object-Oriented Design (OOD), and SQL-based databases. Developed stored procedures, triggers, and views to maintain data consistency and improve reliability. Worked across all phases of the Software Development Life Cycle (SDLC) in both Agile and Waterfall environments. Hands-on experience with Azure Synapse Analytics and data warehouse solutions. Created Snowpipes for automated data ingestion into Snowflake. Processed multiple file formats including XML, JSON, CSV, Avro, Parquet, ORC, and Delta. Hands-on experience using Sqoop for large-scale data ingestion. Designed and implemented Big Data solutions using Hadoop technologies such as Hive, HDFS, HBase, Sqoop, Spark, and Oozie. Quickly adapts to new tools and technologies in evolving data environments. Prepared high-level and low-level technical documentation to support project delivery and knowledge transfer. Education: Master s- University of Houston - Clear Lake, Houston, Texas, Computer Engineering (CGPA3.4/4) 2021-2022 Bachelors, ECE, Lakireddy Bali Reddy college of engineering, 2015-2019 Certification: Azure Fundamentals Microsoft AZ - 900 Technical Skills: Cloud Technologies Azure (Data Factory, Data Bricks, Azure SQL, Azure Storage accounts, Key vault, GitHub, Logic Apps and Event hubs), SSIS Big Data Technologies Py-Spark framework. Languages Python, SQL. Database Technologies SQL Server, Azure SQL, Snowflake Version Controls GIT, Team Foundation Server (TFS) Other Tools Power BI (basic), Jira Professional Experience: Client: Yankee Candle (South Deerfield, MA) Role: Data Engineer Duration: June 2025 - Present Responsibilities: Performed source data analysis and reviewed design patterns to develop migration pipelines from source systems to Azure Data Lake, improving data availability for downstream analytics by 35% Participated in designing Azure Data Lake structures based on source system design patterns and business requirements. Worked on migrating infrastructure and applications from on premises SQL Server to Azure Cloud. Built ETL workflows using SSIS and T SQL to integrate legacy systems with cloud-based platforms. Prepared and contributed to high-level design (HLD) documents and presentations for business stakeholders. Configured Virtual Machines (VMs) on source environments to establish secure data connections with on premises servers. Built services using Azure Functions and deployed them on Azure Service Fabric. Worked on migrating legacy applications to Azure Cloud. Utilized knowledge of SQL Server to migrate databases to Azure SQL Database using SQL Azure Migration Wizard, and deployed applications to Azure Cloud. Collaborated with business users on migration activities and documented the end-to-end process and data flows across different applications. Created pipelines and performed orchestration in Azure Data Factory based on BRDs (Business Requirement Documents) and process flows. Involved in Azure Data Factory orchestration, deploying development pipelines and configuring triggers for scheduled runs, increasing on time pipeline execution rate by 40%. Created personal access tokens in Azure Databricks to authenticate connections with REST APIs and reporting tools. Configured web applications to be hosted on Azure Cloud using Azure one click publish. Implemented Azure DevOps CI/CD pipelines for automated code deployment to Azure PaaS infrastructure. Set up hybrid connections between Azure web apps (PaaS) and on premises SQL databases. Built Data Sync jobs on the source side to expose data via REST APIs and synchronize data from SQL Server 2012 databases to Azure SQL. Collaborated with cross-functional teams to understand data requirements and deliver solutions to support analytics, reporting, and operational needs. Designed reactive event streams and data models for Azure-based cloud services. Environment: Azure Data Factory, Azure Databricks, PySpark, Spark SQL, Azure Data Lake Storage (Gen2), Azure SQL Database, SQL Server, SSIS, T-SQL, Azure Functions, Azure Service Fabric, Azure DevOps (CI/CD), REST APIs, Azure Virtual Machines, Hybrid Connectivity (On-prem to Azure), Event-Driven Architecture Client: Humana (Irving, TX) Role: Data Engineer Duration: June 2022 - March 2025 Responsibilities: Enhanced existing APIs using PySpark to improve performance and functionality, reducing average processing time by 30%. Participated in code reviews and acted as a mentor for team members. Responsible for data migration from various source systems to Azure Data Lake. Developed multiple data migration pipelines in Azure Data Factory using different types of activities. Migrated data from FTP servers and mainframe systems into Azure Data Lake. Managed data across multiple zones (such as raw, curated, and consumption) to support transformation and analytics use cases. Designed various ingestion and processing patterns based on project use cases and architectural requirements, reducing new pipeline development effort by 35%. Wrote PySpark code to create DataFrames and perform transformations on source data. Worked on ETL error handling logic in production environments in Azure to improve reliability and support. Converted on premises stored procedures into equivalent PySpark DataFrame transformations. Built complex data ingestion and processing frameworks using Azure Databricks, Python, and PySpark. Environment: Azure Databricks, PySpark, Spark SQL, Python, Azure Data Factory, Azure Data Lake Storage, FTP, Mainframe Systems, ETL Frameworks, Production Support, Data Migration Pipelines, Error Handling & Monitoring Client: S&P Global (India) Role: Data Engineer Duration: Jan 2020 - July 2021 Responsibilities: Involved in multiple phases of the Software Development Life Cycle (SDLC) including analysis, design, coding, and implementation of scalable solutions based on business requirements, helping to reduce post release defects by 35%. Optimized data architecture and SQL queries to scale across terabytes of data for visualization and application consumption. Worked collaboratively with technology leads, architects, and business partners to define objectives and analytics processing requirements for large-scale datasets. Identified Enterprise Data Lake data sources needed to support analytics and modeling. Migrated big data infrastructure from on premises environments (SQL Server, SFTP, SAP) to cloud-based architectures including Azure Data Lake, Azure Synapse, S3, Azure Data Factory, and Azure Databricks. Wrote Python shell scripts, PySpark code, and SQL for daily ETL jobs. Developed PySpark scripts using DataFrames to automate validation, logging, and alerting for Spark applications orchestrated by Azure Data Factory, reducing manual validation effort by 40%. Partnered with technology teams to ensure optimal use of AWS Cloud Platform Services where applicable. Developed Azure Logic Apps to receive notifications and alerts about pipeline runs. Wrote Bash scripts to convert encrypted files into decrypted files as part of data processing workflows. Worked with Azure DevOps for version control and documentation of data flows. Optimized and monitored performance of Spark applications running in pre production environments and implemented fixes before promoting workloads to production. Participated in ad hoc standups and architecture meetings to set daily priorities and track work progress in a highly Agile environment. Developed technical documentation and training materials to support knowledge transfer and onboarding. Environment: Azure Databricks, PySpark, Spark SQL, Python, SQL, Azure Data Factory, Azure Data Lake, Azure Synapse, Hadoop (Hive, HDFS, HBase, Sqoop), Azure Logic Apps, Bash Scripting, Azure DevOps, Agile SDLC Keywords: continuous integration continuous deployment business intelligence sthree active directory Arizona Massachusetts Texas |