Srinivas Yerragolla - Data Engineer |
[email protected] |
Location: Austin, Texas, USA |
Relocation: Remote and Hybrid for Ausitn Texas |
Visa: H1b |
Srinivas Yerragolla
Mobile: 737-304-0779 Professional Summary: Information Technology Professional with over 10+ years of experience in Data Systems Development and Business Systems, specializing in Data Engineering and Data Analysis. Have extensive expertise in developing and designing data integration and migration solutions using the Azure Data Engineering Stack and managing ETL development. Expertly executed data transformations in ADLS with medallion architecture, revolutionizing workflows, evaluating data quality and driven effective decision-making in Azure cloud environment. Have strong technical skills in OLTP and OLAP design, working with various big data formats and tools like Apache Spark, Hive, and Databricks. I have also led the creation of robust data pipelines, implemented serverless microservices with Azure Functions, and developed PySpark and Spark SQL scripts, KSQL for data processing and Stream Processing and Confluence cloud for deployed and managed Kafka clusters. Additionally, I have a solid understanding of Big Data infrastructure, including HDFS and the MapReduce framework. Technical Skills: ETL Tools: Azure Data Factory, Data Bricks, PySpark, Azure SQL RDBMS: Microsoft SQL Server 2016, 2019 and MYSQL Cloud Platform: MS Azure, Azure Functions, Azure Blob Storage, ADLS Gen 1 and Gen 2, Confluent Cloud Data Modeling Tools: ER Studio Data Architect Architecture: Medallion architecture Certifications: Azure Data Engineer Associate-DP 203 - BD5959BEBFEF2CCF Power BI Data Analyst Associate PL-300-AEDE351190E300B3 Professional Experience Data Engineer July 2024 to Till Now Apple, Austin, USA Responsibilities At Apple, working on creating Big Data datasets involves not only designing and developing robust data pipelines using tools like PySpark, SQL, and Python but also integrating them seamlessly into the company s data ecosystem. The role encompasses building and maintaining scalable and efficient Airflow pipelines to orchestrate complex workflows, manage dependencies, and automate data processing tasks, ensuring data is processed, validated, and updated accurately according to evolving business requirements. By leveraging PySpark for distributed data processing, the focus is on optimizing performance for large-scale data transformations and aggregations. SQL is used extensively for data modeling, querying, and managing relational data structures, while Python scripts facilitate data cleansing, feature engineering, and custom ETL processes. The pipelines are designed to be resilient, monitorable, and scalable, accommodating increasing data volumes and complex data relationships. Furthermore, there is an emphasis on continuous integration and delivery (CI/CD) to automate deployment of data pipelines, ensuring that changes are tested and deployed swiftly without disrupting ongoing data operations. Collaborating closely with data scientists, analysts, and business stakeholders is crucial to ensure the data architecture and pipelines are flexible and aligned with the dynamic needs of various business units. The role also involves monitoring and optimizing data pipelines for performance, cost efficiency, and data quality, using tools and techniques such as partitioning, caching, and adaptive query execution. Additionally, staying up to date with the latest advancements in Big Data technologies and frameworks is essential, enabling proactive improvements to the data processing workflows and keeping Microsoft at the forefront of innovation in data management and analytics. Senior Data Engineer May 2018 to June 2024 State Street Corporation Responsibilities Designed, implemented, and maintained data pipelines using Azure Data Factory and Azure Databricks for seamless data movement and transformation. Expertise in constructing Azure Data Factory visual tools for orchestrating data flows, maintaining data lineage, and metadata across enterprise data warehouses and data lakes. Developed scalable and reusable activities within Azure Data Factory, leveraging Azure services for transformation, data movement, and custom activity execution. Integrated data from multiple RESTful APIs into a centralized Azure SQL database using Azure Data Factory and Azure Functions. Configured and managed Azure Data Factory environments, including integration runtime, linked services, triggers, and pipeline monitoring. Implemented a Kafka-based messaging system to handle real-time data ingestion, achieving sub-second latency for event processing across multiple microservices. Configured and managed Kafka clusters, ensuring high availability and fault tolerance by setting up multi-node clusters and implementing appropriate replication and partitioning strategies. Developed custom Kafka producers and consumers using Java to handle high-throughput data streams, ensuring efficient and reliable message delivery. Leveraged KSQL to create real-time stream processing applications, transforming raw Kafka topics into meaningful, quarriable streams and tables. Designed and executed complex KSQL queries to filter, aggregate, and join data streams, enabling real-time analytics and insights. Optimized KSQL queries for performance and scalability, ensuring efficient processing of large volumes of streaming data with minimal latency. Built and deployed stream processing pipelines using Apache Kafka Streams, enabling real-time processing and transformation of data streams. Integrated stream processing applications with various data sources and sinks, such as databases, REST APIs, and message queues, to facilitate seamless data flow and real-time analytics. Monitored and maintained stream processing applications to ensure high performance, reliability, and fault tolerance, leveraging metrics and logging frameworks for visibility and troubleshooting. Deployed and managed Kafka clusters on Confluent Cloud, leveraging its fully managed service to reduce operational overhead and ensure high availability. Utilized Confluent Cloud s advanced features, such as Schema Registry and ksqlDB, to streamline data integration and processing workflows. Implemented secure data pipelines on Confluent Cloud by configuring authentication and authorization mechanisms, ensuring compliance with data privacy regulations. Automated data pipeline workflows for data movement and transformation into Azure Data Lake Storage using Azure Data Factory. Developed data cleaning and validation scripts on Azure Databricks using PySpark and Spark SQL. Ingesting both batch and streaming data into medallion architecture. Played key role in building the Databricks Lakehouse platform and Medallion architecture and established data governance using Unity Catalog. Utilized PySpark Data Frame and SQL APIs to perform analytical queries, resulting in actionable business insights and improved data processing efficiency. Implemented partitioning strategies and employed Spark SQL within PySpark for optimized shuffle operations and complex analytics at scale. Created Databricks notebooks with delta format tables, implemented lake house architecture, and managed scalable data storage solutions in Azure. Engineered ETL processes, implemented CI/CD pipelines, and developed custom UDFs and reusable Azure Functions for data processing tasks. Designed scalable data solutions using Azure Data Explorer, optimized storage and retrieval performance, and utilized Snowflake for data modeling and orchestration. Conducted performance tuning of SQL queries, developed stored procedures, and implemented data archival solutions for cost-effective storage. Successfully implemented Proof of Concept (POC) projects to validate requirements and benchmark ETL loads in development databases. Environment: Apache Spark, PySpark, Spark SQL, Python, SQL, Kafka, KSQL, Confluent Cloud, Azure Data Factory, Azure SQL Database, Azure Storage Accounts, APIs, Azure Functions, Azure Blob Storage, ADLS Gen 1 and Gen 2, Azure Databricks, Azure Dedicated SQL pools. Vodafone, Pune, India Data Engineer Jan 2016 to May 2018 Responsibilities Developed data models and relational database systems as a Data Engineer. Conducted data mapping activities for a data warehouse, ensuring data integrity and consistency. Produced and optimized PL/SQL statements and stored procedures in DB2 for efficient data extraction and manipulation. Designed and implemented Star and Snowflake schema-based dimensional models to support the data warehouse architecture. Utilized forward engineering techniques to create Physical Data Models with DDL tailored to meet the requirements of Logical Data Models. Provided comprehensive source-to-target mappings to ETL teams for initial, full, and incremental data loads into target data marts. Collaborated closely with ETL SSIS Developers to explain complex data transformations and logic. Developed normalized Logical and Physical database models for OLTP systems in insurance applications. Created dimensional models for reporting systems by identifying essential dimensions and facts using tools like ER Studio Data Architect. Environment: SQL server, SQL database, Azure analysis services, T-SQL, PL/SQL. CompuCom Private Limited, Pune, India ETL Developer Feb 2013 to Jan 2016 Responsibilities Developed Data Models using E/R Studio and deployed to Enterprise Data Warehouse. Designed data marts with star and snowflake schemas for dimensional data modeling. Created 3rd normal-form target data models and mapped to logical models. Implemented data cleansing rules and resolved anomalies in legacy applications. Utilized PL/SQL, Shell Scripts, and T-SQL for data operations and database interface. Developed SSIS and SSAS applications for data extraction, loading, and analysis. Designed Tableau visualizations, dashboards, and reports to support business decisions. Environment: E/R Studio, OLAP, PL/SQL, SQL Informatics India, Hyderabad, India SQL Developer Sept 2012 to Jan 2013 Responsibilities Performed complex data analysis and profiling using SQL across various systems. Optimized data collection procedures and generated regular reports (weekly, monthly, quarterly). Developed SQL queries and reports for inventory control and sales profitability. Utilized Microsoft Excel for pivot tables, pivot reporting, and VLOOKUP functions. Applied Microsoft SPSS for statistical tracking and analysis on large datasets. Designed and implemented PL/SQL queries for testing, validation, and data management tasks. Environment: SQL, PL/SQL Education: Master of computer applications, JNTU Hyderabad Keywords: continuous integration continuous deployment business intelligence rlang microsoft procedural language |