Surya Mithra Reddy Ram - Senior Data Engineer |
[email protected] |
Location: Richmond, Texas, USA |
Relocation: |
Visa: H!b |
Surya Mithra Reddy Ram
Senior AWS Data Engineer Contact: (832)-786-8694 Email: [email protected] | LINKEDIN PROFESSIONAL SUMMARY Accomplished Senior Data Engineer with over 10 years of expertise in designing and managing data pipelines across diverse cloud platforms, including GCP, AWS, and Snowflake. Proficient in programming languages such as Python, PySpark, SQL, and Scala, enabling complex data transformations and real-time data processing. Extensive experience with data processing frameworks and tools like Spark, Dataproc, Google Cloud Dataflow, AWS EMR, and AWS Lambda, creating scalable and efficient data pipelines. Skilled in data warehousing techniques such as partitioning, clustering, and denormalization to optimize storage and query performance. Experienced in data visualization tools like Power BI and Advanced Excel, enabling effective storytelling and insights through interactive dashboards and reports. Proficient in ETL tools such as Informatica PowerCenter, Talend, AWS Glue, and SAP BODS to streamline data integration and transformation. Expertise in workflow orchestration tools like Apache Airflow and Google Cloud Composer to automate and monitor data pipelines for peak performance. Applied machine learning techniques for building classification and recommendation models using Python, supporting real-time, data-driven decision-making. Extensive experience with monitoring tools like AWS CloudWatch, AWS CloudTrail, and Google Stackdriver, ensuring real-time logging and pipeline reliability. Hands-on experience with version control systems, including Git, for effective code management, collaboration, and CI/CD integration. Skilled in building CI/CD pipelines using tools like Jenkins, Terraform, and Google Cloud Build for seamless deployment of production-ready data pipelines. Proficient in real-time data ingestion and processing using tools such as Google Pub/Sub, SLT, and HDFS, ensuring timely data availability for analytics and reporting. Familiar with regulatory compliance frameworks such as PHI, HL7, FHIR, and HIPAA, ensuring secure and compliant data handling. Adept at Agile methodologies, collaborating with cross-functional teams, and using tools like Jira for task management and sprint planning to deliver high-quality data solutions. TECHNICAL SKILLS Cloud Platforms GCP, AWS, Azure, Snowflake Programming Languages Python, PySpark, Scala, SQL, R. Data Processing Spark, Dataproc, Google Cloud Dataflow, EMR, Lambda, Databricks, Hadoop, Hive, MapReduce, Talend, AWS Glue, BODS. Data Warehouses Google BigQuery, Cloud Spanner, Redshift, Teradata, Oracle, SAP HANA, SAP ASE, MySQL, Amazon RDS. ETL and Tools Informatica PowerCenter, Pentaho, SSIS, SSRS, BODS, SLT Replication. Data Visualization Power BI, Advanced Excel Workflow Orchestration Apache Airflow, Google Cloud Composer, AWS Step Functions. Version Control GitHub. CI/CD Tools Jenkins, Cloud Build, Terraform CERTIFICATIONS Certified Google Professional Data Engineer Certified AWS Solutions Architect Associate EDUCATION Bachelor of Technology in Computer Science and Engineering, JNTU Anantapur, India PROFESSIONAL EXPERIENCE Client: Tennessee Farmers Insurance company, Nashville, TN Mar 2024 Present Role: Senior Data Engineer Responsibilities: Executed seamless data migration from Oracle and SAP ASE databases to BigQuery by staging in Google Cloud Storage (GCS) via BODS, aligning with ECC structure requirements. Improved batch and real-time ingestion performance in BigQuery by standardizing S/4 HANA and ECC structures with advanced normalization techniques. Engineered and managed data ingestion pipelines using BODS and SLT, deployed via Terraform, to efficiently move data from Oracle, SAP ASE, and SAP HANA into Google Cloud. Automated pipelines for optimized real-time and batch data integration by utilizing Cloud Dataflow, Dataproc, and Apache Airflow. Leveraged GKE to containerize Python-based data jobs, orchestrating workflows with Cloud Composer for automated batch processing. Designed ELT pipelines to replicate real-time data from SAP HANA into BigQuery via SLT Replication, employing advanced partitioning and clustering for query optimization. Applied Spark and Python for cleaning and transforming historical and real-time datasets, ensuring high accuracy and readiness for analytics. Automated workflows using Airflow and shell scripting to significantly reduce processing times for daily data integrations. Built CI/CD pipelines with Cloud Build to streamline deployment processes for data pipelines and infrastructure. Wrote custom stored procedures and transformations in Python, Spark, and Scala for handling complex data cleaning and migration needs. Enhanced query performance in BigQuery with advanced partitioning, clustering, and denormalization techniques, complemented by automation through shell scripts. Created automated DAGs and integrated Dataproc for large-scale batch processing, ensuring efficient real-time data handling. Monitored pipelines in BigQuery, Dataproc, and Airflow environments using Cloud Monitoring, ensuring stable operations and issue resolution. Managed version-controlled data pipelines and collaborated across Agile teams using GitHub and Jira to ensure sprint-based delivery of projects. Tech Stack: BODS, SLT Replication, Terraform, Python, SQL, PySpark, Cloud Dataflow, Dataproc, Apache Airflow, Google Cloud Storage, BigQuery, Cloud Build, GitHub, Custom Stored Procedures, Python Scripts, SQL Queries, Cloud Monitoring, Jira, Git Version Control, Oracle, SAP ASE, SAP HANA, Agile Client: Starbucks, Seattle, WA Aug 2021 Feb 2024 Role: GCP Data Engineer Responsibilities: Built scalable ELT pipelines to ingest CRM data into Snowflake using FiveTran, transforming OLTP data to OLAP format for BigQuery storage and analytics. Captured real-time IoT data via Google Pub/Sub and automated pipelines for timely ingestion and processing with Airflow orchestration. Designed staging processes in GCS for real-time data, applying scalable transformations using Dataproc and Python. Developed Dataflow pipelines to process real-time data into BigQuery and Spanner, employing advanced partitioning and clustering to enhance analytics capabilities. Orchestrated ETL pipelines across platforms using Google Cloud Composer, integrating data with Python and Spark transformations. Applied machine learning models like classification and recommendation systems to real-time datasets using Python and Scala for advanced predictive analytics. Automated ETL pipelines with Apache Airflow, enhanced scheduling with cron jobs, and monitored workflows with comprehensive logging mechanisms. Created Python DAGs in Airflow to orchestrate end-to-end data pipelines and integrated real-time datasets via Dataproc. Streamlined CI/CD processes using Jenkins, ensuring automated deployment of data pipelines to production. Version-controlled Python and Scala scripts in Git, enabling smooth collaboration and clean versioning within Agile teams. Tech Stack: Snowflake, Google Pub/Sub, Google Cloud Storage, Google Cloud Composer, Dataflow, Dataproc, BigQuery, SQL, Python, Machine Learning, Apache Airflow, Cron Jobs, Python DAGs, Git, Jenkins, Apache Airflow, CI/CD Pipelines, Agile Methodology. Client: CVS, Woonsocket, RI Jun 2020 Aug 2021 Role: AWS Data Engineer Responsibilities: Migrated healthcare data from Teradata and Oracle to AWS Redshift using Talend and AWS Glue while ensuring compliance with HL7 and FHIR standards. Designed data mirroring techniques, staging PHI-compliant healthcare data securely in Amazon S3. Developed Python and Spark scripts on AWS EMR and Lambda to transform healthcare data for analytics in AWS Redshift. Created and maintained table schemas in Redshift, ensuring seamless integration with downstream systems. Captured and processed real-time healthcare data via APIs, storing it in efficient formats for scalability and analytics. Extracted Salesforce CRM data with AWS AppFlow, transforming and loading into Redshift to support business needs. Built ELT pipelines with audit tracking to monitor and log data migrations and transformations for compliance purposes. Automated batch processing with AWS Step Functions, Glue, EMR, and Lambda to handle large volumes of healthcare data. Monitored pipelines using AWS CloudWatch and CloudTrail, ensuring real-time tracking of healthcare data workflows. Managed metadata using Amazon RDS for tracking schema versions, data sources, and transformation logic. Tech Stack: Talend, AWS Glue, AWS Redshift, Teradata, Oracle, HL7 & FHIR Standards, Amazon S3, PHI Compliance, Python, Apache Spark, AWS EMR, AWS Lambda, AWS AppFlow, Salesforce CRM, AWS Step Functions, AWS CloudWatch, AWS CloudTrail, Amazon RDS, Git, Jira, Agile Methodology. Client: Nextera, Juno Beach, FL Sep 2018 May 2020 Role: Data Engineer Responsibilities: Built a framework to accommodate Full loads & Incremental loads. Worked on Data Analysis, validations and audit framework implementation. Responsible for collecting the client requirements for the specific sources and design the solution for the same. Responsible for developing & implementing the solution using Spark and Python API to load the data from source AWS S3 location. Create dimension and fact tables and load the transformed data into Redshift tables. Applied end-to-end unit testing and document the results for each requirement story and review them with the test lead before moving to production. Working on Agile methodology to meet the deadlines for full ELT cycle requirements. Worked closely with Business users. Interacted with ETL developers, Project Managers, and members of the QA teams. Created different KPI using calculated key figures and parameters. Creating automation scripts using python to remove manual processes and to save time and effort. Improving performance and processing for existing data flows. Responsible for the documentation, design, development, and architecture of visualization reports. Handle the installation, configuration, and supporting of multi node setup of AWS EMR. Creating automation using Python to reduce all manual efforts & works. Tech Stack: AWS S3, AWS Redshift, Apache Spark, Agile Methodology, Fact and Dimension Tables, Python Automation, KPIs, AWS EMR. Client: Indiumsoft, Hyderabad, India Aug 2016 Sep 2018 Role: Big Data Engineer Responsibilities: Built and maintained ETL pipelines using Informatica PowerCenter to integrate sales data into Teradata while applying SCD Type 1 and Type 2 for handling historical data. Designed and optimized Informatica mappings for large-scale data loads using FastLoad and MultiLoad, with staging tables in Oracle DB to streamline transformations. Developed star schema-based data marts in Teradata, including fact and dimension tables, to support advanced reporting and analytics. Created ETL workflows to load data into data marts, enabling advanced business insights and KPI reporting. Integrated KPI logic into ETL workflows to ensure accurate business metric generation. Conducted extensive testing, including unit, integration, and user acceptance testing, to ensure data reliability. Documented ETL processes, transformations, and mappings, creating user manuals and data dictionaries for maintenance and user reference. Tech Stack: Informatica PowerCenter, Teradata, Oracle DB, FastLoad, MultiLoad, Star Schema Client: Spurtree Technologies Inc, Bangalore, India Jun 2013 Aug 2016 Role: ETL Developer Responsibilities: Designed and implemented customized database systems to meet client requirements. Authored comprehensive design specifications and documentation for database projects. Developed ETL pipelines using Pentaho for seamless integration of data from various sources. Troubleshot and optimized MapReduce jobs to resolve failures and improve performance. Facilitated data import/export across multiple systems and the Hadoop Distributed File System (HDFS). Built scalable and distributed data solutions utilizing Hadoop, Hive, MapReduce, and Spark. Transformed structured and semi-structured data using tools like Hive and Spark. Created detailed user documentation for Hadoop ecosystems and processes. Executed Hive queries to perform in-depth data analysis and validation. Tech Stack: MapReduce, Pig, Hive, Hadoop, Cloudera, HBase, Sqoop Keywords: continuous integration continuous deployment quality analyst business intelligence sthree database sfour rlang information technology trade national Florida Rhode Island Tennessee Washington |