Home

Data Engineer | 7+ years | OPT | Apex NC | Open to relocate - Data Eningeer
[email protected]
Location: Apex, North Carolina, USA
Relocation: Yes
Visa: OPT
Resume file: Mahesh DE resume_1768579955864.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Professional Summary
Skilled and results-oriented Data Engineer with 7 years of experience in building and optimizing data
infrastructure. Skilled in cloud platforms, big data technologies, and improving data processing, storage, and retrieval. Strong team collaborator with a track record of delivering scalable solutions.

Technical Skills
Cloud Platforms: Microsoft Azure (Data Factory, Synapse Analytics, Data Lake Storage, Databricks, Event Hubs, Stream Analytics), Amazon Web Services (AWS Glue, EMR, Lambda, S3, EC2, Athena, Redshift)
ETL Tools: Azure Data Factory, AWS Glue, Apache Airflow, Apache NiFi, SSIS, Databricks, Snowflake
Scripting Language: Python (Pandas, NumPy, Matplotlib), SQL, PL/SQL, Scala, Shell Scripting
Databases: Oracle, SQL Server, MySQL, PostgreSQL, DB2, MongoDB, DynamoDB, Cassandra, Hbase
DevOps & CI/CD Tools: Git, Jenkins, Azure DevOps, Docker, Terraform, ARM Templates
Logging: AWS CloudWatch, Splunk (basic), Unix/Linux Administration
Methodologies: Agile (Scrum), SDLC, Release Management
Data Visualization tools: Power BI, Tableau.

Professional Experience:
Role: Data Engineer
Client: Discover | Jan 2024 Present
Developed end-to-end Spark applications to perform data cleansing, validation, transformation, and summarization on user behavioral data.
Utilized Spark, Scala, and Python for querying and preparing data from big data sources.
Explored Spark for improving the performance and optimization of existing algorithms in Hadoop using
Spark Context, Spark-SQL, Data Frame, Pair RDD.
Involved in converting Hive/SQL queries into Spark transformations using Spark with Scala.
Implemented a prototype to perform real-time streaming of data using Spark Streaming with Kafka.
Skilled in SQL and PostgreSQL for proficient querying and optimization of relational databases.
Worked on various performance optimizations like using distributed cache for small datasets, partitioning, bucketing in Hive, and Map-Side joins.
Utilized Python in AWS Lambda functions to build an event-driven, serverless architecture effectively.
Engineered robust data processing pipelines using AWS Glue, orchestrating seamless ETL tasks for structured and unstructured data with unparalleled efficiency more than 50%
Leveraged the power of Amazon EMR to perform sophisticated big data processing, ensuring scalability and performance optimization for diverse data workloads.
Crafted intricate data workflows with Apache Airflow on AWS, automating and scheduling tasks for a well-coordinated and efficient data ecosystem.
Configured and managed EC2 instances on AWS to deploy and scale Spark applications.
Leveraging Amazon Athena has capabilities for agile and on-the-fly exploration of extensive datasets stored on S3 through interactive query processing, facilitating rapid and flexible data analysis.
Employed CloudWatch to monitor and optimize the performance of Spark applications and data processing pipelines on AWS infrastructure.
Employed advanced MapReduce techniques like partitioning, bucketing, and Map-Side joins to optimize data processing in Hadoop and PySpark environments.

Developed serverless functions and automated workflows using AWS Step Functions.
Integrated NoSQL databases like Cassandra and DynamoDB seamlessly with Spark applications, enhancing storage efficiency.
Leveraged data modeling principles in Hadoop and PySpark ecosystems to optimize data structures and algorithms, improving performance and scalability for diverse workloads.
Implemented a robust CI/CD pipeline using Git, ensuring seamless integration of code changes and automated deployment of Spark applications for efficient data processing tasks.
Integrated Tableau dashboards seamlessly with AWS data services, enabling real-time monitoring and analysis of big data workflows.
Environment: Spark, Scala, Python, Hadoop, Spark Streaming, Kafka, Spark SQL, Pyspark, Hive, AWS Glue, Amazon EMR, Apache Airflow, AWS Lambda, Amazon Redshift, Amazon Athena, Tableau, Agile Methodologies.
Role: Data Engineer
Client: Byju s, India, Hyderabad | Dec 2020 Aug 2022
Implemented robust data pipelines using Azure Data Factory and Databricks for diverse data sources.
Designed scalable Azure Data Lake architectures for structured and unstructured datasets.
Performed migration of legacy systems to Azure Synapse Analytics, enhancing performance by 40%.
Developed real-time data processing solutions using Azure Event Hubs and Stream Analytics.
Optimized AWS Redshift configuration, reducing query execution time by 24%
Developed interactive Power BI dashboards for Sales and Production teams, providing real-time visibility into key performance metrics.
Designed and implemented Month-to-Date (MTD) reports for enabling leadership to monitor progress against monthly targets.
Created detailed on-hand inventory reports in Power BI, optimizing stock monitoring and facilitating inventory management decisions.
Collaborated with cross-functional teams to gather requirements, build data models, and deliver impactful visual analytics solutions.
Utilized DAX expressions and Power Query to transform raw data into insightful, user-friendly reports.
Automated CI/CD workflows with Terraform, Jenkins, and Azure DevOps for efficient deployments, and integrated SSIS packages for seamless ETL processes.
Utilized Power BI to create interactive dashboards connected to Azure SQL and Data Lake, leveraging SQL Server for optimized backend querying.
Used Apache Spark and Databricks notebooks to process and transform high-volume datasets efficiently.
Designed and implemented schema models in Synapse and SQL Server for optimized query performance and storage efficiency.
Leveraged Airflow for task orchestration and scheduling of ETL workflows in Azure environments.
Designed a scalable data lake on Azure Data Lake Storage to integrate multi-source data.
Migrated on-premises data warehouse systems to Azure Synapse Analytics, achieving cost savings and performance enhancements.
Leveraged ARM templates over several Azure projects for the reusability of the code.
Environment: Azure Data Factory, Azure Synapse Analytics, Azure Blob Storage, Azure Data Lake, Power BI, Terraform, Apache Spark, Databricks, Airflow, Jenkins
Role: Hadoop Developer
Client: Allstate India Pvt. Ltd, India| Aug 2018-Dec 2020
Experienced in data ingestion with large datasets, employing techniques like Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, and Transformations.

Experienced in automating data processing tasks using Linux command-line tools and shell scripting.
Utilized Sqoop to load data from DB2 to HBase, optimizing querying speed and performance.
Designed, built, and supported Hadoop and RDBMS data integration projects, integrating traditional and non-traditional source systems with RDBMS and NoSQL data storage for comprehensive data access and analysis.
Implemented Kafka Utils module in PySpark to create an input stream that directly pulls messages from Kafka brokers.
Implemented Hive table partitioning and parallel script execution to minimize runtime.
Developed an End-to-End data pipeline orchestration using NIFI.
Created Python code for various tasks, dependencies, SLA monitoring, and time sensors for each job to manage workflows and automate processes using Oozie.
Implemented business logic by writing UDFs in Spark and configuring CRON Jobs.
Provided design recommendations and resolved technical problems.
Involved in performance tuning and troubleshooting of Hive and Spark jobs.
Developed H-catalog Streaming code to stream JSON data into Hive (EDW) continuously.
Environment: Hadoop, HDFS, Hive, Spark, HBase, Python, Sqoop, Oozie, Linux, DB2.

Education
University of New Haven, Connecticut Master of Science in Data Science


Certification
Microsoft Certified: Fabric Data Engineer Associate (in Progress)
Keywords: continuous integration continuous deployment business intelligence sthree procedural language

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6649
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: