Home

Hotlist for Data Engineer - Senior Data Engineer
[email protected]
Location: Jacksonville, Florida, USA
Relocation: yes
Visa: H1B
Pratyusha Tummala Resume
Senior Data Engineer
[email protected]
M: + 7706285675 Email: [email protected]
https://www.linkedin.com/in/pratyusha-t-619b8a149/
Overview:

Over 7+ years of experience in developing applications using Apache Spark, Scala, Python, Java,
REST, SpringBoot, Dell Boomi, Kafka, Mongo, DB2, PostgreSQL, SQL Server, Hive, Amazon
RedShift, Amazon S3, Databricks.
Developed and optimized data pipelines using Databricks, leveraging Apache Spark for large-
scale data processing, resulting in improved performance and streamlined data workflows
across cloud-based environments.
Experience with tools in Hadoop including Hive, HDFS, MapReduce, Sqoop and Spark
Experience in importing and exporting data using Sqoop from Relational Database to HDFS and
vice-versa
Knowledge on Hadoop components such as Job Tracker, Task Tracer, Name Node, Data Node
and MapReduce programming paradigm
Experience in designing and developing ETL jobs on both batch and real-time data processing in
Spark using Scala and Python to transfer data from one database to another database.
Utilized strong SQL skills to write, debug, and optimize queries and ETL jobs, reducing execution
time and resource utilization.
Experience in publishing and subscribing messages from Kafka through SpringBoot
microservices and Spark.
Experience in using Spark applications for data processing and analysis using Spark RDD s, Data
Frames, Datasets and SQL.
Experience in using Text, CSV, Excel, JSON file format in Hadoop ecosystem.
Implemented Avro-based serialization and deserialization processes, ensuring efficient data
exchange and seamless integration between systems in a distributed environment.
Experience in design and develop Java REST service APIs to integrate secured protocol for
enterprise needs using SpringBoot framework and expose them to end-users
Working knowledge in establishing connections between the user interface and backend using
web services such as REST

Delivered end-to-end solutions by managing the complete software development lifecycle, from
design and documentation to implementation, testing, and deployment.
Experience with tools like GIT and SVN for version control management and merging source
code
Experience in migrating applications from one environment to other using Jenkins and deploy
applications to OpenShift and Hadoop.
Working experience in designing and developing interactive Tableau workbooks and dashboards
for executive decision making
Demonstrated an intellectually curious and solutions-oriented mindset with the ability to
analyze workflows and processes, developing innovative approaches to solve complex data
engineering challenges with dedication and a positive attitude.

Education:
Master s in Computer Science May 2017
Marshall University, Huntington, WV

Computer Science and Engineering June 2015
Osmania University, Hyderabad, India

Technical Details:
Technologies known : Sqoop, Scala, Spark, Python, PySpark, Java, Kafka, Databricks, Dell Boomi,
HTML5, CSS3
Databases : SQL Server, DB2, Oracle, Mongo DB, PostgreSQL, RedShift, Hive
IDE s : Visual Studio, Eclipse, Notepad++, Atom
Software/Tools : Tableau, Weka, GIT, Jenkins, SVN, Rational Rose
Methodologies : Agile, Waterfall

Professional Experience

Florida Blue, FL June 2022 Till date
Senior Data Engineer

Skills: Spark, Scala, PySpark, Python, Sqoop, SQL Server, DB2, Hive, HBase, MongoDB, Oracle, Java,
Databricks, PostgreSQL, Amazon RedShift, Amazon S3, Stonebranch Scheduler, Control-M Scheduler

Collaborated with cross-functional teams, including architects, scrum masters, and directors to
analyze, design, develop, and maintain features enhancing data accessibility on the Enterprise
Data Platform.
Participate in daily standups, SCRUM sessions, Sprint Backlog sessions, Grooming and Sprint
Retrospective sessions to resolve any dependency and update Product Owner and Scrum
Master with user story status
Developed automated shell scripts in a Linux environment using Bash to trigger Spark jobs via
the Control-M scheduler.
Responsible for writing and optimizing Hive queries to analyze data and load it efficiently to
meet business requirements.
Architected and built large-scale data pipelines using Apache Spark, PySpark, Scala and Python
for data processing and achieved 60% faster processing of high-volume datasets.
Transformed Hive/SQL queries into Spark transformations, utilizing Spark DataFrames, RDDs,
and Datasets for enhanced performance and scalability
Migrating data from Oracle, MySQL into HDFS using Sqoop and importing various formats of flat
files into HDFS.
Involved in performing CRUD operations on different databases like SQL Server, Oracle, DB2,
Hive, HBase, RedShift, Mongo and PostgreSQL
Architected distributed data storage solutions using RedShift, IBM DB2, Mongo, PostgreSQL for
high-volume transactional systems.
Used best practices like vacuuming, analyzing, and compression to boost Redshift performance.
Lead the team in developing real-time data ingestion applications using Kafka, reducing data
latency by 65% and event processing delays by 50% by ensuring high availability of data.
Built a real-time data consumer from Kafka using Avro data format, leveraging the SpringBoot
framework and deploying it on OpenShift for scalable and efficient data processing
Engaged in publishing and consuming Kafka messages via Spark for real-time data streaming and
processing.
Implemented reconciliation framework between source and destination databases to make sure
the data provided to the end-user is accurate
Improved the performance of data processing jobs by 50% by tuning Spark and Scala code base.

Created and scheduled Control-M jobs to run multiple Hive and Spark Jobs, which independently
run with time and data availability
Wrote REST services to POST and GET data from MONGO and PostgreSQL databases using
SpringBoot
Built and maintained schemas for Avro and Parquet to ensure compatibility and data integrity
across cross-functional teams.
Used Postman, SOAP UI, JMeter to perform unit testing on the REST APIs to confirm that all the
functionalities are working as expected
Developed Database/Flat File/JSON profiles, Boomi Mappings, Processes using different
connectors/shapes and logic shapes between application profiles using Dell Boomi
Maintained high data quality standards throughout the entire data processing lifecycle, from
ingestion to consumption.
Optimized data workflows by adopting Parquet and ORC formats, reducing storage costs and
improving query performance in large-scale data environments.
Parsed Avro schema to deserialize data and implemented a solution to read JSON messages,
ensuring efficient data processing and integration across multiple systems.
Automated deployment workflows, ensuring smooth integration of data solutions into CI/CD
pipelines for efficient deployment and continuous delivery
Led and executed the migration of data infrastructure from HDP to CDP, ensuring smooth
transition and minimal downtime
Performed rigorous unit testing and participated in peer code reviews to ensure high-quality,
well-tuned, and maintainable code for optimal performance.
Mentored junior engineers in the best practices of distributed computing and big data
architecture by providing feedback, sharing knowledge, and offered guidance and encouraged
growth within the team
Engaged in technology exploration, experimenting with and adopting new trends to enhance
platform capabilities.

Wells Fargo, NC May 2021 June 2022
Senior Data Engineer
Skills: Spark, Scala, SQL Server, Hive, MongoDB, Oracle, Autosys, Kafka

Developed large-scale data ingestion and transformation pipelines using Spark, Kafka, and Scala
to process over 5TB of data daily from SQL Server, Oracle to Mongo DB.
Reduced the data processing time from 6 hours to 30 minutes by optimizing Spark jobs.
Designed and implemented database schema for a NoSQL solution in MongoDB, handling
unstructured and semi-structured data with a focus on performance and scalability.
Queried on dataframe using explode and explode outer for nested json
Implemented basic data warehousing solutions with Hive and HDFS for efficient data storage
and retrieval.
Automated workflows using Autosys jil commands to manage complex batch processing jobs
and scheduled tasks.
Worked with cross-functional teams to ensure data integrity and alignment with business
requirements, improving data availability and reliability.
Attending daily scrum calls, sprint planning and retrospective calls to discuss and update the
status and about any impediments while working on any tasks
Collaborated with DevOps team to automate deployment workflows and ensure seamless
integration of data solutions into CI/CD pipelines.

Florida Blue, FL Oct 2017 April 2021
Associate Data Engineer
Skills: Spark, Scala, PySpark, Python, Sqoop, SQL Server, DB2, Hive, HBase, MongoDB, Oracle, Java,
PostgreSQL, Amazon RedShift, Amazon S3, Control-M Scheduler

Participated in the full lifecycle of application feature development, including analysis, design,
development, and ongoing maintenance.
Actively engaged in daily standups, SCRUM meetings, Sprint Backlog planning, Grooming
sessions, and Sprint Retrospectives to resolve dependencies and update the Product Owner and
Scrum Master on user story progress.
Wrote automated shell scripts in Linux environment using bash to trigger Spark jobs through
Control-M scheduler
Created Hive queries for data analysis and data loading, ensuring alignment with business
needs and requirements.
Designed and implemented large-scale data pipelines using Apache Spark, PySpark, Scala, and
Python, resulting in a 40% improvement in data processing speed for high-volume datasets.

Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames,
Spark RDDs, Datasets.
Migrating data from Oracle, MySQL into HDFS using Sqoop and importing various formats of flat
files in to HDFS.
Involved in performing CRUD operations on different databases like SQL Server, Oracle, DB2,
Hive, HBase, Mongo and PostgreSQL
Architected distributed data storage solutions using IBM DB2, Mongo, PostgreSQL for high-
volume transactional systems.
Led the development of real-time data ingestion applications with Kafka, achieving a 50%
reduction in data latency and a 30% improvement in event processing by ensuring high
availability
Developed real-time data consumer from Kafka in Avro Data format using SpringBoot
framework on OpenShift
Involved in publishing and consuming Kafka messages to and from Kafka using Spark.
Implemented reconciliation framework between source and destination databases to make sure
the data provided to the end-user is accurate
Ensured high data quality standards throughout the entire lifecycle of data processing and
management.
Improved the performance of data processing jobs by 30% by tuning Spark and Scala code base.
Streamlined data workflows by integrating Parquet and ORC formats, reducing storage costs and
improving query speed in big data environments.
Created and scheduled Control-M jobs to run multiple Hive and Spark Jobs, which independently
run with time and data availability
Wrote REST services to POST and GET data from MONGO and PostgreSQL databases using
SpringBoot
Used Postman, SOAP UI, JMeter to perform unit testing on the REST APIs to confirm that all the
functionalities are working as expected
Developed Database/Flat File/JSON profiles, Boomi Mappings, Processes using different
connectors/shapes and logic shapes between application profiles using Dell Boomi
Managed to automate deployment workflows and ensure seamless integration of data solutions
into CI/CD pipelines.
Keywords: continuous integration continuous deployment user interface sthree database information technology Florida North Carolina West Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];4500
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: