Hotlist for Data Engineer - Senior Data Engineer |
[email protected] |
Location: Jacksonville, Florida, USA |
Relocation: yes |
Visa: H1B |
Pratyusha Tummala Resume
Senior Data Engineer [email protected] M: + 7706285675 Email: [email protected] https://www.linkedin.com/in/pratyusha-t-619b8a149/ Overview: Over 7+ years of experience in developing applications using Apache Spark, Scala, Python, Java, REST, SpringBoot, Dell Boomi, Kafka, Mongo, DB2, PostgreSQL, SQL Server, Hive, Amazon RedShift, Amazon S3, Databricks. Developed and optimized data pipelines using Databricks, leveraging Apache Spark for large- scale data processing, resulting in improved performance and streamlined data workflows across cloud-based environments. Experience with tools in Hadoop including Hive, HDFS, MapReduce, Sqoop and Spark Experience in importing and exporting data using Sqoop from Relational Database to HDFS and vice-versa Knowledge on Hadoop components such as Job Tracker, Task Tracer, Name Node, Data Node and MapReduce programming paradigm Experience in designing and developing ETL jobs on both batch and real-time data processing in Spark using Scala and Python to transfer data from one database to another database. Utilized strong SQL skills to write, debug, and optimize queries and ETL jobs, reducing execution time and resource utilization. Experience in publishing and subscribing messages from Kafka through SpringBoot microservices and Spark. Experience in using Spark applications for data processing and analysis using Spark RDD s, Data Frames, Datasets and SQL. Experience in using Text, CSV, Excel, JSON file format in Hadoop ecosystem. Implemented Avro-based serialization and deserialization processes, ensuring efficient data exchange and seamless integration between systems in a distributed environment. Experience in design and develop Java REST service APIs to integrate secured protocol for enterprise needs using SpringBoot framework and expose them to end-users Working knowledge in establishing connections between the user interface and backend using web services such as REST Delivered end-to-end solutions by managing the complete software development lifecycle, from design and documentation to implementation, testing, and deployment. Experience with tools like GIT and SVN for version control management and merging source code Experience in migrating applications from one environment to other using Jenkins and deploy applications to OpenShift and Hadoop. Working experience in designing and developing interactive Tableau workbooks and dashboards for executive decision making Demonstrated an intellectually curious and solutions-oriented mindset with the ability to analyze workflows and processes, developing innovative approaches to solve complex data engineering challenges with dedication and a positive attitude. Education: Master s in Computer Science May 2017 Marshall University, Huntington, WV Computer Science and Engineering June 2015 Osmania University, Hyderabad, India Technical Details: Technologies known : Sqoop, Scala, Spark, Python, PySpark, Java, Kafka, Databricks, Dell Boomi, HTML5, CSS3 Databases : SQL Server, DB2, Oracle, Mongo DB, PostgreSQL, RedShift, Hive IDE s : Visual Studio, Eclipse, Notepad++, Atom Software/Tools : Tableau, Weka, GIT, Jenkins, SVN, Rational Rose Methodologies : Agile, Waterfall Professional Experience Florida Blue, FL June 2022 Till date Senior Data Engineer Skills: Spark, Scala, PySpark, Python, Sqoop, SQL Server, DB2, Hive, HBase, MongoDB, Oracle, Java, Databricks, PostgreSQL, Amazon RedShift, Amazon S3, Stonebranch Scheduler, Control-M Scheduler Collaborated with cross-functional teams, including architects, scrum masters, and directors to analyze, design, develop, and maintain features enhancing data accessibility on the Enterprise Data Platform. Participate in daily standups, SCRUM sessions, Sprint Backlog sessions, Grooming and Sprint Retrospective sessions to resolve any dependency and update Product Owner and Scrum Master with user story status Developed automated shell scripts in a Linux environment using Bash to trigger Spark jobs via the Control-M scheduler. Responsible for writing and optimizing Hive queries to analyze data and load it efficiently to meet business requirements. Architected and built large-scale data pipelines using Apache Spark, PySpark, Scala and Python for data processing and achieved 60% faster processing of high-volume datasets. Transformed Hive/SQL queries into Spark transformations, utilizing Spark DataFrames, RDDs, and Datasets for enhanced performance and scalability Migrating data from Oracle, MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS. Involved in performing CRUD operations on different databases like SQL Server, Oracle, DB2, Hive, HBase, RedShift, Mongo and PostgreSQL Architected distributed data storage solutions using RedShift, IBM DB2, Mongo, PostgreSQL for high-volume transactional systems. Used best practices like vacuuming, analyzing, and compression to boost Redshift performance. Lead the team in developing real-time data ingestion applications using Kafka, reducing data latency by 65% and event processing delays by 50% by ensuring high availability of data. Built a real-time data consumer from Kafka using Avro data format, leveraging the SpringBoot framework and deploying it on OpenShift for scalable and efficient data processing Engaged in publishing and consuming Kafka messages via Spark for real-time data streaming and processing. Implemented reconciliation framework between source and destination databases to make sure the data provided to the end-user is accurate Improved the performance of data processing jobs by 50% by tuning Spark and Scala code base. Created and scheduled Control-M jobs to run multiple Hive and Spark Jobs, which independently run with time and data availability Wrote REST services to POST and GET data from MONGO and PostgreSQL databases using SpringBoot Built and maintained schemas for Avro and Parquet to ensure compatibility and data integrity across cross-functional teams. Used Postman, SOAP UI, JMeter to perform unit testing on the REST APIs to confirm that all the functionalities are working as expected Developed Database/Flat File/JSON profiles, Boomi Mappings, Processes using different connectors/shapes and logic shapes between application profiles using Dell Boomi Maintained high data quality standards throughout the entire data processing lifecycle, from ingestion to consumption. Optimized data workflows by adopting Parquet and ORC formats, reducing storage costs and improving query performance in large-scale data environments. Parsed Avro schema to deserialize data and implemented a solution to read JSON messages, ensuring efficient data processing and integration across multiple systems. Automated deployment workflows, ensuring smooth integration of data solutions into CI/CD pipelines for efficient deployment and continuous delivery Led and executed the migration of data infrastructure from HDP to CDP, ensuring smooth transition and minimal downtime Performed rigorous unit testing and participated in peer code reviews to ensure high-quality, well-tuned, and maintainable code for optimal performance. Mentored junior engineers in the best practices of distributed computing and big data architecture by providing feedback, sharing knowledge, and offered guidance and encouraged growth within the team Engaged in technology exploration, experimenting with and adopting new trends to enhance platform capabilities. Wells Fargo, NC May 2021 June 2022 Senior Data Engineer Skills: Spark, Scala, SQL Server, Hive, MongoDB, Oracle, Autosys, Kafka Developed large-scale data ingestion and transformation pipelines using Spark, Kafka, and Scala to process over 5TB of data daily from SQL Server, Oracle to Mongo DB. Reduced the data processing time from 6 hours to 30 minutes by optimizing Spark jobs. Designed and implemented database schema for a NoSQL solution in MongoDB, handling unstructured and semi-structured data with a focus on performance and scalability. Queried on dataframe using explode and explode outer for nested json Implemented basic data warehousing solutions with Hive and HDFS for efficient data storage and retrieval. Automated workflows using Autosys jil commands to manage complex batch processing jobs and scheduled tasks. Worked with cross-functional teams to ensure data integrity and alignment with business requirements, improving data availability and reliability. Attending daily scrum calls, sprint planning and retrospective calls to discuss and update the status and about any impediments while working on any tasks Collaborated with DevOps team to automate deployment workflows and ensure seamless integration of data solutions into CI/CD pipelines. Florida Blue, FL Oct 2017 April 2021 Associate Data Engineer Skills: Spark, Scala, PySpark, Python, Sqoop, SQL Server, DB2, Hive, HBase, MongoDB, Oracle, Java, PostgreSQL, Amazon RedShift, Amazon S3, Control-M Scheduler Participated in the full lifecycle of application feature development, including analysis, design, development, and ongoing maintenance. Actively engaged in daily standups, SCRUM meetings, Sprint Backlog planning, Grooming sessions, and Sprint Retrospectives to resolve dependencies and update the Product Owner and Scrum Master on user story progress. Wrote automated shell scripts in Linux environment using bash to trigger Spark jobs through Control-M scheduler Created Hive queries for data analysis and data loading, ensuring alignment with business needs and requirements. Designed and implemented large-scale data pipelines using Apache Spark, PySpark, Scala, and Python, resulting in a 40% improvement in data processing speed for high-volume datasets. Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames, Spark RDDs, Datasets. Migrating data from Oracle, MySQL into HDFS using Sqoop and importing various formats of flat files in to HDFS. Involved in performing CRUD operations on different databases like SQL Server, Oracle, DB2, Hive, HBase, Mongo and PostgreSQL Architected distributed data storage solutions using IBM DB2, Mongo, PostgreSQL for high- volume transactional systems. Led the development of real-time data ingestion applications with Kafka, achieving a 50% reduction in data latency and a 30% improvement in event processing by ensuring high availability Developed real-time data consumer from Kafka in Avro Data format using SpringBoot framework on OpenShift Involved in publishing and consuming Kafka messages to and from Kafka using Spark. Implemented reconciliation framework between source and destination databases to make sure the data provided to the end-user is accurate Ensured high data quality standards throughout the entire lifecycle of data processing and management. Improved the performance of data processing jobs by 30% by tuning Spark and Scala code base. Streamlined data workflows by integrating Parquet and ORC formats, reducing storage costs and improving query speed in big data environments. Created and scheduled Control-M jobs to run multiple Hive and Spark Jobs, which independently run with time and data availability Wrote REST services to POST and GET data from MONGO and PostgreSQL databases using SpringBoot Used Postman, SOAP UI, JMeter to perform unit testing on the REST APIs to confirm that all the functionalities are working as expected Developed Database/Flat File/JSON profiles, Boomi Mappings, Processes using different connectors/shapes and logic shapes between application profiles using Dell Boomi Managed to automate deployment workflows and ensure seamless integration of data solutions into CI/CD pipelines. Keywords: continuous integration continuous deployment user interface sthree database information technology Florida North Carolina West Virginia |