Home

Rajdeep - Big Data Engineer
[email protected]
Location: Frisco, Texas, USA
Relocation:
Visa: H1
Resume file: Rajdeep_1751906030900.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
PROFESSIONAL SUMMARY:

Around 10 years of experience in IT industry which includes 8 years of experience in development using Big Data and Hadoop ecosystem tools
Experience in designing & developing applications using Big Data core technologies Map Reduce, Hive, Spark & Spark SQL
Excellent in learning and adapting to new technologies
Experience with configuration and development on multiple Hadoop distribution platforms Cloudera and Hortonworks (on Premise) including cloud platforms Amazon AWS
Experience on cloud base services like AWS EMR, EC2, S3, Redshift, Athena to work in distributed data model
Experience in configuring and maintaining long-running Amazon EMR clusters manually as well as through Cloud Formation scripts on Amazon AWS
Experience in implementing Hive simple, generic custom UDF s and developed multiple hive views for accessing underlying Tables data
Experience in doing performance tuning for map reduce, Spark jobs & hive complex queries
Experience in importing and exporting data using SQOOP from HDFS to Relational Database Systems and vice-versa
Have good knowledge in Oozie, Airflow, Zookeeper, Kafka, Spark structured streaming & PostgreSQL
Strong understanding in Object-Oriented Programming concepts and implementation
Generated a script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and run aggregation on PySpark code
Strong understanding and working experience on injecting and accessing data to and from a Cassandra cluster using different java and spark API s
Strong Knowledge of spark with Cassandra and Hadoop
Experience in working Agile based applications in different phases like design, development, testing and deployment
Ability to work effectively with associates at all levels within the organization
Drive microservices style of architecture when applicable. Apply Domain Driven Design concepts to model microservices
Strong background in mathematics and have very good analytical and problem-solving skills
Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture
Strong knowledge in real time and near real time applications using apache NiFi, amazon kinesis.
Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design, and review
Familiarity with a broad mix of technologies including a strong subset of Microservices, PaaS
Knowledge in migrating to microservices cloud solutions to build cloud native applications
TECHNICAL SKILLS:
Hadoop/Big Data Hadoop (Yarn), HDFS, MapReduce, Spark, Hive, Sqoop, Flume, Zookeeper, Oozie, Tez
Programming Languages SQL, Python
Distributed computing environments Amazon EMR (Elastic MapReduce), Horton Works (Ambari), Cloudera (CM)
Data flow Tools Apache NiFi, Kafka, Amazon Kinesis
Relational Databases Oracle, MySQL, SQL Server 2005,2008
NoSQL Databases HBase, Cassandra
Cloud Environments AWS (EC2, EMR, S3, Kinesis, DynamoDB), Azure, Docker container (Kubernetes), GCP (Google Cloud Platform)
Data File Types JSON, CSV, PARQUET, AVRO, TEXTFILE, ORC
Version Control Tools Git, Subversion
Methodologies Agile/ Scrum, Rational Unified Process and Waterfall
Operating Systems MacOS, Windows, Unix, and Solaris

Equality Health, Arizona
Role: Data Engineer Mar 2024 - Present
Responsibilities:

Run ETL Pipelines in AWS services S3, DynamoDB, Athena, Redshift, Glue.
Automate and schedule AWS Glue jobs using Apache Airflow.
Developed and implemented an ETL pipeline to fetch files from SFTP Server to AWS Environment.
Implemented AWS Glue Job to process the data and create data extracts.
Created a system to place the Extracts in SFTP for stakeholders.
Extract, Transform and Load data using file formats like EXCEL, CSV and TXT.
Manage code deployment processes through Azure Devops and establish robust CI/CD pipelines.
Manage end-to-end data responsibilities including data modeling, ad-hoc analysis, and implementation of algorithms for extracting Quality and Membership metrics in the PowerBI.
Integrate PostgreSQL with Airflow for managing workflows and triggering jobs dynamically.
Perform data analysis and validate data of US-based Health Care/Insurance companies to deliver actionable insights and improve operational efficiency.
Create, develop, and maintain SQL processes for optimal performance data warehousing.
Monitor System health and logs and respond accordingly to any warning or failure conditions
Write Spark applications using Python to interact with the PostgreSQL database using Spark JDBC

Nike, Inc., Oregon
Role: Data Engineer Jan 2021 - Feb 2024
Responsibilities:

Worked on Airflow Version Upgrade from 1.1.10 to 2.2.5
Successfully developed Nike Global Sales Pipelines and onboarded stakeholders to consume the data to support multiple use cases across Nike.
Extracted product data feeds from Kafka to Spark processing system and stores the order details in Hive databases.
Architected robust and scalable data integration pipelines utilizing Pyspark, enhanced data processing by using EMR and EC2 instances and orchestrated the pipelines using Apache Airflow.
Executed end-to-end ETL pipelines encompassing data modeling, ad-hoc analysis, and the formulation and implementation of algorithms to extract Sales and Revenue metrics within the AWS environment.
Designed and developed Tableau dashboards to generate sales and revenue reports, providing actionable insights for strategic decision making. Orchestrated this process using Airflow.
Elevated data integrity and quality assurance through advanced test design and documentation, enabled self-service customer support.
Created Delta tables in Databricks to enhance data reliability, streamline data processing workflows and utilized Delta Lake features for ACID transactions and time travel capabilities.
Developed materialized views in Databricks for Stakeholders addressing specific analytical needs and improved query performance.
Resolved complex data integration challenges through optimized ETL strategies and influenced data-driven opportunities across cross-functional teams.
Created an e-mail notification service upon completion of job using Pandas for the stakeholders who requested the data in CSV format.
Managed code deployment processes through GitHub and Jenkins, establishing robust CICD pipelines

Carelon Global Solutions
Role: Sr. Data Engineer May 2019-Dec 2019
Responsibilities:

Experience in AWS services like EC2, EMR, S3, DynamoDB, Athena, Redshift, Glue
In-depth understanding of MapReduce Framework and Spark execution model
Capable of processing large sets of structured, semi-structured and unstructured data implements and leverages open source technologies
Migrating servers, databases, and applications from on premise to AWS
Mastered in using different columnar file formats (ORC, Parquet)
Worked on implementing cost and time efficient solutions in AWS for the end clients.
Monitored the MR, Tez, Sqoop, Spark jobs using UI interface (Namenode Manager, Resource Manager ETC) in AWS
implemented Partitions, bucketing concepts in Hive for query optimization and designed both Managed and External tables in Hive to optimize performance
Exported final output data to NoSQL database Cassandra using spark connectors.
Developed ETL parsing and analytics data pipelines using Python/Spark to build a structured data model
Implemented Spark using Python and Spark SQL API for faster testing and processing of data
Developed Spark scripts using Python on AWS EMR for Data Aggregation, Validation and adhoc querying.
Extracted the needed data from the server into S3 and Bulk Loaded the cleaned data into Cassandra using Spark
Rigorously used PySpark (RDD s, Data frames, Spark Sql) and Spark - Cassandra -Connector API's for various tasks (Data migration, Business report generation etc.)
Moved Relational Database data to S3 using Sqoop into s3 data lake and created external hive tables on top of the raw data.
Created an e-mail notification service upon completion of job for the team which requested for the data
Played a key role in productionizing the application after testing by BI analysts
Worked in Agile methodology actively participating in Scrum Planning, Review call, Retrospective and daily scrum in addressing problem/issues.
Scheduled jobs using airflow orchestrating tool.


Capgemini Technology Services
Role: Hadoop Developer Aug 2017 Apr 2019
Responsibilities:

Played a major role in gathering requirements, analysis of entire system and providing estimation on development, testing efforts
Worked in the process of configuring, installing Cloudera Hadoop Environment.
Written Spark applications using python to interact with the PostgreSQL database using Spark jdbc and accessed Hive tables using Hive Context
Created multiple Hive tables, implemented Dynamic Partitioning and Buckets in Hive for efficient data access
Involved in creating Hive External tables, also used custom Serdes s based on the structure of input file so that schema on read is implemented in Hive tables
Involved in converting Hive/SQL queries into Spark transformations (map, filter and many more) and action commands (reduceByKey, groupByKey with Spark RDDs using Python
Implemented Spark structured Streaming and Spark SQL using Data Frames
Integrated product data feeds from Kafka to Spark processing system and store the order details in PostgreSQL database
Worked on scheduling all jobs using Oozie.
Migrated the existing data to Hadoop Environment (HDFS clusters) from RDBMS using Sqoop for processing the data and developed ETL pipeline to efficiently process data using python.
Implemented common use cases like data integrity checks, data sourcing using Spark.
Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment
Monitor System health and logs and respond accordingly to any warning or failure conditions


Cognizant Technology Solutions
Role: SQL/ETL Developer Aug 2014 July 2017

Understood the requirements by interacting with business users and mapping them to design and implement in the AGILE Development methodology.
Worked as an SQL Developer and establish strong relationships with end users and technical resources throughout established projects.
Work closely with the development team to provide support for database objects (stored procedures/triggers/views) and performing code review to ensure proper design and implementation of database systems.
Experience with database administration including performance monitoring, access management and storage management.
Create, develop, and maintain SQL processes for channel performance data warehousing.
Participated in Designing databases (schemas) to ensure that the relationship between data is guided by tightly bound key constraints.
Extensive experience in Data Definition, Data Manipulation, Data Query, Data Transaction and Control Language.
Loaded Data into Oracle Tables using SQL Loader.
Debug the procedure, function, package using DBMS_UTILITY.
Experience in designing and creating Tables, Views, Indexes, Stored Procedures, Cursors, Triggers and Transactions.
Configured the data mapping between sources and destination in Oracle environment and tested performance accuracy related queries under SQL Server.
Closely working with development manager and onsite team to deliver solutions
Keywords: continuous integration continuous deployment user interface business intelligence sthree active directory information technology

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];5780
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: