Home

Sabarish Thavamani - Lead Bigdata Engineer / Developer
[email protected]
Location: Jersey City, New Jersey, USA
Relocation: Yes
Visa: H1B
SABARIESH THAVAMANI
[email protected]
732-387-5744


Having 14+ years of experience in analysis, design, development, testing and implementation expertise in Banking and Medicare applications using Datastage , Informatica, Abinitio, Shell Scripting on Linux and Windows platform. This includes 5 years of expertise in BigData and Hadoop Ecosystems, as well as more than a year of experience in Azure Cloud environment.

Expertise Snapshot :

Over 8 years of extensive career experience in Data warehousing and has wide experience in implementing Data warehousing solutions to large corporate clients in various industries such as banking, financial and Healthcare services.
Over 5 years of strong experience in working on Apache Hadoop ecosystem Components like HDFS, Hive, Sqoop, Spark, Python (Pyspark), Scala (Scala Spark).
More than a year of experience in working on Azure cloud environment, and extensively worked in Azure Data Factory, Azure Databricks, Azure Data Lake Storage.
Hands on experience in Spark-scala programming with good knowledge on Spark Architecture and its in-memory Processing
Experience in consuming data from Kafka topics through scala spark.
Experience in writing custom UDF's for extending Hive functionality.
Worked on design and implemented HIVE based data warehouse solution.
Worked on Hive Ranger Policy and Security Matrix implementation.
Import & export of data in RDBMS using bulk data exchange tool Sqoop and writing declarative language HQL using Hive for OLAP activities.
Extensively worked on Functional Based programming with scala.
Excellent understanding and knowledge of NoSQL databases like Hbase
Orchestrated jobs with scheduling tool like ControlM, Oozie and Tivoli.
Written Unix Shell script for automating the spark jobs through ControlM.
Extensive experience in data analysis, good working knowledge on Data Warehousing concepts.
Expertise in Data Warehousing / ETL programing and fulfilment of data warehouse project tasks such as data extraction , cleansing , aggregating , validations , transformations and loading in DataStage , Informatica & Abinitio.
Worked on various databases like, Oracle, SQL Server, and MySQL.
Involved in version control and source code management tool GitHub
Extensively worked on all the phases of waterfall model and hosted sprint, daily standup in agile process. Well versed in JIRA process, Bitbucket and Jenkins deployment.

Technical Snopshot:

Azure Services ADLS, ADF , Databricks
Hadoop/Big Data Spark, Scala, Hive, Sqoop, Oozie, Kafka, Zookeeper, Python, YARN, HDFS, Map Reduce, Pig
ETL Tools DataStage 8.1 , Informatica 9.5, Abinitio GDG 2.15
Programming Languages Scala 2.11, Python 3.0, Shell Script (KSH), Core Java, XML, SQL
Operating System Unix, Windows
Data Bases MySQL, Oracle, HBase, SQL
Version Control Tools Git
UI Frameworks Eclipse, InteliJ
Scheduling tools Oozie. ControlM , Tivoli


Professional Experience

Johnson & Johnson, NJ Aug 2022 to Till date

DATA Engineer Technical Lead - EDL - Atom & Omega Azure Migration
Technology : Azure ADLS, ADF , Databricks, spark, shell script, ControlM, Tableau

Description

This project is for migrating EDL Atom & Omega from on-premises to Azure cloud and it aims to lift and shift the existing program with azure components like ADF, ADLS and Databricks.

Responsibilities

Collaborated with the team to drive architecture and deliver customer solutions on the Microsoft Azure platform.
Worked on Design and Implement the end-to-end solutions (storage, integration, processing, and visualization) in Azure.
Worked capacity planning and resource allocation to optimize cloud usage to meet business needs, based on the existing on-premises program data volume and CPU hours.
Worked on rebuilding the framework to pull data from various source system using Azure Data Factory.
Workflow orchestration done using azure Data Factory and Databricks workflows/jobs.
Migrated existing hive queries into spark sql to execute in Azure Databricks.
Create the framework to execute existing python codes in Azure Databricks for generating final tableau reports.
Implemented metadata storage and Access control management through Azure Databricks Unity Catalog.
Conducted performance testing and validated the application post migration to ensure the existing functionality.
Actively participated in daily scrum call to discuss progress, blockers and priorities and ensuring alignment with the goals.
Collaborated with product owners, scrum masters and team members to define sprint goals, and prioritize backlog items.
Actively participated in all agile ceremonies like scrum call , sprint planning , sprint review, sprint retrospective and grooming calls.
Contributed to various testing phases, including Unit testing , SIT and Technical regression testing.
Conducted detailed KT sessions for support team, and ensuring they are well equipped to handle systems, tools and process.
Assisted in the creation and execution of the project cutover plan and delivered Hypercare support following implementation.


Data Engineer Technical lead - TransCend - EDL Migration
Technology: Sqoop, Hive , Spark , Python, Shell-Script, ControlM, Tableau

Description

The TranSCend program aims to harmonize 40+ global medical devices ERPs into single instance of SAP S/4 HANA, modernize the code and enable a digitized end to end business. EDL aims to accommodate all the source level changes of decommissioning jet & jes ERP systems , and source data from S/4 Hana system.

Responsibilities

Worked with Product Owners , Designers, QA and other engineers in Agile development environment to deliver the timely solutions to customer as per the requirements.
Took part in several grooming calls with business team to gain an understanding of the project s functional requirements.
Worked with business team to groom the user stories to identify the EDL report logic changes corresponding to upstream data flow changes.
Converted functional requirements to technical user stories and loaded them into Jira.
Prepared Jira stories, participated in Sprint planning and managed task assignment also oversaw both onsite and offshore teams.
Creating data model for bringing data from the new data source for supporting various process.
Experienced with different data formats like Json, Parquet and compression like Snappy.
Effectively worked on optimising the Hive query.
Applied Hive queries to perform Data analysis on EDL data.
Updated the scheduling and orchestration as part f the new integration to achieve SLA.
Connected with upstream to get source data, and performed data loaded and report refreshes for all type of testing.
Prepared test plan and test cases for unit testing, SIT and Technical regression testing, also provided support for Business regression testing and UAT.
Worked on Bug fixing, Code review and version Controlling.
Supported all Consumers / Business by analysing the issues and troubleshoot them through monitoring during UAT and Hypercare.
Actively participated in all Agile ceremonies like Scrum call, Sprint Planning, Sprint Review, Sprint Retrospective and Grooming calls.
Worked on preparing Technical Design Document, Operational Run Book and Hypercare document.

Standard Chartered Bank - GBS, Chennai -India Feb 2020 to Jul 2022
Hadoop Developer-Standard Chartered TFRM ETL , TFRM ECM & ETL Automation
Technology: Pyspark, Scala Spark, Kafka, Hive , Shell-Script, ControlM

Description
TFRM ETL Code required various inputs like T1 hive table , BVD External Files and MFU business pushed files . This project aims to collect all the details and create tables / views in T3 layer . T3 layer will act as a single place to source all required data for TFRM ETL .

Responsibilities
Involved in End to End designing of framework from the scratch.
This project aims to load the data in T3 from various sources like , T1 tables , MFU files of type Excel , BVD External text files.
From T1 table we create views in T3 layer which reads filtered data as per the request , based in recon entry .
Setting up the project environment , and implementing Security Matrix for the project by creating AD Groups and User id specific to the project.
MFU and BVD files will be read from the respective source and does all pre validation such as checksum / header / column validations.
Consumed Business input data from Kafka topics and loaded into T3 layer
Worked with Python, to develop analytical jobs using PySpark API of spark for BVD load.
Using Job management scheduler ControlM to execute the workflow.
Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
Developed and implemented spark custom UDFs involving date Transformations such as date formatting and age calculations as per business requirements.
Written Programs in Spark using Scala and Python for Data quality check.
Loads the data into target T3 layer , and does count validation with source. Once validated makes entry in recon table for downstream consumption.
Involved in complete designing part , and took ownership on the development / testing and prod deployment.
Worked in Agile environment in delivering the agreed user stories within the sprint time.


Cognizant Technology Solutions, Chennai -India Jan 2014 to Dec 2019
ETL Developer Kaiser Permanente MDW
Technology: Informatica , Abinitio , Shell-Script, Tivoli , Oracle , Sqoop, Hive , Shell-Script, Tivoli

Description

Kaiser Permanente is an integrated managed care consortium, based in Oakland, California, United States, founded in 1945. It is America's leading nonprofit integrated health plan that provides care throughout seven regions in the United States. Two or three (four, in the case of California) distinct but interdependent legal entities form the Kaiser system within each region.
The Membership Data Warehouse initially developed using Abinitio , this project aims to convert those Abinitio project into Informatica Jobs.
Responsibilities

Involved in Effort estimation and tracking of the project plan for Development activities
Working closely with customer and addressing solutions for all issues.
Handling severity bridge calls and resolving production job issues on time.
Convert Abinitio code to Informatica requirements
Test the Informatica code with the requirements.
40+ applications were converted from Abinitio to Informatica.
Handling space / environment issues as both informatica and Abinitio jobs were running during the period of informatica job live
Shell scripting done for validate the outputs from both Informatica and Abinitio jobs.
Prepared test cases for validating the job conversion
Interacting with business team to understand the gaps or issues in the logic
Prepared Code review checklist for the developed jobs
Involved in migrating Oracle 10g to 11g
Created script to send periodic update on Abinitio job status through mail.
Created script to update DB volume and Unix server space availability through mail automatically
Supporting more than 40 applications and ensuring it completes on time.
Involved in OH region data/jobs decommission from MDW data warehouse
Analyzing and fixing production data issues and job run issues on time
Involved in creating various reports to clients and business people.
Scheduling and Controlling Abinitio/Informatica jobs through Tivoli Scheduling system.

Virtusa, Chennai -India Mar 2010 to Nov 2013
ETL Developer-Standard Chartered CEMSODS MEPA
Technology: DataStage , Shell-Script, ControlM , Oracle

Description
CEMSODS development for 10 Middle Asian Pacific region (MEPA) countries.

Responsibilities

Understood the Business Challenges and requirements also proposed the solution
Design ETL flow and created ETL Technical specification for coding the jobs.
Developed jobs using different stages like Transformer, Join, Lookup, funnel, Sequential file, Pivot,Db2 Connector etc.,
Involved in creating the test cases based on Functional requirements, Business requirements, Mapping specifications, Entity relationship documents
Performing data validation in ETL code, batch jobs scheduled using Control-M
Tuned transformations and jobs for Performance Enhancement
Involved in Unit Testing, Error handling, and prepared technical document
Played a key role in resolving defects during SIT and UAT.
Design and Development of DataStage parallel jobs for Dimension and fact.
Used import/export utilities to transfer data from development instance to the Production environment
Extensively worked on Shell Scripting and File Transfer Protocol.


ETL Developer-Standard Chartered CEMSODS Support
Technology: DataStage , Shell-Script, ControlM , Oracle

Description

Datastage support for CEMS-ODS countries.

Responsibilities

Automated monitoring process with shell scripting to throw mails for all job status.
Developed shell scripts to send alert mail for any environment issues like DB issue / Space issue.
Monitoring and updating performance statistics in production run.
Involved in query optimization to minimise the job execution.
Reorder/Restructure ControlM jobs to save job execution timings .
Responsible for solving day-to-day issues faced by the client to their satisfaction.
Root-cause analysis and co-ordination meetings and participates in quarterly process audits.
Performed recon validation against report vs. tables.
Preparing test cases for new country rollout UAT, and validating the functionality .
Handling all the activities related with new country rollout in production environment
Prepared Code review checklist for the production deployment jobs
Prepared production support documents which helps the new comers to understand the complete project.


ETL Developer-Standard Chartered CEMSODS
Technology: DataStage , Shell-Script, ControlM , Oracle

Description
Standard Chartered Bank, listed on the London, Hong Kong and Mumbai stock exchanges, ranks among the top 20 companies in the FTSE-100 by market capitalization. Consumer banking offers a broad range of products and services to meet the borrowing, wealth management and transaction needs of individuals. Wholesale Banking has a client-focused strategy, providing trade finance, cash management, securities services, foreign exchange and risk management, capital raising and corporate finance solutions. Our SME Banking division offers products and services to help small and medium enterprises manage the demands of a growing business, including the support of our international network and trade expertise.
By integrating disparate data sources into a single enterprise-wide Waterhouse while maintaining consistently high data quality, which tightly integrates capabilities for data modeling, data quality, and data transformation (Data Stage) into a complete, end-to-end solution. Built upon a platform of shared services including enterprise Metadata management, any-to-any connectivity, and a parallel processing framework, this solution provides the performance and scalability to address the massive data volumes and real-time/right-time delivery of data required by today's information intensive enterprises.

Responsibilities

Involved in the requirement study , design and development.
Extensively used Data stage to develop Jobs for extracting, transforming, integrating and loading data into Target tables.
Developed Parallel jobs using the stages like Funnel, Transformer, Sort, Lookup, Sequential File and Dataset etc.
Extensively created PL/SQL Procedures and packages, dynamic SQL.
Involved in SQL Query tuning
Developed Jobs in Data Stage for loading data from source to target.
Use Dynamic job parameters for file names and passwords to validate at run time.
Creation of Sequence Jobs using Job Activity, Wait for File Activity, Exception Handling, Notification Activity etc.
Used the Data Stage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on an ad hoc or scheduled basis.
Used Data Stage Manager to Import and export of repositories across projects.
Developed Jobs according to given Design documents. Prepared Unit test Plan and tested the Job. Co-coordinated with the Onsite Team during UAT.
Solved UAT defects with the specified timeline.
Responsible for solving the Technical / non technical issues, interacting with the Client.
Interacting with business team to analyze the gaps and implementing the logic
Created Control-M Scripts to trigger the jobs for production run.
Prepared Code review checklist for the developed jobs
Keywords: quality analyst user interface database sfour active directory information technology procedural language Colorado Idaho New Jersey Ohio

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];4451
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: