Resume View

Home

Veda Anand - Sr Big Data Engineer Cloudera, HWX, MapR, AWS, Azure and GCP Consultant

Location: Whitmore Lake, Michigan, USA

Relocation: No

Visa: H1B

Resume file: Umakanth_resume (2)_1763044772734.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

Uma KanthMachapur
Email: [email protected]
LinkedIn: https://www.linkedin.com/in/uma-goud-5301ab191/

Professional Summary:
Over 13 years of experience in design, development, implementation of Software applications
and BI/DWH solutions. Experience in data discovery and advance analytics and building
business solutions with knowledge in developing strategic ideas for deploying Big Data solutions
in both cloud and on-premise environments, to efficiently solve Big Data processing
requirements.
Build Advanced Analytics Applications on different eco systems Cloudera, HWX, GCP,
Snow Flake and AWS.
Strong Understanding in distributed systems, RDBMS, large-scale & small-scale non-
relational data stores, map-reduce systems, database performance, data modelling, and
nifimulti-terabyte data warehouses.
Extensively used Hadoop open-source tools like Hive, HBase, Sqoop, Spark for ETL on
Hadoop Cluster.
Detail-oriented Data Analyst with a strong analytical background and a proven track record
of transforming data into actionable insights.
Proficient in ETL (Extract, Transform, Load) processes, Master Data Management (MDM),
Data Security, and Data Governance.
Proficient in analysing and interpreting complex data sets to identify trends, patterns, and
actionable insights.
Skilled in using statistical and data visualization tools to communicate findings effectively.
Worked with several data Integrating and Replication tools like Atunity Replicate etc.
Strong knowledge on system development lifecycles and project management on BI
implementations.
Extensively used RDBMS like Oracle and SQL Server for developing different applications.
Build several Data Lakes on top of S3, HDFS to help different clients to perform their
advance analysis on big data.
Work with Data science team to provide and feed data for AI, ML and Deep learning projects
Real-time experience in Hadoop Distributed files system, Hadoop framework and Parallel
processing implementation (AWS EMR,Cloudera) with hands on experience in HDFS, Map
Reduce, Pig/Hive, HBase, Yarn, Sqoop, Spark, Pyspark, RDBMS, Linux/Unix shell scripting
and Linux internals.
Experience in writing UDF s and map reduce programs in java for Hive and Pig.
Created Kafka data pipelines to produce and consumer applications for log stream data.
Experience in Data visualization tools like tableau and looker.
Experience in creating scripts and Macros using Microsoft Visual Studios to automate tasks.
Strong expertise in Master Data Management, ensuring data accuracy, consistency, and
reliability across the organization.
Capable of designing and implementing MDM solutions to maintain a single source of truth
for critical business data.

Knowledgeable in data security best practices, ensuring the confidentiality, integrity, and
availability of sensitive information.
Proficient in implementing data security measures, including encryption, access controls, and
data classification.
Well-versed in establishing and maintaining data governance frameworks to manage data
assets effectively.
Skilled in defining data policies, standards, and processes to ensure data quality, compliance,
and accountability.
Other Experiences:
Have experience working with web designer tools like Adobe Dreamweaver CC,
WordPress& Joomla.
Proficient in Manual, Functional and Automation testing.
Also experienced in Smoke, Integration, Regression, Functional, Front End and Back End
Testing.
Capable in developing/writing Test Plans, Test Cases, and Test Scripts based on User
Requirements, and SAD documentation.
Highly experienced in writing test cases and executing in HP Interactive Testing Tools:
Quality Centre, Quick Test Professional (QTP).
Technical Skills:
Reporting Tools: Tableau and Looker
Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and
HBase, CAWA, Spark, spark-sql, Impala, Mapr-DB, Azure, Oracle Big Data Discovery,
Kafka, Nifi
Hadoop Ecosystems: MapR, Cloudera, AWS EMR, Horton Works.
Cloud Platforms:AWS, GCP, Azure
Servers: Application Servers (WAS, Tomcat), Web Servers (IIS6, 7, IHS).
Operating Systems: Windows 2003 Enterprise Server, XP, 2000, UNIX, Red Hat Enterprise
Linux Server release 6.7
Databases: SQL Server 2005, SQL 2008, Oracle 9i/10g, DB2, MS Access2003, Teradata,
postgresSQL
Languages:Python, Bash, SQL, XML, JSP/Servlets, Struts, spring, HTML, PHP, JavaScript,
jQuery, Web services, Scala.
Data Modelling: Star-Schema and Snowflake-schema.
ETL Tools: Knowledge on Informatica & IBM Data stage 8.1,SSIS

Education:
Title of the Degree College/University Year of
Passing

Master of Information
Technology & Management
Studies

University Of Ballarat
Vic, Australia

2013

Bachelor Of Information
Technology

University Of Ballarat
Vic, Australia

2011

Board of Intermediate Education Narayana Jr. College
Telengana, India

2008

Board of Secondary Education St.Ann s Grammar High

School
Malkajgiri, Hyd, India

2006

Work Experience:
iTech-Go, Clarkston, MI Apr 2022 Till Date
Client: PaloAlto Networks
SrData Engineer
Responsibilities:
Designed and implemented robust data architectures on Google Cloud Platform (GCP),
incorporating industry best practices and ensuring scalability and performance.
Conducted data modeling and schema design for efficient storage and retrieval, optimizing
BigQuery performance for complex queries.
Leveraged BigQuery to process and analyze large datasets efficiently, optimizing query
performance and reducing costs.
Implemented partitioning and clustering strategies to enhance BigQuery query efficiency and
reduce data processing time.
Led data ingestion projects in BigQuery and Databricks, incorporating API, Gsheets, file,
RDBMS, and SFTP sources.
Developed and maintained yearly and quarterly reports on BigQuery, contributing to the
creation of intricate executive dashboards.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring,
troubleshooting, manage and review data backups, and manade and review Hadoop log files.
Contributed to various agile projects, including Smart Recruiters pipeline, HR Dashboards,
Tableau Automation through ServiceNow, IT360, SR Dashboards, and Internal Mobility,
leveraging Databricks for enhanced data processing.
Acquired extensive knowledge in the HR domain, enhancing recruiting capabilities.
Loading data from SAP to Hadoop environment using Sqoop.
Responsible for Hadoop cluster monitoring using the tools like Nagios, Ganglia and ambari.
Monitor Red Hat Linux servers health(Master, Edge, worker nodes) in Zabbix and
troubleshoot/resolve reported issues.
Plan, schedule and apply patches on Linux servers to mitigate security vulner abilities.
Install, configure, upgrade and support RedHat Enterprise Linux servers and packages in
VMware vSphere/ESX environment.
Utilized GCP technologies, including BigQuery, Kubernetes Engine, GCP buckets, and
Cloud Functions, alongside Databricks for advanced data processing and analytics.
Built and orchestrated hundreds of pipelines on Airflow and Databricks, ensuring near real-
time availability of data and dashboard reporting.
Designed and exposed APIs in Java Spring Boot and Flask for ServiceNow Tableau access
requests, facilitating data processing into the Tableau system.

Proficient in Python, SQL, Spark, and Databricks for data engineering tasks.
Implemented debugging and monitoring solutions using Airflow, Datadog, Grafana, Kibana,
Google Cloud Watch, and notifications via Datadog for Slack and emails.
Played a pivotal role in designing and planning new solutions for data pipelines, ensuring
seamless communication with Business, Data Analysts, Business Intelligence, Technical
Directors, and Product teams.
Provided production job support, addressing issues, and enhancing features in an agile
environment.
Demonstrated expertise in building and enhancing data pipelines, aligning with business
requirements.
Engineered scalable systems that effectively meet project requirements, guaranteeing
efficient data processing and handling.
Design and implement multiple ETL solutions with more than 50 data sources by extensive
SQL scripting, ETL tools, Python, shell scripting, and scheduling tools, including Databricks.
Wrote scripts in BQ SQL and Spark for creating complex tables with high-performance
metrics like partitioning, clustering, and skewing.
Worked with Google Data Catalogue, Databricks, and other Google Cloud APIs for
monitoring, query, and HR-related analysis for BigQuery and Databricks usage.
Created BigQuery authorized views for row-level security or exposing the data to other
teams.

Cross Sense Analytics, Farmington Hills, MI Aug 2021 Mar 2022
Client: State of Ohio(Bureau of Worker s Compensation)
Sr Data Engineer
Responsibilities:
Responsible for creating Technical Design documents, Source to Target mapping documents
and Test Case documents to reflect the ELT process.
Extracted data from various source systems like Oracle, Sql Server and flat files as per the
requirements.
Installed and configured Hadoop MapReduce, Developed multiple Mapreduce jobs in Java
for data clearing and preprocessing
Writing scripts for data cleansing, data validation, data transformation for the data coming
from different source systems.
Worked on Hadoop cluster and data querying tools to store and retrieve data from the stored
databases.
Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond
accordingly to any warning or failure conditions.
Environment : Hadoop, Hive, Java,
Worked on processing the data and testing using Spark SQL and on real-time processing by
Spark Streaming and Kafka using Python.
Perform assorted Unix/Linux administration tasks, including daily audits on Linux, Solaris,
VMware ESXi hosts, SAN and NAS devices, and resolve reported errors.
Adding in preparations for controlled changes to be introduced over each weekend into the
Linux Solaris and storage environment.

Scripted using Python and PowerShell for setting up baselines, branching, merging, and
automation processes across the process using GIT.
Worked with different file formats like Parquet files and also Impala using PySpark for
accessing the data and performed Spark Streaming with RDDs and Data Frames.
Worked on Data Integration for extracting, transforming, and loading processes for the
designed packages.
Designed and deployed automated ETL workflows using AWS lambda, organized and
cleansed the data in S3 buckets using AWS Glue, and processed the data using Amazon
Redshift.
Used Informatica admin tools to manage logs, user permissions, and domain reports.
Generate and upload node diagnostics. Monitor Data Integration Service jobs and
applications. Domain objects include application services, nodes, grids, folders, database
connections, operating system profiles etc

GreenByte Technologies, Hyderabad, India Mar 2017-Jun2021
SrBig Data Developer
Responsibilities:
Helped client to understand performance issues on the cluster by analysing the Clouderastats.
Designed and implemented Optum Data Extracts and HCG Grouper Extracts on AWS.
Improved memory and time performances for several existing pipelines.
Developed data ingestion modules (both real time and batch data load) to data into various
layers in S3, Redshift and Snowflake using AWS Kinesis, AWS Glue, AWS
Lambda and AWS Step Functions
Perform Data Cleaning, features scaling, features engineering using pandas and numpy
packages in python.
Used Bash Shell Scripting, Sqoop, AVRO, Hive, Impala, HDP, Pig, Python, Map/Reduce
daily to develop ETL, batch processing, and data storage functionality.
Build pipelines using spark, sparksql, hive, hbase tools and build pipelines using AWS
airflow and exploring the power of distributed computing on AWS EMR
Loaded processed data into different consumption points like Apache solr, Hbase, at scale
cubes for visualization and search.
Automated the workflow using Talend Big Data.
Scheduled jobs using Autosys.
Experienced in managing and reviewing Hadoop log files.
Involved in moving all log files generated from various sources to HDFS for further
processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and
unstructured data from relational databases into HDFS using Sqoop imports.
Developed Sqoop scripts to import export data from relational sources and handled
incremental loading on the customer, transaction data by date.
Environment: AWS services, AWS S3, AWS Glue, Lambda, Oracle SQl, Cloudera, Spark,
Python, SQL,Talend workload automation, Jenkins, Git, postgresSQL

The Australian health system, Melbourne, Australia Mar 2015 Jan 2017
Sr Big Data Developer/ Digital Transformation (Cloudera)
Responsibilities:
Designed and implemented data integration solutions to extract, transform, and load (ETL) Epic
Electronic Health Record (EHR) data into data warehouses, enabling comprehensive reporting
and analytics.
Developed data models to ensure accurate representation of clinical and operational data from Epic
Systems, facilitating a better understanding of patient care and hospital performance.
Established robust data quality checks and validation procedures to ensure the integrity and
accuracy of clinical and operational data transferred from Epic Systems to Foundry.
Ensured the secure handling of sensitive patient data by implementing data encryption, access
controls, and adherence to healthcare compliance standards, such as HIPAA.
Tuned ETL workflows to improve data processing efficiency and performance, reducing data
latency and ensuring timely access to healthcare data for analytics.
Established and maintained data warehousing infrastructure specifically tailored to Epic EHR data,
optimizing data storage and retrieval for reporting and analysis.
Built and maintained data extraction processes from Epic Systems, including Clarity, Caboodle,
Chronicles, and other Epic modules, ensuring data accuracy and consistency.
Automated ETL workflows using tools such as Informatica, Talend, or custom scripts,
streamlining data processing from Epic sources to the data warehouse.
Implemented data quality checks and validation processes to ensure the accuracy and integrity of
clinical and operational data from Epic Systems.
Designed and developed Epic-specific reports and dashboards for clinical and administrative teams
using BI tools like Tableau, Power BI, or Cognos.
Tuned ETL processes and data warehouse structures to enhance query performance, reducing
report generation time and improving user experience.
Implemented data governance policies, including data lineage, data dictionary, and data access
controls, to maintain data consistency and ensure compliance with healthcare regulations.
Leveraged advanced analytics and machine learning techniques to extract insights from Epic data,
aiding in clinical decision support, patient outcomes analysis, and operational improvement.
Ensured the security and privacy of patient data by implementing robust data encryption, access
controls, and compliance with HIPAA regulations.
Worked closely with healthcare professionals and clinicians to understand their reporting and
analytics needs, translating them into actionable data solutions.
Successfully managed data migration and transformation during Epic EHR system upgrades,
ensuring continuity of data access and reporting capabilities.
Created comprehensive documentation and conducted training sessions for end-users and IT staff
on the use of Epic data and BI tools.
Provided technical support and troubleshooting for Epic-related data issues and assisted in problem
resolution, ensuring minimal disruptions to clinical operations.
Implemented data transformation processes to standardize and cleanse Epic data, making it ready
for analysis and reporting within the Foundry environment.
Successfully managed data migration and transformation during upgrades to Epic EHR and
Foundry data platform, maintaining data accessibility and reporting capabilities.

Telstra, Melbourne, Australia Dec 2012 Jan 2015
Sr Big Data Advance Analytics Consultant
Responsibilities:
Worked collaboratively with MapR vendor and client to manage and build out of large data
clusters.
Helped design big data clusters and administered them.
Worked both independently and as an integral part of the development team.
Communicated all issues and participated in weekly strategy meetings.
Administered back-end services and databases in the virtual environment.
Did several benchmark tests on Hadoop sql engines (Hive, Spark-sql, Impala) and on
different data formats Avro, sequence, Parquet using different compression codecs like Gzip,
snappy etc.
Worked on sentiment analysis and structured content programs for creating text analytics
app.
Created and Implemented applications on Oracle Big Data Discovery for Data visualization,
Dashboard and Reports.
Collected data from different databases (i.e. Oracle, My Sql) to Hadoop. Used CA Workload
Automation for workflow scheduling and monitoring. .
Worked on Designing and Developing ETL Workflows using Java for processing data in
MapRFS/Hbase using Oozie.
Experienced in managing and reviewing Hadoop log files. Involved in moving all log files
generated from various sources to HDFS for further processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and
unstructured data from relational databases into HDFS using Sqoop imports.
Developed Sqoop scripts to import export data from relational sources Teradata and handled
incremental loading on the customer, transaction data by date.
Developed simple and complex MapReduce programs in Java for Data Analysis on different
data formats.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression
mechanisms.
Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of
the scripts. Worked on Data Serialization formats for converting Complex objects into
sequence bits by using AVRO, PARQUET, JSON, CSV formats.
Responsible for analysing and cleansing raw data by performing Hive queries and running
Pig scripts on data. Created Hive tables, loaded data and wrote Hive queries that run within
the map.
Environment: MapR eco system, ODI, Oracle Endeca, Oracle BigData Discovery, CA
workload automation
Origin, Melbourne, Australia Jan 2011 Oct 2011
Java Developer (Contract)

Responsibilities:
Designed and developed Web Services using Java/J2EE in WebLogic environment.
Developed web pages using Java Servlet, JSP, CSS, Java Script, DHTML, HTML5, and
HTML. Added extensive Struts validation.
Involve in the Analysis, Design, and Development and testing of business requirements.
Developed business logic in JAVA/J2EE technology.
Implemented business logic and generated WSDL for those web services using SOAP.
Worked on Developing JSP pages
Implemented Struts Framework
Developed Business Logic using Java/J2EE
Modified Stored Procedures in MYSQL Database.
Developed the application using Spring Web MVC framework.
Worked with Spring Configuration files to add new content to the website.
Worked on the Spring DAO module and ORM using Hibernate. Used Hibernate Template
and HibernateDaoSupport for Spring-Hibernate Communication.
Configured Association Mappings such as one-one and one-many in Hibernate
Worked with JavaScript calls as the Search is triggered through JS calls when a Search key is
entered in the Search window
Worked on analyzing other Search engines to make use of best practices.
Collaborated with the Business team to fix defects.
Worked on XML, XSL and XHTML files.
Interacted with project management to understand, learn and to perform analysis of the
Search Techniques.
Used Ivy for dependency management.
Keywords: artificial intelligence machine learning javascript business intelligence sthree database information technology golang hewlett packard microsoft mississippi California Michigan

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];6411

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: