| Veda Anand - Data Engineer |
| [email protected] |
| Location: Bentonville, Arkansas, USA |
| Relocation: Yes |
| Visa: H4EAD |
|
NAVYA
Email: [email protected] QUALIFICATIONS Data Engineer skilled at technical leadership, communication and presentations having8 years of professional experience involving project development, implementation, deployment and maintenance using Javaand Big Data related technologies.Equipped with a solid understanding of data processing, ETL (Extract, Transform, Load) methodologies, and Big Data technologies. Adept at ensuring data accuracy, reliability, and performance. SUMMARY Over 8 years of IT experience in design, development, deploying, supporting and implementation of enterprise applications. Experienced in implementing Big Data Technologies - Hadoop ecosystem/HDFS/ Map-Reduce Framework, Spark with python (Pyspark),Sqoop, Oozie,Cassandra,Zookeeper and HIVE data warehousing tool. Hands on experience in GCP, Big Query, GCS bucket, cloud dataflow, BQ command line utilities, Data Proc and Airflow. Expertise in developing MapReduce jobs to scrub, sort, filter, join and query data. Having experience in building data pipelines using spark on google cloud platform. Skilled in development of data models for efficient data processing. Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm. In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames. Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC). Experience in creating tables, partitioning, bucketing, loading, and aggregating data using Hive. Migrating the coding from Hive to Apache Spark using Spark SQL. Managed environments DEV, QA, UAT and PROD for various releases and designed instance strategies. Good knowledge on CI/CD. Hands on experience with build automation tool like Maven and version control tools such as Git and GitHub. Great team player and team builder, highly motivated, fast learner, ability to learn new technology quickly and seamlessly manage workload to meet the deadline. Excellent global exposure to various work cultures and client interaction with diverseteams. WORK EXPERIENCE Fidelity Sr DataEngineer01/2022 to Till Date E\xperience in building and architecting multiple Data pipelines, end to end ETL and ELT processes for Data ingestion and transformation in GCP. Designed data models (star schema) for various datasets and loaded the data to BigQuery. Developed data pipelines that grab data from different data source i.e., Teradata, SQL and FTP sites and then load the data to cloud (Azure blob storage and GCS). Experience with building data pipelines in Python/Pyspark/BigQuery and building python DAG in Apache Airflow. Hands on experience in using Google cloud platform for BigQuery, cloud Dataproc and Apache airflow services. Experience in Processing bound and unbound Data from Google pub/subtopic to BigQuery using cloud Dataflow with Python. Built a Python and spark based configurable framework to connect common Data sources like MYSQL, Oracle, SQL Server, BigQuery and load it in BigQuery. Used g-cloud functions with Python to load Data in to BigQuery for on arrival csv files in GCS bucket. Created a program to download a SQL Dump from their equipment maintenance site and then load it in GCS bucket. On the other side load this SQL dump from GCS bucket to MYSQL (hosted in Google cloud SQL) and load the Data from MYSQL to BigQuery using Python, spark and Dataproc. Built data pipelines in airflow in GCP for ETL related jobs using different airflow operators. Created firewall rules to access Google Data proc from other machines. WrittenPython program for spark transformation in Dataproc. Monitoring BigQuery, Dataproc and cloud Data flow jobs for all the environments. Submitted spark jobs using Gsutil and spark submission get it executed in Dataproc cluster. Created a Python program to maintain raw file archival in GCS bucket. Analyzed various type of raw file like Json, Csv, Xml with Python using Pandasetc. Developed Map Reduce and Spark jobs to discover trends in data usage by users Implemented Spark using Python and Spark SQL for faster processing of data Implemented algorithms for real time analysis in Spark Imported data in to Spark data frames,Performed transformations andactions on data frames Sr Analyst/Software Engineer12/2018 to 12/2021 Capgemini Hyderabad, Telangana- India Developed Spark applications utilizing Data frames and Spark SQL API for faster processing of data. Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirement Data pipeline consists of Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data. Developed Spark jobs and Hive Jobs to summarize and transform data. Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data. Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames. Implemented system wide monitoring and alerts. Installed & configured Hive, Oracle BigData, Apache Spark, Sqoop, Spark SQL etc. Importing and exporting data into Hive using Sqoop. Used Bash Shell Scripting, Sqoop, Hive, HDP, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality. Responsible for developing data pipeline using Flume, Sqoop to extract the data from weblogs and store in HDFS. Worked on loading all tables from the reference source database schema through Sqoop. Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA. Collected data from different databases (i.e., Oracle, My Sql) to Hadoop. Used CA Workload Automation for workflow scheduling and monitoring. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Developed Hive scripts in Hiveto de-normalize and aggregate the data. Scheduled and executed workflows in Oozie to run various jobs. Software Engineer-Intern01/2016 to 11/2018 Computer Science Corporation of India- Hyderabad,Telangana- India Involved in Requirements Analysis and design an Object-oriented domain model. Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object. Involved in designing user screens using HTML as per user requirements. Used Spring-Hibernate integration in the back end to fetch data from Oracle and MYSQLdatabases. Used Spring Dependency Injection properties to provide loose coupling between layers. Implemented the Web Service client for the login authentication, credit reports and applicantinformation. Used Web services (SOAP) for transmission of large blocks of XML data over HTTP. Wrote test cases in JUnit for unit testing of classes. Developed application to be implemented on Windows XP. Created application using Eclipse IDE. Installed Web Logic Server for handling HTTP Request/Response. Used Subversion for version control and created automated build scripts. Assists in post-implementation and continuous-improvement efforts to enhance performanceand improve future outcomes Work on ITIL concepts like (Incident management, Change management and Problemmanagement) Incident management and problem management, join Bridge lines,provide timely updatestroubleshooting production issues, vendor engagement Work with our sustainment team to organize, plan, manage and execute new releases Document and create new knowledge base to provide the most effective solutions toapplication issues.Solutions can be new code development, defect fixes Provide documentation and make technical presentations to management TECHNICAL SKILLS Programming Languages:Java, Python, C,SQL| Big Data Technologies: Hadoop, Spark |Hadoop Ecosystem: HDFS, Map Reduce, Oozie, Hive, Sqoop, Flume, Zookeeper and HBase|Web Technologies: HTML|Databases:MySQL |IDE: Eclipse, PyCharm|Cloud: GCP,Databricks|CI/CD: Maven,Jenkins|Version control: Git,GitHub | Agile Methodologies: Scrum|Test Management Tools: JIRA| EDUCATION Bachelor s Degree JNTUH, Hyderabad, India,2017. Keywords: cprogramm continuous integration continuous deployment quality analyst information technology California Delaware |