Sai Lakshmi - Data Analyst |
[email protected] |
Location: Bordentown, New Jersey, USA |
Relocation: |
Visa: GC |
Name: Sai Lakshmi
Email.Id: [email protected] Ph. No: +1 (201) 609-8006 PROFESSIONAL SUMMARY: 10+ years of experience as Data Analyst with functional and industry experience with accomplished process and project responsibility such as Data analysis, design, development, user acceptance & performance management. Strong professional experience with emphasis on Analysis, design, development, testing, maintenance and implementation of Data Mapping, Data Validation, and Requirement gathering in Data warehousing Environment. Experience in data warehousing applications using ETL tools and programming languages like python, R, Java,Scala, Matlab,SQL/PLSQL, Oracle and SQL Server Database And SSIS. Experience in handling huge set of data using cloud clusters like Amazon Web Services (AWS), MS Azure,Amazon, Redshift, Hadoop and archiving the data. Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata. Experience in providing custom solutions like Eligibility criteria, Match and Basic contribution calculations for major clients using Informatica and reports using Power BI, Tableau, Looker, QlikView. Extensively used Python Libraries PySpark, Pytest, Pymongo, cx_Oracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup. Experience in Data Analysis, Data Profiling, Data Migration, Data Integration and validation of data across all the integration points. Familiarity with ETL (Extract, Transform, Load) processes, tools like Apache NiFi, Apache Airflow, or dbt (Data Build Tool) are crucial for working with data pipelines. Built predictive models using Google AutoML, H2O.ai and TPOT achieving increased revenue and improved accuracy. Familiarity with GDPR and CCPA regulations, and ensuring data privacy is maintained, is becoming increasingly important. Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS). Experienced in Big Data technologies including Apache Hadoop and Apache Spark With expertise in data extraction and exploratory analysis. Implemented Scikit-learn, PyTorch, Keras, and TensorFlow for machine learning, developing predictive models to forecast waste generation patterns and optimize resource allocation. Designed and developed weekly, monthly reports by using MS Excel Techniques (Quip, Zoho Sheet or WPSSpreadsheets. Charts, Graphs, Pivot tables) and Power point presentations Experience with Data analysis and SQL querying with database integrations and data warehousing for large financial organizations. Strong working experience in Data Cleaning, Data Warehousing and Data Massaging using PythonLibraries and MySQL. Experience in creating Ad-hoc reports, data driven subscription reports by using SQL. Expertise in Power BI, Power BI Pro, Power BI Mobile. Expert in creating and developing Power BI Dashboards. Experienced in RDBMS such as Oracle, MySQL and IBM DB2 databases. Hands on experience in complex querying writing, query optimizing in relation Databases including Oracle, T-SQL, Teradata, SQL Server and Python. Experienced in business requirements collection methods using Agile, Scrum and Waterfall methods and software development life cycle (SDLC) testing methodologies, disciplines, tasks, resources and scheduling. Extensive knowledge on Data Profiling using Informatica Developer 9.x/8.6.0/8.1.1/7. x/6.x tool. Understanding version control systems like Git is becoming important for collaborating on data-related projects, especially in team settings. TECHNICAL SKILLS: Languages SAS, SQL, Python, R, Java, Scala, Matlab BI Tools Tableau, Microsoft Power BI, PowerPivot Data Warehousing Tools Talend, SSIS, SSAS, SSRS, Toad Data Modeller Python Libraries Scikit Learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly Data Visualization Tableau, Microsoft Power BI ETL Informatica PowerCenter, SSIS Machine learning models scikit-learn and TensorFlow Microsoft Tools Microsoft Office, MS Project Database Tools SQL server, MySQL, MS Excel, PostgreSQL, SQLite, MongoDB Data Analysis Web Scraping, Statistical Modelling, Hypothesis testing, Predictive Modelling Data Mining Algorithms Decision Trees, Clustering, Random Forest, Regression CERTIFICATIONS Google Data Analytics Professional Certificate-Coursera Microsoft Certified: Power BI Data Analyst Associate PROFESSIONAL EXPERIENCE: Client: Southwest Airlines, Dallas TX April 2023 - Present Role: Data Analyst Responsibilities: Implemented and followed Agile development methodology within the cross-functional team and acted as a liaison between the business user group and the technical team. Analyzing and validating large datasets to support ad-hoc analysis, reporting, and remediation using SAS. Processing data from tools such as Snowflake and writing complex queries in SQL or SAS using complex joins, sub-queries, Table creation, Aggregation, and using concepts of DLL, DQL and DML. Perform ETL (Extract Transform Load) using tools such as Informatica, azure to integrate/transform disparate data sources. Perform data scraping, data cleaning, data analysis, and data interpretation and generate meaningful reports using Python libraries like pandas, matplotlib, etc. Adept at leveraging cloud platforms such as AWS, Azure, and Google Cloud to support scalable data storage and analytics solutions. Generate weekly reports with visualization using tools such as MS EXCEL (Pivot tables and macros), and Tableau to enable business decisions making. Performed Data analysis, statistical analysis, generated reports, listings and graphs using SAS Tools SAS/Base,SAS/Macros and SAS/Graph, SAS/SQL, SAS/Connect, SAS/Access. Generate various dashboards and created calculated fields in Tableau for data intelligence and analysis based on the business requirement. Played a pivotal role in the selection and utilization of ML frameworks including PyTorch, TensorFlow, andScikit-learn, aligning technology choices with project requirements. Applied advanced analytical techniques to solve business problems that are typically medium to large scale with impact to current and/or future business strategy. Applied innovative and scientific/quantitative analytical approaches to draw conclusions and make 'insight to action' recommendations to answer the business objective and drive the appropriate change. Translated recommendation into communication materials to effectively present to colleagues for peer review and mid-to-upper-level management. Familiarity with Spark SQL and PySpark (for Python) can be a game changer when working with large-scale data analysis. Incorporated visualization techniques to support the relevant points of the analysis and ease the understanding for less technical audiences. Used Power BI Desktop to develop data analysis multiple data sources to visualize the reports. Identified and gathered the relevant and quality data sources required to fully answer and address the problem for the recommended strategy through testing or exploratory data analysis (EDA). Transformed disparate data sources and determines the appropriate data hygiene techniques to apply. Understands and adopts emerging technology that can affect the application of scientific methodologies and/or quantitative analytical approaches to problem resolutions. Delivers analysis/findings in a manner that conveys understanding, influences mid to upper-level management, garners support for recommendations, drives business decisions, and influences business strategy. Environment: SAS, Dremio, Snowflake, Tableau, Pandas, NumPy, seaborn, SciPy, Matplotlib, PowerBI, T-SQL, MS SQL Server, MS Excel, UML, MS Visio, ETL - SSIS, SSRS, Data Modelling - Star Schema,SnowFlake Schema. Client: JME Insurance, Dallas, TX Nov 2021 Mar 2023 Role: Data Analyst Responsibilities: Collaborated with stakeholders and cross-functional teams to elicit, elaborate and capture functional and non-functional solutions. Translated business requirements to technical requirement and did data modeling as per the technical requirement. Performed analytical modeling, database design, data analysis, regression analysis, data integrity, and business analytics. Created Data Mapping between the source and the target system. Created documentation to map source and target tables columns and datatypes. Skilled in machine learning algorithms for anomaly detection and predictive analytics, leveraging frameworks like PyODand XGBoost. Experienced in conducting geographic analysis using geospatial analysis frameworks like GeoPandas and Folium. Built Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, Power BI. Strong experience in migrating other databases to Snowflake. Performing basic DML and DDL skills like writing subqueries, window function and CTE. WroteSQL Queries using insert, update, delete statements and exported data in the form of csv, xml, txt etc. Also, wrote SQL queries that included joins like inner, outer, left, right, self-join in SQL Server. Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression. Compared different WFHM DB environments and determined, resolved and documented discrepancies Involved in ETL development, creating required mappings for the data flow using SSIS. Generating various capacity planning reports (graphical) using Python packages like NumPy, matplotlib, SciPy. Designed ETL flows to implement using Informatica Power Center as per the mapping sheets provided. Optimization of data sources for route distribution analytics dashboard in PowerBI report runtime. Involved in developing UML use-case diagrams, Class diagrams, and diagrams using MS Visio. Worked on predictive analytics use-cases using Python language. Conducted A/B testing and statistical analysis using frameworks like SciPyStats and Models to evaluate the effectiveness of marketing campaigns and product features. Developed serverless data processing pipelines usingAWS Lambda functions data workflows and integrating with API Gateway for event-driven processing. Manage Departmental Reporting systems, troubleshooting daily issues, and integrating existing Access databases with numerous external data sources including (SQL, Excel, & Access). Utilized PowerBI and custom SQL features to create dashboards and identify correlation Prepared Scripts in R and Shell for Automation of administration tasks. Developed and implemented data governance frameworks and policies to ensure data quality and compliance with regulatory requirements such as GDPR and CCPA. Wrote several Teradata SQL Queries using SQL Assistant for Ad Hoc Data Pull request. Extracting the source data from Oracle tables, MS SQL Server, sequential files and excel sheets. Created Data Quality Scripts using SQL to validate successful data load and the quality of the data. Created various types of data visualizations using R and PowerBI. Performed data analysis and data profiling using complex SQL on various sources systems. Categorizing and generating a report on the multiple parameters using MS Excel, Power BI. Worked on logical and physical modeling of various data marts as well asDW/BI architecture using Teradata. Environment: Matplotlib, PowerBI, T-SQL, MS SQL Server, MS Excel, UML, MS Visio, ETL - SSIS, SSRS, Data Modelling - Star Schema, Snowflake Schema, Shell Script, Jira, Git, Teradata, Workflows. Client: Zensar Technologies,India Oct 2018- Dec 2020 Role: Data Analyst Responsibilities: Involved in requirements gathering, Analysis, Design, Development, testing production of application using SDLC - Agile/ Scrum model. Worked on the entire Data Analysis project life cycle and actively involved in all the phases including data cleaning, data extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema. Advanced knowledge, especially libraries like NumPy, pandas, scikit-learn, TensorFlow, PyTorch, and PyCaret. Also, manipulated the data for benefit related for lab related work and clinic services etc. by writingSQL queries that included joins like inner, outer, left, right, self-join in SQL Server and exported the data in the form of csv, txt, XML etc. Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression. Worked on Data Warehousing principles like Fact Tables, Dimensional Tables, Dimensional Data Modelling - Star Schema and SnowFlake Schema. Expertise in data manipulation, statistical analysis, and visualization using tools like dplyr, ggplot2, and Shiny. Created Adhoc reports to users in Tableau by connecting various data sources. Used excel sheet, flat files, CSV files to generated TableauAdhoc reports. Involved in defining the source to target data mappings, business rules, business and data definitions Worked closely with stakeholders and subject matter experts to elicit and gather business data requirements. Used Pandas, NumPy, seaborn, SciPy, Matplotlib in Python for developing various machine-learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression for data analysis. Work with business analyst groups to ascertain their database reporting needs. Created database using MongoDB, wrote several queries to extract data from database. Wrote scripts in Python for extracting data from HTML file. Worked with connecting the databases from PostgreSQL to python. Tracked Velocity, Capacity, Burn down Charts, and other metrics during iterations.Created Data flow diagrams. Using R automated a process to extract data and various document types from a website, save the documents to specified file path, and upload documents into an excel template. Performed data analysis and data profiling using SQL on various source systems including Oracle and Teradata. Utilized SSIS ETL toolset to analyze legacy data for data profiling Utilized PowerBI reporting to create, test and validate various visualization, reports (ad-hoc), dashboards, and KPI s. Designed and published visually rich and intuitively interactive PowerBI/Excel workbooks and dashboards for executive decision making. Orchestrated big data processing workflows using EMR clusters, leveraging frameworks like Apache Spark and Hadoop For distributed data processing. Developed, Streamlined CRM database and built SQL queries for data analysis of 1 million + records. Generated New Market and Investment Banking reports by using SSRS and increased the efficiency by 50%. Introduced Power BI, designed dashboards for time-based data and improved performance by 40%. Build ETL workflows for automated reporting of Investment Banking data and reduceD the workload by 40% using SSIS. Environment:Tableau, SQL server,NumPy, seaborn, SciPy, Matplotlib, Python, SDLC - gathering, Analysis, Design, Development, testing, Agile/ Scrum, Data Warehouse, MongoDB, PostgreSQL, Oracle, Teradata, Informatica - Informatica Data Explorer, and Informatica Data Quality, ETL, Data Modelling - Star Schema,SnowFlake Schema, KPI. Company: HSBC, India Nov 2016 - Sep 2018 Role: Data Analyst Responsibilities: Generated energy consumption reports using SSRS, which showed the trend over day, month and year. Performed ad-hoc analysis and data extraction to resolve 20% of the critical business issues. Well versed in Agile SCRUM development methodology, used in day-to-day work in application for Building Automation Systems (BAS) development. Weekly, monthly and Quarterly insight reporting utilizing Excel, Tableau and SQL database about pricing trend and opportunities. Streamlining and automating Excel/ Tableau dashboards for improved speed utilization through Python and SQL based solutions. familiarity withCloud-native analytics tools like AWS Quick Sight, Azure Synapse Analytics, and Google Data Studio. Designed creative dashboards, storylines for dataset of a fashion store by using Tableau features. Developed SSIS packages for extract/load/transformation of source data into a DW/BI architecture/OLAP cubes as per the functional/technical design and conforming to the data mapping/transformation rules. Developed data cleaning strategies in Excel (multilayer fuzzy match) and SQL (automated typo detection and correction) to organize alternative datasets daily to produce consistent and high-quality reports. Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into the database. Involved in Data Analysis and Data Validation by extracting records from multiple databases using SQL in Oracle SQL Developer tool. Understanding SQL-based querying engines for big data platforms like Apache Hive, Impala, and Presto is also becoming important. Identified the data source and defining them to build the data source views. Involved in designing the ETL specification documents like the Mapping document (source to target). Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse. Created Stored Procedures and executed the stored procedure manually before calling it in the SSIS package creation process. Written SQL test scripts to validate data for different test cases and test scenarios Created SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets. Performed data manipulation - inserting, updating, and deleting data from data sets Developed various stored procedures for the data retrieval from the database and generating different types of reports using SQL reporting services (SSRS). Environment: Windows, SDLC-Agile/Scrum, SQL Server, SSIS, SSAS, SSRS, ETL, PL/SQL, Tableau, Excel, CSV Files, Text Files, OLAP, Data Warehouse, SQL - join, inner join, outer join, and self-joins. Client:Sagar soft Pvt Limited, India Feb 2013 - Oct 2016 Role: Data Analyst Responsibilities: Evaluated new applications and identified system requirements. Visualized KPI metrics like resource utilization, net profit margin, gross profit margin and burn rate using Tableau. Worked on time series analysis using Pandas to identify patterns on how asset variable changes which in turn helped project completion by 70%. Conducted data extraction, transformation, and loading (ETL) processes using tools like Apache NiFi and Talend to ingest healthcare data from disparate sources. Designed and implemented data models using tools like Erwin and SQL Server Management Studio to ensure efficient storage and retrieval of banking data. Recommended solutions to increase revenue reduce expense; maximize operational efficiency, quality, compliance, etc. Identified business requirements and analytical needs from potential data sources. Performed SQL validation to verify the data extracts integrity and record counts in the database tables Worked with ETL developers for testing, mapping data and aware of data models to translate and migrate data. Created Requirements Traceability Matrix (RTMs) using Rational Requisite Pro to ensure complete requirements coverage with reference to low level design document and test cases. Assisted the Project Manager to develop both high-level and detailed application architecture to meet user requests and business needs. Also, assisted on project expectations and in evaluating the impact of changes on the project plans accordingly and conducted project related presentations and in performing Risk Assessment, Management and Mitigation. Collaborated with different teams to analyze, investigate and diagnose root cause of problems and publish root cause analysis report (RCA). Achieved in using advanced SQL queries and analytic functions for date calculations, cumulative distribution and NTILE calculations. Used advanced Excel formulas and functions like Pivot Tables, Lookup, If with and/index, match for data cleaning. Environment:SQL, ETL, Mapping data,Tableau, NTILE, RCA, RTMs,Pivot Tables, KPI metrics. Keywords: artificial intelligence machine learning business intelligence database active directory rlang information technology microsoft procedural language Idaho Texas |