Home

ayu sh - data analyst
[email protected]
Location: Edison, New Jersey, USA
Relocation:
Visa:
SUMMARY
5 years of experience in Data Analytics, Business Intelligence, and Machine Learning across banking, insurance, and consulting domains.
Extensive hands-on expertise with SQL, Python, Tableau, Power BI, PySpark, and Snowflake for building analytics pipelines, dashboards, and ML models.
Proficient in developing automated fraud risk scoring and negative news screening systems using Logistic Regression, Decision Trees, Selenium, and NLP techniques.
Designed and deployed interactive dashboards using Tableau and Power BI for claims tracking, hiring trends, and operational metrics, reducing manual reporting by up to 75%.
Built scalable PySpark pipelines in Databricks to process 200K+ transactions daily, improving data refresh latency by 99% and supporting near real-time decision-making.
Integrated structured data from on-prem SQL, MySQL, Oracle, and Azure Data Lake into Snowflake to streamline reporting and AML case investigation workflows.
Applied machine learning algorithms (XGBoost, Random Forest, ARIMA, Prophet) for forecasting, anomaly detection, and customer segmentation, achieving 85 90% model accuracy.
Strong experience working in Agile environments using JIRA and Confluence, collaborating with cross-functional teams on data ingestion, modeling, and reporting.
Developed severity scoring models for claim reserving using regression and NLP, contributing to $4M+ in capital optimization and improving high loss claim identification by 79%.
EXPERIENCE
Data Consultant II | Mitsubishi UFJ Financial Group | Jan 2022- May 2024
Developed an automated investigation toolkit with below modules to support case management reviews of high-risk financial clients.
Enhanced fraud detection by building an Entity Resolution system using Python and Snowflake to link duplicate and related records across disparate sources. Applied fuzzy matching and ML-driven clustering techniques, reducing data inconsistencies by ~15% based and enabling faster, more reliable investigations.
Optimized fraud detection processes and investigator throughput by implementing a risk scoring framework to prioritize high-risk customer investigation. Developed predictive models using Logistic Regression and Decision Trees, and designed risk indicators from transaction patterns, locations, and customer behavior. Enabled early identification of critical cases, reducing triage time and increasing case closures from 1 to 4 per investigator per day.
Improved adverse media screening by automating manual reviews using Selenium and PySpark. Enabled real-time retrieval from public sources and automated negative news report generation, increasing daily case throughput from 4 5 to 20 25 and reducing manual effort by over 75%.
Enabled near real-time analytics by building a centralized Azure-based data lake and a high-throughput ETL pipeline using PySpark in Databricks, Azure Data Factory, and Azure Synapse, with curated outputs delivered to Snowflake.
Analyzed data across 5+ on-prem systems using SQL and Python to define 100+ key reporting fields aligned with business KPIs, reducing manual data mapping by 60%.
Engineered a PySpark-based ingestion pipeline to process over 200,000 daily transactions, reducing data latency by 99% (from 24 hours to hourly refreshes) and improving dashboard performance for claims and transaction analytics.
Data Consultant I | State Farm Insurance | March 2020 - Dec 2021
Developed an end-to-end Tableau dashboard suited by enabling real-time visibility into Workers Compensation and Auto claims. Delivered interactive views for claim search, reserve trends, transaction history, and performance tracking. Reduced manual reporting of claim metrics, reserve changes, and adjuster performance by 70%, streamlining operational insights across business teams.
Enhanced reserve accuracy and capital efficiency by building ML models to predict medical and indemnity loss amounts for Workers Compensation claims. Developed a severity scoring framework using regression algorithms, and NLP on FNOL descriptions, improving high loss claim detection by 79% and reducing under-reserving errors by 18%, contributing to $4M+ in annual capital optimization.
Refined AML investigations and SAR effectiveness by developing machine learning models to replace static rule-based logic with dynamic risk scoring. Designed transaction-level indicators capturing jurisdictional risk, velocity anomalies, and transfer patterns, and applied ensemble algorithms like XGBoost and Random Forest, increasing true positive SAR identification by 28% and reducing false positive alerts by 35%.
Associate Data Scientist | Antim Technologies - India | March 2019 Feb 2020
Improved product relevance and user engagement for a mid-scale Indian retail client with~2M monthly active users by developing a sentiment-aware recommendation system. Combined NLP-based sentiment analysis of user reviews with a collaborative filtering engine, increased precision@5 by 12% through sentiment-derived features, leading to more personalized recommendations and higher customer satisfaction built using Python, NLTK, and scikit-learn.
Developed classification models in Python to predict online shopper conversions and identify key purchase drivers. Engineered features to improve performance using Logistic Regression, Decision Trees. Achieved 76% accuracy and enabled personalized on-site recommendations, contributing to a 12% increase in conversions among high-intent user.
SKILLS
Data Science: Inferential Statistics, Machine learning, Microsoft Azure Databricks, BIG Data stack, A/B Testing
Analytics: Python, C#, JavaScript, SQL Server, R, SQL, Tableau, PowerBI, Redshift, PySpark, Excel
Education
PACE UNIVERSITY | MS in Data Science
GUJARAT TECHNOLOGICAL UNIVERSITY | B.E. in Computer Science
Keywords: csharp machine learning business intelligence rlang microsoft

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6013
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: