Home

Ruchitha Thota - Data Analyst
[email protected]
Location: Cary, North Carolina, USA
Relocation: Open to relocate
Visa: OPT EAD
RUCHITHA THOTA
Data Analyst | Data Scientist
414-241-7475 github.com/Ruchitha52
[email protected] linkedin.com/in/ruchithathota

Professional Summary
Data Analyst with 4+ years of experience transforming complex data into actionable insights across healthcare, finance, and technology. Skilled in SQL, Python, Power BI, and Azure for developing predictive models, automating analytics workflows, and optimizing data pipelines. Experienced in building ETL processes, data models, and ML/AI systems integrating LLMs, RAG, and Generative AI for analytics automation. Proven record improving forecasting accuracy and reporting efficiency by 20 55% through process automation and AI-driven analytics. Strong communicator adept at translating technical findings for executives and mentoring teams in Agile, cloud-based environments.
Technical Skills
Programming & Analytics: Python (Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch), SQL (PostgreSQL, SQL Server, MySQL), R, Excel (Advanced Formulas, Power Query, VBA), DAX.
Machine Learning & AI: Predictive Modeling, Forecasting, Gradient Boosting, Neural Networks, NLP, Explainable AI (SHAP, LIME), A/B Testing, Model Evaluation.
Generative AI & LLMs: Retrieval-Augmented Generation (RAG), LangChain, Azure OpenAI, Prompt Engineering, LLMOps, Generative AI System Design.
Visualization & BI: Power BI (DAX, Star Schema, RLS), Tableau, Google Data Studio, Data Storytelling, KPI Dashboards, Business Intelligence Reporting.
Cloud & Data Engineering: Azure (Data Factory, Synapse, Blob Storage), AWS (S3, Lambda, RDS), GCP (BigQuery), Snowflake, Databricks, Data Warehousing, Data Lake Design.
DevOps & Automation: Git, Docker, CI/CD (Azure DevOps), Airflow, ETL Pipeline Development, API Integration, Data Quality Assurance.
Frameworks & Methods: Agile, CRISP-DM, MLOps, Lean Six Sigma, Data Governance, Feature Store Management, Cross-Functional Collaboration.
Soft Skills: Communication, Stakeholder Management, Data Storytelling, Analytical Thinking, Mentorship, Problem Solving.
Professional Experience
Title: Data Analyst Jul 2025 to Present
Client: Cardinal Health - Navista, Data & Advanced Analytics Team, NJ, United States
Responsibilities:
Designed automated Power BI dashboards for revenue cycle and performance KPIs, reducing manual reporting time by 55% and achieving 90% executive adoption within 3 months.
Built advanced SQL data models integrating financial, clinical, and operational data using Azure Data Factory, Snowflake, and BigQuery, supporting 15+ oncology practices.

Applied predictive analytics (regression, clustering, anomaly detection) to forecast patient volume and revenue metrics, improving forecast accuracy by 21% YoY.
Engineered production-ready RAG pipeline using Azure OpenAI and LangChain to automate clinical data extraction from unstructured oncology notes, reducing manual chart review by 40%.
Implemented data quality framework with Great Expectations and Azure Data Factory, achieving 99.5% data accuracy across 50+ clinical and financial data points.
Led bi-weekly sprint reviews with 15+ oncology clinic stakeholders, translating business requirements into 25+ user stories with 95% on-time delivery.
Designed medallion architecture (Bronze/Silver/Gold) in Snowflake, integrating structured EMR data and semi-structured notes, improving query performance by 60%.
Automated root-cause analysis workflows with Python and Azure pipelines, cutting anomaly investigation time from 4 hours to 30 minutes per incident.
Built RESTful APIs using FastAPI to expose predictive model scores, enabling real-time risk scoring for 10,000+ monthly patient encounters with <100ms latency.
Facilitated requirements discovery between finance, clinical operations, and IT teams to align on 20+ KPIs, reducing reporting conflicts by 65%.
Tools: Power BI, SQL, Python, Azure Data Factory, Snowflake, BigQuery, LangChain, Azure OpenAI, FastAPI, Great Expectations, Agile, MLflow

Title: Data & AI Intern Jan 2025 to Jun 2025 Client: Brillient, Detroit, MI, United States
Responsibilities:
Supported data analytics modernization projects for sales and finance operations.
Optimized SQL data models and queries, improving runtime from 9.2s to 6s and enabling same-day decisions across analytics teams through schema normalization and query refactoring.
Designed and deployed Power BI dashboards increasing executive adoption from 40% to 90%, reducing manual reporting by 40%, and improving time-to-insight for pricing and campaign analysis.
Built and validated machine learning models using Python and Scikit-learn, achieving 28% F1-score improvement in churn prediction and reducing false negatives by 22%.
Supported data integration and ETL pipelines from Salesforce, HubSpot, and internal SQL systems streamlining cross departmental reporting and improving accuracy by 35%.
Enhanced KPI alignment by collaborating with Sales, Marketing, and Finance teams, reducing reporting turnaround by 30% through automation and visualization enhancements.
Applied feature scaling, hyperparameter tuning, and cross-validation to optimize classification models for customer segmentation and retention strategies.
Developed scripts for data quality checks and validation, increasing accuracy and completeness of analytical datasets.

Tools: Python, SQL, Power BI, Scikit-learn, Pandas, DAX, AWS, ETL, Agile
Title: Data Analyst Intern Oct 2024 to Dec 2024
Client: Food FIXR, Milwaukee, WI, United States
Responsibilities:

Delivered analytics dashboards and automation for sales and operations teams.
Designed ETL pipelines reducing data errors from 14% 4% through Python automation and SQL validation.
Built interactive Power BI dashboards tracking sales, forecasts, and inventory KPIs, improving executive decision speed and forecast reliability by 18%.
Conducted customer segmentation and seasonality analysis using clustering and time-series forecasting, increasing order fulfilment accuracy by 15%.
Enhanced SQL-based workflows for weekly reporting, reducing failed report executions by 65% and improving data refresh stability.
Implemented data models for revenue and cost forecasting, leveraging statistical methods for trend detection and scenario planning.
Collaborated with marketing and product teams to operationalize insights into pricing and promotional strategies.
Tools: SQL, Python, Power BI, Tableau, Forecasting, AWS, Google Analytics
Title: Graduate Student Assistant May 2024 to Sep 2024
Concordia University Wisconsin, Mequon, WI, United States
Responsibilities:
Supported AI research in privacy-preserving data analytics and federated learning.
Developed a federated learning system across two institutions, improving model accuracy from 79% to 87% while preserving student data privacy.
Implemented differential privacy and anonymization techniques, retaining 96% model performance while ensuring FERPA compliance.
Standardized and cleaned datasets using Python preprocessing and SQL normalization, improving reporting accuracy by 22% and reducing prep time by 75%.
Created student-risk dashboards and analytics workflows to identify at-risk students, enabling proactive intervention strategies. Collaborated with IT and academic teams to automate data ingestion and monthly retraining pipelines.
Tools: Python, SQL, PySyft, TensorFlow, Power BI, Data Privacy, Federated Learning
Title: Data Visualization Analyst May 2021 to Feb 2024
Wipro Limited, Hyderabad, India
Responsibilities:
Developed enterprise dashboards and analytics for global IT operations.
Automated SQL-driven SLA and operational reporting across multiple business units, reducing report failure rates from 11% to 2%.
Conducted data mining and segmentation on 180K+ workload records, reducing migration errors by 20% through clustering and pattern detection.
Improved ServiceNow dashboard performance by optimizing queries and visualization logic, enhancing response times and SLA adherence.
Collaborated with technical leads to enhance backend APIs and UI components, increasing user satisfaction and operational visibility.
Streamlined incident management workflows, reducing average resolution time by 32% through improved triage and escalation automation.
Provided cross-functional support between IT, engineering, and business teams to ensure KPI alignment and reporting consistency.

Tools: SQL, Power BI, Tableau, ServiceNow, JavaScript, SLA Reporting, Agile
Projects
AI-Powered Talent Sourcing & Ranking System | Python, Hugging Face Transformers, Unsloth, LoRA, Pandas
Built an ML pipeline to automate candidate screening for "Human Resources Manager" roles, processing 50+ profiles and reducing manual review time by 80%.
Fine-tuned three LLMs (Gemma 3 4B, Mistral-7B, Qwen3-4B) using 4-bit quantization + LoRA via Unsloth, achieving GPU-efficient inference on T4 hardware.
Designed prompt engineering strategies instructing LLMs to act as expert recruiters, generating normalized 1-10 fitness scores based on semantic profile matching.
Implemented search-term pre-filtering and batch scoring, extracting numerical scores (0-10) with deterministic generation for consistent, unbiased candidate ranking.
Ranked senior HR professionals (Directors, Managers) at 8-9, while aspirational candidates scored 3-7, demonstrating deep role understanding beyond keyword matching.
Tools: Python, Hugging Face Transformers, Unsloth, LoRA (QLoRA), 4-bit Quantization, Gemma 3 (4B), Mistral-7B, Qwen3-4B, T4 GPU, Pandas, Prompt Engineering, Batch Processing, Semantic Matching

Term Deposit Subscription Prediction
Developed a two-phase machine learning system for a European bank, designing a pre-call classifier to filter out 9,426 unlikely subscribers (saving an estimated 785+ call center hours) and a post-call classifier to identify 1,221 high-potential converters from initial "No" responses.
Engineered and evaluated multiple boosting models (XGBoost, LightGBM, AdaBoost) to optimize for business-specific metrics (recall for resource savings, AUC for conversion potential), surpassing the project's 81% accuracy benchmark.
Performed PCA-based dimensionality reduction and KMeans clustering (optimal K=7, Silhouette Score=0.59) to segment customers into 7 distinct personas (e.g., "High Value," "Resource Drain"), generating actionable marketing strategies for each segment.
Conducted extensive exploratory data analysis (EDA) to identify key drivers of subscription, such as call duration, seasonality (March/October), and customer financial history (housing/loan status).
Implemented a complete, end-to-end classification pipeline using Scikit-Learn, XGBoost, and LightGBM, including data preprocessing, outlier treatment, feature engineering, SMOTE for class balancing, and threshold tuning.
Delivered business recommendations based on feature importance analysis (identifying age, balance, total_contact_time as top predictors) and cluster profiling, enabling targeted marketing campaigns to maximize ROI.
Tools: Python, Scikit-Learn, XGBoost, LightGBM, AdaBoost, Pandas, NumPy, Matplotlib, Seaborn, SMOTE, PCA, KMeans, GridSearchCV, Optuna, Feature Engineering, Threshold Optimization, Cross-Validation


Customer Churn Prediction Model | Independent Project | Nov 2024
Designed and deployed a machine learning model to classify 45,200 customers by churn risk, enabling early retention strategies that reduced projected churn by 8.7%.
Utilized Python (Scikit-learn, Pandas, NumPy) for feature engineering, model training, and validation; achieved an F1-score improvement of 28% after hyperparameter tuning.
Applied SHAP explainability workflows to interpret feature importance and communicate actionable retention insights to marketing and finance teams.
Developed a lightweight Flask API for real-time prediction and dashboard integration with Power BI, enabling automated reporting and CRM flagging.
Leveraged AWS S3 & Lambda for scalable data storage and model invocation, supporting 7 churn-segment dashboards within 60 days.
Tools: Python, Scikit-learn, Flask, Power BI, SHAP, AWS, Data Visualization, Predictive Analytics, A/B Testing

Healthcare Diabetes Prediction | Capstone Project | May 2025
Built an end-to-end predictive system to identify high-risk patients from anonymized EHR data, improving recall from 0.64
0.83 and reducing false negatives by 31%.
Implemented Gradient Boosting Classifier with SMOTE for class balancing, achieving 97.15% accuracy, 95% precision, and F1 = 0.81 on test data.
Integrated Azure Machine Learning Studio for experiment tracking, model registry, and deployment automation.
Enabled explainability and fairness monitoring via SHAP and partial-dependence plots, supporting clinical interpretability.
Collaborated with healthcare domain experts to validate model thresholds and integrate findings into Power BI dashboards for clinician use.
Tools: Python, Azure ML Studio, TensorFlow, Power BI, Gradient Boosting, SHAP, Data Privacy, Model Deployment

Generative AI Financial Summarization Assistant | Prototype | Jan 2026
Developed a proof-of-concept Retrieval-Augmented Generation (RAG) assistant using LangChain and Azure OpenAI to summarize financial reports and highlight P&L variances.
Implemented prompt-optimization and embedding indexing to improve context retrieval accuracy by 34%.
Integrated data ingestion from Azure Blob Storage and Snowflake to dynamically update response contexts.
Delivered automated variance analysis explanations to finance stakeholders via a FastAPI microservice, demonstrating early integration of GenAI in analytics workflows.
Deployed insights through FastAPI microservice integrated with Snowflake.
Tools: LangChain, Azure OpenAI, RAG, Python, FastAPI, Azure Data Factory, Generative AI
Certifications
AWS Certified Cloud Practitioner Amazon Web Services
Microsoft Certified: Data Analyst Associate (Power BI)
Microsoft Certified: Azure AI Engineer Associate
TensorFlow Developer Certificate
Google Data Analytics Professional Certificate
HackerRank SQL (Intermediate)
Databricks Certified Data Analytics
Education
Master of Science Computer Science Concordia University Wisconsin
.
Keywords: continuous integration continuous deployment artificial intelligence machine learning user interface business intelligence sthree rlang information technology Michigan New Jersey Wisconsin

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7311
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: