Home

harshitha - Gen ai engineer
[email protected]
Location: Remote, Remote, USA
Relocation:
Visa:
Sai Harshitha Padala
[email protected] (+1 9803526677) LINKEDIN

Professional Summary

Senior Data Scientist and AI Engineer with 9+ years of experience designing and deploying GenAI and data-driven systems across healthcare, payer, and enterprise finance domains
Certified Oracle Cloud Infrastructure 2025 Certified Generative AI Professional LINK
Hands-on expertise in fine-tuning Large Language Models (GPT, LLaMA, Falcon) and building Retrieval-Augmented Generation (RAG) pipelines, vector databases (FAISS, Pinecone), and LangGraph-based multi-agent systems for financial and operational intelligence.
Strong background in software engineering, developing modular APIs, optimizing model performance, and deploying containerized AI solutions with Docker, Kubernetes, and CI/CD.
Proficient in SQL, PL/SQL, and data modeling for structured and unstructured data integration across GCP and AWS, Experienced in developing LLM-powered copilots, prompt-tuned agents, and explainable GenAI workflows for audit automation, reporting
Experienced in architecting GraphRAG, metadata-aware retrieval, and multi-hop reasoning pipelines to improve grounding, financial traceability, and enterprise search relevance.
Skilled in designing LLM evaluation suites using Ragas, custom scoring, and prompt analytics to measure accuracy, hallucination rate, and financial-document consistency.
Hands-on experience building enterprise copilots for claims, audits, financial summaries, and operational forecasting using LangGraph, tool-calling, and domain-specific prompt flows.
Strong expertise integrating LLMs with Snowflake, Databricks, BigQuery, Athena, and structured SQL logic for hybrid retrieval and KPI-aware response generation.
Demonstrated ability to lead end-to-end GenAI lifecycle data prep, fine-tuning, grounding, orchestration, deployment, monitoring, and model governance in regulated environments.
Proven track record working with ledger-linked data models, cost-variance analytics, financial reporting structures, and payer reimbursement workflows using ML + RAG hybrids.
Experienced in designing MLOps pipelines with Vertex AI, SageMaker, MLflow, and GitLab CI/CD, enabling automated training, experiment tracking, and production-scale rollout.
Adept at collaborating with finance, actuarial, compliance, and clinical SMEs to build trustworthy, auditable AI systems aligned with CMS, HIPAA, and enterprise governance controls.









TECHNICAL SKILLS


Programming & Frameworks
Python, SQL, PL/SQL, PyTorch, TensorFlow, Scikit-learn, Hugging Face Transformers, FastAPI, Flask, LangGraph, LangChain, Streamlit, Gradio
AI & Machine Learning
LLM Fine-Tuning (PEFT, LoRA), RAG, GraphRAG, NLP, Classification, Regression, Clustering, Forecasting (ARIMA, Prophet), Reinforcement Learning (RLHF/RLAIF), Feature Engineering, Model Evaluation & Explainability, Prompt Optimization, Retrieval Performance Tuning
Generative & Agentic AI
OpenAI GPT-4/Claude 3, AWS Bedrock (Claude, Titan), Vertex AI (Gemini), LangGraph, LangChain, Vector Databases (FAISS, Pinecone, Chroma, Weaviate), Agent Orchestration, Tool Calling, Prompt Engineering, Context Caching, Query Re-ranking
Cloud & MLOps
GCP (Vertex AI, BigQuery, Cloud Run), AWS (SageMaker, ECS, Lambda, Step Functions, S3, CloudWatch), Azure ML, Docker, Kubernetes, MLflow, Prometheus, Grafana, GitLab CI/CD, Model Governance
Data Engineering & Storage
Snowflake, Databricks, Delta Lake, Apache Spark, BigQuery, Athena, SQL/NoSQL, ETL/ELT Pipelines, Dataform, DBT, Feature Stores
Visualization & Reporting
Looker, Power BI, Tableau, Matplotlib, Plotly, Seaborn
Other Tools & Utilities
Pandas, NumPy, Boto3, REST/GraphQL APIs, JSON, Git, GitLab, JIRA, Confluence, Agile/Scrum Collaboration










PROFESSIONAL EXPERIENCE

Client: Optum , Chicago, IL
Role: Data Scientist/Gen Ai Engineer Duration: October 2023 Present

Project Scope:
Designed and deployed GenAI-driven financial intelligence systems automating payer reconciliation, cost forecasting, and audit summarization. Built and fine-tuned domain-specific LLMs for financial operations and compliance workflows using Vertex AI and LangGraph, ensuring regulatory traceability across sensitive healthcare finance datasets.

Fine-tuned Llama and Falcon models using Hugging Face and Vertex AI for financial reconciliation and cost-variance analysis within payer networks.
Built LangGraph-based multi-agent systems for automating CMS audit summaries, cost reporting, and financial performance Q&A.
Implemented Retrieval-Augmented Generation (RAG) pipelines with FAISS and Pinecone for real-time access to payer financial data and ledger-level insights.
Engineered GenAI copilots integrated with internal finance systems to generate balance summaries, utilization forecasts, and variance alerts for leadership dashboards.
Applied reinforcement learning and prompt optimization to improve LLM accuracy in financial narrative generation and compliance documentation.
Collaborated with finance and regulatory teams to ensure CMS/HIPAA-aligned explainability, audit readiness, and compliance governance.
Developed cost forecasting and revenue prediction models using PyTorch and TensorFlow integrated with GCP BigQuery and Looker dashboards for CFO-level visibility.
Automated structured and unstructured data ingestion pipelines for financial KPIs, reimbursement summaries, and cost-per-patient reporting.
Enhanced GenAI model explainability through XAI frameworks and built evaluation metrics for bias mitigation and financial accuracy validation.
Integrated GenAI-based anomaly detection models for identifying outlier transactions and abnormal reimbursement trends, reducing financial discrepancies across payer systems.
Deployed intelligent audit copilots capable of automatically validating invoice-to-claim matching and generating exception reports for internal finance audits.
Designed NLP-driven summarization models to convert detailed financial and operational logs into concise executive summaries for monthly and quarterly reviews.
Developed ledger-aware vector retrieval systems enabling explainable drill-downs from aggregated KPIs to underlying claim or cost-center records.
Implemented secure data governance workflows for AI-driven financial analytics using GCP IAM and audit trails to ensure end-to-end compliance and traceability.

Environment:Python, PyTorch, TensorFlow, Hugging Face Transformers, LangGraph, Vertex AI, FAISS, Pinecone, BigQuery, Looker, GCP, Docker, FastAPI, GitHub Actions, CI/CD, REST APIs, CMS compliance, financial data governance.



Client: Spencer Health Solutions, Morrisville, NC
Role: Data Scientist
Duration: December 2021 July 2023

Project Scope:
Developed predictive and analytical models to improve payer analytics, claims risk prediction, and patient adherence forecasting. Designed data-driven workflows using Python, AWS SageMaker, and Athena to extract clinical and operational insights. In 2023, extended existing analytics workflows to integrate retrieval-based AI components (early RAG prototypes) for summarizing patient feedback and provider documentation.

Built predictive models using Python (scikit-learn, XGBoost) to identify high-cost claimants, forecast medical expenditure, and optimize payer financial risk across commercial insurance lines.
Engineered feature pipelines on AWS using S3, Glue, and Athena to process multi-source data including eligibility, enrollment, and pharmacy claims.
Conducted exploratory data analysis (EDA) and designed model explainability layers to support actuarial and care management teams in decision-making.
Developed financial forecasting models for claim volume, premium inflow, and revenue trend prediction, enhancing quarterly financial planning and budget accuracy for payer partners
Automated ETL workflows integrating claims, provider, and utilization data, reducing manual data refresh effort by over 40%.
Implemented RAG-style document retrieval prototypes using pre-trained embeddings and FAISS to support early exploration of claim summarization and anomaly detection.
Developed SQL-based data validation scripts to detect outliers and ensure data completeness before model ingestion.
Collaborated with AWS engineering teams to containerize model endpoints with Docker and deploy them on SageMaker endpoints for live scoring.
Worked with product owners and compliance teams to ensure HIPAA-aligned data access and storage controls for all ML workloads.
Created interactive dashboards in QuickSight and Looker to visualize provider performance, claim rejection rates, and patient adherence KPIs.
Partnered with data governance teams to document data dictionaries, lineage, and business rules ensuring transparent analytical pipelines.

Environment:AWS SageMaker, Lambda, Step Functions, ECS, S3, Athena, Python, Scikit-learn, XGBoost, TensorFlow, PyTorch, boto3, Pandas, NumPy, CloudWatch, QuickSight, GitHub, Docker

Client: USCC Chicago , Illinois USA United States Cellular Corporation
Role: Data Scientist /Machine Learning Engineer Duration: December 2019 November 2021
Project Scope:
Designed and implemented scalable data science and machine learning solutions to support telecom customer analytics, churn reduction, and revenue optimization. Developed distributed data pipelines using Python, PySpark, and SQL on on-premise and hybrid cloud infrastructure. Focused on end-to-end model development, operationalization, and automation using Air ow and containerized deployments
Developed end-to-end churn forecasting pipelines using PySpark on AWS EMR, integrating daily subscriber activity, billing records, and call center interactions into ML-ready datasets.
Built and maintained PySpark-based ETL pipelines to process customer usage, billing, and interaction data
for downstream predictive modeling.
Automated data extraction, preprocessing, and scoring work ows using Air ow DAGs and shell scripts to ensure repeatable production runs.
Implemented model monitoring and retraining triggers based on data drift and performance degradation
using Python-based automation.
Collaborated with marketing and operations teams to translate predictive insights into retention strategies and campaign targeting.
Deployed models as RESTful APIs using Flask and Docker, integrating outputs with internal analytics
dashboards.
Conducted data profiling and quality checks to ensure consistency, accuracy, and audit readiness of model-ready datasets.
Performed hyperparameter tuning, cross-validation, and model explainability studies to improve prediction
accuracy and transparency.
Supported migration of analytical workloads from on-prem Hadoop to early Databricks and cloud-based infrastructure for scalability and maintainability.

Environment: Snow ake, PySpark, AWS EMR, SageMaker, Redshift, Lambda, Step Functions, SQL, Python, scikit-learn, XGBoost, TensorFlow, Tableau, QuickSight, GitHub, AWS Glue, Con uence

Client: Cygnet Infotech, Hyderabad , India Role: Data Analyst
Duration: June 2016 September 2019 Project Scope:
Developed Power BI and QlikView dashboards integrating SQL-based data models to monitor SLA compliance, NPS, and CSAT metrics; automated recurring Excel and SQL reports; and provided analytical insights that improved customer-support performance and reduced escalation rates.
Analyzed customer-service and product-usage data to identify trends in resolution times, escalation rates, and recurring issue categories for performance optimization
Created interactive Power BI and QlikView dashboards tracking SLA compliance, customer-satisfaction
metrics, and agent-level performance KPIs used by operations leadership
Developed Excel-based reconciliation reports to track monthly billing discrepancies, refunds, and invoice-level anomalies, ensuring financial accuracy and transparency
Collaborated with business teams to define KPI logic and automated weekly / monthly reporting using SQL
queries and Excel macros, improving report turnaround time
Built user-friendly Excel dashboards with PivotTables, slicers, and conditional formatting to help non-technical stakeholders filter data by region, agent, or issue type.
Partnered with QA and product teams to categorize issues by severity and frequency, helping prioritize bug
Designed scorecards and ranking charts to visualize weekly NPS, CSAT, and agent-level satisfaction metrics for performance reviews
Supported quarterly business reviews by preparing trend analyses and visual summaries of performance
metrics, customer feedback, and SLA attainment

Environment: SQL, Power BI, QlikView, Excel, PivotTables, VLOOKUP, Excel Macros, Slicers, Conditional Formatting, Customer Support KPIs, NPS, CSAT, SLA Metrics

Education
Bachelor of Technology (B. Tech) in Information Technology from KLUniversity May 2016
Vijayawada, Andhra Pradesh, India
Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning business intelligence sthree procedural language Illinois North Carolina

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6435
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: