| harshitha - Gen ai engineer |
| [email protected] |
| Location: Remote, Remote, USA |
| Relocation: |
| Visa: |
|
Sai Harshitha Padala
[email protected] (+1 9803526677) LINKEDIN Professional Summary Senior Data Scientist and AI Engineer with 9+ years of experience designing and deploying GenAI and data-driven systems across healthcare, payer, and enterprise finance domains Certified Oracle Cloud Infrastructure 2025 Certified Generative AI Professional LINK Hands-on expertise in fine-tuning Large Language Models (GPT, LLaMA, Falcon) and building Retrieval-Augmented Generation (RAG) pipelines, vector databases (FAISS, Pinecone), and LangGraph-based multi-agent systems for financial and operational intelligence. Strong background in software engineering, developing modular APIs, optimizing model performance, and deploying containerized AI solutions with Docker, Kubernetes, and CI/CD. Proficient in SQL, PL/SQL, and data modeling for structured and unstructured data integration across GCP and AWS, Experienced in developing LLM-powered copilots, prompt-tuned agents, and explainable GenAI workflows for audit automation, reporting Experienced in architecting GraphRAG, metadata-aware retrieval, and multi-hop reasoning pipelines to improve grounding, financial traceability, and enterprise search relevance. Skilled in designing LLM evaluation suites using Ragas, custom scoring, and prompt analytics to measure accuracy, hallucination rate, and financial-document consistency. Hands-on experience building enterprise copilots for claims, audits, financial summaries, and operational forecasting using LangGraph, tool-calling, and domain-specific prompt flows. Strong expertise integrating LLMs with Snowflake, Databricks, BigQuery, Athena, and structured SQL logic for hybrid retrieval and KPI-aware response generation. Demonstrated ability to lead end-to-end GenAI lifecycle data prep, fine-tuning, grounding, orchestration, deployment, monitoring, and model governance in regulated environments. Proven track record working with ledger-linked data models, cost-variance analytics, financial reporting structures, and payer reimbursement workflows using ML + RAG hybrids. Experienced in designing MLOps pipelines with Vertex AI, SageMaker, MLflow, and GitLab CI/CD, enabling automated training, experiment tracking, and production-scale rollout. Adept at collaborating with finance, actuarial, compliance, and clinical SMEs to build trustworthy, auditable AI systems aligned with CMS, HIPAA, and enterprise governance controls. TECHNICAL SKILLS Programming & Frameworks Python, SQL, PL/SQL, PyTorch, TensorFlow, Scikit-learn, Hugging Face Transformers, FastAPI, Flask, LangGraph, LangChain, Streamlit, Gradio AI & Machine Learning LLM Fine-Tuning (PEFT, LoRA), RAG, GraphRAG, NLP, Classification, Regression, Clustering, Forecasting (ARIMA, Prophet), Reinforcement Learning (RLHF/RLAIF), Feature Engineering, Model Evaluation & Explainability, Prompt Optimization, Retrieval Performance Tuning Generative & Agentic AI OpenAI GPT-4/Claude 3, AWS Bedrock (Claude, Titan), Vertex AI (Gemini), LangGraph, LangChain, Vector Databases (FAISS, Pinecone, Chroma, Weaviate), Agent Orchestration, Tool Calling, Prompt Engineering, Context Caching, Query Re-ranking Cloud & MLOps GCP (Vertex AI, BigQuery, Cloud Run), AWS (SageMaker, ECS, Lambda, Step Functions, S3, CloudWatch), Azure ML, Docker, Kubernetes, MLflow, Prometheus, Grafana, GitLab CI/CD, Model Governance Data Engineering & Storage Snowflake, Databricks, Delta Lake, Apache Spark, BigQuery, Athena, SQL/NoSQL, ETL/ELT Pipelines, Dataform, DBT, Feature Stores Visualization & Reporting Looker, Power BI, Tableau, Matplotlib, Plotly, Seaborn Other Tools & Utilities Pandas, NumPy, Boto3, REST/GraphQL APIs, JSON, Git, GitLab, JIRA, Confluence, Agile/Scrum Collaboration PROFESSIONAL EXPERIENCE Client: Optum , Chicago, IL Role: Data Scientist/Gen Ai Engineer Duration: October 2023 Present Project Scope: Designed and deployed GenAI-driven financial intelligence systems automating payer reconciliation, cost forecasting, and audit summarization. Built and fine-tuned domain-specific LLMs for financial operations and compliance workflows using Vertex AI and LangGraph, ensuring regulatory traceability across sensitive healthcare finance datasets. Fine-tuned Llama and Falcon models using Hugging Face and Vertex AI for financial reconciliation and cost-variance analysis within payer networks. Built LangGraph-based multi-agent systems for automating CMS audit summaries, cost reporting, and financial performance Q&A. Implemented Retrieval-Augmented Generation (RAG) pipelines with FAISS and Pinecone for real-time access to payer financial data and ledger-level insights. Engineered GenAI copilots integrated with internal finance systems to generate balance summaries, utilization forecasts, and variance alerts for leadership dashboards. Applied reinforcement learning and prompt optimization to improve LLM accuracy in financial narrative generation and compliance documentation. Collaborated with finance and regulatory teams to ensure CMS/HIPAA-aligned explainability, audit readiness, and compliance governance. Developed cost forecasting and revenue prediction models using PyTorch and TensorFlow integrated with GCP BigQuery and Looker dashboards for CFO-level visibility. Automated structured and unstructured data ingestion pipelines for financial KPIs, reimbursement summaries, and cost-per-patient reporting. Enhanced GenAI model explainability through XAI frameworks and built evaluation metrics for bias mitigation and financial accuracy validation. Integrated GenAI-based anomaly detection models for identifying outlier transactions and abnormal reimbursement trends, reducing financial discrepancies across payer systems. Deployed intelligent audit copilots capable of automatically validating invoice-to-claim matching and generating exception reports for internal finance audits. Designed NLP-driven summarization models to convert detailed financial and operational logs into concise executive summaries for monthly and quarterly reviews. Developed ledger-aware vector retrieval systems enabling explainable drill-downs from aggregated KPIs to underlying claim or cost-center records. Implemented secure data governance workflows for AI-driven financial analytics using GCP IAM and audit trails to ensure end-to-end compliance and traceability. Environment:Python, PyTorch, TensorFlow, Hugging Face Transformers, LangGraph, Vertex AI, FAISS, Pinecone, BigQuery, Looker, GCP, Docker, FastAPI, GitHub Actions, CI/CD, REST APIs, CMS compliance, financial data governance. Client: Spencer Health Solutions, Morrisville, NC Role: Data Scientist Duration: December 2021 July 2023 Project Scope: Developed predictive and analytical models to improve payer analytics, claims risk prediction, and patient adherence forecasting. Designed data-driven workflows using Python, AWS SageMaker, and Athena to extract clinical and operational insights. In 2023, extended existing analytics workflows to integrate retrieval-based AI components (early RAG prototypes) for summarizing patient feedback and provider documentation. Built predictive models using Python (scikit-learn, XGBoost) to identify high-cost claimants, forecast medical expenditure, and optimize payer financial risk across commercial insurance lines. Engineered feature pipelines on AWS using S3, Glue, and Athena to process multi-source data including eligibility, enrollment, and pharmacy claims. Conducted exploratory data analysis (EDA) and designed model explainability layers to support actuarial and care management teams in decision-making. Developed financial forecasting models for claim volume, premium inflow, and revenue trend prediction, enhancing quarterly financial planning and budget accuracy for payer partners Automated ETL workflows integrating claims, provider, and utilization data, reducing manual data refresh effort by over 40%. Implemented RAG-style document retrieval prototypes using pre-trained embeddings and FAISS to support early exploration of claim summarization and anomaly detection. Developed SQL-based data validation scripts to detect outliers and ensure data completeness before model ingestion. Collaborated with AWS engineering teams to containerize model endpoints with Docker and deploy them on SageMaker endpoints for live scoring. Worked with product owners and compliance teams to ensure HIPAA-aligned data access and storage controls for all ML workloads. Created interactive dashboards in QuickSight and Looker to visualize provider performance, claim rejection rates, and patient adherence KPIs. Partnered with data governance teams to document data dictionaries, lineage, and business rules ensuring transparent analytical pipelines. Environment:AWS SageMaker, Lambda, Step Functions, ECS, S3, Athena, Python, Scikit-learn, XGBoost, TensorFlow, PyTorch, boto3, Pandas, NumPy, CloudWatch, QuickSight, GitHub, Docker Client: USCC Chicago , Illinois USA United States Cellular Corporation Role: Data Scientist /Machine Learning Engineer Duration: December 2019 November 2021 Project Scope: Designed and implemented scalable data science and machine learning solutions to support telecom customer analytics, churn reduction, and revenue optimization. Developed distributed data pipelines using Python, PySpark, and SQL on on-premise and hybrid cloud infrastructure. Focused on end-to-end model development, operationalization, and automation using Air ow and containerized deployments Developed end-to-end churn forecasting pipelines using PySpark on AWS EMR, integrating daily subscriber activity, billing records, and call center interactions into ML-ready datasets. Built and maintained PySpark-based ETL pipelines to process customer usage, billing, and interaction data for downstream predictive modeling. Automated data extraction, preprocessing, and scoring work ows using Air ow DAGs and shell scripts to ensure repeatable production runs. Implemented model monitoring and retraining triggers based on data drift and performance degradation using Python-based automation. Collaborated with marketing and operations teams to translate predictive insights into retention strategies and campaign targeting. Deployed models as RESTful APIs using Flask and Docker, integrating outputs with internal analytics dashboards. Conducted data profiling and quality checks to ensure consistency, accuracy, and audit readiness of model-ready datasets. Performed hyperparameter tuning, cross-validation, and model explainability studies to improve prediction accuracy and transparency. Supported migration of analytical workloads from on-prem Hadoop to early Databricks and cloud-based infrastructure for scalability and maintainability. Environment: Snow ake, PySpark, AWS EMR, SageMaker, Redshift, Lambda, Step Functions, SQL, Python, scikit-learn, XGBoost, TensorFlow, Tableau, QuickSight, GitHub, AWS Glue, Con uence Client: Cygnet Infotech, Hyderabad , India Role: Data Analyst Duration: June 2016 September 2019 Project Scope: Developed Power BI and QlikView dashboards integrating SQL-based data models to monitor SLA compliance, NPS, and CSAT metrics; automated recurring Excel and SQL reports; and provided analytical insights that improved customer-support performance and reduced escalation rates. Analyzed customer-service and product-usage data to identify trends in resolution times, escalation rates, and recurring issue categories for performance optimization Created interactive Power BI and QlikView dashboards tracking SLA compliance, customer-satisfaction metrics, and agent-level performance KPIs used by operations leadership Developed Excel-based reconciliation reports to track monthly billing discrepancies, refunds, and invoice-level anomalies, ensuring financial accuracy and transparency Collaborated with business teams to define KPI logic and automated weekly / monthly reporting using SQL queries and Excel macros, improving report turnaround time Built user-friendly Excel dashboards with PivotTables, slicers, and conditional formatting to help non-technical stakeholders filter data by region, agent, or issue type. Partnered with QA and product teams to categorize issues by severity and frequency, helping prioritize bug Designed scorecards and ranking charts to visualize weekly NPS, CSAT, and agent-level satisfaction metrics for performance reviews Supported quarterly business reviews by preparing trend analyses and visual summaries of performance metrics, customer feedback, and SLA attainment Environment: SQL, Power BI, QlikView, Excel, PivotTables, VLOOKUP, Excel Macros, Slicers, Conditional Formatting, Customer Support KPIs, NPS, CSAT, SLA Metrics Education Bachelor of Technology (B. Tech) in Information Technology from KLUniversity May 2016 Vijayawada, Andhra Pradesh, India Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning business intelligence sthree procedural language Illinois North Carolina |