Home

Sowmya Pidathala - Senior Data Scientist
[email protected]
Location: Atlanta, Georgia, USA
Relocation: yes
Visa: GC
Resume file: Sowmya Pidathala - Resume- Senior Data Scientist_1778174574326.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Sowmya Pidathala
Senior Data Scientist
E-Mail: [email protected]
Phone: +17244872313
LinkedIn: www.linkedin.com/in/sowmya-pidathala-753274214

PROFESSIONAL SUMMARY:

Senior Data Scientist / AI-ML / Generative AI Engineer with 10+ years of experience designing and deploying enterprise-scale machine learning, deep learning, and GenAI solutions across healthcare, banking, marketing analytics, industrial automation, and retail domains. Experienced in KPI development, measurement frameworks, A/B testing, experimental design, and customer analytics to drive data-driven business decisions and actionable recommendations.
Proven expertise in LLMOps, Retrieval-Augmented Generation (RAG), and multi-agent AI workflows using LangChain, LangGraph, vector databases (Faiss, Chroma), and modern LLM frameworks to build context-aware AI applications.
Strong hands-on experience with extensive coding experience across Python, SQL, and Scala for production-grade ML systems.
Specialized in fine-tuning and optimizing large language models such as GPT-4, LLaMA, Mistral, and Gemma using parameter-efficient techniques including LoRA and QLoRA to support domain-specific summarization, reasoning, and conversational intelligence and also specialized in designing scalable model architectures and fine-tuning large language models such as GPT-4, LLaMA, Mistral
Designed and implemented end-to-end MLOps and LLMOps pipelines using MLflow, Azure ML, SageMaker, Databricks, and CI/CD workflows to automate model training, deployment, monitoring, and governance in production environments.
Built scalable Generative AI and machine learning applications for fraud detection, predictive maintenance, clinical summarization, customer analytics, and intelligent document processing using RAG frameworks and distributed data pipelines.
Developed and deployed deep learning models (CNN, RNN, LSTM, Transformers) using PyTorch and TensorFlow for NLP, predictive analytics, anomaly detection, and time-series forecasting across large-scale data ecosystems.
Strong proficiency in big-data processing and data engineering using Spark, Databricks, Snowflake, and cloud-native pipelines to build secure and scalable machine learning data platforms.
Experienced in Explainable and Responsible AI, implementing SHAP, LIME, and governance-driven monitoring frameworks to ensure transparency, model reliability, and regulatory compliance.
Proficient across multi-cloud AI ecosystems including AWS, Azure delivering production-ready AI solutions integrated with enterprise data platforms and analytics tools.
Recognized as a subject matter expert in GenAI, LLMOps, and applied machine learning across healthcare, banking, and industrial domains.

TECHNICAL SKILLS:

Programming Languages: Python, SQL, R, Scala, Java, Bash, JavaScript, TypeScript, C++, MATLAB
Machine Learning & Deep Learning: Scikit-learn, TensorFlow, PyTorch, Keras, XGBoost, LightGBM, Prophet, Hugging Face Transformers, Transfer Learning (VGG, ResNet, MobileNet), CNN, RNN, LSTM, GANs, VAEs, Diffusion Models, Time-Series Forecasting (ARIMA, LSTM), Reinforcement Learning, Recommendation Systems, Explainable AI (SHAP, LIME)
Generative AI & LLMs: GPT-4, GPT-3.5, Claude, LLaMA-2, Mistral, Gemma, LangChain, LangGraph, LangSmith, Ollama, RAG Pipelines,vLLM, LoRA/QLoRA Fine-Tuning, Semantic kernel, Prompt Engineering, PEFT, Multi-Agent Orchestration, vLLM, Bedrock, Groq Inference
MLOps / LLMOps: MLflow, Kubeflow, Vertex AI, Azure ML, SageMaker, Airflow, DVC,FastAPI, Docker, Kubernetes, Terraform, Helm, GKE, AKS, CI/CD (Azure DevOps, Jenkins, GitLab), TruLens, PromptLayer, Model Registry, Drift & Token Monitoring, Recommendation Systems, Personalization Models, Ranking Algorithms, Collaborative Filtering, Content-Based Filtering
Big Data & Data Engineering: Spark, Hadoop, Hive, Kafka, BigQuery, Dataflow (Beam), Snowflake, Databricks, Redshift, Delta Lake, Azure Data Factory, Kinesis
Vector, Graph & Embedding Tools: Faiss, Chroma, Pinecone, Weaviate, Neo4j, Milvus
Natural Language Processing (NLP) & Computer Vision (CV): spaCy, NLTK, T5, BERT, BioBERT, Word2Vec, FastText, TF-IDF, LDA, Attention Mechanisms, OpenCV, YOLOv8, Detectron2, Tesseract OCR, Semantic Search, Embedding Models, Similarity Search, Vector Retrieval
Statistical & Analytical Techniques: Regression (Linear, Logistic, Ridge, Lasso), SVM, KNN, Na ve Bayes, Decision Trees, Random Forest, Ensemble Learning, Clustering (K-Means, Hierarchical), ANOVA, Chi-Square, Hypothesis Testing, Confidence Intervals, A/B Testing, Experimental Design, Randomized Controlled Trials (RCT), KPI Development, Campaign Measurement, Observational Analytics
Visualization & BI Tools: Tableau, Power BI, QuickSight, Looker, Streamlit, Plotly, Matplotlib, Seaborn, R Shiny, Excel
DevOps & Monitoring: Git, GitHub, GitLab, Jenkins, Docker Compose, YAML, Prometheus, Grafana, Evidently AI, Kibana, Splunk
Cloud Platforms: AWS (Bedrock, SageMaker, Lambda, Glue, S3, EC2, Kinesis, ECS, EKS), GCP (Vertex AI, BigQuery, Cloud Composer, Dataflow), Azure (Azure AI, Azure ML, AKS, Databricks, Delta Lake)

PROFESSIONAL EXPERIENCE:
CVS Health, New York, NY
Role: Senior Data Scientist Feb 2025 Present

Architected GenAI-driven healthcare analytics systems on AWS Bedrock processing high-volume pharmacy and claims data, enabling real-time anomaly detection and reducing model response latency by ~40% through optimized inference pipelines.
Built personalization and recommendation pipelines using embedding-based retrieval and LLM-driven ranking to improve clinical content relevance and decision support accuracy.
Implemented semantic search solutions using transformer-based embeddings to enable efficient retrieval across healthcare knowledge bases.
Applied model evaluation, performance tuning, and prompt optimization techniques to improve accuracy, latency, and response quality of AI systems.
Built a production-grade RAG platform using LangChain, Faiss, and Chroma on healthcare knowledge bases enabling explainable, context-aware clinical summaries for pharmacy and care management operations teams.
Designed knowledge graph pipelines integrating claims, pharmacy, and clinical records to enhance LLM retrieval accuracy and contextual reasoning across PBM and insurance data workflows.
Implemented multi-agent orchestration workflows using LangGraph and Semantic Kernel to automate reasoning across healthcare use cases including patient risk stratification, operational alerts, and care coordination.
Fine-tuned LLMs (GPT-4, LLaMA, Mistral, Gemma) using LoRA/QLoRA techniques for healthcare text summarization, clinical document analysis, and pharmacy insights generation.
Operationalized LLMOps pipelines using LangChain, AWS ML, and MLflow automating prompt versioning, model evaluation, drift monitoring, and CI/CD deployment for production LLM applications.
Ensured full regulatory compliance with HIPAA, PHI, and PII standards by collaborating with clinical, compliance, and legal teams to govern AI system behavior and data access protocols.
Deployed scalable LLM inference APIs using FastAPI and vLLM on AWS ECS/EKS, supporting high-volume healthcare AI workloads with integrated monitoring and governance.
Developed Power BI dashboards tracking model performance, pharmacy claims trends, and operational KPIs across care management platforms for executive stakeholders.
Defined and tracked KPIs and measurement frameworks to evaluate model performance and operational outcomes, presenting findings to executive stakeholders through compelling data-driven narratives.
Collaborated directly with Collaborated directly with customer-facing clinical and operations stakeholders to gather data requirements, communicate analytical findings, and deliver actionable recommendations aligned with business objectives.
Provided mentoring and technical guidance to junior data scientists and ML engineers, conducting code reviews and supporting professional development.

Environment: Python, LangChain, LangGraph, Semantic Kernel, AWS Bedrock, AWS Machine Learning, Spark (PySpark), MLflow, FastAPI, vLLM, Faiss, Chroma, Docker, Power BI, AWS CloudWatch, Monitoring & Observability Tools.

Puffer Sweiven, Stafford, Tx
Role: Senior Data Scientist Aug 2022 Dec 2024

Led end-to-end AI and machine learning initiatives to improve operational efficiency, predictive maintenance, and equipment performance across industrial automation and energy operations, using Azure Machine Learning and Azure Databricks to process large volumes of operational and sensor data.
Developed predictive models such as XGBoost, Random Forest, LSTM, and Prophet to forecast equipment failures, maintenance cycles, and production demand, helping engineering teams reduce downtime and improve operational planning.
Built deep learning models using TensorFlow and PyTorch to analyze industrial sensor data, equipment logs, and operational reports, enabling automated anomaly detection and real-time equipment health monitoring.
Implemented NLP workflows using transformer-based models such as BERT and DistilBERT to extract insights from field reports, maintenance logs, and engineering documentation, improving knowledge retrieval and troubleshooting efficiency.
Designed deep learning architectures including LSTM and Transformer-based models for industrial sensor data analysis.
Developed Generative AI prototypes using Azure OpenAI Service and Azure AI Studio to summarize technical reports and operational documents, helping engineers quickly review large datasets and identify potential operational issues.
Established MLOps pipelines using Azure DevOps, MLflow, and Databricks to automate model training, versioning, monitoring, and deployment into production environments.
Built explainability dashboards using SHAP, Power BI, and Azure Monitor to provide engineering and operations teams with transparent insights into predictive models and equipment risk indicators.
Collaborated with engineering and operations teams to implement secure data pipelines, role-based access controls, and monitoring frameworks to ensure reliable data processing across operational systems.
Deployed containerized inference services using FastAPI and Azure Kubernetes Service (AKS), enabling near real-time predictions for equipment monitoring platforms and operational dashboards.
Presented operational insights and predictive analytics results to engineering leadership, supporting data-driven decisions for asset optimization and industrial process improvements.
Designed and executed A/B testing and observational analytics frameworks to evaluate model effectiveness and validate data-driven decisions across operational use cases.
Developed reusable analytics templates and libraries for multi-dimensional dataset analysis, enabling scalable insights across various operational verticals and use cases.
Reduced equipment downtime by building predictive maintenance models on Azure ML and Databricks that analyzed sensor data in real time, helping engineering teams detect potential failures earlier and improving operational reliability by about 30%.

Environment: Python, TensorFlow, PyTorch, Hugging Face Transformers, Azure Machine Learning, Azure Databricks, Azure OpenAI Service, Azure AI Studio, MLflow, Azure DevOps, Azure Kubernetes Service (AKS), Power BI, Azure Monitor, FastAPI, Docker, SQL, SHAP, Pandas, NumPy.

Bank of America - Charlotte, North Carolina
Role: Data Scientist March 2020 - April 2022
Design, implement, and manage end-to-end machine learning pipelines to support model development, deployment, and monitoring.
Developed and trained supervised learning models such as Logistic Regression, Decision Trees, Random Forest, Naive Bayes, and XGBoost to identify anomalies and understand customer behavior trends.
Led measurement initiatives and developed KPIs to track customer behavior trends, campaign performance, and anomaly detection outcomes, enabling data-driven business optimization.
Consulted with internal business teams to gather analytical requirements, design experimental frameworks, and communicate insights through structured presentations and dashboards.
Built complete MLOps workflows using AWS SageMaker, including automated training, tuning, model packaging, and production deployment at scale.
Built NLP pipelines using transformer-based models such as BERT to analyze customer transcripts and automate document summarization.
Built and deployed production API services using FastAPI and Flask for serving ML and LLM models, with Docker containers and Kubernetes orchestration ensuring reliability and scalability.
Designed Snowflake-based analytical data marts and created dimensional data models using both star and snowflake schema patterns.
Implemented secure CI/CD pipelines using Azure DevOps to automate ML model deployment across hybrid cloud infrastructures.
Collaborated with engineering, security, and compliance teams to align AI workflows with regulatory and governance requirements.
Developed NLP preprocessing workflows for tokenization, text normalization, lemmatization, and other transformations to prepare data for downstream modeling tasks.
Performed time-series analysis and applied deep learning techniques such as RNNs and CNNs to detect anomalies in power grid and sensor-based datasets.
Built fraud detection models using SVM and implemented them with Scikit-learn, NumPy, and Matplotlib, significantly enhancing model accuracy.
Developed knowledge graph structures using Neo4j to model relationships between financial transactions, enabling graph-based anomaly detection and entity linking.
Designed Snowflake data marts and dimensional models to support analytical and reporting workloads across business units.
Created interactive visualizations, dashboards, and executive summaries in Tableau and Excel to communicate key findings from ML models.
Utilized AWS services including Lambda, EC2, Transcribe, and CloudWatch to automate model workflows, enable audio transcript processing, and maintain robust monitoring.
Packaged and deployed ML models into Docker containers and integrated them with Flask-based services for seamless operation across on-prem and cloud environments.
Environment: SDLC, Python, Scikit-learn, Numpy, Scipy, Matplotlib, Pandas, AWS S3, Dynamo DB, and AWS Lambda, AWS EC2, Sage Maker, Lex, EMR, Redshift, Snowflake, RNN, Machine Learning, Deep Learning, OLAP, ODS, OLTP, 3NF, Naive Bayes, RandomForest, K-means clustering, KNN, PCA, Tableau.

Toyota Financial Services - Dallas, Tx
Role: Data Science Engineer July 2018 Feb 2020
Designed and deployed ML models for auto-loan risk scoring, churn prediction, and fraud detection, ensuring scalable and compliant solutions for vehicle financing and leasing operations.
Developed classification and clustering models using Logistic Regression, Random Forest, SVM, Gradient Boosting, K-Means, and PCA with optimized feature engineering and hyperparameter tuning.
Built large-scale PySpark batch pipelines to process transactional and loan datasets, integrating curated data into Snowflake for analytics and ML workflows.
Performed in-depth exploratory data analysis on customer behavior, payment history, and financing data to generate insights that improved retention and marketing strategies.
Built feature engineering pipelines using Pandas, NumPy, Scikit-learn, SMOTE, binning, and advanced categorical encoding techniques.
Developed executive dashboards in Tableau to visualize delinquency trends, high-risk accounts, portfolio health, and campaign performance.
Automated ETL pipelines using Python and SQL and designed star/snowflake schemas to support analytics and reporting across finance and operations teams.
Maintained model governance through detailed documentation, validation procedures, performance monitoring, and regulatory-ready reporting.

Environment: Python, Pandas, NumPy, Scikit-learn, PySpark, SQL, Oracle, Teradata, Snowflake, AWS (S3, Redshift, RDS), Tableau, Seaborn, Matplotlib, NLTK, XGBoost, ETL Pipelines, ER/Studio, Erwin.

Tata Consultancy & Services, India
Role: Data Analyst Jan 2016 Aug2017

Collected, cleaned, and transformed structured and semi-structured data from multiple sources (databases, APIs, flat files) to ensure data quality and integrity.
Performed Exploratory Data Analysis (EDA) to identify trends, patterns, and anomalies using Python (Pandas, NumPy) and SQL.
Developed interactive dashboards and reports using Tableau to provide real-time business insights.
Wrote complex SQL queries involving joins, subqueries, CTEs, window functions, and aggregations for business reporting.
Conducted statistical analysis including hypothesis testing, correlation analysis, regression, and A/B testing.
Automated recurring reporting processes using Python and scheduled SQL jobs, reducing manual effort and turnaround time.
Collaborated with cross-functional teams (Product, Marketing, Operations) to define KPIs and translate business requirements into data-driven solutions.
Designed and maintained data models to support reporting and analytics workflows.
Identified performance bottlenecks and optimized queries to improve data retrieval speed.
Built predictive insights using basic machine learning techniques such as classification and clustering.
Ensuring data governance by validating data accuracy, maintaining documentation, and implementing data validation checks.
Presented findings to stakeholders using clear visualizations and executive-ready reports.
Monitored data pipelines and resolved discrepancies to maintain reporting consistency.
Supported data warehousing initiatives and assisted in ETL pipeline validation.

Environment: Python, SQL, Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn (basic ML), Tableau, MySQL, PostgreSQL, SQL Server, Snowflake (basic), Excel (Advanced), ETL Workflows, Data Cleaning & Transformation, Statistical Analysis (Hypothesis Testing, Regression, A/B Testing), Query Optimization (Joins, CTEs, Window Functions), Jupyter Notebook, Git, GitHub, Linux (basic), Shell Scripting (basic), REST APIs (data extraction), CI/CD basics.

Dell Technologies, India
Role: Python Developer Oct 2013 Dec 2015

Developed and maintained backend components using Python, Django, and Flask for enterprise-level applications.
Built and consumed RESTful APIs, ensuring seamless communication between frontend and backend systems.
Implemented business logic, form validations, and server-side processing to enhance application functionality.
Designed database schemas and performed complex queries using MySQL and PostgreSQL.
Improved application performance by optimizing SQL queries and refactoring inefficient code.
Assisted in developing reusable Python modules to reduce redundancy and improve maintainability.
Integrated third-party services such as payment gateways, authentication APIs, and cloud storage services.
Implemented JWT/OAuth-based authentication and role-based access control mechanisms.
Wrote unit and integration test cases using PyTest and unit test, increasing code coverage and reducing bugs.
Debugged and resolved production issues by analyzing logs and performing root cause analysis.
Worked closely with frontend developers to integrate APIs and ensure smooth UI functionality.
Participated in Agile/Scrum methodology, including sprint planning, backlog grooming, and retrospectives.
Used Git, GitHub/GitLab for version control and collaborative development.
Assisted in CI/CD processes and supported deployments in AWS (EC2, S3, RDS, Lambda) environments.
Documented APIs using Swagger/OpenAPI and maintained technical documentation.

Environment: Python, Django, Flask, Django REST Framework (DRF), MySQL, PostgreSQL, SQLite, REST APIs, HTML5, CSS3, Git, GitHub/GitLab, PyTest, Postman, Swagger (OpenAPI), AWS (EC2, S3, RDS, Lambda), Docker (basic), Linux, Shell Scripting, CI/CD Pipelines.
Keywords: cplusplus continuous integration continuous deployment artificial intelligence machine learning user interface business intelligence sthree database rlang New York Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7297
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: