| Vaibhav Sainath Reddy - GEN AI Engineer |
| [email protected] |
| Location: Wilmington, Delaware, USA |
| Relocation: Yes across USA |
| Visa: H1B |
| Resume file: Vaibhav Sainath _GEN AI_Data Engineer_1778705006642.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Vaibhav Sainath Reddy
Office: (402) 418-5715 [email protected] LinkedIn: linkedin.com/in/vaibhav575 Senior GenAI & Data Engineer | AWS Bedrock | Azure OpenAI | PySpark | Databricks | RAG | MLOps Open to Onsite, hybrid and remote | Available immediately Professional Profile: Results-driven Senior GenAI & Data Engineer with 9 years of experience building enterprise AI, machine learning, and large-scale data platforms across healthcare(HIPAA), telecom, and insurance domains. Expertise in AWS Bedrock, Azure OpenAI, LangChain, RAG pipelines, PySpark, Databricks, Kafka, and cloud-native MLOps architectures on AWS, Azure, and GCP. Experienced in developing real-time streaming pipelines, AI agents, scalable ETL frameworks, and production-grade ML systems in regulated enterprise environments. Core Technical Competencies: Generative AI & LLM Orchestration: Expert in building production-ready applications using AWS Bedrock, Azure OpenAI, and GPT-4. Specialized in LangChain for agentic workflows and advanced prompt engineering (Chain-of-Thought, ReAct). RAG & Vector Intelligence: Extensive experience designing Retrieval-Augmented Generation (RAG) pipelines using vector databases and Azure Cognitive Search to ground LLMs in proprietary enterprise data. Model Optimization: Hands-on experience fine-tuning open-source models (LLaMA 3, Mistral, Falcon) using efficiency-focused techniques like PEFT and LoRA. Advanced Machine Learning: Proficient in both traditional ML (XGBoost, Random Forest) and Deep Learning (PyTorch, TensorFlow) for computer vision, NLP, and predictive analytics. Cloud-Native MLOps: Architecting scalable AI infrastructure on AWS, Azure, and GCP. Expert in Docker, Kubernetes, and MLflow for CI/CD, model versioning, and real-time monitoring of hallucinations and drift. Data Engineering at Scale: Building robust ETL pipelines using PySpark, Databricks, and Snowflake to support large-scale model training and inference. Impact & Leadership A strategic collaborator skilled at translating complex business requirements into technical roadmaps. Proven track record of leading Agile teams and delivering high-stakes projects, including AI-driven diagnostics for medical imaging and automated customer support ecosystems for global enterprises. Core Competencies: Generative AI & LLMs RAG & Vector Search Prompt Engineering MLOps & CI/CD AWS Bedrock / Azure ML NLP / Text Analytics Deep Learning (CNN/RNN) AI Agents (LangChain) Data Pipelines & ETL Power BI / Tableau Python / SQL / PySpark Healthcare AI (HIPAA) Cloud (AWS/Azure/GCP) Docker / Kubernetes Stakeholder Management Technical Skills: Generative AI: AWS Bedrock, Azure OpenAI, OpenAI GPT-4/3.5, LLaMA 2/3, Prompt Engineering, Fine-Tuning, RAG, AI Agents, Foundation Models. ML Frameworks: TensorFlow, PyTorch, Keras, Scikit-learn, XGBoost, FastAI, Hugging Face Transformers, MLflow, AutoML. NLP: LangChain, NLTK, SpaCy, Sentiment Analysis, Entity Recognition, Text Classification, Vector Search, Embeddings Cloud Platforms: AWS (Bedrock, SageMaker, Lambda, S3, EC2, Comprehend), Azure (ML, OpenAI, Cognitive Search, Data Factory, Synapse, DevOps), GCP (Vertex AI, BigQuery). MLOps & DevOps: Docker, Kubernetes, MLflow, CI/CD, Azure DevOps, SageMaker Endpoints, Model Monitoring, Drift Detection. Programming: Python (Pandas, NumPy, PySpark, Scikit-learn), SQL, UNIX Shell, VBA, Excel Macros. Data & Visualization: Tableau, Power BI, Looker, Snowflake, Redshift, Teradata, BigQuery, Azure Data Lake, SSAS APIs & Web: FastAPI, Flask, REST APIs, Microservices. Data Engineering: Apache Spark, PySpark, Databricks, Apache Kafka, Airflow, Azure Data Factory, Snowflake, Redshift, BigQuery, Delta Lake, ETL/ELT Pipelines, Data Modeling (Star/Snowflake Schema). Computer Vision: OpenCV, DICOM Processing, CNN-based Anomaly Detection. Professional Experience: Senior Gen AI, ML Engineer / Data Engineer Healthcare AI Platform CFHP (Community First Health Plans) | San Antonio, TX (Remote) Feb 2023 Present Collaborated with clinical operations and business stakeholders to gather requirements and deliver AWS Bedrock-based generative AI solutions for automated internal support response generation. Designed and deployed LangChain AI agents integrated with AWS Bedrock foundation models to automate patient-device troubleshooting workflows, reducing manual intervention and improving client service SLAs. Built prompt engineering frameworks for enterprise LLM applications using GPT-4 and Bedrock, improving response quality and business relevance. Built HIPAA-compliant, serverless AI inference pipelines using AWS Lambda, S3, and EC2 to process unstructured healthcare data and deliver real-time AI-powered recommendations at scale. Implemented model monitoring dashboards tracking hallucination rates, latency, and response quality to ensure production reliability across AI-powered healthcare systems. Applied RAG (Retrieval-Augmented Generation) techniques with Azure Cognitive Search to build intelligent clinical record retrieval tools, significantly improving clinician search speed and accuracy. Developed and deployed GPT-4-powered conversational assistant for medical device troubleshooting, user training, and FAQ support, reducing support ticket volume for healthcare workers. Designed and implemented real-time data pipelines using Apache Kafka and Spark Structured Streaming to process high-volume event data, enabling low-latency analytics, CDC-based ingestion, and real-time inference for downstream business applications. Improved image recognition model performance by 20% by developing and fine-tuning custom deep learning architectures using PyTorch for healthcare diagnostic imaging. Built NLP pipelines using PyTorch, SpaCy, and Hugging Face to parse unstructured clinical notes and device logs, enabling automated extraction of device performance issues for post-market surveillance. Implemented computer vision models using PyTorch and OpenCV for anomaly detection in medical imaging (X-rays, ECG waveforms), achieving a 20% improvement in recognition accuracy. Integrated Azure ML and AutoML with FDA-regulated datasets to build classification models for device status monitoring and patient risk prediction; ensured full HIPAA compliance throughout. Developed Python-based ETL pipelines for structured and unstructured healthcare data using Pandas and PySpark; built FastAPI microservices to expose ML models to downstream applications. Created Power BI and Looker dashboards visualizing device performance trends, patient vitals, and predictive failure alerts for healthcare providers and executive stakeholders. Deployed scalable ML workloads across Vertex AI, SageMaker, and Azure ML using Kubernetes; established CI/CD pipelines to reduce model release cycles and ensure consistent environments. Environment: AWS Bedrock, AWS Lambda, S3, EC2, SageMaker, Azure OpenAI, Azure ML, Azure Cognitive Search, GCP Vertex AI, LangChain, GPT-4, LLaMA 3, Hugging Face, PyTorch, TensorFlow, Scikit-learn, OpenCV, DICOM, FastAPI, PySpark, Databricks, Apache Kafka, Snowflake, Delta Lake, Airflow, Docker, Kubernetes, MLflow, Power BI, Looker, Python, SQL, Azure DevOps, CI/CD Senior Data Scientist / ML & Data Engineer Telecom AI & Analytics Platform Gainwell Technologies | Wilmington, Delware Oct 2021 Jan 2023 Worked on early-stage NLP and transformer-based ML solutions using SageMaker and Hugging Face models to support intelligent telecom analytics and automation use cases. Developed custom prompt orchestration workflows for LLMs to support telecom billing analytics, allowing business users to query complex datasets using natural language. Applied A/B testing and statistical validation to compare prompt variations and model outputs, helping clients select the most effective AI configurations for production deployment. Implemented Apache Airflow DAGs to orchestrate complex, multi-step data workflows across ingestion, transformation, and model scoring stages, ensuring SLA-compliant pipeline execution and automated failure alerting. Designed and optimized data warehouse models in BigQuery using partitioning and clustering strategies, reducing analytical query costs by 35% and improving dashboard refresh speeds for telecom reporting teams. Built distributed ETL pipelines using PySpark and Databricks processing 3TB+ daily enterprise data with 99.9% pipeline reliability. Built real-time data streaming pipelines using Apache Kafka and GCP Dataflow to capture subscriber behavior events, enabling near-real-time fraud detection and churn propensity scoring. Implemented Vertex AI pipelines for automated data preprocessing and model training, significantly improving workflow efficiency for telecom analytics use cases. Fine-tuned Hugging Face transformer models for NLP tasks including sentiment analysis and entity recognition on telecom customer data, improving model performance by 15%+. Streamlined MLflow model registry adoption for experiment tracking, versioning, and lifecycle management across the data science team. Built Azure Cognitive Services NLP solutions and Azure Data Factory + Databricks pipelines for seamless data ingestion, transformation, and model deployment. Automated backend ML scoring APIs using Python + FastAPI/Flask, reducing manual reporting efforts and improving forecasting accuracy for client stakeholders. Environment: SageMaker, GCP Vertex AI, BigQuery, GCP Dataflow, Apache Kafka, Apache Airflow, PySpark, Databricks, Azure Data Factory, Azure Cognitive Services, Hugging Face Transformers, MLflow, FastAPI, Flask, Python, SQL, Docker, LLMs, Prompt Engineering, A/B Testing, SparkSQL, Power BI, Tableau Data Analyst / Python Insurance Analytics & ML Platform All state Insurance | Hyderabad, India Apr 2019 Jul 2021 Designed and maintained physical and logical data models (star schema, snowflake schema) for complex insurance data structures, enhancing data integrity and analytical usability across business units. Built Python-based ML inference scripts using Scikit-learn, Random Forest, and SVM for predictive modeling of insurance claims and customer churn; tracked experiments with MLflow, reducing defect rates by 20%. Developed automated Python ETL pipelines (Pandas, NumPy) to ingest, validate, transform, and load large-scale structured and semi-structured insurance datasets into enterprise data warehouses on Snowflake and Teradata. Built and orchestrated batch data workflows using Apache Airflow, automating daily data ingestion jobs from upstream policy and claims systems and ensuring SLA compliance with alerting on pipeline failures. Wrote complex SQL queries and stored procedures for back-end data validation, testing, and reconciliation; designed SSAS cubes, named queries, and calculated columns for robust reporting capabilities. Created automated data validation and cleaning scripts in Python (Pandas, NumPy), improving quality assurance for large-scale data loads and ETL workflows. Developed interactive Tableau dashboards and Power BI reports to surface actionable insights on claims trends, customer satisfaction scores, and policy renewal rates for business partners and senior leadership. Partnered with enterprise data warehouse and data governance teams to resolve data quality issues and develop integrated information delivery solutions across policy, billing, and claims domains. Applied NLP techniques (NLTK) to optimize customer satisfaction analytics; utilized VBA macros to automate repetitive Excel-based workflows, increasing operational efficiency. Environment: Python, Pandas, NumPy, Scikit-learn, MLflow, SQL, Snowflake, Teradata, SSAS, Apache Airflow, Tableau, Power BI, Excel (VBA/Macros), Random Forest, SVM, NLTK, ETL Pipelines, Data Modeling, Star/Snowflake Schema Junior Data Analyst Big Data Analytics & Financial Modeling CADSYS Limited | Hyderabad, India Jun 2016 Mar 2019 Leveraged PySpark, SparkSQL, and MLlib on Hadoop big data platforms to perform distributed, real-time analysis on loan defaults and financial risk datasets, demonstrating proficiency in large-scale data processing. Designed and built ETL pipelines using PySpark for data extraction from RDBMS and flat file sources, applying transformation logic based on Source-to-Target mapping documents and loading into analytical data marts. Developed and maintained dimensional data models (star schema, snowflake schema) to support self-serve BI reporting; documented logical and physical data models and data dictionaries. Utilized Google BigQuery for large-scale data analysis, developing optimized queries and data models that enhanced analytical performance and scalability for financial datasets. Applied unsupervised ML (K-Means clustering, DBSCAN) and statistical models to segment business datasets for customer risk profiling; tracked experiments with MLflow and incorporated test suites to reduce defect rates. Built automated Python data validation and reconciliation scripts to ensure accuracy of ETL outputs across source and target systems, enabling reliable audit trails for compliance reporting. Created use cases, activity diagrams, and state diagrams; worked in Agile/Scrum methodology to deliver data requirements on time and within scope. Environment: PySpark, SparkSQL, Hadoop, MLlib, Google BigQuery, Python, Pandas, NumPy, SQL, MLflow, Scikit-learn, K-Means Clustering, ETL Pipelines, Dimensional Data Modeling, VBA, Excel, Agile/Scrum, RDBMS Education: Master of Science Information Technology Wilmington University Delaware, USA 2022 Bachelor of Science Computer Science Osmania University (IPE) Hyderabad, India 2016 Key Achievements: Reduced cloud processing costs by 25% by optimizing model training workflows on Google Cloud Platform through efficient resource scheduling and pipeline redesign. Improved image recognition model performance by 20% by developing and fine-tuning custom deep learning architectures using PyTorch for healthcare diagnostic imaging. Reduced defect rate by 20% by integrating a comprehensive test case suite and MLflow experiment tracking into the ML development lifecycle. Automated patient-device troubleshooting workflows using LangChain AI agents and AWS Bedrock, significantly reducing manual intervention and average response time for client service teams. Built HIPAA-compliant, serverless AI inference pipelines on AWS Lambda + EC2 + S3, enabling real-time recommendations from unstructured healthcare data at scale. Fine-tuned transformer-based NLP models (Hugging Face) for telecom billing analytics, enabling business users to extract natural language insights from complex datasets without SQL expertise. Delivered end-to-end MLOps pipelines on Azure ML and SageMaker, reducing model deployment time with automated CI/CD workflows. Certifications & Additional Information: Databricks Certified Data Engineer Associate Azure AI Engineer Associate Visa Status: H1B Authorized to work in the United States (no sponsorship required for current employer) Domain Experience: Healthcare (HIPAA), Insurance, Telecom, Retail Banking, Lending, Mortgage, Digital Banking Methodologies: Agile / Scrum, SDLC, CI/CD, DataOps, MLOps Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree Texas |