| Sai Teja Ankuru - Lead AI/ML Engineer |
| [email protected] |
| Location: New York City, New York, USA |
| Relocation: Yes |
| Visa: H1B |
| Resume file: Sai Teja - AIML Resume_1777916278660.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
PROFESSIONAL SUMMARY:
AI/ML Engineer with around 11 years of experience designing and deploying production-grade ML and Generative AI solutions using Vertex AI, TensorFlow/PyTorch, and BigQuery within cloud-native architectures. Strong expertise in building scalable MLOps pipelines, data engineering workflows, and AI-driven systems on GCP, integrating structured and unstructured data for enterprise applications. Proven track record of implementing LLMs, RAG pipelines, and agentic workflows (LangChain/LangGraph) with focus on performance optimization, reliability, and real-time inference systems. Extensive experience building HIPAA-compliant, cloud-native AI platforms leveraging AWS, GCP, and Azure services, integrating structured and unstructured data at enterprise scale. Hands-on expertise in big data and distributed systems, including Apache Spark, PySpark, Kafka, EMR, Databricks, and cloud data warehouses (Redshift, BigQuery, Snowflake) for large-scale feature engineering and analytics. Strong applied ML experience across classification, regression, time-series forecasting, anomaly detection, NLP, and computer vision, with deep learning frameworks such as TensorFlow and PyTorch. Experienced in real-time and batch inference architectures, developing low-latency, production-ready APIs using FastAPI, AWS Lambda, API Gateway, and Cloud Run. Adept in MLOps and LLMOps, implementing CI/CD pipelines, model registries, automated retraining, monitoring, and rollback strategies using CodePipeline, Jenkins, Docker, and Airflow. Demonstrated ability to apply explainable AI (SHAP, LIME) to ensure regulatory transparency, interpretability, and stakeholder trust in healthcare and insurance environments. Strong collaborator with a track record of working closely with product managers, clinicians, actuaries, data scientists, and business stakeholders to translate complex requirements into impactful AI solutions. TECHNICAL SKILLS: Machine Learning & Model Development: Supervised & Unsupervised Learning, Model Development & Deployment, Feature Engineering, Model Evaluation, Hyperparameter Tuning, Deep Learning, NLP, Explainable AI (SHAP, LIME),Generative Models (GANs, VAEs), Transformer Architectures, Attention Mechanisms, Multi-modal Learning Programming & APIs:Python, FastAPI, Flask, REST APIs, Async Workflows, API Integration, Microservices Architecture, Model Context Protocol (MCP), Production Implementations, FastMCP Library Cloud Platforms: AWS (Lambda, API Gateway, SageMaker, S3, Step Functions, CloudWatch, IAM, AWS Bedrock Agents, Bedrock Guardrails, Bedrock Knowledge Bases, EC2, RDS, VPC (Subnets, NACLs, Security Groups),Route 53, CloudTrail, KMS, AWS Backup), Azure (Azure OpenAI, Azure Functions, Azure Data Lake,Azure DevOps, Azure AI Search,Azure AI Foundry, Azure AI Search (Vector Search Optimization), Azure Functions (Serverless AI APIs)), GCP (Vertex AI, BigQuery, Cloud Run,Kubeflow, TensorFlow Extended (TFX)) Generative AI & AWS Bedrock: Amazon Bedrock, Bedrock Agents, Bedrock AgentCore, Bedrock Knowledge Bases, Bedrock Guardrails, Prompt Engineering, Structured Outputs, Agent Memory & Context Management, LLM Evaluation Agentic AI & GenAI: LangChain, LangGraph, AutoGen, Agentic AI Systems, Multi-Agent Workflows, Tool-Using Agents, Custom Tool Development, MCP-style Architectures, Agents as Tools, Prompt Engineering, Structured Outputs, Agent Sessions & Memory Management, LLM Response Evaluation, Guardrails, Hallucination Mitigation, LlamaIndex CrewAI, Semantic Kernel, AI Foundry LLM Platforms: OpenAI, Azure OpenAI Service, Anthropic, AWS Bedrock (Claude, Llama), Hugging Face Transformers, Google Gemini. Frontend Technologies: React, JavaScript, TypeScript, HTML5, CSS3, Modern Frontend Frameworks, Component-Based UI Design Backend & APIs: Node.js, Python, REST APIs, Microservices Architecture, API Integration, Async Processing Retrieval & Vector Databases: Retrieval-Augmented Generation (RAG), Embeddings, Chunking Strategies, Reranking, Pinecone, FAISS, Chroma, Weaviate. LLMOps / MLOps & DevOps: LLMOps, Prompt Versioning, Response Monitoring, Cost Governance, CI/CD, Git, Docker, Jenkins, GitHub Actions, Kubernetes, MLflow Big Data & Distributed Computing: Apache Spark, PySpark, Spark SQL, Spark Streaming, Spark Structured Streaming, Hadoop (HDFS, YARN), Hive, Kafka, Amazon EMR, AWS Glue, Databricks, Cloudera, Azure HDInsight. DevOps, MLOps & Automation: Git, Jenkins, AWS CodePipeline, CodeBuild, CodeCommit, GitHub Actions, Docker, Amazon ECR, AWS Fargate, MLflow, SageMaker Model Registry Data Management & Processing: Pandas, NumPy, Amazon Redshift, BigQuery, Snowflake, Oracle, PostgreSQL, Cassandra, HBase, Amazon S3, Google Cloud Storage Data Warehousing & BI: Amazon Redshift, BigQuery, Snowflake, Tableau, Looker, Power BI Testing, Explainability & Quality: PyTest, Postman, Model Validation, SHAP, LIME, Drift Detection,LLM Evaluation Frameworks. Monitoring & Observability: Prometheus, Grafana Operating Systems: Linux, UNIX, Ubuntu, CentOS, Windows PROFESSIONAL EXPERIENCE: BCBS, Richardson, TX | Feb 2024 Present Lead AI/ML Engineer (Healthcare & Generative AI) Designed and deployed production-grade ML and Generative AI solutions on GCP, leveraging Vertex AI for model training, deployment, and lifecycle management. Built and operationalized Vertex AI Pipelines to automate model training, validation, deployment, and monitoring workflows. Developed and trained predictive and NLP models using TensorFlow and PyTorch, integrating them into enterprise healthcare applications. Implemented LLM-based RAG architectures with optimized retrieval strategies, improving contextual accuracy of AI-driven responses. Designed agentic AI workflows using LangChain and LangGraph, enabling multi-step reasoning and tool orchestration. Built scalable data pipelines using BigQuery and SQL, supporting ingestion, preprocessing, and feature engineering for structured and unstructured datasets. Developed data transformation and ingestion workflows aligned with large-scale healthcare datasets including clinical notes and claims data. Deployed serverless AI components using Cloud Functions and Cloud Run for low-latency inference. Implemented distributed data processing pipelines aligned with Dataflow patterns for scalable data engineering workflows. Integrated ML and LLM models into production APIs using FastAPI, enabling real-time inference across enterprise systems. Designed and implemented end-to-end MLOps pipelines, including CI/CD automation, model versioning, and rollback strategies. Containerized applications using Docker and deployed on Kubernetes (GKE-compatible architecture) for scalable production environments. Monitored model performance, latency, and drift, implementing automated retraining and alerting mechanisms. Optimized model inference performance for latency, throughput, and cost efficiency in production systems. Built evaluation frameworks for LLM outputs, measuring accuracy, relevance, and consistency across use cases. Implemented logging, monitoring, and observability frameworks for AI systems using enterprise-grade tooling. Applied prompt engineering and response optimization techniques to improve LLM output quality. Ensured data governance, security, and HIPAA compliance across AI pipelines and deployments. Collaborated with cross-functional teams to translate business requirements into scalable AI/ML architectures. Led design discussions and provided technical mentorship, ensuring adherence to best practices in AI engineering. Environment: AWS SageMaker, Redshift ML, GCP Vertex AI, Generative AI, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), NLP (Transformers), TensorFlow, PySpark, Python, SQL, AWS Glue, Amazon EMR, Amazon S3, Amazon Kinesis, AWS Lambda, API Gateway, FastAPI, BigQuery, GCS, SHAP, LIME, MWAA (Airflow), CodePipeline, CodeBuild, JIRA, Confluence Best Buy, Richfield, MN | Jul 2022 -- Jan 2024 Senior AI/ML Engineer Collaborated with product and analytics teams to translate retail use cases such as personalization, demand forecasting, and inventory optimization into scalable AI/ML solutions using Amazon SageMaker. Designed and optimized complex Amazon Redshift SQL queries with joins, aggregations, and window functions to support feature engineering and large-scale analytics. Built data cleansing and feature engineering pipelines using Python, Pandas, and AWS Glue to convert raw transactional and clickstream data into ML-ready datasets. Automated end-to-end ETL workflows using AWS Step Functions, Lambda, and Python for reliable ingestion and transformation across S3, Redshift, and SageMaker. Scaled distributed data processing and feature generation using Amazon EMR and PySpark to support high-volume retail datasets and model training workloads. Enabled real-time and near-real-time ML inference by integrating streaming pipelines with Amazon Kinesis Data Streams and Kinesis Data Analytics. Applied NLP techniques using spaCy for text analytics on product and customer data, and developed computer vision models using Faster R-CNN on SageMaker. Designed and evaluated Generative AI and LLM-based solutions using SageMaker JumpStart and Hugging Face Transformers for intelligent search and content enrichment. Trained and optimized deep learning models using PyTorch and SageMaker Training Jobs, incorporating automated hyperparameter tuning for improved performance. Implemented model validation and explainability using cross-validation and LIME to ensure transparency and trust in ML-driven decisions. Managed the full ML lifecycle using SageMaker Model Registry and MLflow, enabling experiment tracking, versioning, and controlled production releases. Developed and deployed RESTful inference APIs using Amazon API Gateway, AWS Lambda, and FastAPI for seamless integration with retail systems. Implemented automated testing using PyTest, Postman, and WireMock to ensure reliability of ML-backed microservices. Integrated SonarQube with GitHub Actions to enforce static code analysis, security checks, and maintainability standards. Containerized ML workloads using Amazon ECR and deployed scalable inference services via AWS Fargate and SageMaker Endpoints. Implemented monitoring and security controls using Amazon CloudWatch, IAM, and S3 to support observable and compliant ML operations. Built CI/CD pipelines using AWS CodePipeline, CodeBuild, and CodeCommit to automate testing, deployment, and rollback of ML services. Environment: Python, NumPy, Pandas, PyTorch, FastAPI, Amazon SageMaker, spaCy, Faster R-CNN, Hugging Face Transformers, LIME, AWS Glue, Amazon Redshift, Amazon S3, EMR, PySpark, Kinesis, ECR, Fargate, CloudWatch, Step Functions, Lambda, CodePipeline, CodeBuild, CodeCommit, IAM, Git, Agile, JIRA, Confluence Nationwide Mutual Insurance, Columbus, OH | Sep 2021 -- Apr 2022 AI/ML Engineer Designed and productionized end-to-end machine learning pipelines on AWS and GCP, transforming large-scale insurance and financial datasets into deployable predictive models. Collaborated with actuarial, underwriting, and analytics stakeholders to translate business problems into supervised and time-series ML use cases. Built automated feature engineering pipelines using PySpark, Spark SQL, and AWS Glue to generate ML-ready datasets from Oracle, Postgres, and streaming sources. Developed and trained classification and regression models to predict claim risk, policy lapse probability, and anomaly patterns in financial transactions. Implemented batch and near-real-time inference pipelines using AWS Lambda, Kafka, Spark Streaming, and GCP Pub/Sub for scoring incoming events. Migrated legacy rule-based risk logic into ML-driven models using Random Forest, Gradient Boosting, and XGBoost for improved accuracy and scalability. Leveraged Amazon SageMaker and GCP AI Platform for model training, hyperparameter tuning, model versioning, and deployment. Integrated trained models into downstream analytics platforms (Tableau, Looker, Power BI) to support business decision-making and monitoring. Applied data validation, drift detection, and performance monitoring techniques to ensure long-term model reliability in production. Implemented explainability techniques (feature importance, SHAP-style analysis) to support regulatory transparency and stakeholder trust. Designed CI/CD pipelines for ML workflows using Git, Jenkins, AWS CodePipeline, and GCP Cloud Build to automate testing and deployment. Optimized model performance and inference latency by tuning Spark jobs, data partitioning, and cloud resource configurations. Environment: Python, PySpark, Spark MLlib, Scikit-learn, XGBoost, AWS SageMaker, AWS Lambda, S3, Redshift, EMR, Glue, Kafka, Spark Streaming, GCP AI Platform, BigQuery, Dataproc, Pub/Sub, Tableau, Looker, Jenkins, Git, Linux. Wavelabs Technologies, Hyderabad, India | Apr 2017 -- Oct 2019 Data Scientist Delivered applied data science solutions for enterprise clients by translating business problems into analytical and predictive modeling use cases. Designed end-to-end analytical pipelines using Spark, Kafka, and Databricks to transform raw data into ML-ready datasets. Performed advanced feature engineering on large-scale customer, transaction, and event datasets using Spark DataFrames and Spark SQL. Built and evaluated supervised and unsupervised machine learning models for churn prediction, customer segmentation, and behavior analysis. Implemented real-time analytics pipelines using Kafka and Spark Streaming to generate near-real-time predictive features and insights. Migrated on-prem analytical workloads to Azure Data Lake and Snowflake, enabling scalable analytics and model training environments. Developed Databricks notebooks to conduct exploratory data analysis, feature validation, and model performance assessment. Partnered with business and analytics stakeholders to define KPIs, success metrics, and model evaluation criteria. Integrated model outputs and analytical insights into BI dashboards and client reporting platforms. Optimized Spark jobs and data layouts to improve performance of analytics and model-training pipelines. Supported model deployment and batch scoring workflows by operationalizing data pipelines used in production environments. Ensured data quality, consistency, and reproducibility across analytical workflows through validation and monitoring checks. Environment: Linux, Cloudera, Apache Hadoop, HDFS, YARN, Hive, Spark, Scala, Pig, Sqoop, Kafka, Snowflake, Azure Data Lake Storage, Databricks, Spark SQL, Zookeeper, Spark Streaming ADP, Hyderabad, India | Jan 2014 -- Mar 2017 Data Scientist I Analyzed large-scale payroll and workforce datasets using Spark, Hive, and Python to support employee attrition, compliance, and workforce planning analytics. Performed exploratory data analysis and feature engineering on historical employee, payroll, and tenure data to enable downstream predictive modeling. Built scalable batch analytics pipelines on Azure HDInsight to aggregate and transform HR and payroll data into ML-ready analytical datasets. Engineered time-based, categorical, and behavioral features (tenure buckets, compensation deltas, role transitions) for attrition and workforce stability analysis. Developed statistical anomaly detection summaries to identify unusual payroll patterns and support audit and compliance investigations. Implemented Spark Structured Streaming pipelines on Azure Databricks to process workforce and payroll-related event data in near real time. Created reusable Hive views and Spark SQL transformations to support recurring analytical and hypothesis-driven business queries. Validated data quality and consistency through reconciliation checks between Hive, Azure Data Lake Storage, and downstream analytical systems. Prepared structured datasets and analytical summaries for senior data scientists to use in forecasting and regression-based workforce models. Integrated curated analytical datasets into Snowflake and Azure Synapse to enable BI reporting and exploratory analysis. Automated ingestion and transformation workflows using Azure Data Factory, Bash scripting, and Spark jobs to support scheduled and on-demand analytics. Collaborated with analytics, compliance, and business teams to translate workforce and payroll questions into scalable data science solutions. Environment Python, Spark, Spark Structured Streaming, Hive, Pig, MapReduce, Azure HDInsight, Azure Databricks, Azure Data Lake Storage Gen2, Azure Data Factory, Snowflake, Azure Synapse, Cassandra, HBase, Azure Functions, SQL, Bash, Linux Keywords: continuous integration continuous deployment artificial intelligence machine learning user interface javascript business intelligence sthree rlang Minnesota Ohio Texas |