Home

Ganesh - AI ML Engineer
[email protected]
Location: Jersey City, New Jersey, USA
Relocation: yes
Visa: Green card
Resume file: Ganesh-AI ML Engineer_1773244224373.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Sai Ganesh
Email Id: [email protected]
linkedin.com/in/sai-ganesh-k-811aa618b

PROFESSIONAL SUMMARY:
Senior AI/ML Engineer and Data Scientist with 12+ years of experience delivering scalable machine learning, advanced analytics, and Generative AI solutions across financial services, healthcare, e-commerce, infrastructure, and transportation domains. Proven track record of developing predictive models, risk analytics systems, and intelligent decision-support platforms within regulated enterprise environments.
Extensive experience designing and deploying cloud-native AI systems on AWS and Azure, including data engineering pipelines, feature engineering frameworks, model training, real-time and batch inference, and large-scale distributed processing using PySpark and Apache Spark. Strong expertise in building production-grade ML platforms using Amazon SageMaker, AWS Bedrock, and Azure Machine Learning.
Hands-on experience with Generative AI, transformer-based NLP, and Retrieval-Augmented Generation (RAG), implementing semantic search, vector-based retrieval, and LLM-powered automation to enhance document intelligence and business workflows. Proficient in integrating OpenAI, Claude, and LLaMA models with enterprise systems using APIs, microservices, and event-driven architectures.
Demonstrated expertise in MLOps, model governance, explainability, and monitoring, including drift detection, SHAP-based interpretability, CI/CD automation, and secure deployment practices. Adept at aligning AI initiatives with compliance, auditability, and performance optimization requirements to support enterprise-scale decision-making.

TECHNICAL SKILLS:
GenAI & LLMs: OpenAI, GPT, Llama, Gemini, Claude, Anthropic, Crew AI, AutoGen, Lang Chain, Lang Graph, Prompt Engineering, RAG, Guardrails.
ML/AI Libraries & Frameworks: TensorFlow, PyTorch, Keras, Scikit-learn, XGBoost, LightGBM, CatBoost, H2O.ai, FastAI, MXNet, Transformers, Spark MLlib, Random Forest, Gradient Boosting, Time-Series Forecasting, SHAP.
Vector/LLM Databases: Pinecone, FAISS, ChromaDB, Azure Cognitive Search, Quadrant.
NLP Tools: BERT, TF-IDF, BM25, Word2Vec, Hugging Face Transformers, NLTK, spaCy, Elasticsearch, OpenSearch, Stanford NLP
Cloud Platforms: AWS (S3, Glue, Athena, Redshift, SageMaker, Bedrock, Lambda, ECS/Fargate, Step Functions, SQS, DynamoDB, API Gateway, CloudWatch, KMS), Azure (Azure Machine Learning, Azure Data Factory, Azure Blob Storage, Azure Key Vault, AKS, Azure DevOps), GCP (Vertex AI, BigQuery).
Big Data Ecosystem: Apache Spark, PySpark, SparkSQL, Kafka, Hadoop, HDFS, Hive, Oozie, Zookeeper, ETL Pipelines, Airflow, DBT
MLOps & DevOps: MLflow, Jenkins, Terraform, CloudFormation, Docker, Kubernetes, CI/CD, GitHub Actions.
Programming Languages: Python, SQL, PL/SQL, Scala, Java, R.
Databases (SQL/NoSQL): Snowflake, Redshift, MongoDB, Cassandra, PostgreSQL, MySQL, Oracle, SQL Server, CosmosDB, Neo4j, Big Query.
Version Control: Git, GitHub, Bitbucket, GitLab.
Operating Systems: Windows, Linux, Ubuntu, CentOS, Red Hat, MacOS

WORK EXPERIENCE:
Freddie Mac, McLean, VA | Nov 2024 Present
Sr. AI/ML Engineer
Project: Designed and delivered enterprise-grade AI/ML and Generative AI solutions for mortgage risk analytics, underwriting intelligence, and loan performance forecasting within a regulated financial environment. Built production-scale cloud platforms integrating predictive modeling, Retrieval-Augmented Generation (RAG), and agent-driven automation to enable explainable, audit-ready decision support across underwriting and servicing operations.
Key Responsibilities:
Integrated predictive modeling, RAG pipelines, and agent-driven automation into a unified AI-driven decision intelligence framework supporting underwriting and servicing operations.
Developed scalable data ingestion and transformation pipelines using Amazon S3, AWS Glue, Athena, and PySpark to process structured loan datasets and unstructured mortgage documents at enterprise scale.
Implemented Retrieval-Augmented Generation (RAG) workflows using Amazon Bedrock embeddings with vector store in FAISS and Pinecone to enable citation-backed semantic search across policy documents and loan files.
Applied large language models (Claude, Llama via Bedrock, OpenAI through API) for document summarization, guideline interpretation, and exception analysis to streamline underwriting and servicing review workflows.
Fine-tuned transformer-based models (BERT) for document classification and policy tagging to improve grounding accuracy and reduce hallucination risk in downstream GenAI workflows.
Built and deployed credit risk and delinquency prediction models (XGBoost, Logistic Regression) in Amazon SageMaker, incorporating SHAP-based explainability to support model governance and regulatory transparency.
Exposed ML models and GenAI services via FastAPI-based microservices and RESTful APIs, enabling secure integration with underwriting portals, case management systems, and downstream enterprise applications.
Integrated AI risk scores and document-grounded GenAI outputs into AI-assisted workflows using event-driven architectures with AWS Lambda, API Gateway, SQS, and Step Functions.
Designed multi-step AI agent workflows using LangChain and LangGraph to perform document validation, policy cross-referencing, and conditional action recommendations with human-in-the-loop approvals.
Orchestrated scalable and stateful agent execution using ECS/Fargate, DynamoDB, and S3, supporting concurrent processing of high-volume loan review scenarios.
Established model monitoring and evaluation frameworks using SageMaker Model Monitor and CloudWatch to track data drift, hallucination risk, grounding accuracy, latency, and inference cost across ML and LLM workloads.
Implemented CI/CD-driven MLOps pipelines for automated training, containerized deployment (Docker), version control, and infrastructure automation, ensuring reproducibility and controlled model lifecycle management.
Tech stack: Python, SQL, PySpark | AWS (S3, Glue, Athena, Redshift, SageMaker, Bedrock, Lambda, API Gateway, ECS/Fargate, Step Functions, SQS, DynamoDB, CloudWatch) | OpenAI SDK | FAISS, Pinecone | BERT | LangChain, LangGraph | SHAP | FastAPI | Docker | CI/CD

HCA Healthcare, Nashville, TN | May 2023 Oct 2024
Data Scientist/AI Engineer
Project: Designed and delivered cloud-based AI/ML solutions integrating EMR, claims, lab, and operational datasets to improve readmission risk prediction, patient stratification, and staffing optimization. Built scalable ML, NLP, and GenAI systems across hybrid AWS Azure environments to enhance clinical decision support and operational efficiency.
Key Responsibilities
Built and deployed patient risk scoring, 30-day readmission prediction, and length-of-stay forecasting models using XGBoost, TensorFlow, and Scikit-learn to improve proactive care management.
Developed transformer-based NLP pipelines (BERT, Hugging Face) to extract structured insights from clinical notes, discharge summaries, and physician documentation.
Applied parameter-efficient fine-tuning techniques (LoRA, QLoRA) to adapt transformer models for clinical text summarization and risk categorization, improving domain relevance while optimizing GPU memory usage.
Implemented Retrieval-Augmented Generation (RAG) workflows over clinical guidelines and treatment protocols to provide context-aware recommendations while reducing hallucination risks in AI-assisted summaries.
Engineered hybrid data ingestion pipelines using Azure Data Factory and AWS Glue to integrate EMR, claims, and lab datasets into Azure Blob Storage and Amazon S3 data lakes.
Trained and deployed ML models using Amazon SageMaker and Azure Machine Learning, supporting experimentation, validation, and production inference across cloud environments.
Containerized and deployed inference services using Docker, Kubernetes, and Azure Kubernetes Service (AKS) to enable scalable real-time and batch model serving.
Exposed risk scores and NLP outputs through secure FastAPI-based REST services integrated with EHR-linked enterprise applications.
Established MLOps and monitoring frameworks using MLflow, Azure DevOps pipelines, and automated retraining workflows with drift detection mechanism and automated retraining to maintain model stability across evolving patient populations.
Ensured HIPAA compliance through encrypted storage (Azure Key Vault, AWS KMS), role-based access controls, and secure inference endpoints.
Designed Spark-based feature engineering workflows to process high-volume clinical and operational datasets efficiently.
Tech stack: Python, SQL, PySpark | AWS (S3, Glue, Athena, SageMaker, KMS) | Azure (Azure Data Factory, Azure Machine Learning, Azure Blob Storage, Azure Key Vault, AKS, Azure DevOps) | XGBoost, TensorFlow, PyTorch, Scikit-learn | Hugging Face, BERT, LoRA, QLoRA | OpenAI APIs, LangChain, RAG | MLflow | Docker, Kubernetes | FAISS | Snowflake | FastAPI | CI/CD

Market America Greensboro, NC | Jul 2022 Mar 2023
Data Scientist
Project: Delivered scalable machine learning and NLP solutions for e-commerce customer analytics, search optimization, and recommendation systems. Built high-volume data processing and ranking pipelines on AWS to improve product discovery, personalization, and customer engagement.
Key Responsibilities:
Designed collaborative filtering and matrix factorization-based recommendation systems to enhance product discovery and reduce cold-start issues.
Implemented traditional NLP pipelines using TF-IDF, BM25 ranking, and N-gram features to improve search precision and query matching accuracy.
Integrated transformer-based models (BERT) for contextual embeddings and product classification, enhancing semantic relevance in search and recommendation workflows.
Built customer segmentation, churn prediction, and purchase propensity models using XGBoost, Random Forest, and Logistic Regression to support targeted marketing and personalization strategies.
Engineered large-scale PySpark ETL pipelines processing 500M+ records per day from Amazon S3 into Apache Iceberg tables queried via Athena.
Developed scalable feature engineering frameworks leveraging clickstream and transactional datasets to improve downstream model performance.
Deployed ML and NLP models as Docker-based microservices exposed via REST APIs to support real-time recommendation and ranking services.
Conducted A/B testing experiments to evaluate improvements in search relevance, recommendation quality, and customer engagement metrics.
Tech stack: Python, SQL, PySpark | AWS (S3, EC2, EMR, Athena) | Scikit-learn, Spark MLlib, XGBoost | TF-IDF, BM25, BERT | Elasticsearch / OpenSearch | Apache Iceberg | Docker | REST APIs | CI/CD

Baird Financial Services, CA | Nov 2020 Jun 2022
Data Scientist
Project: Worked on developing advanced analytics and machine learning solutions to enhance customer insights, credit risk assessment, and fraud detection across multiple financial product lines. The role focused on building scalable data pipelines, predictive models, and decision intelligence systems using cloud-based platforms and modern ML frameworks.
Key Responsibilities:
Designed and implemented scalable data pipelines for customer financial and transactional data using Python and Azure Data Factory, enabling predictive modeling and real-time analytics.
Built machine learning models for credit scoring, delinquency prediction, and fraud detection using Scikit-learn and PyTorch, deployed via Azure ML for production inference.
Applied advanced feature engineering and statistical analysis (ANOVA, correlation, regression) to uncover behavioral and demographic factors impacting credit performance.
Partnered with product and risk teams to enhance financial planning models, supporting segmentation and portfolio optimization across retail, mortgage, and commercial lending.
Conducted A/B testing and hypothesis testing to evaluate marketing and underwriting strategies, measuring impact on customer acquisition and default rates.
Developed and optimized SQL stored procedures, views, and analytical queries in SQL Server to support BI, compliance, and credit risk reporting.
Automated recurring reports and performance metrics using Python scripts and Azure Data Factory, reducing manual effort and reporting latency.
Collaborated on R&D initiatives using OpenCV and computer vision for check deposit fraud detection leveraging image-based analytics.
Tech stack: Python, SQL | Pandas, NumPy, SciPy, Statsmodels | Scikit-learn, PyTorch | Azure (Azure Machine Learning, Azure Data Factory, Azure DevOps) | SQL Server | OpenCV | Tableau | Seaborn, Matplotlib | Git | Windows, Linux

Equinix, Inc., Redwood City, CA | Oct 2018 Oct 2020
Data Scientist / ML Engineer
Project: Developed and operationalized machine learning solutions to optimize data center operations, predictive maintenance, and energy efficiency across Equinix s global infrastructure using large-scale telemetry and IoT data.
Key Responsibilities:
Built predictive maintenance and time-series forecasting models using sensor and telemetry data to anticipate equipment failures and optimize energy consumption.
Developed end-to-end ML pipelines using Python, Spark MLlib, and TensorFlow, supporting batch and near real-time inference workloads.
Applied unsupervised learning and statistical techniques for anomaly detection, improving system reliability and reducing unplanned downtime.
Implemented scalable ETL workflows using PySpark and Airflow to process multi-terabyte operational and IoT datasets.
Implemented model versioning, retraining, and performance tracking using MLflow, ensuring reproducibility and controlled model lifecycle management.
Integrated ML pipelines with AWS services (S3, EC2, Lambda) and automated deployments using Kubernetes and CI/CD pipelines.
Built executive dashboards and operational reports in Power BI and Tableau to visualize infrastructure KPIs, anomaly trends, and energy efficiency metrics.
Tech stack: Python, PySpark | TensorFlow, Scikit-learn | Spark MLlib | MLflow, Airflow | Docker, Kubernetes | AWS (S3, EC2, Lambda, SageMaker) | Power BI, Tableau | Git | Linux

Amtrak, Washington, DC | Jun 2016 Aug 2018
Data Scientist
Project: Worked on data-driven optimization initiatives focused on improving passenger experience, predictive maintenance, and operational efficiency across Amtrak s national rail network. The role involved developing machine learning models and analytical dashboards that provided insights into ridership trends, asset reliability, and schedule performance.
Key Responsibilities:
Collected, cleansed, and processed structured and unstructured datasets from ticketing systems, IoT sensors, and customer feedback sources.
Built predictive models for train delay forecasting, maintenance scheduling, and demand forecasting using Python, Scikit-learn, and Spark MLlib.
Implemented classification and regression models to identify factors contributing to delays and optimize on-time performance.
Designed and automated ETL workflows using Python, PySpark, and SQL for integrating data from multiple internal systems.
Conducted feature engineering and model validation to improve accuracy, recall, and F1 scores for predictive models.
Created interactive dashboards and visual analytics in Power BI and Tableau for operations and executive leadership teams.
Partnered with infrastructure and operations teams to embed ML insights into business decision workflows.
Utilized AWS S3, Lambda, and EC2 for scalable data processing and model execution.
Performed root cause analysis and scenario simulations to support cost optimization and maintenance planning.
Tech Stack: Python, PySpark, SQL, R | Scikit-learn, TensorFlow | Spark MLlib | AWS (S3, EC2, Lambda) | Tableau, Power BI | Hadoop | Git | Jupyter | Linux

MFS Investment Management, Boston, MA | Jun 2014 May 2016
NLP Engineer
Project: Developed NLP and predictive analytics solutions to extract insights from unstructured financial data, enabling faster investment research, sentiment analysis, and portfolio decision-making.
Key Contributions:
Built NLP pipelines for text ingestion, cleaning, tokenization, and named-entity recognition using Python (NLTK, spaCy) and Scikit-learn.
Developed sentiment analysis models to evaluate company outlooks and financial narratives, improving the speed and consistency of analyst evaluations.
Implemented topic modeling (LDA, TF-IDF) and clustering algorithms to identify key market drivers from news feeds and internal research documents.
Designed and executed predictive models to forecast asset movements based on historical market indicators and text-based sentiment patterns.
Integrated structured and unstructured data from multiple sources (Bloomberg, internal reports, and web APIs) into centralized analytics repositories.
Created visualization dashboards in Tableau and matplotlib to communicate insights and trends to investment strategists and risk teams.
Worked closely with portfolio managers and data engineering teams to operationalize models for near-real-time use in investment workflows.
Conducted model validation, performance tuning, and documentation to ensure transparency and reproducibility of analytical results.
Tech Stack: Python, R | Pandas, NumPy | NLTK, spaCy | Scikit-learn, TensorFlow | Hadoop, Spark | AWS (EC2, S3) | SQL Server, Oracle | Tableau | REST APIs | Bloomberg Data Feeds | Git | Jupyter | Linux/Unix

Landmark Systems & Solutions | Nov 2010 Dec 2013
Data Analyst
Project: The project focused on supporting infrastructure and finance operations through data-driven reporting and MIS frameworks. It involved consolidating large volumes of financial, operational, and transactional data from multiple internal systems to enable management reporting, performance tracking, and decision support.
Key Contributions:
Designed and maintained MIS and management reports using structured datasets generated through ETL-based reporting workflows.
Performed data extraction, transformation, and loading (ETL) from source systems into reporting tables using SQL queries, Excel-based transformations, and batch processing techniques.
Built and optimized reporting datasets and summary tables to support finance, infrastructure, and operations teams.
Developed automated reports using Excel (advanced formulas, pivot tables, VBA macros) to streamline recurring reporting cycles.
Conducted data validation, reconciliation, and quality checks across source and target datasets to ensure reporting accuracy.
Analyzed financial and operational data to identify trends, variances, and performance gaps, supporting budgeting and audit activities.
Collaborated with stakeholders to gather reporting requirements and translate them into ETL logic and reporting outputs.
Supported ad-hoc analysis and data requests by creating custom extracts and summary reports.
Tech Stack: Excel (Advanced), VBA | SQL | ETL Reporting | MIS Reporting & Dashboards | Data Validation & Reconciliation


Education: Bachelor of Computer Science, Lovely Professional University, India.
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree active directory rlang information technology trade national procedural language California Idaho Massachusetts North Carolina Tennessee Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6977
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: