| Dhanush - gen & AI engineer |
| [email protected] |
| Location: Columbus, Ohio, USA |
| Relocation: yes |
| Visa: |
| Resume file: DHANUSH_KATA_1777566398358.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Dhanush Kata
GenAI Engineer | LLM & Agentic AI Specialist [email protected] PROFESSIONAL SUMMARY GenAI Engineer and Python Developer with 6+ years of experience in Generative AI, LLM engineering, agentic AI systems, Python, and full-stack AI application development, delivering production-grade GenAI and LLM-powered solutions across finance, banking, retail, and enterprise domains. Strong expertise in building end-to-end Python-based AI/ML pipelines, including data ingestion, preprocessing, feature engineering, model training, evaluation, deployment, and monitoring. Hands-on experience with ML algorithms such as Random Forest, Gradient Boosting, XGBoost, LightGBM, ARIMA, and neural networks for predictive analytics and forecasting. Proficient in Natural Language Processing (NLP) including text classification, embeddings, semantic search, named entity recognition, and transformer-based models. Experienced in developing Generative AI applications using OpenAI models, including prompt engineering, text generation, summarization, and intelligent automation. Knowledge of Retrieval-Augmented Generation (RAG) architectures, vector databases (Pinecone, Chroma, pgvector), and LangGraph/LlamaIndex-based agent workflows for building context-aware AI systems and enterprise copilot solutions. Skilled in MLOps practices, including model versioning, experiment tracking, CI/CD pipelines, and workflow orchestration using tools like MLflow and Apache Airflow. Experienced in building LangGraph-based multi-agent systems and copilot applications with tool/function calling, prompt orchestration, and Human-in-the-Loop (HITL) workflows to support enterprise AI automation and user-centric AI experiences. Skilled in developing React-based frontend applications to deliver intuitive AI-driven user experiences, including explainability dashboards, feedback interfaces, and conversational AI frontends. Experienced in deploying ML/AI models using Docker, Kubernetes, and cloud platforms (Azure, AWS, GCP) for scalable, real-time, and batch inference systems. Strong experience in building Python-based RESTful APIs and microservices for LLM inference, agent orchestration, and GenAI workflow integration to integrate AI/ML models into production applications and business workflows. Proficient in big data technologies (Spark, PySpark, Hadoop) to support large-scale GenAI data pipelines and LLM training data preparation and data visualization tools such as Power BI and Tableau for large-scale data processing and insights generation. TECHNICAL SKILLS Programming Python (6+ yrs), SQL, R AI / ML / Deep Learning TensorFlow, PyTorch, Keras, Scikit-learn, XGBoost, LightGBM, CatBoost Generative AI & NLP OpenAI GPT-4/4o, Claude (Anthropic), Gemini, Prompt Engineering, LangChain, LangGraph, LlamaIndex, AutoGen, CrewAI, Agent Frameworks NLP & AI Techniques Text Generation, Embeddings, Semantic Search, Fine-tuning (LoRA/PEFT), RAG, Advanced RAG (HyDE, Re-ranking), Vector DBs (Pinecone, Chroma, pgvector, Weaviate), Tool/Function Calling, MCP Protocols, HITL Workflows, Multilingual NLP, NER Big Data & Data Engineering Hadoop, Spark, PySpark, Hive, MapReduce Cloud Platforms AWS, Azure, GCP, Azure Machine Learning, Azure Cognitive Services MLOps & Deployment Docker, Kubernetes, Kubeflow, MLflow, Airflow, CI/CD Databases MySQL, Oracle, PostgreSQL Web & APIs React, Node.js, Django, Flask, FastAPI, REST APIs Visualization & Tools Tableau, Power BI AI Observability & Evaluation MLflow, Langfuse, Prompt Tracking, Model Evaluation Metrics GenAI Platforms & Tools Azure OpenAI, AWS Bedrock, Google Vertex AI, Hugging Face, Ollama, GitHub Copilot PROFESSIONAL EXPERIENCE Client: Fifth Third Bank, Cincinnati, OH Sep 2025 Present Role: AI & GenAI Engineer Project: Banking AI Platform Developed a Loan Underwriting Assistant using a RAG (Retrieval-Augmented Generation) setup all credit policy documents were converted into vector embeddings using the Hugging Face Embeddings API and stored in Pinecone and pgvector so the system could search them instantly. Used Chroma during development for local testing. Added HyDE (generates a hypothetical answer first to improve search quality) and a LangChain re-ranking layer to sort results by relevance before passing them to the LLM. Underwriters could now ask a question in plain English and get a grounded, policy-cited answer in seconds instead of reading hundreds of pages manually. Used LangGraph to build a multi-agent workflow with three CrewAI agent roles a retrieval agent (searches Pinecone/pgvector), a compliance-validation agent (checks the answer against policy rules), and a formatting agent (structures the final response with citations). Connected OpenAI GPT-4/4o via Azure OpenAI to generate the final answer from the validated context, using LangChain prompt templates with few-shot examples to keep answers consistent and policy-grounded. Also used LlamaIndex for document indexing and AutoGen for orchestrating agent conversations in certain workflow steps. Built a Customer Service AI Assistant that handled thousands of daily bank customer queries on accounts, loans, and products. Used LangGraph to design a four-agent workflow a routing agent, a knowledge-retrieval agent (searches Pinecone/pgvector), a live-data agent (uses Tool/Function Calling to query live bank SQL/PostgreSQL databases for real account balances and loan status), and a handoff agent. Used Weaviate to store conversation memory across sessions so returning customers had full context continuity. When model confidence dropped, a Human-in-the-Loop (HITL) workflow paused the conversation and passed the full state to a human agent. Ran the primary LLM on Azure OpenAI (GPT-4/4o) with AWS Bedrock as a high-availability backup. Used Claude (Anthropic) for compliance-sensitive financial queries where accuracy and careful tone were critical. Applied few-shot prompting, chain-of-thought reasoning, and structured output techniques via LangChain prompt templates. Used Azure Cognitive Services and Azure Bot Framework for OCR, image classification, intent detection, multilingual NLP, document automation, text summarization, translation, and NER across enterprise chatbot workflows. Built Credit Risk Scoring and Customer Churn Prediction models using XGBoost, LightGBM, and ARIMA to support loan approval decisions and proactive customer retention trained on structured banking data with feature engineering using Scikit-learn, Pandas, and NumPy. Used TensorFlow and PyTorch for a custom intent classifier that pre-routed customer messages before LLM calls, reducing latency. Added SHAP-based explainability so compliance officers could see exactly why the model made a decision before approving any automated action. Built the customer and underwriter-facing UIs in React with a Node.js backend. Monitored LLM response quality, token costs, and latency using Langfuse; tracked all model versions and prompt experiments in MLflow. Built Power BI and Tableau dashboards to show model outputs and business KPIs to bank stakeholders. Used GitHub Copilot and AI coding assistants throughout development to speed up writing boilerplate code. Deployed everything as Docker containers on Kubernetes via Azure ML, exposed through FastAPI and Flask REST APIs across Azure, AWS, and GCP for geo-redundant serving. Environment: Python, SQL, Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch, XGBoost, LightGBM, OpenAI GPT-4/4o, Claude (Anthropic), LangChain, LangGraph, LlamaIndex, AutoGen, CrewAI, Azure OpenAI, AWS Bedrock, React, Node.js, Azure ML, Docker, Kubernetes, MLflow, Langfuse, Pinecone, Chroma, pgvector, Weaviate, FastAPI, Flask, GitHub Copilot, Hugging Face, Azure Cognitive Services, Azure Bot Framework, Azure, AWS, GCP Client: Walmart, California Aug 2024 Jun 2025 Role: AI Engineer Project: Retail AI & Data Platform Set up fully automated end-to-end ML training pipelines on GCP and Azure for Walmart's pricing and supply chain models. Raw retail data (transactions, clicks, inventory records) was stored in Hadoop HDFS and queried using Hive SQL (backed by MapReduce), then processed with PySpark on Databricks where distributed feature engineering ran across terabytes of data in hours. Trained Scikit-learn, XGBoost, and LightGBM models on each processed dataset, with all runs tracked in MLflow Model Registry for version comparison and rollback. Managed all cloud infrastructure (clusters, storage, networking) with Terraform on GCP and Azure. Orchestrated every pipeline step with Apache Airflow DAGs data pull, Spark transform, train, evaluate, deploy running weekly on schedule with automatic alerts on failures. Built CI/CD pipelines using Jenkins and GitHub Actions to automatically test and deploy pipeline code on every push. Set up a model monitoring and data drift detection system that logged real-time predictions against actual sales in MLflow, computed rolling accuracy metrics (MAE, RMSE, MAPE) using Python, Pandas, and NumPy, stored results in PostgreSQL and MySQL, and automatically triggered Airflow retraining DAGs when models drifted critical during high-traffic periods like Black Friday. Developed a Customer Review Sentiment Analysis platform that processed millions of Walmart product reviews automatically. Used spaCy for named entity recognition (extracting product names, brands, and specific attributes customers mentioned) and dependency parsing to link each attribute to the sentiment expressed about it. Used NLTK for text preprocessing tokenization, stopword removal, stemming, and n-gram extraction. Ran the full NLP pipeline in parallel across millions of reviews using PySpark on Databricks, cutting batch processing time from days to hours. Trained Scikit-learn text classifiers (Logistic Regression, SVM, Naive Bayes) on labeled reviews for positive/negative/neutral scoring. Stored all sentiment results and topic summaries in PostgreSQL and MySQL. Applied SHAP-based model explainability and fairness audits across customer-facing models for responsible AI compliance. Scheduled daily pipeline refresh with Apache Airflow. Deployed all models as Docker containers on Kubernetes on AWS and GCP with REST API endpoints. Built Tableau and Power BI dashboards showing accuracy trends, drift alerts, and sentiment insights for category managers and leadership teams. Environment: Python, Spark, PySpark, Hadoop, Hive, MapReduce, Databricks, Apache Airflow, GCP, Docker, Kubernetes, Terraform, MLflow, CI/CD (Jenkins / GitHub Actions), PostgreSQL, MySQL, Power BI, Tableau, Scikit-learn, NLP (spaCy, NLTK), XGBoost, LightGBM, Pandas, NumPy, SHAP, Azure, AWS, REST APIs Client: Siemens, Bangalore, India Sep 2021 Aug 2023 Role: AI & ML Engineer Project: Industrial AI Platform Developed a Predictive Maintenance system using Random Forest and XGBoost trained on two years of labeled sensor data (temperature, vibration, pressure, speed) from factory equipment. Engineered features including rolling averages, rate-of-change values, vibration spike counts, and cross-sensor correlations using Scikit-learn's preprocessing pipeline. Used Statsmodels for time-series decomposition to separate trend and seasonal components from raw sensor signals. Performed all exploratory data analysis in Jupyter Notebook using Matplotlib and Seaborn sensor distribution plots, correlation heatmaps, fault occurrence timelines. Evaluated models using Precision, Recall, F1-score, and RMSE with Grid Search and Random Search hyperparameter tuning. Built an automated end-to-end sensor data ingestion and model training pipeline custom Python connector scripts pulled data from SCADA systems, IoT devices, and PLC logs in different formats; Pandas and NumPy handled cleaning (missing value imputation, outlier removal, timestamp resampling); Scikit-learn preprocessing pipelines applied feature scaling and a custom sliding-window aggregate transformer. Trained Random Forest, Decision Trees, Linear Regression, and SVM models automatically on each new data batch. Matplotlib and Seaborn auto-generated evaluation report charts (precision/recall curves, feature importance plots) saved to the QA folder after every run. Also developed production yield and energy consumption forecasting models connected to Siemens' plant management system so operators could see real-time predictions and adjust settings to cut waste. Maintained all code and dataset versions in Git with basic CI/CD automation. Built production REST APIs using Flask and Django connecting the ML models to Siemens' Manufacturing Execution System (MES) and Quality Management System. Flask handled real-time low-latency prediction endpoints the MES sent sensor readings as a POST request, the API applied Scikit-learn fitted transformers using Pandas and NumPy, ran XGBoost/Random Forest inference, and returned a fault probability score in milliseconds. Django REST Framework handled management endpoints (historical logs, model metrics, equipment profiles) backed by PostgreSQL and MySQL. Used Statsmodels for a trend analysis endpoint and Matplotlib/Seaborn for on-demand chart generation endpoints returning base64-encoded sensor images in the JSON response. Deployed as Docker containers on GCP. Environment: Python, Pandas, NumPy, Scikit-learn, Statsmodels, Matplotlib, Seaborn, Flask, Django, REST APIs, MySQL, PostgreSQL, Git, Jupyter Notebook, GCP, ML algorithms (Random Forest, Decision Trees, Linear Regression, SVM, XGBoost), Feature Engineering, Docker, CI/CD basics Client: Persistent Systems, Bangalore, India Jun 2018 Sep 2021 Role: Python Developer Project 1: Enterprise Client Portal Developed the full-stack backend for an Enterprise Client Portal using Python and Django with schema-per-tenant separation in PostgreSQL each enterprise client had completely isolated data at the database level. Built Django models, views, serializers, role-based access control, report generation, and ERP/CRM integration endpoints. Used Flask for lightweight internal microservice APIs. Exposed all functionality via Django REST Framework REST APIs. Built portal pages in JavaScript, HTML, and CSS forms, data tables, report viewers, and dashboards. Used Pandas and NumPy in the report layer for data aggregation and Excel/CSV export. Designed and indexed PostgreSQL and MySQL schemas for performance. Configured Nginx as reverse proxy (SSL termination, routing, caching) and Apache for static file serving. Monitored production health using Dynatrace tracked response times, slow queries, error rates, and memory usage with alert thresholds. Containerized with Docker (basic) on Linux servers. Maintained all code in Git with a CI/CD pipeline for automated testing and deployment. Used structured logging analyzed through Dynatrace and Linux performance profiling tools for debugging under load. Project 2: Data Integration Platform & Healthcare Database Optimization Built a Data Integration Platform as unified REST APIs using Django REST Framework and Flask custom Python connector scripts extracted data from clients' ERP, CRM, and SaaS tools; Pandas and NumPy cleaned and transformed it (field normalization, type conversion, null handling, business rules); all data landed in PostgreSQL and MySQL with a normalized schema. Monitored all API endpoints with Dynatrace (latency, error rates, slow queries). Nginx handled load distribution and rate limiting. Deployed on Linux with Apache for static pages. Built JavaScript, HTML, CSS integration status dashboards. Used performance profiling tools to optimize slow transformation steps. Docker (basic) for environment consistency across dev and production. All code in Git with CI/CD. Redesigned MySQL and PostgreSQL schemas for a healthcare client managing patient records, appointments, lab results, and billing diagnosed slow queries using Django ORM query inspection and SQL EXPLAIN plans (found full table scans, missing indexes, redundant joins). Restructured PostgreSQL with normalization, date-range partitioning, and composite indexes; rewrote MySQL stored procedures and replaced nested subqueries with efficient JOINs. Ran the data migration ETL in Python with Pandas and NumPy batched extraction, transformation, validation, and reload. Rebuilt all Django ORM models and updated every dependent Django REST Framework and Flask API endpoint. Updated JavaScript, HTML, CSS admin UI. Monitored post-migration query performance with Dynatrace previously slow reports now ran in seconds. Deployed on Linux with Nginx/Apache, versioned in Git with CI/CD regression tests, Docker (basic) throughout. Environment: Python, Django, Flask, REST APIs, MySQL, PostgreSQL, JavaScript, HTML, CSS, Pandas, NumPy, Git, Linux, Dynatrace, Logging & Monitoring tools, Performance profiling tools, Apache server, Nginx, Docker (basic), CI/CD pipelines EDUCATION: Masters in Data Science Lindsey Wilson University, 2025 Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning user interface javascript business intelligence rlang information technology Ohio |