| Nitesh - Data Scientist |
| [email protected] |
| Location: Auburn, Georgia, USA |
| Relocation: Yes |
| Visa: H1B |
| Resume file: Nitish_L_Resume_1761672379395.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Nitish L
Email: [email protected] Phone:- 214-810-7089 Professional Summary: IT Professional with 9 years of experience as a Data Scientist with a proven track record in transforming data into actionable insights to drive strategic business decisions in Healthcare, Pharma, Finance, Marketing & Retail Domains. Skilled in designing and implementing end-to-end data science solutions on cloud platforms such as AWS and Azure, resulting in cost savings of 20% and performance improvements of 25%. Strong expertise in data analysis, machine learning, and statistical modeling to solve complex business problems and improve operational efficiency. Proficient in leveraging advanced analytics and data visualization tools to communicate insights effectively, enhancing decision-making. Led cross-functional teams in developing innovative data-driven solutions, enhancing customer experiences, and increasing revenue growth by 15%. Adept at implementing data governance and compliance measures, ensuring data quality and regulatory adherence, achieving 100% compliance. Experienced in optimizing data delivery architectures, resulting in a 20% improvement in project efficiency and collaboration across marketing, sales, and business analytics domains. Proven ability to manage and streamline data pipelines and infrastructures, enhancing data accuracy and availability for real-time analysis. Strong communication and problem-solving skills, with the ability to present complex data insights to stakeholders and drive strategic initiatives, contributing to increase in strategic decision-making efficiency. Implemented CI/CD workflows for machine learning models, reducing deployment time by 20% and ensuring consistent model performance. Developed interactive visualizations and dashboards using Tableau and Power BI, improving data accessibility and user engagement by 30%. Leveraged MLOps practices for scalable model management and monitoring, increasing model reliability and reducing maintenance efforts by 25%. Set up automated model monitoring and alerting systems, ensuring continuous model accuracy and performance, and reducing downtime by 40%. Technical Skills: Languages & Models Python, R, PySpark. Classification: Logistic Regression, Decision Trees, Random Forest, Gradient Boosting Techniques, Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), Naive Bayes, Neural Networks, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM). Regression: Linear Regression, Polynomial, Stepwise Regression, Decision Tree Regression, Random Forest, Support Vector Regression, Ridge & Lasso Regression, Elastic Net, Principal Components Regression (PCR), Partial Least Squares (PLS) Regression, Bayesian Linear. Time Series: Autoregressive Integrated Moving Average (ARIMA), Seasonal Decomposition of Time Series (STL), Exponential Smoothing (ETS), Seasonal Autoregressive Integrated Moving-Average (SARIMA), Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX), Vector Autoregression (VAR), Seasonal Vector Autoregression (SARIMAX), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Seasonal-Trend Decomposition, State Space Models, Facebook Prophet, Kalman Filters, Holt-Winters' Method. NLP (Natural Language Processing) & Large Language Models (LLMs): Tokenization, Lemmatization, Stemming, Named Entity Recognition (NER), Part-of-Speech Tagging (POS), Text Classification, Sentiment Analysis, Topic Modeling (LDA, LSA), Word Embeddings (Word2Vec, GloVe, FastText), BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), Transformer Models, Seq2Seq (Sequence-to-Sequence) Models, Attention Mechanisms, PyTorch, TensorFlow Machine Learning & Data Analysis Packages Pandas, NumPy, Scikit-learn, TensorFlow, Keras, XGBoost, LightGBM, CatBoost, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Plotly, StatsModels, Prophet, NetworkX, Graph-tool, DGL (Deep Graph Library), Statsmodels, BeautifulSoup, Scrapy, Selenium Databases Snowflake, SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA. Version Control Tools GitHub, Azure DevOps, AWS CodeCommit, Bitbucket. Cloud Technologies Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), IBM Cloud, Databricks, Kubernetes, Docker AWS: Amazon S3, Amazon EC2, Amazon EMR, Amazon Redshift, Amazon Athena, Amazon SageMaker, AWS Glue, AWS Lambda. Azure: Azure Blob Storage, Azure Virtual Machines, Azure HDInsight, Azure Synapse Analytics, Azure Databricks, Azure Machine Learning, Azure Data Factory, Azure Functions GCP (Google Cloud Platform): Google Cloud Storage, Google Compute Engine, Google Cloud Dataproc, BigQuery, AI Platform (formerly Cloud Machine Learning Engine), Cloud Dataflow, Cloud Dataprep, Cloud Functions Databricks: Databricks Workspace, Databricks Runtime, Databricks Delta Lake, Databricks MLflow, Databricks Connect Machine Learning Deployment MLflow Tracking, MLflow Projects, MLflow Models, MLflow Registry, Kubernetes, Docker, Flask, TensorFlow Serving, PyTorch Serve, Amazon SageMaker, Azure Machine Learning, Google Cloud AI Platform, Apache Airflow, Kubeflow, AWS Lambda, Azure Functions, Google Cloud Functions, Heroku, Apache Kafka, Apache Flink, Apache Spark. BI Tools/ Reporting Tools Advance MS Excel, Tableau, Power BI, Amazon QuickSight, Google Data Studio, QlikView/Qlik Sense, Looker (Google Cloud), IBM Cognos Analytics, SAP Business Intelligence. Operating System Windows, Linux, Unix. Professional Experience: Verizon Connect, Remote June 2023 Current Senior Engineer Gen AI/ML Responsibilities: Reduced clinical documentation drafting time by 40-50% by spearheading the development and deployment of a GenAI-powered Clinical Documentation Assistant on Azure. Engineered a Python and FastAPI backend supporting enterprise-level GenAI applications capable of handling 1,000+ requests daily and scaling for up to 100 concurrent users within Optum's secure Azure environment. Improved information retrieval accuracy by implementing an advanced RAG pipeline on Azure, leveraging Azure OpenAI Service (GPT-4) and Azure Cognitive Search across a proprietary clinical knowledge base of over 15,000 curated medical documents. Enhanced AI workflow modularity and control by developing a multi-agent system with Langchain/LangGraph, orchestrating 4 core AI agents responsible for distinct information retrieval, content generation, and compliance formatting tasks. Increased LLM output relevance and consistency by developing and optimizing over 30 distinct prompt strategies and templates, meticulously aligned with communication standards and branding. Achieved a 25% improvement in first-pass acceptance rates of AI-generated drafts through a comprehensive evaluation framework, incorporating automated metrics and feedback from a panel of 15+ clinical domain experts. Ensured 99.9% uptime and high availability for GenAI applications by managing deployment and scaling using Docker and Azure Kubernetes Service (AKS), serving an initial user base of 500+ clinicians and medical staff. Successfully applied expertise across 5+ major LLM families (including BERT, GPT-4, and Llama variants) to architect and implement solutions for complex enterprise challenges in a regulated healthcare setting. Increased RAG retrieval precision for relevant medical information by over 20% by implementing hybrid search and efficiently fine-tuning models on a specialized vocabulary of 3,000+ domain-specific medical terms using parameter-efficient methods like LoRA and QLoRA. Decreased critical LLM hallucination rates by an estimated 60% through targeted prompt engineering, robust context augmentation from the validated document corpus, and iterative refinement based on continuous domain expert feedback. Fine-tuned a GenAI (RAG model) using Azure Machine Learning and Ray framework, reducing document retrieval time by 40%. Deployed the model on Azure Kubernetes Service (AKS), achieving 99.9% availability and handling over 30K requests per day. Established an automated CI/CD pipeline using Azure DevOps, reducing model deployment time by 60%. Integrated Azure Monitor and Log Analytics, improving error detection by 20%, and ensured 24/7 operational resilience. Deployed Azure Monitor and Application Insights for model performance tracking, leading to a 10% increase in model accuracy. Implemented a Human-in-the-Loop (HITL) feedback loop, enhancing content relevance and reducing false positives by 4%. Built a secure data pipeline using Azure Data Factory and Synapse Analytics, enabling real-time processing of 100,000 records per second. Integrated vector databases for efficient retrieval, improving response time by 30% while maintaining strict compliance with data governance policies. Integrated Azure Cognitive Search, OpenAI, and Cosmos DB, enabling seamless hybrid cloud operations. The system scaled to accommodate 5000 concurrent users with reliability, supporting enterprise-level workloads with zero downtime MTX Group Inc, Dallas, TX July 2022 May 2023 Senior Engineer - Artificial Intelligence Web Scraped Data using Python (Beautiful Soup, Scrapy) to extract customer feedback data from multiple websites, gathering over 500,000 plus reviews and comments. Stored the scraped data in Azure SQL Database, designing a normalized schema to ensure data integrity and optimized query performance, resulting in 50% faster data retrieval. Proficient in utilizing R, RStudio, and Rspark for comprehensive statistical analysis and feature engineering. Skilled in employing a wide array of R packages, including dplyr for data manipulation, ggplot2 for data visualization, caret for model training and evaluation, random Forest and xgboost for machine learning, forecast for time series analysis, lme4 for mixed-effects modeling, and rpart for decision tree models. Successfully developed and implemented advanced statistical models and machine learning algorithms in R Language to extract meaningful insights from complex datasets and enhance predictive model performance. Applied robust feature engineering techniques in R to improve model accuracy and interpretability, ensuring the delivery of actionable business insights. Leveraged R's rich ecosystem to automate workflows, streamline data processing, and optimize analytical outcomes. Developed NLP framework using ML models using R Language (SVMs, Random Forests, and Naive Bayes) and advanced models (SpaCy, GloVe, BERT, LSTM (RNN)) to analyze customer feedback, achieved sentiment scores ~95%. Integrated Azure SQL Database with Power BI, creating interactive dashboards that provided real-time visibility and reduced decision-making time by 40%. Deployed the NLP model on Azure Machine Learning and Azure Kubernetes Service (AKS), ensuring 99% uptime and seamless scalability. Employed Git for version control and implemented continuous integration and deployment (CI/CD) with Azure DevOps, reducing deployment time by 30% and improving development efficiency by 20%. Improved Customer Loyalty and NPS by 21%, contributing to a $2 million YoY revenue growth through data-driven insights and strategic recommendations. Led agile project management as a Scrum Master, orchestrating streamlined collaboration and progress tracking through daily stand-ups and sprint planning sessions. Fostered transparent communication and stakeholder alignment during the interim role, bridging the gap between the development team and stakeholders to ensure synergy with business objectives. Environment: Python, Azure SQL Database, Power BI, PySpark, R, VS Code, Azure Machine Learning and Azure Kubernetes Service (AKS), Azure DevOps. Amazon, Seattle, WA Feb 2022 June 2022 Data Scientist Responsibilities: Fine-tuned a GenAI (RAG model) using Azure Machine Learning and Ray framework, reducing document retrieval time by 40%. Deployed the model on Azure Kubernetes Service (AKS), achieving 99.9% availability and handling over 30K requests per day. Established an automated CI/CD pipeline using Azure DevOps, reducing model deployment time by 60%. Integrated Azure Monitor and Log Analytics, improving error detection by 20%, and ensured 24/7 operational resilience. Deployed Azure Monitor and Application Insights for model performance tracking, leading to a 10% increase in model accuracy. Implemented a Human-in-the-Loop (HITL) feedback loop, enhancing content relevance and reducing false positives by 4%. Built a secure data pipeline using Azure Data Factory and Synapse Analytics, enabling real-time processing of 100,000 records per second. Integrated vector databases for efficient retrieval, improving response time by 30% while maintaining strict compliance with data governance policies. Integrated Azure Cognitive Search, OpenAI, and Cosmos DB, enabling seamless hybrid cloud operations. The system scaled to accommodate 5000 concurrent users with reliability, supporting enterprise-level workloads with zero downtime Web Scraped Data using Python (Beautiful Soup, Scrapy) to extract customer feedback data from multiple websites, gathering over 500,000 plus reviews and comments. Stored the scraped data in Azure SQL Database, designing a normalized schema to ensure data integrity and optimized query performance, resulting in 50% faster data retrieval. Proficient in utilizing R, RStudio, and Rspark for comprehensive statistical analysis and feature engineering. Skilled in employing a wide array of R packages, including dplyr for data manipulation, ggplot2 for data visualization, caret for model training and evaluation, random Forest and xgboost for machine learning, forecast for time series analysis, lme4 for mixed-effects modeling, and rpart for decision tree models. Successfully developed and implemented advanced statistical models and machine learning algorithms in R Language to extract meaningful insights from complex datasets and enhance predictive model performance. Applied robust feature engineering techniques in R to improve model accuracy and interpretability, ensuring the delivery of actionable business insights. Leveraged R's rich ecosystem to automate workflows, streamline data processing, and optimize analytical outcomes. Developed NLP framework using ML models using R Language (SVMs, Random Forests, and Naive Bayes) and advanced models (SpaCy, GloVe, BERT, LSTM (RNN)) to analyze customer feedback, achieved sentiment scores ~95%. Integrated Azure SQL Database with Power BI, creating interactive dashboards that provided real-time visibility and reduced decision-making time by 40%. Deployed the NLP model on Azure Machine Learning and Azure Kubernetes Service (AKS), ensuring 99% uptime and seamless scalability. Employed Git for version control and implemented continuous integration and deployment (CI/CD) with Azure DevOps, reducing deployment time by 30% and improving development efficiency by 20%. Improved Customer Loyalty and NPS by 21%, contributing to a $2 million YoY revenue growth through data-driven insights and strategic recommendations. Led agile project management as a Scrum Master, orchestrating streamlined collaboration and progress tracking through daily stand-ups and sprint planning sessions. Fostered transparent communication and stakeholder alignment during the interim role, bridging the gap between the development team and stakeholders to ensure synergy with business objectives. Environment: AWS, Amazon S3, Amazon Redshift, Amazon QuickSight, SQL, Amazon Sagemaker, Python, PySpark, R, Amazon EC2, Model Development, Model Deployment. Emids Technologies Pvt Ltd - India April 2019 Aug 2021 Data Scientist Responsibilities: Engineered a model-based collaborative filtering recommender system using Databricks for data processing and Snowflake for data storage, increased cart value by 12% and average session time from 1.47 to 3.18 minutes through personalized recommendations. Applied advanced feature engineering techniques such as interaction terms and user behavior metrics in Databricks, which enhanced data quality and improved model accuracy by 15%. Utilized MLflow for tracking experiments, model registry, and deployment, leading to a 20% reduction in deployment time and ensuring reproducibility and streamlined model lifecycle management. Conducted A/B testing to evaluate the recommendation system's impact on click-through rates, achieving an 18% increase in CTR and optimizing marketing campaigns, which reduced the marketing budget by ~12%. Developed a comprehensive Google Analytics dashboard integrated with data from Snowflake, providing real-time insights into marketing campaign KPIs and enabling a 18% faster response time to market trends. Implemented continuous monitoring of model performance and system health using Databricks, ensuring sustained model accuracy and reliability, which improved overall system uptime by 20%. Environment: Python, Databricks, Snowflake, MLFlow, PySpark, Pandas-UDF s, EDA, Data Modelling, Model Development. Sasken Technologies Ltd India May 2016 March 2019 Data Analyst Responsibilities: Built out the data and reporting infrastructure from the ground up using Tableau and SQL to provide real-time insights into the product, marketing funnels, and business KPIs. Analyzed trends, perform forecasting, and track KPIs that impact consumer awareness and product management. Analysed and interpreted data from various sources to identify trends, patterns, and anomalies. Developed and optimized SQL queries and data manipulation scripts, improving query performance by 25%. Generated actionable reports and dashboards using data visualization tools (Tableau, PowerBI) for business stakeholders. Conducted data quality assessments and cleansing processes, resulting in a 20% increase in data accuracy. Collaborated with cross-functional teams to understand data requirements and translate them into analytical solutions. Performed ad-hoc data analysis to address specific business questions and challenges. Designed and maintained data models for efficient data storage and retrieval. Assisted in the development and execution of data-driven marketing campaigns, resulting in a 15% increase in conversion rates. Presented data-driven insights and recommendations to key decision-makers, contributing to improved strategies and decisions. Created data documentation and data dictionaries for reference and knowledge sharing. Collaborated with Data Engineers to optimize data pipelines for data ingestion and transformation. Environment: SQL, Tableau, Forecasting, Trend Analysis, EDA, Data Modeling, Ad-hoc Data Analysis, Data Manipulation. EDUCATION/CERTIFICATIONS Master s degree in Analytics and Project Management - University of Connecticut, Hartford, CT (2021) Bachelor of Science: GITAM University, Hyderabad, IN (2016) Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree database active directory rlang information technology microsoft mississippi Connecticut Texas Washington |