| Nava Chaitanya - Sr. Data Scientist |
| [email protected] |
| Location: Dallas, Texas, USA |
| Relocation: |
| Visa: |
| Resume file: Nava- Data Scientist Resume_1758643398624.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
PROFESSIONAL SUMMARY:
Senior Data Scientist with 12 years of experience delivering end-to-end data solutions across Healthcare, Finance, Retail, Telecom, E-commerce, and Manufacturing domains. Started career as a Data Analyst, gaining strong foundations in data collection, cleaning, and reporting, and progressed to designing and deploying advanced machine learning and AI models. Adept at translating complex business problems into actionable insights, optimizing operational workflows, and enabling data-driven decision-making. Proficient in a wide range of technologies and tools, including Python, R, SQL, Java, Big Data frameworks (Hadoop, Spark, Databricks), cloud platforms (AWS and Azure), ML/DL libraries (TensorFlow, PyTorch, Keras, scikit-learn), NLP (spaCy, NLTK, Transformers), visualization tools (Tableau, Power BI), and MLOps tools (Docker, Kubernetes, MLflow, Airflow). Experienced in building predictive, classification, clustering, and recommendation models, applying statistical methods, dimensionality reduction, and deep learning techniques to deliver actionable business solutions. Demonstrated expertise in managing multiple clients and projects, delivering solutions for 5 7 major clients across diverse industries. Skilled at mentoring junior data scientists, deploying scalable machine learning models, automating ETL pipelines, and integrating cloud-based solutions to enhance productivity and business outcomes. Known for strong analytical thinking, problem-solving skills, and the ability to communicate technical concepts to non-technical stakeholders. TOOLS AND TECHNOLOGIES: Languages SQL, Python, JAVA, JavaScript, jQuery, ReactJS, Next.js, HTML, CSS, C, C++, Angular, R, Impala, Hive Statistical Methods Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto- correlation Artificial Intelligence/ Machine Learning Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN and Ensemble Method R Package dplyr, sqldf, data table, Random Forest, gbm, caret Big Data Hadoop, Spark Python Packages NumPy, Scipy, Pandas, Matplotlib, Seaborn, scikit-learn, Requests, urllib3, NLTK, Pillow, Pytest Deep Learning CNN, RNN, ANN, Reinforcement learning, Transfer Learning, TensorFlow, PyTorch, Keras Python Framework Django, Flask Methodologies SDLC Agile/ Scrum, TDD, BDD Databases SQL, MYSQL, MongoDB, Oracle Cloud AWS - EMR, EC2, ENS, RDS, S3, Athena, Glue, Elastic search, Lambda, SQS, DynamoDB BI/ Analysis Tools SAS, Stata, Tableau, Power BI, Docker, Git, SAP, MS Office Suite, Anaconda, SSIS Data Modelling Snowflake, Star Schema Reporting Tools Tableau, Power BI Operating Systems Windows, Linux EDUCATION: Bachelors in computer science and engineering from Osmania University, India. Mar 2011 Masters in computer science from the University of Houston, Clear Lake, USA. Dec 2012 PROFESSIONAL EXPERIENCE Role: Sr. Senior Data Scientist Client: ETC (Electronic Transaction Consultants) May 2023 to Present Responsibilities: Built predictive models to identify toll violations and high-risk fraudulent transactions using Python, R, Random Forest, XGBoost, and neural networks, integrating vehicle profiles, trip history, and payment records; deployed models on AWS SageMaker. Developed NLP pipelines with spaCy, NLTK, and Hugging Face Transformers to extract structured insights from unstructured customer service logs and license plate data, processing large datasets with Spark and storing results in Snowflake and AWS S3. Created interactive dashboards in Tableau and Power BI for transportation authorities to monitor toll revenue, vehicle classifications, and operational KPIs. Automated ETL workflows for large-scale tolling data using Spark, Airflow, AWS Glue, and Snowflake to streamline analytics and reporting. Containerized ML models and APIs with Docker and deployed via Kubernetes, ensuring reproducible and scalable mobility analytics pipelines. Managed ML experiment tracking, versioning, and monitoring with MLflow to maintain model governance and track predictive performance across tolling systems. Built recommendation systems to optimize dynamic pricing and congestion management, leveraging ensemble models and advanced feature engineering. Applied dimensionality reduction techniques such as PCA and t-SNE to simplify high-dimensional vehicle and trip datasets for machine learning applications. Conducted cross-validation, hyperparameter tuning, and ensemble methods to optimize predictive accuracy for traffic forecasting and violation detection models. Processed large-scale toll and traffic datasets using Hadoop, Spark, and Databricks to enable real-time and batch analytics. Integrated cloud data sources including AWS S3, RDS, Athena, and DynamoDB to securely store and query transaction data. Designed and developed Java-based microservices and React front-end components capable of processing healthcare data encoded with ICD-10 and ICD/CPT codes for clinical and billing workflows. Built secure API integrations to consume and transmit HL7 and FHIR messages between Java services and external healthcare systems. Integrated LangChain and LangGraph within Java services to orchestrate LLM-powered workflows alongside traditional microservices. Developed CrewAI agentic pipelines using Java and Spring Boot to automate complex, multi-step healthcare processes Created AutoGen-driven components within the full stack application to dynamically generate responses and workflows for end users. Mentored junior data scientists, providing guidance on model development, cloud deployment, and MLOps best practices for transportation analytics. Ensured compliance with government data regulations and maintained data quality standards across tolling datasets. Applied reinforcement learning and transfer learning to optimize traffic routing, congestion mitigation, and toll pricing strategies. Developed automated monitoring systems to detect model drift in traffic and violation detection models, alerting teams for timely interventions. Role: Data Scientist Client:The Hartford Aug 2021 to Apr 2023 Responsibilities: Built recommendation engines for millions of users using collaborative filtering, content-based methods, LightGBM, and Python, integrated into production via Flask APIs. Conducted A/B testing and ML-driven optimization for marketing campaigns using Python, SQL, and Power BI dashboards to visualize results. Developed ETL pipelines using Spark, Airflow, and AWS EMR for high-volume clickstream and transaction datasets. Designed real-time streaming analytics with Spark Streaming and Kafka for live recommendation updates. Applied text mining and sentiment analysis on product reviews using Python NLP libraries for trend detection and customer feedback analysis. Built CNN models for image classification to automate product categorization and tagging. Automated model retraining, monitoring, and versioning with MLflow to maintain high prediction accuracy. Integrated cloud-based data sources including AWS S3, RDS, and Athena for scalable and secure storage. Mentored junior data scientists on machine learning, cloud deployment, and data visualization best practices. Applied dimensionality reduction, feature engineering, and ensemble methods to optimize model performance. Created AutoGen-driven components within the full stack application to dynamically generate responses and workflows for end users. Ensured compliance, auditing, and performance tuning across all modules handling ICD-10, HL7, X12 EDI, and FHIR data within the Java Full Stack environment. Developed CrewAI agentic pipelines using Java and Spring Boot to automate complex, multi-step healthcare processes. Developed interactive dashboards in Tableau and Power BI to visualize user behavior, sales metrics, and inventory performance. Collaborated with engineers to deploy models using Docker and Kubernetes for scalable production systems. Role:Data Scientist Client: Pfizer |Alpharetta,GA July 2019to July 2021 Responsibilities: Developed NLP pipelines using spaCy, NLTK, and Transformers to extract entities from clinical notes, storing structured outputs in Snowflake for downstream analytics. Built predictive models in Python and R to forecast patient risk and optimize hospital resource allocation. Created dashboards in Tableau and Power BI to monitor patient flow, medication usage, and hospital operations. Automated ETL pipelines with Spark, AWS Glue, and Snowflake for processing large-scale clinical datasets. Applied machine learning techniques such as Random Forest, SVM, and neural networks for patient outcome prediction. Conducted cohort analysis to identify patient treatment patterns and improve care strategies. Used Databricks and Spark for high-performance processing of clinical and operational datasets. Applied dimensionality reduction and feature selection to improve model interpretability and accuracy. Built ensemble models combining multiple algorithms for more robust predictions. Applied time series models for forecasting medication stock levels and patient inflow. Mentored junior data scientists on ML methods, data processing, and cloud deployment best practices. Role: Data Scientist Client: UnitedHealth Group| New York, NY Jan 2017 to Jun 2019 Responsibilities: Built fraud detection models using Python, R, and SQL with supervised and unsupervised learning to identify suspicious transactions. Developed portfolio risk forecasting models using regression, time series, and ensemble methods to aid investment decisions. Created dashboards in Tableau and Power BI to visualize financial KPIs, fraud trends, and portfolio risks. Processed large-scale financial datasets using Spark, SQL, and Databricks to improve speed and efficiency. Applied cross-validation, hyperparameter tuning, and ensemble methods to improve model accuracy. Automated data pipelines and model scoring with Python, Airflow, and AWS Lambda. Built unsupervised learning models to detect unusual account behavior for early risk intervention. Deployed models securely in cloud environments and ensured compliance with financial regulations. Mentored junior analysts and data scientists on ML algorithms, data processing, and visualization. Role: Data Analyst Client: Walmart| Bentonville, AR Feb 2015 to Dec 2017 Responsibilities: Built sales forecasting models using ARIMA, Prophet, and Python to improve inventory planning and reduce stockouts. Developed recommendation engines for cross-selling using collaborative filtering and content-based methods. Optimized inventory allocation using clustering and predictive analytics for better product availability. Created dashboards in Tableau to monitor sales, stock levels, and promotion effectiveness. Processed large datasets using Spark, SQL, and Snowflake for faster and scalable analysis. Automated reporting pipelines with Python and Airflow to save manual effort. Applied feature engineering, dimensionality reduction, and ensemble methods for predictive modeling. Conducted customer segmentation and pricing optimization using regression and decision tree techniques. Role: Data Analyst Client: Verizon Telecom | New York, NY Feb 2013 to Jan 2015 Responsibilities: Built predictive models in Python and R to forecast customer churn and retention, improving retention strategies. Developed dashboards in Tableau and Power BI to visualize customer metrics, campaign effectiveness, and network usage. Applied statistical methods such as hypothesis testing and ANOVA to understand customer behavior. Built dashboards in Tableau and Excel to monitor production KPIs, machine utilization, and workflow efficiency. Automated weekly reporting using Python scripts to reduce errors and save time. Conducted statistical analysis to identify operational inefficiencies and optimize production processes. Built predictive maintenance models using regression and classification techniques to reduce machine downtime. Developed visualizations to highlight production bottlenecks and resource allocation. Analyzed production datasets using SQL and R to provide actionable insights to engineering teams. Developed basic forecasting models to estimate production outcomes and maintenance schedules. Collaborated with engineering teams to implement data-driven process improvements. Automated reporting processes using Python and SQL for faster and accurate delivery. Analyzed network traffic data using Hive and Impala to optimize resource allocation. Built predictive models for call center load forecasting to improve staffing efficiency. Collaborated with marketing teams t Keywords: cprogramm cplusplus artificial intelligence machine learning javascript business intelligence sthree rlang microsoft Arkansas Georgia New York |