Resume View

Home

Miharika - Lead Data Scientist/ ML Engineer

Location: Maitland, Florida, USA

Relocation: Yes

Visa: H4EAD

Resume file: Niharika_Resume_1763059473487.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

NIHARIKA Y
Email: [email protected]
Contact No: (585) 505-6972

CAREER SUMMARY
10+ years of experience in Data analysis, Predictive modeling, Statistical analysis with Python, SAS and SQL programming.
Strong expertise in Exploratory data analysis, Machine learning and Predictive analytics.
Proficient in advanced statistics and data mining algorithms- Analysis of Variance (ANOVA), ML and Deep learning techniques like ensemble methods, ANN, CNN, RNN, Hypothesis testing, T-test, Chi-square, Linear Regression, Design of experiments, MANOVA, Logistic Regression, Decision Trees, Multivariate Analysis, Cluster analysis, Time series, A/B testing.
Profound Knowledge in working with Big data tools such as Hive, Scoop, Spark RDD, Spark SQL, Pyspark.
Strong Expertise in all phases of Machine learning model development and deployment, containerization, Optimization, Monitoring, CI/CD Automation and MLflow through seamless Model deployment.
Strong Knowledge in Generative AI methodologies, including fine-tuning and pretraining LLMs, prompt engineering, and multimodal AI applications. Experienced in leveraging GPT, BERT, T5, and diffusion models for text generation, summarization, and AI-powered assistants. Passionate about integrating GenAI solutions into business workflows to drive automation, personalization, and enhanced decision-making

TECHNICAL SKILLS:

Languages Python, PySpark, SQL, sparkSQL, SAS, R
Big Data Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, Spark
BI Tools Tableau, Power BI, Excel & MS office tools
Databases Hadoop, PLSQL, MySQL, SQL Server, Snowflake.
MLOPS Jenkins, Docker, Kafka, Kubernetes, Bitbucket, GitHub, Confluence, Kanban
OS Windows, Unix
Statistical Techniques Time Series forecasting, Survival analysis, Linear & Nonlinear regressions, GLM & GLMM, Logit & probit regression, Design of Experiments, Hypothesis Testing, ANOVA &flavors, p-value, Confidence Intervals, PCA, Outlier Detection, Auto-correlations,
Packages/libraries Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, MLlib, ggplot2, NLP(NLTK, SpaCy,Gensim), Xgboost
Machine Learning/AI Linear Regression, MLR, Logistic Regression, Decision trees, Random Forest, Boosting Methods, Association Rule Mining, Clustering (K-Means, Hierarchal), Gradient descent, SVM, KNN,Recommendation engine, Text Analytics, Time series ARMA, ARIMA
Deep Learning/AI PyTorch, CNN, RNN, LSTM models using TensorFlow (Keras), GAN models, OpenAI, Huggingface, Llama2, KerasNLP, BERT, LangChain, GPT-4, TensorFlowText
Cloud Compute AWS, Azure, Databricks, GCP

THESIS WORK/PUBLICATIONS

Published research work on Structured Survey Interviewing, developing a qualitative proportional randomized response model to reduce bias in survey responses. (Authors: Niharika Yennum, Dr Stefen A Sedory, Dr Sarjinder Singh) https://www.academia.edu/94339301/Improved_strategy_to_collect_sensitive_data_by_using_geometric_distr ibution_as_a_randomization_device

CERTIFICATIONS:
Python Programming certification by Udemy
SAS Certified Statistical business analyst using SAS9 (License No: SBARM003586v9)
SAS Certified Base Programmer for SAS9 Credential (License No: BP072794V9)
SQL Server certification (MCSA)

EDUCATION:
Masters in Statistical Analytics, Computing and Modeling, Texas A & M University Jan 2016-May2017.
Integrated Masters in Mathematical Sciences, University of Hyderabad, India. Jul 2008-Jul 2013

WORK EXPERIENCE
Lead Data Scientist/ ML Engineer:
Be The Match/NMDP (CIBMTR Research Group)- MN Sep 2019 July 2025
Designed and deployed a machine learning model to predict cancer outcomes, enabling physicians to make data-driven clinical decisions.
Implemented regression, bagging(Random forest) and boosting (XGBOOST) techniques to enhance model accuracy, improving clinical decision-making efficiency.
Developed and automated ETL pipelines using AWS Glue, integrating data from multiple sources into S3, analyzing in Glue and loading to Oracle DB.
Worked on high cardinality healthcare datasets by applying advanced ML feature engineering techniques using Principal Component Analysis in reducing and optimizing features.
Applying K fold cross validation technique to improve ML model performance by training and validating models on multiple validation partitions.
Trained and optimized ML/DL models using TensorFlow and Scikit-learn, improving model accuracy by 12% through hyperparameter tuning and advanced feature selection.
Deployed machine learning models into production environments, using AWS Sagemaker to ensure seamless integration.
Created and maintained comprehensive documentation for ML pipelines, deployment processes, and performance metrics, providing clear guidelines and references for the team.

Data Scientist/ML Engineer:
Arizona Department of Education- AZ Apr 2018 Sep 2019
Implemented Classification model that predicts the students performance to maximize their learning productivity by analyzing students past academic performance to predict their future results.
Collaborated with other Data scientists on data gathering from various sources, Validation of the data, Preprocessing and Feature Engineering.

To provide statistical help for vendors in the test construction, item selection and validation using Advanced statistical techniques such as Regression and Rasch models.
Perform Anova, A/B testing on the clusters of students to see the bias in their performance with respect to the various accommodations provided and their demographics.
Reviewed stored procedures for reports and wrote test queries against the source system (Sql Server) to match the results with the actual report against Datamart (oracle)

Sr Statistical Analyst | AMEX, AZ Aug 2017 Apr 2018
Assisted in the development of Predictive Scoring models for Credit risk assessment, resulting in 5% increase in prediction accuracy.
Analyzed large datasets using SQL and SAS, identifying key trends and insights for the credit risk team.
Collaborated with senior data scientist to improve the model performance, achieving 15% reduction in the processing time.
Participated in data cleaning and preprocessing activities, ensuring accuracy and reliability of inputs.
Delivered presentations on data findings and model results to stakeholders, facilitating informed decision making.

Data Scientist/Risk Analyst:
Igreen Systems, India Jun 2013-Dec2015
Developed Predictive models/Scorecards/Segmentation that predict customer behavior such as delinquency, payment rate, profitability for regional banks.
Prepared monthly ad hoc reports related to membership purchasing, lifecycle, and customer engagement and prepared visualizations summarizing key insights using Tableau
Implemented statistical models to predict the customer churn and worked with marketing team in reducing customer churn by 4%
Responsible for design and development of advanced Python and SAS programs to prepare transform, and harmonize data sets in preparation for modeling
Developed SQL coding to gather data from various sources, transformed for consistency and loaded into an analysis data warehouse by aggregating the data.
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree database active directory rlang microsoft mississippi Arizona Minnesota

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];6423

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: