Rishi - Data scientist(Gen AI/ML Engineer/AI Architect) |
[email protected] |
Location: Jersey City, New Jersey, USA |
Relocation: Yes |
Visa: GC |
RISHI
Gen-AI/AI Engineer(Data Scientist) E-mail: [email protected] Phone: 2106994016 Professional Summary: AI Engineer with 11+ years of experience in Python, ML/LLM engineering, and data engineering. Skilled in designing scalable ML pipelines, integrating Google Cloud LLMs, Google Vector Search, and hybrid search methodologies. Expert in AWS SageMaker, Docker, Django, Flask, and distributed team collaboration. Proven ability to deploy production-grade ML solutions and optimize MLOps workflows for large-scale applications. Experienced in integrating Google Cloud LLMs and building search pipelines using Google Vector Search and Hybrid Search methodologies. Proven experience collaborating with data scientists, data engineers, and DevOps to build and deploy ML models using AWS SageMaker in hybrid environments. Skilled in SageMaker Pipelines, model monitoring, fine-tuning LLMs, and implementing CI/CD workflows for robust MLOps. Experience in Explainable AI (XAI) and production-ready LLM customization for agentic systems. Strong hands-on knowledge in Python, PyTorch, TensorFlow, Keras, and NLP frameworks like SpaCy, NLTK, and Hugging Face Transformers, with deep implementation experience in GPT, LLaMA, Claude, BERT, RoBERTa, and LangChain. Skilled in working with Vector DBs, LoRA, RAG, PEFT, and Knowledge Graphs. Well-versed in AWS cloud ecosystem (SageMaker, EC2, Lambda, EMR, S3), with robust experience in Jupyter Notebooks, agentic programming, and building AI solutions such as GraphRAG, Chain of Thought (CoT), and Tree of Thought (ToT). Expert in processing structured/unstructured data, deploying models in production, and delivering high-impact AI/ML solutions using advanced data science toolkits. Proficient in SQL, Spark, Linux, Git/GitHub, and tools such as Tableau, QuickSight, and Kibana. Adept at translating ambiguous business problems into data-driven solutions and working in collaborative, fast-paced environments. 8+ years of hands on experience and comprehensive industry knowledge of Artificial Intelligence/ Machine Learning, Statistical Modelling, Deep Learning, Data Analytics, Data Modelling, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP), and Business Intelligence. Having good experience in Analytics Models like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and PostgreSQL, Erwin. Strong knowledge in all phases of the SDLC (Software Development Life Cycle) Agile/ Scrum from analysis, design, development, testing, implementation and maintenance. Strong leadership in the fields of Data Cleansing, Web Scraping, Data Manipulation, Predictive Modeling with R and Python, and Power BI & Tableau for data visualization. Experienced in Data Modeling techniques employing Data Warehousing concepts like Star/ Snowflake schema and Extended Star. Hands-on experience in working experience in the entire Data Science project life cycle including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data, and created ER diagrams and schema. Excellent knowledge of Artificial Intelligence/ Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system. Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import/ Export through the use of ETL tools such as Informatica Power Center. Experience working in AWS environment using S3, Athena, Lambda, AWS Sage maker, AWS Lex, AWS Aurora, Quick Sight, Cloud formation, Cloud Watch, IAM, Glacier, EC2, EMR, Recognition and API Gateway. Proficient in Artificial Intelligence/ Machine Learning, Data/ Text Mining, Statistical Analysis & Predictive Modeling. Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based speech recognition using Tensor Flow. Expertise in using AWS S3 to stage data and to support data transfer and data archival. Experience in using Excellent knowledge and experience in OLTP/ OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data Modeling using Erwin tool. Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using Python to perform event driven processing. Proficient in Python Scripting and worked in stats function with NumPy, visualization using Matplotlib and Pandas for organizing data. High level of understanding in concepts of Deep learning using CNN, RNN, ANN, Reinforcement learning, Transfer Learning and performing data augmentation using Generative Adversarial Networks (GANs). High level analytical thinking by extensively leveraging statistical techniques, such as T-test, P-value analysis, z-score analysis, ANOVA, Confidence Interval, Confusion Matrix, Precision, Recall, ROC / AUC curve analysis, etc. Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R. Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR. Strong programming experience in Matlab and Python visioning libraries Experienced in the use of active dashboards and reports that are both visually appealing and functional, such as Python Matplotlib, R Shiny, Power BI, and Tableau. Proficiency with creating, publishing, and customizing Tableau dashboards and dynamic reports with user filters. Experience in Dimensional Modeling, ER Modeling, Star Schema/ Snowflake Schema, FACT and Dimensional tables and Operational Data Store (ODS). Extensive knowledge in working with Amazon EC2 to provide a solution for computing, query processing, and storage across a wide range of applications. Proficiency in Python, a strong program development capabilities, with experience in image processing algorithm Proficient in data visualization tools such as Tableau, Power BI, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards. Experience in building models with deep learning frameworks like TensorFlow, PyTorch, and Keras Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction auto encoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN auto encoder architecture). Proficient in Python, experience building, and product ionizing end-to-end systems. Strong programming expertise (preferably in Python) and strong in Database SQL. Solid coding and engineering skills preferably in Artificial Intelligence/ Machine Learning. Exposure to python and python packages. Experience in developing various solution driven views and dashboards by developing different chart types including Pie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, and Scatter Plots in Power BI. Be a valued contributor in shaping the future of our products and services. Successfully working in fast-paced multitasking environment both independently and in a collaborative team. Excellent communication skills needed for swift implementation of data science and data analytic projects. Technical Skills: Languages SQL, Python, JAVA, JavaScript, jQuery, ReactJS, Next.js, HTML, CSS, C, C++, Angular, R, Impala, Hive Statistical Methods Statistical Inference, Hypothesis Testing, Confidence Intervals, p-values, Statistical Significance, Probability Distributions, Descriptive Statistics, Correlation and Covariance, Sampling Techniques, ANOVA, Chi-Square Tests, Bayes' Theorem, Cross-Validation, Time Series Analysis, Auto-correlation, Statistical Modeling, Experimental Design, Central Limit Theorem, Law of Large Numbers, Residual Analysis, Multivariate Analysis Artificial Intelligence/ Machine Learning Supervised Learning, Unsupervised Learning, Model Evaluation Metrics, Cross-Validation, Feature Engineering, Hyperparameter Tuning, Overfitting & Regularization (L1, L2), Ensemble Methods, Decision Trees, Random Forest, Gradient Boosting (XGBoost, LightGBM), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naive Bayes, Regression (Linear, Logistic), Classification Algorithms, Clustering (K-Means, Hierarchical), Dimensionality Reduction (PCA, t-SNE), Time Series Forecasting, ARIMA, SARIMA, Holt-Winters Exponential Smoothing, Prophet, LSTM for Time Series, Model Interpretability (SHAP, LIME), Recommendation Systems, Scikit-learn R Package dplyr, sqldf, data table, Random Forest, gbm, caret Natural Language Processing (NLP) Text Preprocessing (Tokenization, Lemmatization, Stemming), Bag of Words (BoW), TF-IDF, Word Embeddings (Word2Vec, GloVe, FastText), Sequence-to-Sequence Models, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Attention Mechanism, Encoder-Decoder Models, Gated Recurrent Units (GRU), Transformer Models (BERT, GPT, T5), Named Entity Recognition (NER), Part-of-Speech Tagging, Sentiment Analysis, Text Classification, Machine Translation, Text Summarization, Question Answering Systems, Language Modeling, Word and Sentence Similarity, POS Tagging, Dependency Parsing, Text Generation, Sequence Labeling, Topic Modeling (LDA), SpaCy, NLTK, Hugging Face Big Data Hadoop, Hive, HBase, Apache Spark, Scala, Kinesis, Pig, Sqoop. Python Packages NumPy, Scipy, Pandas, Matplotlib, Seaborn, scikit-learn, Requests, urllib3, NLTK, Pillow, Pytest Deep Learning Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), Autoencoders, Transfer Learning(VGG, ResNet, InceptionNet, MobileNet), Attention Mechanism, Object Detection (YOLO, Faster R-CNN), OCR, Image Segmentation (U-Net, Mask R-CNN), Activation Functions (ReLU, Sigmoid, Tanh, Softmax), Optimization Algorithms (SGD, Adam, RMSprop), Loss Functions (Cross-Entropy, MSE), Regularization (Dropout, Batch Normalization), Model Deployment, TensorFlow, Keras, PyTorch Generative AI LLM s, Ollama, Langchain, Langsmith, Agentic AI, Fine-tuning Techniques (LoRA, QLoRA), Local LLMs (Mistral, Llama, Gemma, etc.), Inference APIs (Groq, Nvidia NIM, etc.), Retrieval-Augmented Generation (RAG), Graph Databases(Neo4j), Vector Databases (Faiss, Chroma, Pinecone), Embedding Models(Mistral, GoogleAI Embedding Models), Prompt Engineering, Prompt Templates, Prompt Versioning, Document Chunking & Indexing, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Conditional GANs, CycleGAN, StyleGAN, Deep Convolutional GANs (DCGAN), Transformer-based Generative Models (GPT, T5), Text-to-Image Generation (DALL-E, CLIP), Image-to-Image Translation, Neural Style Transfer, Chatbots, AI Search Engines, Semi-Supervised Learning, Data Augmentation, Zero-Shot Learning, Self-Supervised Learning, Reinforcement Learning for Generation, Diffusion Models, Latent Space Exploration MLOps & LLMOps CI/CD for ML, Model Deployment (REST API, Batch), Model Monitoring, Model Versioning, Model Retraining Pipelines, ML Pipelines (MLflow, Airflow, Kubeflow), Experiment Tracking (MLflow, Weights & Biases), Data Versioning (DVC), Logging and Alerting (Prometheus, Grafana), Containerization (Docker), Orchestration (Kubernetes), Feature Stores (Feast), Cloud ML Platforms (AWS SageMaker, AWS BedRock, GCP AI Platform, Google Vertex AI, Azure AI) LLM Response Evaluation (TruLens, PromptLayer), Model Registry, Token Usage Tracking, Scalable LLM Deployment (TGI, vLLM) Python Framework Django, Flask Methodologies SDLC Agile/ Scrum, TDD, BDD Databases MySQL, Mongo Microsoft SQL server, SQLite, Red Shift, RDS, PostgreSQL, SQLAlchemy, Mongo DB, Teradata Cloud AWS SageMaker, AWS Bedrock, Google Cloud AI Platform, Google Vertex AI Amazon Web Services: EC2, Lambda, Sage Maker, Bedrock, EMR, S3, Glue, MKS, Kinesis, Quick Sight, API Gateway, Athena, Lex, Recognition, CI/CD, Code Commit, DynamoDB, transcribe, Cloud Formation, Cloud Watch, Glacier, IAM BI/ Analysis Tools SAS, Stata, Tableau, Power BI, Docker, Git, SAP, MS Office Suite, Anaconda, SSIS Data Modelling Snowflake, Star Schema Vector Search Engines FAISS, Pinecone, Google Vector Search Reporting Tools Tableau, Power BI Operating Systems Windows, Linux, Mac OS Other Tools & Technologies: Git, GitHub, GitLab, Docker, Docker Compose, Kubernetes, VS Code, Jupyter Notebook, Google Colab, CUDA Toolkit, Postman, Swagger, REST APIs, Linux/Unix Command Line, Bash Scripting, Conda, Virtualenv, Makefile, YAML, JSON, Terraform, Nginx, Redis, Jenkins, Power BI, Tableau, Excel, Google Sheets, Notion, Trello, Slack, Figma Education: M.S. in Information Technology and Management, Saint Louis University Professional Experience: Client: Capital One, Dallas, TX Oct 2023 Present Role: GenAI/AI Engineer Responsibilities: Involved in requirement analysis, application development, application migration, and maintenance using Software Development Lifecycle (SDLC) and Python technologies. Developed MLOps pipelines on AWS using SageMaker Pipelines, Lambda, and Step Functions to orchestrate training, tuning, and deployment. Adapted existing Retrieval-Augmented Generation (RAG) pipelines to leverage Google Vertex AI and Google LLM APIs for scalable LLM deployments. Designed hybrid search strategies combining vector similarity search and keyword-based search using Google Vector Search for improving document retrieval accuracy. Set up model monitoring and alerting workflows to ensure ongoing model performance using CloudWatch and SageMaker Model Monitor. Deployed LLM-based workflows for call transcript summarization using GPT-4, LangChain, and vector databases (RAG architecture). Fine-tuned domain-specific LLMs using LoRA/PEFT and created performance benchmarks using accuracy, BLEU, perplexity, and human feedback. Developed end-to-end ML pipelines using AWS SageMaker Pipelines, Step Functions, and Lambda. Integrated Google LLMs and Google Vector Search into NLP workflows for intelligent document search and hybrid search solutions. Built RAG (Retrieval Augmented Generation) pipelines utilizing LangChain and custom embedding models. Deployed and monitored LLM fine-tuned models (GPT-4, Claude) using LoRA and PEFT techniques. Built REST APIs using Flask and deployed on Docker containers in hybrid AWS/Google Cloud environments. Optimized model training costs through advanced instance management and parallel processing. Designed, implemented, and monitored ML solutions ensuring high performance and low latency. Built Support Vector Machine algorithms for detecting the fraud and dishonest behaviors of customers by using several packages: Scikit-learn, Numpy, Scipy, Matplotlib, and Pandas in Python. Used AWS S3, Dynamo DB, and AWS Lambda, AWS EC2 for data storage and models& deployment. Worked extensively on AWS services like Sage Maker, Lambda, Lex, EMR, S3, Redshift etc. Used AWS transcribe to obtain call transcripts, perform text processing (cleaning, tokenization, and lemmatization) Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing. Designed the data marts in dimensional data modeling using Snowflake schemas. Generated executive summary dashboards to display performance monitoring measures with Power BI. Developed and implemented predictive models using Artificial Intelligence/ Machine Learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, RandomForest, K-means clustering, KNN, PCA and regularization for Data Analysis. Leverage AWS Sage Maker to build, train, tune and deploy state of art Artificial Intelligence/ Machine Learning and Deep Learning models. Built classification models include: Logistic Regression, SVM, Decision Tree, and Random Forest. Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval. Worked with creating ETL specification documents, & creating flowcharts, process workflows and data flow diagrams. Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas. Worked on the Snow-flaking the Dimensions to remove redundancy. Created reports utilizing Excel services and Power BI. Applying Deep Learning (RNN) to find the Optimum route for guiding the tree trim crew. Using XGBOOST algorithm predicting storm under different weather conditions and using Deep Learning analyzing the severity of after storm effects on the Power lines and Circuits. Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud. Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP. Produced A/B test readouts to drive launch decisions for search algorithms including query refinement, topic modeling, and signal boosting and machine-learned weights for ranking signals. Implemented an Image Recognition (CNN + SVM) anomaly detector and convolutional neural nets (CNN/ Image Recognition) to determine fraud purchase direction. Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards Environment: SDLC, Python, Scikit-learn, Numpy, Scipy, Matplotlib, Pandas, AWS S3, Dynamo DB, and AWS Lambda, AWS EC2, Sage Maker, Lex, EMR, Redshift, Snowflake, RNN, Machine Learning, Deep Learning, OLAP, ODS, OLTP, 3NF, Naive Bayes, RandomForest, K-means clustering, KNN, PCA, Power BI. Client: Elevance Health June 2021 to Sep 2023 Role: Data Scientist Responsibilities: Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch. Performed data imputation using Scikit-learn package in Python. Built advanced GenAI capabilities leveraging Hugging Face Transformers and LLaMA for multi-label classification on architectural documentation. Integrated NLP pipelines with LangChain and vector search components for document similarity scoring, semantic search, and clustering. Applied fine-tuning techniques like LoRA and PEFT to improve model accuracy on specialized architectural text datasets and implemented GraphRAG for structured output generation. Built several predictive models using machine learning algorithms such as Logistic Regression, Linear Regression, Lasso Regression, K-Means, Decision Tree, Random Forest, Na ve Bayes, Social Network Analysis, Cluster Analysis, and Neural Networks, XGboost, KNN and SVM. Building detection and classification models using Python, TensorFlow, Keras, and scikit-learn. Used Amazon Web Services, AWS provisioning and good knowledge of AWS services like EC2, S3, Red shift, Glacier, Bamboo, API Gateway, ELB (Load Balancers), RDS, SNS, SWF and EBS. Integrated NLP pipelines with LangChain and Google Vector Search for semantic document clustering and hybrid retrieval. Built fine-tuned models on architectural datasets using Hugging Face Transformers and Google Cloud Vertex AI. Implemented Flask-based microservices deployed on Docker for scalable ML model inference. Developed monitoring dashboards using AWS CloudWatch, Google Cloud Monitoring, and QuickSight. Led initiatives enhancing communication across distributed teams using agile methodologies. Developed the required data warehouse model using Snowflake schema for the generalized model Worked on processing the collected data using Python Pandas and Numpy packages for statistical analysis. Used Cognitive Science in Artificial Intelligence/ Machine Learning for Neuro feedback training which is essential for intentional control of brain rhythms. Worked on data cleaning and ensured Data Quality, consistency, integrity using Pandas, and Numpy. Developed Star and Snowflake schemas based dimensional model to develop the data warehouse. Used Numpy, Scipy, Pandas, NLTK (Natural Language Processing Toolkit), and Matplotlib to build models. Involving in Text Analytics, generating data visualizations using Python and creating dashboards using tools like Power BI. Performed Na ve Bayes, KNN, Logistic Regression, RandomForest, SVM and XGboost to identify whether a design will default or not. Managed database design and implemented a comprehensive Snow Flake Schema with shared dimensions. Application of various Artificial Intelligence (AI)/ machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models. Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default loss. Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks. Involved in scheduling refresh of Power BI reports, hourly and on-demand. Environment: SDLC, Python, Scikit-learn, Numpy, Scipy, Matplotlib, Pandas, AWS S3, Dynamo DB, AWS Lambda, AWS EC2, Sage Maker, NLTK, Lex, EMR, Redshift, Machine Learning, Deep Learning, Snowflake, OLAP, OLTP, Naive Bayes, RandomForest, K-means clustering, KNN, PCA, PySpark, XGBoost, Tensor flow, Keras, Power BI. Client: USAA, San Antonio, TX. Nov 2018 to May 2021 Role: Data Scientist Responsibilities: Facilitated agile team ceremonies including Daily Stand-up, Backlog Grooming, Sprint Review, Sprint Planning etc. Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements. Involved in building database Model, APIs and Views utilizing Python, in order to build an interactive web based solution. Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables. Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-Learn, Scipy, and NLTK in Python for developing various Artificial Intelligence/ Machine Learning algorithms like XGBOOST. Developed the required data warehouse model using Snowflake schema for the generalized model Performed data cleaning in Python and R and visualized data and findings using Tableau. Developed dashboards and visual KPI reports using Power BI, highlighting keyword trends, clicks, and click-through rates, impressions by device, month, and states for clients leading to an increase of 30% in client reach. Designed and Implemented Data Cleansing Process and statistical analysis with Python. Built and Developed an End-to End Data Engineering pipeline with automated data ingestion using Snowflake and AWS (S3, and RDS). Analyzed technical and economic feasibility of clients performing requirement gathering to optimize and reduce project expenses by up to 60%. Performed data imputation using Scikit-learn package in Python. Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data. Curated SEO-optimized solutions for business enterprises to boost sales and internet presence by 70%. Worked with data analytics team to develop time series and optimizations. Developed an end-to-end multilingual E-Learning Management System (SCORM compliant) based on Articulate 360 and Redwood Web Authoring Tools. Created Logical and Physical data models with Star and Snowflake schema techniques using Erwin in Data warehouse as well as in Data Mart. Utilized Power Query in Power BI to Pivot and Un-pivot the data model for data cleansing and data massaging. Performed ad-hoc requests for clients using SQL queries to extract and format requested information. Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch. Designed data model, analyzed data for online transactional processing (OLTP) and Online Analytical Processing (OLAP) systems. Worked Normalization and De-normalization concepts and design methodologies. Wrote and executed customized SQL code for ad-hoc reporting duties and other tools for routine. Created customized reports in PowerBI for data visualization. Environment: ER/ Studio, SQL, Python, APIs, OLAP, OLTP, PL/ SQL, Oracle, Teradata, Power BI, ETL, SQL, Redshift, Pandas, Numpy, Seaborn, Matplotlib, Scikit-Learn, Scipy, NLTK, Python, XGBOOST, Tableau, Power Query, Snowflake, AWS, S3, RDS, Erwin. Client: Walmart, Bentonville, AR. July 2015 to Oct 2018 Role: Data Analyst Responsibilities: Performed Data Analysis, Data Migration, and Data Preparation useful for Customer Segmentation and Profiling. Implementing investigation calculations in Python. Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit learn, and NLTK in Python Implementing Data Warehousing and Data Modelling procedures to build ETL pipelines to extract and transform data across multiple sources. Architected scalable algorithms using Python programming and capable of performing Data Mining, Predictive Modelling using all kinds of statistical algorithms as required. Utilize ETL tooling to build, template, and rapidly deploy new pipelines for gathering and cleaning data Developed Multivariate data validation scripts in Python for equity, derivate, currency and commodity related data, thereby improving efficiency of pipeline by 17%. Used Predictive Analysis to develop and design of sample methodologies and analyzed data for pricing of client's products. Involved in optimizing the ETL process of Alteryx to Snowflake. Used Data visualization tools Such as Tableau, Advanced MS Excel (macros, index, conditional list, arrays, pivots, and lookups), Alteryx Designer, and Modeler. Used Data Analytics, Data Automation and coordinated with custom representation instruments utilizing Python, Mahout, Hadoop and MongoDB. Performed all necessary day-to-day GIT support for different projects, Responsible for design and maintenance of the GIT Repositories, and the access control strategies. Fostered teamwork, communication, and collaboration while managing competing priorities of weekly, bi-weekly, monthly and quarterly priorities. Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications. Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems. Analyzed business requirements and upgraded function specification while conducting testing on multiple versions and resolving critical bugs to improve the functionality of the Learning Management System. Built and Deployed a UI/ UX e-learning web application using jQuery, JavaScript, HTML, and NodeJS for various courses. Cleaned and transformed the data using Python, developed dashboards and visual KPI reports using Tableau. Involved in publishing of various kinds of live, interactive data visualizations, dashboards, reports and workbooks from Tableau Desktop to Tableau servers. Environment: Python, Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit Learn, NLTK, ETL, Alteryx, Snowflake, Tableau, jQuery, JavaScript, HTML, NodeJS, Hadoop, MongoDB, OLTP, OLAP, ER Studio, Oracle, SQL Server, SQL, Tableau Server. Client: Broadridge, NY Feb 2014 to June 2015 Role: Python Developer Responsibilities: Interacted with the Business Team to understand the requirements and prepared technical documents for implementing the solutions as per Business needs. Worked with multiple teams to analyze data from various SQL databases and different Microsoft excel data sheets to generate monthly and quarterly business insights. Working experience in creating REST webservice protocols in python Designed a model identify peak months and days of products and minimum stock required to avoid out of stock for product thereby increasing product availability and company sales by 4.2% Improved the accuracy of predicted prices using data using multiple Python and excel models. Coordinated a team of 16 customer sales representatives working on 5 different projects across 3 locations to gather and sanitize data. Good experience in writing SQL Queries and implementing stored procedures, functions, packages, tables, views, cursors, triggers. Used Jenkins for continuous integration for code quality inspection and worked on building local repository mirror and source code management using Git hub. Assisted the development team in sending the correct data via query strings using SQL as the backend for storing data Used data mining to extract information from data sets and identified correlations and patterns across data sets from both SQL Database and excel data sheets. Conducted market analysis and prepared presentations for business leaders. Performed analysis of C++ code to figure out the business logic and prepared documentation for future users. Environment: SQL, Agile, Python, Microsoft Office, Python 2.7 and 3.6, Django, SOAP, REST, Flask, GitHub. Keywords: cprogramm cplusplus continuous integration continuous deployment artificial intelligence machine learning user interface user experience javascript business intelligence sthree database active directory rlang microsoft procedural language Arkansas Delaware New York Texas |