Home

eshwar Hemanth - Data Scientist/ ML Engineer
[email protected]
Location: Seattle, Washington, USA
Relocation: yes
Visa: OPT
ESWAR
Machine Learning Engineer
[email protected] Seattle, WA

PROFILE SUMMARY
.Specialist in Data Sciences with 6+ years of expertise in data extraction, pre-processing, validation, exploratory data analysis, feature engineering, data wrangling, data engineering, machine learning and data visualization.
.Improved algorithm efficiency using Spark Context, Spark SQL, MLlib, DataFrames, Pair RDDs, and Spark YARN. Gained hands-on experience with Azure SQL Database, Data Warehouse, Analysis Services, HDInsight, Data Lake, and Data Factory.
.Configured ETL processes in Azure Data Factory to migrate data from external sources like Azure Blob Storage and text files into Azure Synapse.
Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table. Experience in automating day-to-day activities by using Windows PowerShell.
Strong expertise in Data Analysis, Data Cleansing, Data Validation, Data Verification, and Data Visualization to identify discrepancies and ensure data integrity.
Proficient in publishing and presenting interactive dashboards and storylines on both web and desktop platforms for decision-makers.
Skilled in applying Agile methodology for iterative development and effective collaboration within teams.
Strong in writing complex SQL queries for various RDBMS and proficient with NoSQL databases such as MongoDB for handling unstructured data.
Hands-on experience working with Azure Cloud, including Azure Data Lake Gen2 for data storage, Azure Data Factory for data integration, and Azure Databricks for big data processing.
Skilled in identifying and resolving Data Mismatches to ensure high-quality datasets for modeling and analysis.
Proficient in container systems like Docker and container orchestration using EC2 Container Service, Kubernetes, with experience in working with Terraform.
Developed and Designed Integrating Resource Description Framework (RDF) Data model using Oracle Spatial RDF database technology.
Proficient in building production-quality and large-scale deployment of applications related to deep learning algorithms such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), LSTMs, and Hugging Face Transformers.
Experience in Python for data manipulation with key libraries like NumPy, Pandas, Matplotlib, Scikit-learn, Seaborn, TensorFlow, PyTorch, Keras, OpenCV, Beautiful Soup, Generative AI, and NLTK.
Proficient in managing AWS cloud resources like EC2, S3, Lambda, EBS, EMR, DynamoDB, SQS, SNS, and CloudWatch.
Good experience working with PyTest, PyMock, Selenium web driver frameworks for testing different front end and backend components.
Experience working with NoSQL databases like MongoDB and AWS DynamoDB in storing and retrieving json documents.
Performed Exploratory Data Analysis, Predictive Analysis and Hypothesis Testing (t-test, z-test) on the selected large set of data.
Proven data visualization and statistical analysis expertise, enabling data-driven decision-making for business intelligence initiatives using Tableau, Power BI, and Looker.

PROFESSIONAL EXPERIENCE
Optum | Machine Learning Engineer Apr 2024 Present

Collaborated with multiple business stakeholders, data owners, and Cloud teams including Azure and AWS Architects to align AI/ML strategies with business objectives.
Involved in multiple AI and Machine Learning programs within our product suites, including:
Developed baselining and forecasting models for performance and security KPIs.
Implemented clustering models for devices based on behavioral patterns over time.
Leveraged technologies such as NLP, LSTM, KubeFlow, Docker, AWS SageMaker, and AWS Greengrass.
Implemented novel iterative development procedures using JupyterLab-based IDE AI Notebooks to streamline experimentation and deployment workflows.
Utilized Data Flow and Python to build dynamic data workflow pipelines, enabling efficient processing and experimentation.
Partnered with developers and technical leads to create detailed data designs and specifications, ensuring alignment with project goals and the organization's long-term data strategy.
Acquired, cleaned structure and unstructured data from multiple sources and maintained Graph databases systems using text mining, natural language processing and semantic web using SPARQL, RDF/OWL and knowledge graph.
Used Neo4J, D3.js to create payer & payee relationships and visualizations using payment transactions
Designed and built frameworks to orchestrate data pipelines and integrate Machine Learning models into scalable systems.
Extracted and parsed RDF data using a Java API called Sesame, from ontology system called Semaphore, which is used to process the unstructured resources and build NLP capabilities.
Experience in Deep and Machine Learning with supervised - Random Forest Model using Dataiku.
Performed advanced qualitative and quantitative analyses on high-volume databases, identifying trends, patterns, and correlations to drive improvements in overall business performance using Python.
Ensured adherence to data management best practices while developing and maintaining new data assets.
Implemented SPARQL queries to enhance data quality, finding data, clean, remodeled and linked data.
Managed technical aspects of data model maintenance, migration, and deployments across environments.
Designed, developed, and implemented data warehouse architectures utilizing logical and physical data models to support analytics and ML applications.
Prot g & TopQuadrant ontology editors, Jena, RDF, OWL
Managed resource roles and utilization through IAM and Terraform, ensuring effective governance across development, test, and production environments.
Performed and documented collaboration mechanisms such as stand-ups and sprints in line with Agile development principles, acting as the Scrum Master to facilitate smooth project execution.
Developed a new data schema for the data consumption store to enhance processing times for Machine Learning and AI models using SQL, Hadoop, and Cloud services.
Created innovative and efficient reporting architectures to convey KPIs to relevant internal and external stakeholders using tools like Google Data Studio and Tableau Server.

Best Buy | Data Scientist Jun 2023 Mar 2024

Collaborated with Engineering and business teams to identify gaps in existing data and implemented data tracking mechanisms, enabling the development of deep learning (DL) models that delivered high accuracy.
Leveraged analytical tools to uncover trends, relationships, and insights within datasets, transforming findings into actionable risk management and marketing strategies that drive measurable value.
Oversaw end-to-end project management, including approving project plans, prioritizing tasks, managing stakeholder engagements, allocating resources, budgeting, and aligning deliverables with business objectives.
Managed data storage and processing pipelines in Google Cloud Platform (GCP), supporting AI and ML services in production, development, and testing environments using SQL, Spark, Python, and AI VMs.
Utilized latest technologies and rich ecosystem of tools provided by Hadoop such as HBase, Dataiku(Machine Learning), Hive, Kafka, Solr.
Conducted research and documented advanced solution architectures for enterprise customers, including the design and implementation of CI/CD pipelines and monitoring systems to ensure robust model performance.
Automated data extraction from diverse sources using Python scripts for seamless data analysis.
Created and optimized SQL queries for data aggregation and analysis to meet business needs.
Designing and developing a graphic database to enable future development of a machine learning model capable of fraud detection; leverage Neo4j open-source database and Hadoop to create fraud detection model and device fingerprint to detect fraud device cycle.
Analyzed past sales data to identify key factors driving higher sales for individual stores.
Designed and deployed machine learning models to predict and recommend safety stock levels and identify top-performing stores based on order cancellation probabilities.
Built and maintained databases accessible to end users via SQL, enabling tailored data utilization.
Applied advanced data mining techniques using SQL and MSSQL to extract actionable insights.
Developed machine learning models to solve business problems, utilizing Python for implementation.
Established and enhanced business intelligence (BI) capabilities to derive value from data and support customer IT needs.
Led Project Athena, leveraging GCP's machine learning and AI capabilities to build a predictive analytics engine accessible via API, utilizing a containerized microservices architecture for scalability.
Implemented predictive maintenance and behavioral modeling for IoT data in manufacturing and healthcare using LSTMs.
Built microservices-based architectures for machine learning applications, including XGBoost-based implementations.
Developed time-series models and statistical methods to forecast inventory and procurement cycles using Python.

HSBC | Machine Learning Engineer Dec 2020 Jul 2022

Built an interactive, dynamic Tableau dashboard to uncover customer-level insights and developed Tableau reports to measure KPIs for cleared chargebacks data of GTS North America.
Predicted chargeback payment likelihood with 90% accuracy using a decision tree model, identifying higher chances when the chargeback amount and customer risk are low.
Conducted data preprocessing, cleaning, and filtering using Pandas, including exploratory analysis and data integrity analysis.
Performed customer behavior analysis and value assessment using Uplift models, implementing K-means and uplift random forest with Hadoop MapReduce.
Conducted time series analysis using Spark, ensuring scalability and speed to detect events, establish thresholds, model behavior, and predict anomalous events.
Designed and implemented new data collection strategies by integrating data from newly added sources into the analytics data warehouse in the cloud using Flume and Kafka, ensuring streamlined usability.
Developed predictive data models for forecasting, evaluated their effectiveness through A/B testing, and gathered feedback to refine the models further.
Created and presented detailed reports and dashboards to executives, showcasing the results and workings of models prior to production deployment.
Built multi-layer dashboards for a business intelligence platform, supporting company leadership development using Python and Tableau.
Conducted advanced qualitative and quantitative analysis of high-volume databases to identify trends, patterns, and correlations, driving improved business performance using Python.

DTDC | Data Analyst Dec 2019 Dec 2020

Partnered with cross-functional teams to produce automated dashboards and reporting using Power BI, facilitated stakeholders with data-driven insights, and Increased workforce performance by 25%.
Analyzed delivery time data and route efficiency using Python and SQL, identifying patterns that led to a 14% reduction in delivery delays and enhanced on-time performance.
Clean and prepare data by handling missing values, duplicates, and inconsistencies. Perform data transformations such as normalization, scaling, and feature extraction.
Supported route optimization projects by analyzing volume, weight, and frequency of shipments, improving last-mile efficiency by 18%.
Conducted qualitative surveys and mathematical analysis to identify a potential customer base which was 3.5-fold bigger than the active customer base for commercial LPG.
Familiar with Hadoop ecosystem components like HDFS, MapReduce, HiveQL, and SparkSQL.
Skilled in using Apache Spark and PySpark for distributed data processing and analysis.
Hands-on experience in working with Hadoop and MapReduce in large-scale database environments.
Proficient in analyzing time series data using AR (AutoRegressive), MA (Moving Average), ARIMA (AutoRegressive Integrated Moving Average) models.
Expertise in volatility modeling using ARCH (Autoregressive Conditional Heteroskedasticity) and GARCH (Generalized ARCH) models.
Strong experience in developing analytic models and solutions using Python (2.x, 3.x).
Proficient with SciPy stack including NumPy, Pandas, SciPy, Matplotlib, and IPython for data manipulation and visualization.

EDUCATION
Masters of Science in Data Science | Wichita State University, Wichita, Kansas Aug 2022- Dec 2023


SKILLS
Programming Languages: Python, R, SQL, Java, Golang.
Machine Learning Frameworks: NumPy, Pandas, Matplotlib, Scikit-learn, Seaborn, TensorFlow, Keras, NLTK, OpenCV, XGBoost, Beautiful Soup.
AI Technology: Generative AI, Natural Language Processing(NLP), Large Language Model(LLM).
Machine Learning: Linear, Logistic Regression, Decision Trees, Random Forests, Naive Bayes, SVM, A/B Testing.
Deep Learning: CNNs, RNNs, LSTMs, GANs, and Transformers (BERT, GPT-3).
Data Processing & ETL: Apache Spark, SPARQL, Dask, Apache Kafka, Apache NiFi, and Azure Data Factory.
Big Data Technologies: Hadoop, HDFS, Hive, Spark, and Snowflake.
Cloud Platforms: AWS (S3, EC2, Lambda, SageMaker), GCP (BigQuery, Dataflow), and Azure (Synapse, Data Lake, Azure ML).
Data Visualization: Tableau, Power BI, Looker, and Python libraries (Matplotlib, Seaborn).
Relational Databases: MySQL, PostgreSQL, SQL Server.
NoSQL Databases: MongoDB, Cassandra, DynamoDB.
Model Deployment: Flask, FastAPI, Docker, Kubernetes, and cloud services.
Version Control & IDE s: Git, GitHub, Jupyter Notebooks, Google Colab.
Keywords: continuous integration continuous deployment artificial intelligence machine learning javascript business intelligence sthree rlang information technology Arkansas Massachusetts Washington

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];4593
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: