Home

Sai Hemanth Paturi - Data Scientist/AI/ML Engineer/Agentic AI Engineer
[email protected]
Location: Milwaukee, Wisconsin, USA
Relocation: Yes
Visa: Green Card
Resume file: SaiHemanth_Data_Scientist_1769181404839.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Sai Hemanth Paturi
Data Scientist
+1 (513)987-0449 | LinkedIn: www.linkedin.com/in/sai-hemanth-p/ | [email protected]

PROFESSIONAL SUMMARY:

I am a senior data scientist with eleven years of experience specialized in building enterprise grade Power BI reporting solutions and predictive analytics workflows for high governance environments across multiple global industries.
I am an expert in designing complex data models using Power Query and DAX to transform multi terabyte datasets into interactive high performance dashboards that drive critical executive decision making processes daily.
I maintain a testing heavy engineering posture by implementing automated data validation suites and schema checks to ensure the absolute accuracy and consistency of all reporting outputs within various production environments today.
As a modern observability architect I utilize OpenTelemetry and Jaeger to profile the performance of data pipelines and reporting services ensuring low latency usability for all end users within the organization.
I am proficient in bridging the gap between deep data science and business intelligence translating complex model insights into clear actionable visualizations for diverse stakeholders to support their specific strategic goals effectively.
I am proficient in bridging the gap between deep data science and business intelligence translating complex model insights into clear actionable visualizations for diverse stakeholders to support their specific strategic goals effectively.
I serve as a strategic agile collaborator leading data driven initiatives from initial requirements gathering to the deployment of optimized reporting frameworks and automated quality gates for all global business clients now.
I am proficient in designing API gateways for large language models leveraging Kong or white labeled solutions to manage model routing rate limiting and centralized telemetry for generative artificial intelligence applications worldwide.
I am a cloud native and containerization specialist who is highly skilled with Docker and edge platforms building and deploying resilient model services using AWS Lambda and orchestrated workflows for scoring models.
I utilize advanced statistical analysis and model validation including ANOVA hypothesis testing and variance diagnostics to validate model assumptions and ensure every deployment is interpretable and compliant with high governance regulated environments.
I have applied high standard software engineering practices including modular Python reusable components and code reviews to ensure maintainable robust and audit friendly data science codebases that support long term business goals reliably.
I have also served as a strategic agile collaborator leading complex data science initiatives from problem framing to production rollout and translating business requirements into technical roadmaps that prioritize automated testing and continuous monitoring.
I own end to end data quality and drift governance frameworks that monitor drift detect anomalies and validate schemas using OpenTelemetry protecting downstream models from hidden data issues and performance degradation over.
I am a technical mentor and documentation lead specializing in producing structured model cards API specs and technical summaries while mentoring junior scientists on methodologies and coding standards for better team performance.










TECHNICAL SKILLS:
Category Specialized Skill Set
AI & Agentic Orchestration Agentic Orchestration, Task Decomposition, Multi-step Reasoning Flows, Fallback & Error Handling. Frameworks: LangGraph (Stateful Multi-Agent Design), LangChain. GenAI Ops: LLM API Gateways (Kong), Model Routing, Rate Limiting, Centralized Telemetry. Search: GraphRAG, Hybrid Vector Retrieval, Semantic Indexing, Knowledge Grounding.
Power BI & Visualization Dashboard Design: Interactive Reporting, User Experience Optimization, Mobile Viewports. Data Modeling: Power Query (M), DAX (Calculated Columns and Measures), Star Schema, Snowflake Schema. Tools: Power BI Desktop, Power BI Service, Power BI Gateway, Tableau, Matplotlib, Plotly.
Data Engineering & Big Data Platforms: Databricks (dbx), AWS (Lambda, S3, EMR), Azure. Distributed Computing: Apache Spark (PySpark), Hadoop, Hive, Kafka. Quality: Drift Detection, Schema Validation, Lineage Tracking, Automated Ingestion.
Big Data & Cloud Platforms: Databricks (dbx), AWS (Lambda, EC2, S3, IAM, EMR), Snowflake. Distributed Computing: Apache Spark (PySpark), Hive, HDFS, Kafka. Orchestration: Apache Airflow, AWS Step Functions.
Data Science & Analytics Languages: Python (Pandas, NumPy, Scikit-learn, Statsmodels), R, SQL. Modeling: Predictive Analytics, ANOVA, Hypothesis Testing, Time Series Forecasting. Databases: PostgreSQL, MySQL, SQL Server, Oracle.
Data Engineering Pipelines: ETL/ELT, Schema Design (Star/Snowflake), Feature Stores. Quality: Drift Detection, Schema Validation, Lineage Tracking
Visualization & Tools BI Tools: Power BI, Tableau. Libraries: Matplotlib, Seaborn, Plotly. IDE/Env: VS Code, Jupyter, Google Colab.

PROFESSIONAL EXPERIENCE:
Client: Citi Bank, NYC, NY August 2023 Present
Role: Data Scientist
Responsibilities:

Designed and developed interactive Power BI dashboards to visualize complex credit risk metrics for executive leadership while utilizing advanced DAX functions to create complex calculated measures for deep financial performance analysis.
Leveraged Power Query and M formula language to perform sophisticated data transformations on multi terabyte datasets while maintaining a testing heavy posture by developing automated data validation suites for reporting accuracy.
Engineered optimized star schema and snowflake schema data models to handle high velocity credit data within Power BI while conducting rigorous data validation to maintain high integrity for mission critical workflows.
Processed massive datasets using PySpark on EMR clusters to extract high value features for downstream reporting while translating complex machine learning model outcomes into clear visualizations for various non technical stakeholders.
Optimized dashboard performance by refining DAX query logic and reducing report load times for end users while configuring backend data ingestion from AWS DynamoDB to support all real time reporting requirements.
Instrumented reporting pipelines with OpenTelemetry to monitor data health and identify refresh failures while utilizing MLflow to track data lineage and ensure the reproducibility of insights displayed in production reporting dashboards.
Partnered with business stakeholders to translate regulatory requirements into technical analytics solutions while developing technical reporting roadmaps that aligned with enterprise governance and all applicable high level data privacy standards today.
Maintained GitHub Actions pipelines to automate the deployment of Power BI report updates and version control while integrating automated quality gates into the reporting lifecycle to prevent deployment of inaccurate data.
Performed rigorous statistical analysis including ANOVA and hypothesis testing to validate reporting assumptions while ensuring the interpretability of visualized trends to meet strict regulatory review and internal compliance standards consistently.
Developed reusable Power BI templates to standardize visualization practices and design language across the organization while creating comprehensive documentation for data models and reporting workflows to support audit ready operations daily.
Analyzed large and complex datasets to deliver actionable insights that drive strategic business decisions across the bank while performing data validation to ensure data accuracy and consistency within all reporting tools.
Collaborated with business stakeholders to understand reporting requirements and translate business needs into effective visualizations while analytics solutions optimized Power BI reports for high performance and daily usability for the users.
Documented data models and reports plus reporting processes to ensure transparency and ease of maintenance while utilizing solid understanding of data reporting and visualization to solve various complex business problems effectively.
Communicated insights clearly to technical and non technical stakeholders to drive alignment on project goals while working in data driven and analytics focused environments to deliver high quality and reliable results.
Scaled reporting infrastructure to handle increasing volumes of data while ensuring that dashboards remain responsive and interactive for all users across the organization regardless of the complexity of the datasets.
Researched new features in Power BI to stay ahead of the curve and implement cutting edge visualization techniques while sharing knowledge with junior team members to foster a culture of learning.
Evaluated data quality and implemented cleanup procedures to ensure that only the most accurate information is used in reports while monitoring data ingestion processes to detect and resolve any errors.
Designed custom visualizations to meet specific business needs that standard Power BI tools could not address while ensuring that all reports are accessible and easy to use for every single person.
Balanced multiple projects and priorities to meet tight deadlines while maintaining a high level of quality in all deliverables and ensuring that every stakeholder is satisfied with the final results.
Facilitated workshops and training sessions to help business users understand how to use Power BI effectively while gathering feedback to continuously improve the usability and impact of the existing reporting dashboards.

Environment: Python (PyTest, Playwright), Kubeflow, Vertex AI, H2O.ai, AutoFlow, GitHub Actions, OpenTelemetry (OTel), MLflow, Jaeger, Databricks (dbx), AWS (Lambda, DynamoDB, EventBridge, VPC, S3), LangGraph, SQL, PySpark, Docker, Scikit-Learn, Power BI.

Client: UHG OPTUM, Minnetonka, MN May 2022 July 2023
Role: Machine Learning Engineer
Responsibilities:

Healthcare Analytics and Visualization: Designed and maintained clinical Power BI dashboards to visualize patient risk-scoring outcomes, utilizing complex DAX measures to deliver actionable population health insights to healthcare providers.
Automated Data Validation: Engineered automated data validation frameworks for clinical datasets, implementing rigorous consistency checks to ensure the accuracy of reporting in a highly regulated healthcare environment.
Large-Scale Reporting Infrastructure: Managed the reporting lifecycle for multi-terabyte clinical datasets, leveraging Power Query and star schema modeling to optimize dashboard performance for real-time risk stratification.
Distributed Data Transformation: Architected high-throughput Spark pipelines on Databricks to clean and transform healthcare data, ensuring seamless integration between back-end clinical databases and front-end Power BI reports.
MLOps for Business Intelligence: Orchestrated automated workflows using GitHub Actions and Airflow to update reporting datasets, integrating model versioning and data quality gates to maintain reliable analytics.
Observability and Reporting Health: Implemented observability stacks using OpenTelemetry to monitor report refresh success and data lineage, identifying performance bottlenecks in the flow from Kafka and SQL Server to Power BI.
NLP-Driven Visual Insights: Processed unstructured clinical text using NLP techniques to generate high-value reporting features, enabling the visualization of provider notes and diagnosis trends within interactive dashboards.
Model Interpretability for Stakeholders: Utilized Power BI to communicate complex machine learning outcomes, translating statistical metrics and AUC results into clear visual stories for clinical and non-technical stakeholders.
Compliance and Governance Documentation: Developed structured model cards and reporting runbooks, ensuring that all data visualizations and underlying models adhered to strict healthcare audit and bias assessment standards.
Cloud-Native Reporting Layers: Deployed scalable reporting components using AWS and Docker, optimizing the cost efficiency and speed of large-scale population health analytics.
Environment: Python (PyTest, Playwright), Kubeflow, Vertex AI, H2O.ai, AutoFlow, GitHub Actions, OpenTelemetry (OTel), Jaeger, Databricks (dbx), AWS (Lambda, EMR, S3, CloudWatch), Airflow, Docker, PySpark, Spark MLlib, SQL, Kafka, Hive, EvidentlyAI.
Client: State of VA, Richmond, Virginia April 2019 May 2022
Role: Data Engineer
Responsibilities:

Designed and maintained clinical Power BI dashboards to visualize patient risk scoring outcomes while utilizing complex DAX measures to deliver actionable population health insights to many different diverse healthcare providers today.
Engineered automated data validation frameworks for clinical datasets by implementing rigorous consistency checks to ensure the accuracy of reporting within a highly regulated healthcare environment using various Python based testing tools.
Managed the reporting lifecycle for multi terabyte clinical datasets by leveraging Power Query and star schema modeling to optimize dashboard performance for real time risk stratification for all the medical staff.
Architected high throughput Spark pipelines on Databricks to clean and transform healthcare data while ensuring seamless integration between back end clinical databases and all front end Power BI dashboards and reports.
Orchestrated automated workflows using GitHub Actions and Airflow to update reporting datasets while integrating model versioning and data quality gates to maintain reliable analytics for every single user in the system.
Implemented observability stacks using OpenTelemetry to monitor report refresh success and data lineage while identifying performance bottlenecks in the flow from Kafka and SQL Server to the final Power BI dashboards.
Processed unstructured clinical text using NLP techniques to generate high value reporting features which enabled the visualization of provider notes and diagnosis trends within highly interactive and very modern clinical dashboards.
Utilized Power BI to communicate complex machine learning outcomes by translating statistical metrics and AUC results into clear visual stories for both clinical and non technical stakeholders across the entire organization.
Developed structured model cards and reporting runbooks ensuring that all data visualizations and underlying models adhered to strict healthcare audit and bias assessment standards for compliance and for regulatory review too.
Deployed scalable reporting components using AWS and Docker while optimizing the cost efficiency and speed of large scale population health analytics to provide better care for all patients in the network.
Analyzed large healthcare datasets to deliver actionable insights that drive strategic clinical decisions while performing data validation to ensure data accuracy and consistency within all the Power BI reporting tools used.
Collaborated with clinical stakeholders to understand reporting requirements and translate those needs into effective visualizations while analytics solutions optimized Power BI reports for high performance and daily usability for medical teams.
Documented data models plus reports and reporting processes to ensure transparency and ease of maintenance while utilizing solid understanding of data reporting and visualization to solve various complex clinical problems very effectively.
Communicated insights clearly to medical and non technical stakeholders to drive alignment on project goals while working in data driven and analytics focused environments to deliver high quality and reliable results.
Scaled clinical reporting infrastructure to handle increasing volumes of patient data while ensuring that dashboards remain responsive and interactive for all users regardless of the underlying complexity of the healthcare datasets.
Researched new features in Power BI to stay ahead of the curve and implement cutting edge visualization techniques while sharing knowledge with junior team members to foster a culture of deep learning.
Evaluated healthcare data quality and implemented cleanup procedures to ensure that only the most accurate information is used in reports while monitoring data ingestion processes to detect and resolve any errors.
Designed custom visualizations to meet specific medical needs that standard Power BI tools could not address while ensuring that all clinical reports are accessible and easy to use for every person.
Balanced multiple healthcare projects and priorities to meet tight deadlines while maintaining a high level of quality in all deliverables and ensuring that every clinical stakeholder is satisfied with the results.
Facilitated workshops and training sessions to help medical users understand how to use Power BI effectively while gathering feedback to continuously improve the usability and impact of existing patient reporting dashboards.

Environment: Python (PyTest), GitHub Actions, Databricks (dbx), AWS (S3, EMR, Lambda, CloudWatch), Hadoop, Spark, Kafka, Snowflake, Docker, SQL/PL-SQL, Pandas, NumPy, Scikit-learn, Hive, Jenkins.

Client: American Airlines, Fort worth, TX December 2017 Mar 2019
Role: Sr. Python developer
Responsibilities:

Testing-Heavy Software Engineering: Enforced a rigorous testing-heavy development posture by implementing comprehensive unit testing (PyTest, PyUnit) and mocking frameworks (MOCK) for flight operations and scheduling analytics tools, achieving a 32% increase in system reliability.
Automated CI/CD & Quality Gates: Built and optimized CI/CD pipelines using Jenkins and Git, integrating automated quality gates and unit test suites to ensure reliable, zero-defect delivery of high-value automation features for 24/7 airline operations.
Microservices & API Infrastructure: Designed a microservices-driven architecture using Python, Flask, and Django, exposing business logic and ML-enabled insights through secure REST APIs integrated with enterprise scheduling and crew management systems.
Observability & Instrumentation: Implemented application performance monitoring using Python instrumentation and log analytics, creating proactive alerting systems to maintain operational stability and resolve production issues in near-real-time.
Scalable Data Engineering: Developed ingestion pipelines to pull real-time airline operational feeds from Kafka, SQL Server, and Oracle, utilizing Python-based distributed processing patterns for flight schedule and aircraft health log extractions.
High-Performance Python Optimization: Optimized runtime performance through vectorization (NumPy), concurrency, and memory-efficient techniques, significantly reducing latency for high-volume operational event streams and historical data processing.
Containerization & Portability: "Dangerously" proficient in Docker, containerizing Python services to ensure consistent runtime performance and portability across on-prem and cloud-based airline infrastructure.
Distributed Big Data Integration: Integrated Python-based processing frameworks with Spark, Kafka, and Hadoop to enable scalable, near-real-time ingestion of aircraft and operations datasets into HDFS environments.
Collaborative Technical Lead: Partnered with Product Owners and QA engineers in Agile/Scrum environments, translating flight optimization requirements into modular OOP-based Python designs and maintainable codebases.
Technical Documentation: Produced high-quality API specifications, system flow diagrams, and deployment guides to support long-term maintainability and cross-team knowledge transfer within the engineering organization.
Environment: Python 2.7, MySQL, Microsoft SQL Server, Cassandra, LDAP, Git, Bitbucket, Linux, Windows, JSON, XML, HTML, CSS, JavaScript, jQuery, AngularJS, REST APIs, Bootstrap, Rally, Agile/Scrum, PyCharm, PyUnit, PyTest, MOCK, Beautiful Soup, Matplotlib, Apache Directory Studio, DataStax DevCenter, Ansible, Jenkins, and CI/CD automation tools.

Client: Microsoft, India. June 2014 May 2017
Role: Python Developer
Responsibilities:

Testing-Heavy Development Posture: Engineered Python-based diagnostic tools by enforcing a testing-heavy posture, developing comprehensive unit test suites using PyTest and mocking frameworks to ensure system reliability and prevent regressions in mission-critical monitoring tools.
Modern Observability (o11y) & Telemetry: Designed distributed data-processing architectures integrated with internal telemetry platforms; utilized Python instrumentation and logging frameworks to build threshold-based alerting and real-time diagnostic dashboards for system health.
Automated CI/CD & Quality Gates: Built and managed CI/CD pipelines using Jenkins and Git, automating builds and integrating quality gates to ensure faster release cycles and reduced manual intervention for high-volume diagnostic workflows.
RESTful API Infrastructure: Developed scalable APIs with Flask and Django to expose telemetry metrics and automation results, ensuring secure and standardized data consumption across internal engineering applications and monitoring systems.
Containerization & Consistency: Containerized Python services using Docker to ensure standardized deployments and operational reliability across development, QA, and production environments, eliminating "it works on my machine" issues.
Data Ingestion & Extraction: Engineered ingestion pipelines to retrieve data from APIs, SQL Server, and telemetry event streams, utilizing modular OOP-based Python and Pandas to automate extraction and maintain real-time diagnostic data availability.
High-Performance Optimization: Optimized processing pipelines through vectorization using NumPy and concurrency strategies, significantly improving throughput and reducing latency for large-scale performance logging and data parsing.
Data Integrity & Schema Validation: Performed extensive parsing and schema validation using regex and JSON/XML handlers, ensuring high-quality datasets for downstream analytics and engineering leadership insights.
Strategic Agile Collaboration: Partnered with Program Managers and QA teams in Agile/Scrum environments, translating complex diagnostic requirements into technical roadmaps and detailed architectural documentation.

Environment: Python (PyTest, PyUnit, MOCK), Jenkins (CI/CD), Docker, SQL Server, MySQL, Teradata, NumPy, SQLAlchemy, Flask, Django, Tableau, Anaconda, Beautiful Soup.


Education: Bachelor of Technology in Computer Science and Engineering
GITAM University, India 2010 2014
Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning access management business intelligence sthree rlang information technology procedural language Minnesota New York Texas Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6689
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: