| Kalyan - - DATA SCIENCE |
| [email protected] |
| Location: Dallas, Texas, USA |
| Relocation: YES |
| Visa: GC |
| Resume file: Kalyan_Data Scientist Resume (1)_1761227323728.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Kalyan Chakravarthy Paleti
Senior Data Scientist Phone: 9453480515 E-mail: [email protected] PROFESSIONAL SUMMARY Experienced and versatile Senior Data Scientist with 12+ years of experience delivering enterprise-grade, AI/ML-driven data solutions across telecom, legal, media, finance, energy, retail, and public sectors. Proven expertise in advanced analytics, cloud-native ML development, and scalable deployments. Key highlights include: Expertise in supervised and unsupervised ML (classification, clustering, regression, anomaly detection, recommendation engines, risk/fraud scoring, churn prediction, forecasting). Advanced experience in deep learning using TensorFlow, PyTorch, and Keras, covering CNNs, RNNs, LSTMs, Transformers, and ONNX models. Strong NLP background: NER, topic modeling, summarization, sentiment analysis, semantic search, embeddings, and advanced LLM-based pipelines (BERT, GPT, Hugging Face Transformers, LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen). Hands-on in MLOps & LLMOps: MLflow, Airflow, Docker, Kubernetes (EKS, AKS, OKE), Jenkins, GitHub Actions, OCI/Azure ML/SageMaker pipelines for CI/CD, monitoring, drift detection, and explainability (SHAP, LIME). Proven ability in time-series forecasting: ARIMA, SARIMA, Prophet, Holt-Winters, LSTM, and deep forecasting for demand, financial, and operational predictions. Designed and deployed real-time data/ML pipelines leveraging Kafka, Kinesis, Flink, Spark Streaming, and cloud-native event-driven architectures. Cloud-native experience across AWS (S3, Redshift, Glue, SageMaker, Lambda, Step Functions, DynamoDB), Azure (Azure ML, Synapse, AKS), and GCP (Vertex AI, Dataproc, BigQuery). Built enterprise-scale ETL/data pipelines using PySpark, Dask, Oracle Data Integrator, Snowflake, Hive, SQL, ODI, and Delta Lake. Skilled in data visualization & BI with Power BI, Tableau, Grafana, and Looker, delivering executive dashboards and interactive analytics. Strong programming foundation: Python, R, Java, C/C++, SQL, JavaScript, React, Angular, HTML/CSS, enabling full-stack integration when required. Experienced in data governance, Responsible AI, compliance-driven ML (FCC, HIPAA, GDPR, financial/insurance regulations), embedding fairness, bias detection, and auditability into ML lifecycles. Domain expertise across Telecom (network analytics, fraud detection), Legal (litigation analytics, e-discovery), Finance & Insurance (fraud detection, claims forecasting), Retail (churn, recommendations, demand forecasting), Energy (reservoir optimization), Media (ad targeting, content recs), and Public Sector (infrastructure monitoring, policy compliance). TECHNICAL SKILLS: Category Technologies / Tools Programming Languages Python, R, Java, JavaScript, TypeScript, jQuery, ReactJS, Next.js, Angular, HTML, CSS, C, C++, SQL, Hive, Impala Machine Learning & AI Scikit-learn, Spark MLlib, XGBoost, LightGBM, H2O.ai, MLflow, RapidMiner, Weka, IBM Watson ML, Amazon Comprehend Deep Learning TensorFlow, PyTorch, Keras, ONNX, Hugging Face Transformers, MXNet, CNTK, DeepLearning4j NLP BERT, GPT, TF-IDF, NLTK, spaCy, Hugging Face Transformers, Sentence Transformers, LDA, NER, Summarization Time Series & Forecasting ARIMA, SARIMA, Holt-Winters, Prophet, LSTM, RNN MLOps / LLMOps MLflow, Airflow, Kubeflow, Docker, Kubernetes (EKS/AKS/OKE), Jenkins, GitHub Actions, GitLab CI/CD, SageMaker Pipelines, Azure ML, Vertex AI Big Data & Streaming Apache Spark, PySpark, Hadoop, MapReduce, Hive, Pig, HBase, Kafka, Kinesis, Flink, Databricks Cloud Platforms AWS (S3, Redshift, SageMaker, Glue, Lambda, Step Functions, DynamoDB, EC2), Azure (ML, Synapse, AKS), GCP (Vertex AI, BigQuery, Dataflow, Dataproc, Pub/Sub) Visualization / BI Tableau, Power BI, Looker, Grafana, Oracle Analytics Cloud Databases PostgreSQL, MySQL, SQL Server, MongoDB, DynamoDB, Oracle, Snowflake, Delta Lake LLM Agent Frameworks LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen DevOps / Version Control Git, GitHub, GitLab, Bitbucket, Jenkins, Docker, Kubernetes, CI/CD Other Tools & APIs REST APIs, FastAPI, Flask, OpenCV, OCR, Prometheus, Evidently AI, SHAP, LIME PROFESSIONAL EXPERIENCE Client: Oracle Oct 2023 Present Role: Senior Data Scientist Location: Connecticut Responsibilities: Delivered predictive and prescriptive analytics for customer retention, product adoption, and sales optimization using Logistic Regression, Random Forests, Gradient Boosting, and XGBoost. Utilized Python (Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib) for EDA, feature engineering, and model development, ensuring interpretability and alignment with business stakeholders. Designed and optimized SQL queries and data pipelines across Oracle Autonomous Data Warehouse, Oracle Data Lakehouse, and Delta Lake, enabling large-scale data processing for enterprise analytics. Built interactive dashboards in Oracle Analytics Cloud, Power BI, and Tableau to track KPIs such as customer engagement, product usage, churn risk, and financial performance. Engineered scalable ETL workflows using Oracle Data Integrator (ODI), Python, and OCI Data Integration to automate ingestion and transformation of structured, semi-structured, and streaming data. Applied clustering (K-Means, DBSCAN) and PCA to uncover customer segments, usage trends, and cross-sell/upsell opportunities, driving data-informed go-to-market strategies. Integrated Git-based version control and automated ML pipelines with OCI Data Science, MLflow, and GitHub Actions, streamlining experimentation and deployment workflows. Partnered with product managers and business leaders to translate complex ML insights into actionable strategies for customer success, revenue growth, and operational efficiency. Collaborated with ML engineers and DevOps teams to containerize ML models with Docker and deploy them to Kubernetes (OKE) and OCI AI Services for scalable real-time inference. Ensured data quality and reliability through anomaly detection, automated validation checks, and monitoring via OCI Monitoring, Logging, and Alerts. Developed reusable Python utilities for feature engineering, data validation, and model evaluation, standardizing practices and reducing development effort across teams. Designed and executed A/B testing and uplift modeling to measure the impact of new customer engagement strategies and product features, delivering data-backed recommendations. Conducted model explainability analysis using SHAP and LIME, increasing trust and adoption of ML models in revenue-critical applications. Researched and prototyped advanced ML and GenAI approaches (e.g., LSTM for financial time series forecasting, LLMs for customer support automation), laying the foundation for enterprise-wide AI adoption. Championed Responsible AI practices, embedding bias detection, fairness evaluations, and governance processes to align with Oracle s global compliance and ethics standards. Environment & Tools: Python (Pandas, NumPy, Scikit-learn, XGBoost, TensorFlow, Seaborn, Matplotlib), SQL, Oracle Cloud Infrastructure (OCI Data Science, OCI Data Integration, Oracle Autonomous Data Warehouse, Oracle Analytics Cloud, OKE), Power BI, Tableau, Git/GitHub Actions, MLflow, Docker, Kubernetes, Agile. Client: Bulls Attorneys P.A Aug 2022 Sep 2023 Role: Senior Data Scientist Location: Wichita, Kansas Responsibilities: Built predictive models for legal risk scoring, litigation forecasting, and fraud detection using Logistic Regression, Decision Trees, Random Forest, and XGBoost. Utilized Python (Pandas, NumPy, Scikit-learn, NLTK, SpaCy) for exploratory data analysis, feature engineering, NLP-based text classification, and entity recognition from unstructured legal documents. Extracted, transformed, and analyzed structured data from Oracle and Snowflake along with unstructured case files, contracts, and regulatory filings to build litigation and compliance analytics pipelines. Developed interactive dashboards in Power BI and Tableau to provide attorneys and compliance officers with insights on case volumes, risk exposure, and litigation timelines. Collaborated with legal teams to deploy production-ready models via Azure ML pipelines, integrating with firm-wide case management and document review systems. Applied clustering and topic modeling (LDA, K-Means) to organize large volumes of legal documents and support e-discovery workflows. Built document similarity and semantic search solutions using TF-IDF, word embeddings, and sentence transformers, enabling faster case preparation and precedent identification. Ensured transparency and compliance of AI systems by applying explainability techniques (SHAP, LIME), critical for decisions influencing legal strategy and client risk management. Conducted time series modeling (ARIMA, Prophet) to forecast case backlogs, trial timelines, and settlement trends, supporting resource allocation and strategic planning. Partnered with governance and compliance teams to ensure adherence to HIPAA, GDPR, and legal data confidentiality standards across data pipelines and model deployments. Developed reusable Python modules for automated document preprocessing, OCR integration, and model evaluation, standardizing workflows across multiple legal analytics projects. Collaborated with DevOps to containerize ML workflows using Docker and deploy them on Kubernetes clusters (AKS) for scalable inference across large legal datasets. Conducted root cause analyses on data inconsistencies in case files and regulatory datasets, implementing automated anomaly detection and data validation pipelines. Delivered training workshops for attorneys and paralegals to enhance data literacy, promoting adoption of AI-driven legal research, compliance monitoring, and litigation support tools. Environment & Tools: Python (Pandas, NumPy, Scikit-learn, XGBoost, NLTK, SpaCy, Transformers), SQL, Oracle, Snowflake, Power BI, Tableau, Azure ML, Git, Docker, Kubernetes (AKS), SHAP, LIME, OCR/NLP pipelines, CI/CD, Legal Analytics & Compliance AI. Client: State of SD May 2020 July 2022 Role: Senior Data Scientist Location: South Dakota Responsibilities: Developed and deployed ML models for telecom performance network reliability, dropped-call prediction, and signal degradation cutting outage risk by 25%. Engineered and optimized large-scale data pipelines using Python (Pandas, NumPy, Dask, Scikit-learn, XGBoost, PySpark) to process multi-terabyte telecom datasets such as tower signal metrics, subscriber behavior logs, and network KPIs. Built fraud detection and anomaly detection systems leveraging classification (Logistic Regression, Random Forest, Gradient Boosting) and clustering algorithms (K-Means, DBSCAN, Hierarchical Clustering) to flag abnormal call records and network traffic patterns in near real-time. Developed SQL and PySpark scripts to extract, cleanse, and aggregate structured/unstructured telecom operational data from OSS/BSS, CDR (Call Detail Records), and network logs, ensuring high data quality and readiness for ML modeling. Implemented time series forecasting (ARIMA, SARIMA, Facebook Prophet, LSTM models) to predict usage spikes, network congestion, and bandwidth demand, enabling proactive infrastructure scaling. Created interactive dashboards in Power BI and Tableau for visualization of network KPIs, predictive alerts, fraud detection metrics, and customer experience trends, supporting executives and field engineers in decision-making. Partnered with telecom engineers to integrate ML outputs with automated alerting and monitoring systems, enabling proactive incident management and reducing mean-time-to-repair (MTTR). Developed model monitoring frameworks (concept drift detection, data quality checks, performance benchmarking) using MLflow, Evidently AI, and custom monitoring scripts, ensuring long-term stability of production models. Conducted feature engineering and correlation analysis to uncover the impact of geography, device type, and user behavior on network performance, improving model interpretability and business adoption. Embedded model governance, fairness, and explainability (LIME, SHAP) into the ML lifecycle, ensuring compliance with FCC regulatory requirements and internal audit standards. Applied cloud-based deployments (AWS S3, EC2, Lambda, SageMaker) and integrated models into CI/CD pipelines (Git, Jenkins, Docker, Kubernetes) for scalable and automated production rollouts. Supported network expansion planning by providing data-driven insights on coverage optimization, bandwidth allocation, and rural network expansion strategies. Automated reporting pipelines (Python, Airflow, Power BI Service) to deliver weekly and monthly insights on service reliability, latency, and call failure trends for executive and regulatory board reviews. Collaborated with cross-functional teams including network architects, data engineers, and business analysts to align ML models with SLA compliance, customer satisfaction, and operational efficiency KPIs. Environment & Tools: Python (Pandas, NumPy, Scikit-learn, XGBoost, PySpark, TensorFlow, Prophet, ARIMA/LSTM), SQL, Hive, OSS/BSS/CDR data, Power BI/Tableau, AWS (S3, Redshift, SageMaker, Lambda), Git, Jenkins, Docker, Kubernetes, MLflow, SHAP/LIME, Telecom Analytics. Client: Zion Oil & Gas Inc. August 2017 April 2020 Role: Data Scientist Location: Texas Responsibilities: Designed and deployed advanced predictive models using ensemble methods and time-series forecasting to optimize well performance, increasing production efficiency by identifying high-yield targets. Processed real-time sensor telemetry from drilling rigs using AWS Kinesis and Lambda, automating predictive maintenance alerts that reduced unplanned downtime by 20%. Applied deep learning (CNNs) to classify seismic imagery, enhancing reservoir simulation accuracy and supporting strategic exploration decisions. Developed geospatial analytics pipelines with Python and OpenCV to integrate satellite, field, and sensor data, enabling accurate reservoir mapping and well placement. Deployed scalable machine learning models via SageMaker, supporting continuous inference for reservoir forecasting and operational fault detection. Automated end-to-end ingestion pipelines with AWS Step Functions and Kinesis, standardizing sensor-to-model workflows for faster analytics. Integrated heterogeneous datasets (relational and NoSQL) to build a centralized data hub, streamlining analytics and reporting. Created interactive operational dashboards using Power BI and Grafana to monitor drilling KPIs, production efficiency, and maintenance events in real time. Implemented anomaly detection algorithms to flag irregular pressure, temperature, and flow metrics, enabling proactive operational interventions. Trained reservoir performance models on GPU clusters, reducing model training times and improving predictive accuracy. Developed RESTful APIs to deliver model insights to field engineers, enhancing decision-making speed and accuracy. Leveraged LightGBM and XGBoost for high-accuracy regression and classification analyses on oilfield production and exploration datasets. Managed structured and unstructured data across AWS S3, Redshift, and HDFS, ensuring end-to-end data lifecycle management. Partnered with engineering and geoscience teams to validate model outputs, document assumptions, and integrate analytics into operational workflows. Conducted training sessions and workshops for field analysts, enabling adoption of predictive insights for drilling optimization and reservoir management. Environment: AWS SageMaker, PyTorch, LightGBM, XGBoost, AWS Lambda, Kinesis, Step Functions, Redshift, S3, Grafana, OpenCV, Python, SQL, NoSQL, REST APIs, Power BI. Client: Lowe s Nov 2012 May 2016 Role: Big Data Engineer / Data Scientist Location: North Carolina, USA Responsibilities: Designed and implemented large-scale data processing architectures using Hadoop, MapReduce, Hive, Pig, and HBase, enabling analysis of multi-terabyte retail transaction data. Developed customer segmentation and churn prediction models using logistic regression, decision trees, and Naive Bayes, supporting targeted marketing campaigns and loyalty programs. Built market basket analysis and recommendation engines using association rule mining (Apriori, FP-Growth) and early Apache Mahout, improving cross-sell and upsell opportunities. Created demand forecasting models using ARIMA, Holt-Winters, and regression-based methods, helping optimize inventory management and supply chain planning. Automated ETL pipelines with Sqoop, Oozie, and Airflow (early adoption) to ingest and schedule batch processing of sales and CRM data. Leveraged Spark (1.x) and Spark Streaming for faster processing and near real-time insights on pricing, promotions, and online transactions. Built NLP pipelines using TF-IDF and topic modeling (LDA) to analyze customer reviews and call center transcripts, driving product improvement strategies. Developed sentiment analysis using SVM classifiers and logistic regression, helping the business measure customer satisfaction at scale. Deployed predictive models via Flask REST APIs and Docker containers, creating reusable inference services for marketing and operations teams. Designed self-service BI dashboards in Tableau and Power BI, enabling executives and store managers to track KPIs in real-time. Conducted data quality profiling and anomaly detection to validate CRM and POS data, ensuring accurate insights for analytics and reporting. Led internal initiatives to create reusable ML code templates and establish early experiment tracking frameworks for consistent model governance. Environment: Hadoop, MapReduce, Hive, Pig, HBase, Spark (1.x), Sqoop, Oozie, Kafka (early adoption), Python (pandas, scikit-learn, NLTK), R, Apache Mahout, Flask, Docker, Tableau, Power BI, SQL Server, SAS. Keywords: cprogramm cplusplus continuous integration continuous deployment artificial intelligence machine learning javascript business intelligence sthree active directory rlang golang South Dakota |