Home

Lalith Kumar - Principal BI/Data Architect | Snowflake | Databricks | Power BI
[email protected]
Location: San Jose, California, USA
Relocation: Open
Visa:
Resume file: Lalith_Kumar_Resume_1777561181944.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Lalith Kumar
Summary
Principal BI/Data Architect with 13 years of experience delivering enterprise data platforms, analytics, and ML- driven BI solutions across security, technology, healthcare, and manufacturing industries. Expertise in Snowflake, Databricks, Power BI, Hadoop/Spark, AWS, and Azure data services. Proven track record in architecting scalable data platforms, building predictive analytics pipelines, and delivering executive-ready dashboards.
Technical Skills
Data Platforms & Warehousing: Snowflake, Databricks, Hadoop/Spark, Redshift, Synapse, BigQuery, Teradata, SQL Server
ETL/ELT & Integration: ADF, SSIS, Informatica, Talend, Glue, dbt, Kafka, Spark Streaming, API pipelines
Visualization & Analytics: Power BI (RLS, DAX, Paginated Reports, Composite Models), QlikView/Qlik Sense, Tableau, SSRS
Programming & ML: Python (Pandas, scikit-learn, PySpark), SQL (Advanced), Snowpark UDFs
Governance & Compliance: RBAC, Encryption, Lineage, GDPR, SOC 2, Purview, Unity Catalog
DevOps & Automation: Azure DevOps, Git, CI/CD for ADF & Power BI, pbi-tools, REST APIs
Experience
NVIDIA Santa Clara, CA Dec 2025 Present
Data Engineer Lead
Led design and implementation of enterprise-scale data platforms on Databricks (Lakehouse architecture) supporting semiconductor supply chain and planning analytics.
Architected and optimized high-performance PySpark ETL pipelines processing large-scale datasets (10M 100M+ records), reducing pipeline runtime by 30 50%.
Owned end-to-end development of Delta Lake data models leveraging MERGE, CDC, and schema evolution to enable scalable incremental processing and high data reliability.
Designed and implemented event-driven data ingestion frameworks (file arrival triggers), enabling near real- time data availability for business-critical use cases.
Established data quality, validation, and reconciliation frameworks across source, target, and delete layers, significantly improving data accuracy and trust.
Led CI/CD automation using GitLab and YAML-based Databricks bundles, standardizing deployments across dev, QA, and production environments.
Directed development and orchestration of Databricks workflows and jobs, including monitoring, alerting, and failure handling.
Optimized Spark SQL and PostgreSQL performance, improving query efficiency for large-scale analytical workloads.
Drove integration of enterprise data from SAP IBP (OData/APIs) into the analytics ecosystem, enabling seamless planning and reporting workflows.
Partnered with cross-functional teams (data science, business, and engineering) to deliver scalable data solutions and curated datasets for executive reporting and analytics.
Enabled ML-ready data pipelines by designing clean, feature-engineered datasets for predictive analytics and forecasting models.
Collaborated with data science teams to support machine learning use cases such as demand forecasting and lead-time optimization.
Built reusable data layers supporting MLflow-based workflows in Databricks, accelerating experimentation and model deployment.
Applied feature engineering and statistical techniques using Python to enhance model performance.
Contributed to AI-driven solutions including anomaly detection and intelligent data validation in data pipelines.

Key Technologies: Databricks, Apache Spark (PySpark), Delta Lake, SQL, Python, PostgreSQL, GitLab CI/CD, SAP IBP, MLflow, Data Modeling, Data Quality, Performance Optimization
SonicWall, Inc. Milpitas, CA Jan 2022 Nov 2025
Principal Engineer (Enterprise Data & Analytics Consultant)
Designed and implemented Snowflake enterprise data warehouse with multi-zone architecture (Raw, Confirm, Silver/Gold layers); configured S3 external stages, Snowpipe for continuous ingestion, clustering keys, and micro-partitioning to improve query efficiency.
Built ADF pipelines with linked services, datasets, parameterized copy activities, and triggers for incremental loads and CDC; automated deployments using ARM templates.
Modeled star-schema and snowflake-schema datasets for Power BI; developed 30+ dashboards (executive KPIs, churn, ARR, product telemetry) with advanced DAX measures (CALCULATE, VAR, SWITCH, RANKX); configured composite models (Import + Direct Query) for hybrid performance.
Developed predictive churn and revenue models using Snowpark, Python (scikit-learn), and SQL UDFs, registered ML models in Snowflake, generated predictions, and exposed outputs via Power BI dashboards.
Implemented data security frameworks: RBAC roles, masking policies, row-level/column-level security in Snowflake; static/dynamic RLS in Power BI; Azure AD SSO integration.
Reduced query latency by 40% through Snowflake warehouse performance tuning with multi-cluster auto- scaling, clustering strategies, and query profiling; optimized Power BI incremental refresh and aggregations.
Built CI/CD pipelines for Power BI (using pbi-tools CLI and DevOps pipelines) and ADF JSON templates, automated dataset refresh and report deployment via REST APIs.
Environment: Snowflake, Snowpark, ADF, Power BI, Python, SQL, Azure DevOps, Salesforce, Zuora, NetSuite, Gainsight
SonicWall, Managed Security Services Milpitas, CA Oct 2018 Jan 2022
SW Dev Principal Engineer (Data Engineering & Analytics Consultant)
Architected Hadoop/Spark data lake processing 3TB+/day of firewall and threat telemetry; designed Hive partitioned tables with ORC/Parquet formats for efficient storage and analytics.
Migrated workloads from Hadoop to Azure Databricks with Delta Lake; built PySpark notebooks with complex ETL (joins, aggregations, window functions) and Delta tables with Z-Ordering and Time Travel for compliance.
Developed Kafka + Spark Structured Streaming pipelines for high-velocity threat feeds; implemented water- marking, checkpointing, and sliding windows for reliable streaming analytics.
Designed and delivered real-time threat dashboards in Power BI and Qlik integrating Databricks outputs; built geo heatmaps, drill-through reports, and risk scoring models.
Reduced ETL runtime from 4 hours to 1.2 hours through Spark job optimization via broadcast joins, partition pruning, and caching strategies.
Configured governance and lineage using Apache Atlas and Unity Catalog; integrated RLS into Power BI for client-specific access.
Environment: Hadoop, Spark, Databricks, Delta Lake, Kafka, Hive, Power BI, Qlik, Angular
Dell Technologies Santa Clara, CA Aug 2016 Oct 2018
Senior BI Developer / Consultant
Built Informatica ETL mappings (Source Qualifier, Lookup, Router, Aggregator, Update Strategy) to ingest Salesforce, Eloqua, and Google Analytics data into Big Query; implemented CDC for incremental loads.
Automated data ingestion with Python REST API jobs to pull campaign and CRM data into staging layers.
Modelling data into dimensional schemas (star schema, SCD Type 2) for marketing analytics.
Created QlikView/Qlik Sense dashboards with Set Analysis, alternate states, and section access for security; delivered KPIs including MQL SQL funnel, campaign ROI, and lead attribution.
Piloted Power BI dashboards with DAX measures for funnel conversion and marketing spend analysis.
Improved Informatica pipeline performance by 35% via partitioning and caching.
Environment: Informatica, Big Query, Salesforce, Eloqua, Python APIs, QlikView/Qlik Sense, Power BI

Toro El Cajon, CA Dec 2015 Jul 2016
Senior BI Analyst / Developer (Consultant)
Integrated SAP ECC finance and sales data into SQL Server staging using SSIS packages (Derived Column, Merge Join, Lookup, Conditional Split).
Built QlikView dashboards for sales margin and regional analysis using Set Analysis and cyclic groups.
Developed early Power BI POCs connecting directly to SQL Server; designed KPIs and drill-through reports for sales leadership.
Reduced reporting latency by 50% via query tuning and caching.
Environment: QlikView, SQL Server, SSIS, SAP ECC, Power BI (POC)
GlobalFoundries Santa Clara, CA Jul 2014 Nov 2015
Senior Programmer/Analyst (Consultant)
Designed and deployed Oracle + Ab Initio enterprise data warehouse for semiconductor yield, defect tracking, and equipment downtime analytics.
Built SSRS drill-down reports and QlikView dashboards with alternate states for operations teams.
Created fact and dimension models aligned to manufacturing KPIs; enabled near-real-time dashboards that improved fab yield efficiency by 8%.
Environment: Oracle, Ab Initio, QlikView, SSRS, SQL Server
Premier, Inc. Charlotte, NC Jan 2014 Jun 2014
Senior QlikView Developer (Consultant)
Enhanced QlikView dashboards with advanced Set Analysis, complex expressions, and incremental load strategies.
Developed healthcare cost and patient outcome dashboards supporting HIPAA compliance and driving multi- million-dollar savings for member hospitals.
Environment: Oracle, SQL Server, QlikView
UnitedHealth Group Minneapolis, MN May 2013 Dec 2013
QlikView Developer (Consultant)
Built customer support dashboards in QlikView integrating SQL Server and Excel data.
Applied section access for row-level security and used nested conditional expressions for SLA adherence tracking.
Improved case visibility, reducing resolution times and SLA breaches.
Environment: Oracle, SQL Server, QlikView
Key Achievements
Reduced PySpark ETL pipeline runtimes by 30 50% at NVIDIA through architectural optimization on Databricks Lakehouse.
Cut Snowflake query latency by 40% at SonicWall via multi-cluster auto-scaling and clustering strategies.
Reduced ETL runtime from 4 hours to 1.2 hours at SonicWall MSS through Spark job optimization (broadcast joins, partition pruning, caching).
Improved Informatica pipeline performance by 35% at Dell Technologies through partitioning and caching.
Reduced reporting latency by 50% at Toro via query tuning and caching.
Improved semiconductor fab yield efficiency by 8% at GlobalFoundries through near-real-time operational dashboards.
Developed HIPAA-compliant healthcare dashboards at Premier, Inc. driving multi-million-dollar savings for member hospitals.
Architected and delivered 30+ Power BI dashboards at SonicWall covering executive KPIs, churn, ARR, and product telemetry.

Education
M.S., Electrical Engineering Texas A&M University, Kingsville, TX 2013
B.Tech, Electrical & Electronics Engineering JNTU, India 2011
Certifications
AWS Certified Data Analytics Specialty
AWS Certified Database Specialty
Microsoft Certified: Power BI Data Analyst Associate (PL-300)
Google Cloud: Generative AI, LLMs, Responsible AI
Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning business intelligence sthree active directory procedural language California Minnesota North Carolina Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7263
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: