Home

Sai Verry Keerthan - Data Engineer
[email protected]
Location: Dallas, Texas, USA
Relocation: Any Location
Visa: OPT
Resume file: SaiVarre_Data_Engineer_Resume_1772725626203.pdf
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Sai Varre
Richmond, VA | [email protected] | +1(346)-285-2722 |LinkedIn |Trailhead
Summary
Data Engineer with 4+ years of experience designing, building, and optimizing batch and streaming data pipelines
across Azure, AWS, and GCP environments. Hands-on with Python, SQL, Spark, Kafka, Airflow, dbt, and Snowflake,
delivering scalable ETL/ELT workflows and high-quality analytical data models supporting enterprise reporting.
Skilled in implementing data quality validation, schema enforcement, and orchestration reliability using Great
Expectations, Kafka Schema Registry, and parameterized Airflow pipelines. Experienced partnering with analytics
and ML teams to operationalize production-ready dataflows, improve reporting accuracy, and support predictive and
anomaly detection use cases through modern DataOps practices and governed data infrastructure.
Work Experience
AT&T Data Engineer
Cloud Data Platform | ETL Orchestration | DataOps & Observability | Telecom Analytics
May 2023 - Present
Engineered and maintained scalable real-time ETL pipelines using Azure Data Factory, AWS Glue, and
Kafka, processing 10+ TB/month of telecom data (Parquet/Delta) for analytics dashboards.
Implemented Spark Streaming and Snowflake Streams frameworks to achieve sub-minute latency for
network performance analytics and monitoring.
Contributed to standardized ingestion + transformation patterns across Azure, GCP, and Snowflake
environments, improving code reusability and reducing onboarding time for new pipelines.
Automated data validation and quality checks using Great Expectations, detecting schema drift and null
anomalies, and reducing data quality incidents by 35%.
Introduced producer consumer data contracts using Kafka Schema Registry and CI checks
(backward-compatibility, required fields), preventing breaking changes before deploys.
Developed and automated modular dbt transformations for 200+ business tables (with CI/CD), supporting
customer behavior, fraud detection, and predictive maintenance analytics.
Deployed cross-cloud pipelines using GCP Dataflow, Cloud Storage, BigQuery and Cloud Composer,
automating ingestion from Azure Data Lake and enabling unified, monitored analytics across environments.
Configured Pub/Sub connectors for near real-time synchronization between Kafka and BigQuery,
improving multi-region data accessibility and query performance by 20%.
Added table-level lineage and run-history metadata logging for batch and streaming pipelines, improving
traceability and speeding up root-cause analysis during data incidents.
Improved Snowflake performance and reduced cloud costs by ~20% through warehouse tuning and
optimized query caching.
Enhanced data observability with Grafana, Azure Monitor and Power BI dashboards to monitor pipeline
throughput, SLA breaches, and error trends across distributed workloads.
Collaborated with Data Scientists to operationalize ML pipelines within Databricks, integrating predictive
churn and anomaly detection models into production dataflows.
Created parameterized Airflow DAGs to manage cross-cloud dependencies among Snowflake, Redshift,
and PostgreSQL workloads, improving reliability and automation.
Enhanced pipeline reliability by introducing fault-tolerant execution patterns, including retry handling
and targeted backfills, which reduced on-call intervention during pipeline failures.
Partnered with product teams to define telecom data governance standards cataloging, lineage tracking,
and access controls using Azure Purview and AWS Glue Data Catalog.
Authored runbooks, support playbooks, and post-incident documentation, and coordinated RCA review
sessions, improving onboarding, knowledge transfer, and support handoffs across data teams.
BluePal Solutions Pvt Ltd, India Data Engineer
Nov 2020 - Aug 2022
Built Python ETL pipelines to process academic and financial data into PostgreSQL, reducing manual
reporting work by 40%.
Designed PostgreSQL schemas with indexes and partitions, cutting query latency from 5s+ to under
2s for high-volume analytics queries.
Integrated 10+ university systems with secure REST APIs, enabling consistent data exchange for
enrollment, finance, and student management.
Automated ETL testing and validation with Python, SQL, and JUnit, improving trust in nightly
pipelines and reducing defects by 25%.
Developed Kafka producers and consumers to stream enrollment and attendance events in real time,
providing administrators timely operational insights.
Containerized ETL workloads with Docker and Jenkins CI/CD, improving consistency across
environments and cutting deployment effort by half.
Implemented Python + Great Expectations validation suites for nightly ETL jobs, reducing reporting
errors by 25% and improving PostgreSQL data reliability.
Migrated legacy academic data pipelines to GCP BigQuery (sandbox) and Cloud Storage for
benchmarking, introducing cost-efficient analytics at scale.
Collaborated with BI teams and delivered 5+ forecasting dashboards for enrollment planning (used by
200+ administrators) to improve resource allocation.
Supported deployment of containerized ETL services to Azure App Service using Azure DevOps
pipelines, contributing to the team s early cloud migration efforts.
Designed role-based access and basic data-anonymization scripts in Python/SQL to secure sensitive
student records.
Wrote technical documentation and onboarding guides, reducing new engineer ramp-up time by
30% and promoting best practices.
Monitored production by analyzing PostgreSQL logs and Kafka streams, diagnosing bottlenecks and
sustaining 99.9% pipeline uptime.
Technical Skills
Programming Languages: Java, Python, Shell, C, C++, Bash, Scala, SQL, YAML
Data Engineering & Big Data: ETL/ELT, ETL Testing, Apache Kafka, Apache Spark, Hadoop, Airflow, dbt, Delta Lake,
Databricks, Great Expectations, DataOps, Data Modeling
Frameworks / Libraries: Flask, TensorFlow, PyTorch, Scikit-learn, spaCy, NLTK, BERT, PySpark
Cloud & DevOps: AWS (S3, Glue, Kinesis, Lambda, CDK, CloudWatch, CodePipeline, CodeBuild), Azure (ADF, Synapse,
Azure DevOps, Purview, Monitor), GCP (BigQuery, Dataflow, Pub/Sub, Composer), Docker, Jenkins, Terraform
Databases & Warehousing: PostgreSQL, MySQL, Oracle, MongoDB, Snowflake, Redshift, BigQuery, NoSQL
Testing & Monitoring & Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Datadog, CloudWatch (logs, metrics,
alarms)
Data Analysis & Visualization: Tableau, Power BI, Excel (Advanced), Matplotlib, Seaborn
Testing Tools: JUnit, PyTest, JMeter, REST Assured (for pipeline/API testing)
Development & Collaboration Tools: Git, GitHub, Bitbucket, Jira, Confluence, Jupyter, Pandas, NumPy, TKinter
Education
University of Wisconsin-Milwaukee Master s in Computer Science
May 2024
Related Coursework: Natural Language Processing, Robot Motion Planning, Machine Learning, Computational
Intelligence, Computer Networks, Computational Models Decision Making, Operating Systems, Data Analytics
Jawaharlal Nehru Technological University, Hyderabad B.Tech in Computer Science Engineering
Nov 2020
Keywords: cprogramm cplusplus continuous integration continuous deployment machine learning business intelligence sthree Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6949
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: