Home

Veda Anand - Sr. Data Engineer
[email protected]
Location: Booneville, Arkansas, USA
Relocation: Yes
Visa: H4EAD
Resume file: Navya DE Resume (1)_1763045170345.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
NAVYASRI A
Sr. Data
Engineer
Email:
[email protected]
LinkedIn: https://www.linkedin.com/in/navyasri-a-
5ab819211/
PROFESSIONAL SUMMARY:
Senior Data Engineer with 8+ years of experience designing, developing, and optimizing large-scale data
pipelines and cloud-native architectures across AWS, Azure, and GCP, with a strong focus on
performance, scalability, and reliability.
Extensive hands-on expertise in Python and Java, building modular, production-ready data
engineering solutions and API-driven workflows in GCP, Databricks, and on-premise environments.
Proven experience in developing and orchestrating complex ETL/ELT frameworks using Python, DBT,
Talend, Matillion, and Databricks Notebooks to support batch and streaming workloads across multi-
cloud platforms, including GCP.
Strong command over Apache Spark (Core, SQL, PySpark, Streaming) and Databricks for processing
high-volume structured and semi-structured data, leveraging GCP Dataproc and BigQuery for large-
scale distributed computing.
Expertise in building robust, cloud-native data pipelines using GCP Dataflow, Apache Beam, Cloud
Composer (Airflow), Cloud Storage, and Pub/Sub, supporting real-time and batch ingestion pipelines.
Skilled in implementing real-time event streaming solutions using Apache Kafka, GCP Pub/Sub, and
Databricks Structured Streaming, enabling event-driven analytics and time-sensitive processing.
Deep experience with modern data warehousing platforms such as BigQuery (GCP), Snowflake,
Redshift, and Azure Synapse, including performance tuning, cost optimization, and data lifecycle
management.
Proficient in data modeling using Kimball methodology, designing scalable Star and Snowflake
schemas for BI tools, and implementing model-driven designs in Python, SQL, and Java pipelines.
Strong SQL, PL/SQL, and Python-based scripting capabilities, with expertise in window functions,
CTEs, stored procedures, and performance optimization for both traditional and cloud-native
databases.
Experience integrating machine learning models into production using Python, Databricks ML, and Vertex AI
(GCP) to support predictive analytics and intelligent automation.
Experienced in developing RESTful APIs using Python (Flask/FastAPI) and Java (Spring Boot) for
automated data delivery, service integration, and microservices architecture within GCP.
Familiar with modern DevOps and CI/CD tools for data engineering, including Terraform for GCP,
GitHub Actions, and Docker, streamlining deployment pipelines for Databricks and GCP-based
workflows.
Expertise in dashboarding and data storytelling using Power BI, D3.js, and integrating visualizations into
web applications powered by Python or Java backends.
Strong advocate for data security and governance in GCP, implementing IAM, VPC-SC, data encryption,
and access control policies to ensure compliance with regulatory standards.
Collaborative team player and communicator, frequently partnering with data scientists, ML engineers,
and GCP solution architects to build scalable, cloud-native systems in Databricks and GCP.
Passionate about continuous learning, always exploring new capabilities within GCP, Python, Java, and
Databricks to improve efficiency, performance, and team productivity in Agile environments.
TECHNICAL SKILLS:
Category Technologies / Tools
Cloud Platforms

AWS: S3, EC2, RDS, Redshift, Lambda, Glue, Athena, EMR, Lake Formation, Kinesis, Step
Functions, CodePipeline, IAM Azure: Data Factory, ADLS, Synapse Analytics, Stream
Analytics, Functions, Monitor, DevOps, RBAC GCP: BigQuery, Cloud Storage, Dataflow,
Cloud Functions, Dataproc

Big Data &
Distributed
Computing

Apache Spark (Core, SQL, PySpark, Streaming), Apache Kafka, Apache Beam, Hadoop
(MapReduce, Hive, HDFS, HBase)
ETL & Data DBT, Matillion, Talend, Informatica

Integration
Programming &
Scripting Python (Pandas, NumPy, Matplotlib), Scala, Java, PL/SQL, SQL, PowerShell
Orchestration &
Workflow Apache Airflow, GitHub Actions
Data
Warehousing &
Storage

Azure Synapse Analytics, Snowflake, Redshift, BigQuery

Data Modeling &
Governance

Kimball Methodology, Star & Snowflake Schemas, Erwin Data Modeler, Data Lineage,
Encryption, Audit Logging

DevOps & CI/CD Jenkins, Terraform, AWS CloudFormation, Git
Visualization & BI Power BI, D3.js, Excel (Pivot Tables)
APIs & Integration RESTful APIs, Internal APIs, Event-Driven Architecture
Security &
Access Control IAM, RBAC
Other Tools &
Practices Bugzilla, Azure Monitor, Agile (Kanban, Scrum), JIRA
CERTIFICATIONS:
. Freedom with AI - AI Tools, Prompt Engineering, Content Creation
. AWS: Certified Solutions Architecture Job Simulation
. Data Visualisation: Empowering Business with Effective Insights - TATA Consultancy Services

EXPERIENCE:
Client: InfoSmart Technologies (Remote)
Data Engineer| Dec 2022 Present
Essential Contributions:
Created and sustained scalable data pipelines for the Equinix Digital Ecosystem Platform, facilitating real-
time analytics and smooth data transfer across worldwide data centers.
Developed data quality frameworks and automated processes utilizing Apache Airflow, improving the reliability
and efficiency of enterprise data management.
snow

Roles & Responsibilities:
Developed, maintained, and automated data processing workflows on the Google Cloud Platform (GCP) to
ensure efficient data handling and transformation.
Built and optimized scalable ETL pipelines on GCP using Spark and Airflow to support large-scale data
processing and analytics.
Partnered with data science teams to prepare data sets and support ML model deployment for personalization
use cases.
Developed Airflow DAGs to orchestrate and monitor data workflows, improving pipeline reliability and
transparency.
Designed and implemented scalable data pipelines for Data ingestion and transformation using Spark
Performed data cleaning and transformation tasks to ensure data quality and consistency, using SQL and
Python for data manipulation and preparation.
Engaged with stakeholders to understand data needs and deliver tailored insights, effectively communicating
findings and recommendations to drive strategic decisions.
Validated data quality and completeness using SQL and BigQuery, ensuring consistent, trustworthy outputs
for downstream analytics.
Created detailed reports and dashboards utilizing Power BI and Looker for comprehensive data analysis and
insights.
Optimized SQL performance through indexing, filtering, and efficient joins to reduce query time and enhance
data accessibility across business units on platforms like Redshift, BigQuery (GCP), Snowflake, and
Databricks.
Engineered real-time data pipelines using AWS Kinesis, Apache Beam, and GCP Pub/Sub to capture and
process high-velocity event streams for immediate downstream consumption.
Designed and implemented OLTP systems focusing on transaction throughput and consistency,
supporting real-time operations across critical GCP- and AWS-based platforms.
Applied AWS Lambda and Google Cloud Functions to build lightweight, serverless functions for data
transformations using Python, improving flexibility and reducing infrastructure overhead.

Used Presto, BigQuery, and Snowflake to unify querying across distributed sources, improving access to
cross-platform data with minimal performance lag.
Developed and maintained ETL frameworks using DBT, AWS Glue, and Databricks Notebooks built with
Python to ensure timely data delivery, clean transformations, and scalable logic across AWS, GCP, and
Snowflake environments.
Created and managed Kimball-style data marts on Redshift, BigQuery, Snowflake, and Databricks to
support reporting and dashboarding use cases for various business domains.
Leveraged Spark-SQL and Databricks with advanced SQL constructs such as window functions and CTEs to
streamline logic and optimize data transformations.
Utilized Pandas and NumPy in Python to perform complex data cleaning, feature engineering, and analytical
preprocessing for both batch and streaming pipelines on GCP, Databricks, and Snowflake.
Integrated RESTful APIs and internal systems using Python, Java, and Google Cloud Endpoints to automate
ingestion workflows and expand data connectivity across platforms.
Developed interactive data visualizations using D3.js, JavaScript, and Python libraries to present real-time
insights clearly to business stakeholders, deployed via GCP App Engine.
Streamlined governance practices using data lineage tracking tools like Databricks Unity Catalog, Google
Cloud Data Catalog, and Snowflake to ensure compliance, traceability, and trust in analytical outputs.
Employed MapReduce processing on Hadoop clusters and Dataproc (GCP) to efficiently handle large-scale
datasets and reduce batch runtime.
Applied bucketing and partitioning strategies in Apache Hive, BigQuery, and Snowflake to reduce scan
overhead and accelerate query execution.
Constructed data pipelines using Apache Spark, Apache Beam, and Databricks, enabling high-volume
batch and streaming transformations in GCP, Snowflake, and multi-cloud environments.
Used Amazon Redshift, BigQuery, and Snowflake to build performant data warehouse environments,
optimize schema design, and improve analytical response times.
Enforced data governance policies using AWS Lake Formation, GCP IAM, and Databricks Access Control
to maintain secure, well-cataloged data lakes.
Utilized Databricks and Snowflake extensively for collaborative engineering, version control, and efficient
execution of Python- and SQL-based analytics workflows across GCP and AWS.
Collaborated with DevOps teams to implement end-to-end CI/CD automation via AWS CodePipeline, Cloud
Build (GCP), and Databricks Repos, enabling smooth code releases.
Automated infrastructure setup and configuration through AWS CloudFormation, Terraform for GCP, and
Databricks CLI, ensuring consistent, repeatable deployments across cloud environments.
Built and deployed containerized data applications using Docker, orchestrated via Kubernetes (GKE in GCP)
for high availability, auto-scaling, and simplified microservices management.
Environment: SQL, GCP, AWS Kinesis, AWS Lambda, Presto, DBT, AWS Glue, Databricks, Spark-SQL, Python
(Pandas, NumPy), Java, RESTful APIs, D3.js, JavaScript, Hadoop, MapReduce, Apache Hive, Apache Spark, Apache
Beam, Amazon Redshift, BigQuery, AWS Lake Formation, AWS CodePipeline, AWS CloudFormation, Terraform,
Docker, Kubernetes.
Client: Capgemini (Johnson and Johnson)
Sr Analyst/Software Engineer | Dec 2018 to Dec 2021
Main Contributions:
Created and enhanced data pipelines on AWS with PySpark and SQL to analyze large-scale customer
behavior and transaction data for immediate insights.
Developed automated workflows for data validation and transformation with Apache Airflow and DBT,
enhancing data accuracy and decreasing manual intervention by 40%.
Roles & Responsibilities:
Utilized Hadoop technologies such as Hive and Spark to construct efficient data pipelines, significantly
improving data flow and processing within the company.
Collaborated with team members to design and implement data models for efficient data processing,
contributing to improved business intelligence and decision-making.
Assisted in migrating on-premises data to Google cloud, ensuring a seamless transition with minimal
downtime and scheduling them using Airflow.
Worked with various teams to understand their data needs, using this information to develop effective data
pipelines that supported business goals.
Collected and organized data from various sources from Learning Management System.
Resolved data processing problems as they arose, ensuring the delivery of high-quality, reliable data.
Committed to continuous learning of new data engineering methodologies and tools, contributing to the team's
technical knowledge and proficiency.
Developed interactive dashboards and reports using Power BI and Tableau to visualize key metrics.
Communicated key findings from data to multiple stakeholders to facilitate data driven decisions.
Built scalable ETL workflows in Matillion to extract, transform, and load data across cloud
environments, including AWS, GCP, and Databricks, accelerating pipeline development.
Developed distributed batch processing jobs on Amazon EMR and Dataproc (GCP) using Apache
Spark, Hive, and Python, efficiently handling large-scale transformations and joins.

Designed and optimized advanced PL/SQL procedures to streamline critical business operations, ensuring
efficient data transformation and reliable reporting delivery across cloud and hybrid platforms.
Applied Apache Spark and Databricks for high-performance data aggregation, cleansing, and shaping
across structured and semi-structured data sources on GCP, AWS, and on-prem environments.
Configured Informatica workflows and reusable mappings to automate ingestion from diverse enterprise
systems, enhancing reusability, modularity, and consistency across cloud platforms like GCP and AWS.
Developed and integrated RESTful APIs using Python and Java to connect enterprise data systems with
external platforms, enabling real-time data exchange and automation.
Created dashboards and performance visualizations using Python (Matplotlib), Databricks, and GCP
Looker Studio to monitor data pipeline health and optimize runtime efficiency.
Automated job execution on Amazon EC2 and GCP Compute Engine, scheduling compute-intensive
processes during off-peak hours for cost efficiency.
Enforced data security using AWS IAM policies and GCP IAM, implementing least-privilege access and
role-based controls across multi-cloud environments.
Implemented centralized data governance using AWS Lake Formation, Google Cloud Data Catalog, and
Databricks Unity Catalog to define fine-grained access policies and ensure regulatory compliance.
Built real-time streaming data pipelines using Apache Kafka, GCP Pub/Sub, and Databricks Structured
Streaming, facilitating rapid, reliable data exchange across distributed microservices.
Designed event-triggered data processing solutions using AWS Lambda, Cloud Functions (GCP), and
Python, connecting real-time ingestion with downstream transformation layers.
Coordinated serverless data workflows using AWS Step Functions and GCP Workflows, improving fault
tolerance and workflow orchestration across dependent cloud services.
Led migration of legacy data warehouses to Snowflake, BigQuery, and Databricks Delta Lake, reducing
query times and simplifying data access for business stakeholders.
Provisioned cloud infrastructure using Terraform for AWS, GCP, and Databricks, enabling repeatable,
version-controlled deployments across staging and production.
Deployed CI/CD pipelines using Jenkins, GitHub Actions, and Databricks Repos to automate data
pipeline builds, testing, and releases, improving code reliability and reducing deployment risks.
Designed dimensional models in Snowflake using star and snowflake schemas to support high-
performance analytics and self-service BI tools like Power BI and Looker (GCP).
Queried petabyte-scale datasets using Amazon Athena, BigQuery, and Presto, enabling analysts to gain
near real-time insights without heavy infrastructure overhead.
Structured raw and curated datasets in Amazon S3, Google Cloud Storage, and Delta Lake
(Databricks), creating a robust, scalable data foundation for analytics and long-term archiving.
Managed processing of unstructured big data using Hadoop-based architectures, GCP Dataproc, and
Apache Spark, enabling downstream analysis in BI and reporting platforms.
Established data quality checks and exception handling frameworks using Python, Databricks
Workflows, and Airflow to track anomalies, validate integrity, and ensure reliable reporting.
Environment: Matillion, Amazon EMR, Apache Spark, Hive, PL/SQL, Informatica, RESTful APIs, Python (Matplotlib),
Amazon EC2, AWS IAM, AWS Lake Formation, Apache Kafka, AWS Lambda, AWS Step Functions, Snowflake,
Terraform, Jenkins, Amazon Athena, Amazon S3, Hadoop.
Client: Capgemini (Johnson and Johnson)
Software Analyst/Engineer | Sep 2016 to Nov 2018
Roles & Responsibilities:
Designed and implemented a microservices architecture using Spring Boot.
Assisted in the development of a healthcare management system using Spring MVC.
Implemented RESTful APIs for user authentication, and progress tracking, adhering to REST principles and
best practices.
Utilized Hibernate for object- relational mapping, facilitating efficient interaction with MySQL database.
Was responsible to communicate with End client to support the application and analyze and fix the issue.
Collaborated with cross functional teams in agile development environment, ensuring the timely delivery of
project milestones.
Extensively used Java OOPs concepts for developing Automation Frameworks using Eclipse, Selenium
WebDriver, cucumber and TestNG.
Implemented POM, Data-driven framework, and executed automation scripts and manual test cases in
different environments.
Developed automated solutions to expedite testing to address unit testing, regression testing, negative testing
and bug retests.
Managed full-lifecycle data pipelines in Azure Data Factory, coordinating data ingestion, transformation, and
validation across hybrid cloud sources.
Structured multi-zone data architecture in Azure Data Lake Storage (ADLS), organizing raw, refined, and
curated layers to meet enterprise reporting needs.
Automated infrastructure provisioning with PowerShell scripts, accelerating environment setup, and
minimizing human error.
Built and optimized high-performance data marts using Azure Synapse Analytics, enabling large-scale
analytics through serverless SQL pools.

Designed and deployed real-time analytics pipelines using Azure Stream Analytics to monitor IoT telemetry
and detect anomalies, enhancing operational visibility.
Developed and optimized data transformation scripts with PySpark, enabling machine learning readiness and
reducing ETL latency.
Built scalable ETL workflows in Azure Databricks, leveraging Delta Lake to streamline batch and streaming
processes for improved data availability.
Automated event-driven processing using Azure Functions, enhancing scalability and reducing response time
in real-time data applications.
Deployed Azure services focusing on high availability and compliance, using Azure Monitor and Application
Insights for end-to-end observability.
Implemented RBAC (Role-Based Access Control) to secure sensitive data access and enforce governance
policies across environments.
Designed dimensional models using star schema principles, improving BI performance and simplifying data
navigation for analysts.
Environment: ADF, ADLS, PowerShell, Azure Synapse Analytics, Azure Stream Analytics, PySpark, Azure
Databricks, Delta Lake, Azure Functions, Azure Monitor, Application Insights, RBAC, Apache Spark,
Keywords: continuous integration continuous deployment artificial intelligence machine learning javascript business intelligence sthree procedural language South Carolina

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6413
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: