Home

Sindhu R - Senior Data Engineer
[email protected]
Location: Iowa City, Iowa, USA
Relocation: Open (515-605-7328)
Visa: H4EAD
Sri Sindhu R
Senior Data Engineer
Email: [email protected] | Ph #: 515-605-7328


PROFESSIONAL SUMMARY

9+ years of expertise in data engineering and data science, with a focus on developing scalable end-to-end ETL/ELT pipelines that include data collecting, ingestion, transformation, modeling, integration, and analytics for structured and unstructured data sources.
Extensive hands-on experience with the Hadoop ecosystem (HDFS, MapReduce, Spark, Scala, Hive, Pig, Sqoop, Flume, Oozie, Impala, HBase, YARN) and real-time data streaming with Kafka, Storm, and Spark Streaming
Extensive expertise creating secure and scalable cloud-native data systems using AWS (EC2, S3, EMR, RDS, Redshift, Glue, Lambda, IAM, CloudWatch, SQS, SNS), Azure (ADF, Data Lake, Databricks), and GCP (Compute Engine, Cloud Storage, Cloud SQL) technologies.
Experience developing batch and real-time data pipelines in PySpark, Spark SQL, Scala, and Python, as well as orchestrating processes in Airflow, NiFi, AWS Step Functions, and Azure Data Factory.
Deep understanding of data warehousing and dimensional modeling (Star Schema, Snowflake Schema), as well as the creation of enterprise data lakes and optimized data marts for analytics and business intelligence reporting.
Practical knowledge with Snowflake (SnowSQL, Snowpipe), Amazon Redshift, and performance tuning via complicated SQL queries, stored procedures, indexing, and query optimization approaches.
Extensive expertise with NoSQL and RDBMS databases such as MongoDB, Cassandra, DynamoDB, MySQL, PostgreSQL, Oracle, and SQL Server, assuring data integrity, migration, and validation.
Used Scikit-learn, TensorFlow, Keras, PyTorch, and SageMaker to create regression, clustering, PCA, SVM, decision trees, and deep learning models for predictive analytics and business insights.
Designed, built, and maintained scalable, production-grade data pipelines on Google Cloud Platform (GCP) to support fraud analytics and reporting use cases.
Experience with data preparation, feature engineering, exploratory data analysis (EDA), statistical modeling, and large-scale data transformations with NumPy, Pandas, and PySpark.
Created interactive dashboards and reporting solutions with Tableau, AWS QuickSight, and Data Studio, allowing business stakeholders to gain real-time insights.
Analyzed complex relational datasets in PostgreSQL to identify data patterns, anomalies, and optimization opportunities supporting enterprise analytics initiatives.
Implemented CI/CD and DevOps methods using Git, Jenkins, and cloud automation tools delivered scalable data products with monitoring, logging, and performance optimization.
Strong grasp of Agile processes, as well as outstanding communication skills, for coaching team members and providing enterprise-grade, secure, and high-performance data solutions.

TECHNICAL SKILLS

Big Data Ecosystem HDFS, YARN, Pig, Hive, Kafka, Apache NiFi, Flume, Sqoop, Spark core, Spark SQL, Spark Streaming, HBase, Oozie, Zookeeper.
Languages Python, SQL, R, Scala
Database Oracle, MySQL, MS SQL Server, PostgreSQL, Teradata, Sydata, SAS Studio.
No SQL Database MongoDB, Cassandra
Machine Learning Libraries TensorFlow, Pytorch, Scikit-learn, Keras
ETL Tools Informatica, Datastage, Terraform, AWS Glue
Reporting Tools Tableau, Power BI, Cognos
Version Control Tools Git, Tortoise SVN
Visualization Tools Tableau, Python (matplotlib, seaborn)
IDE/Testing Tools Eclipse, IntelliJ, pycharm, Anaconda, R-studio
DevOps Docker , Kubernetes Container, CI/CD Pipeline

Brown Brothers Harriman Financial | NYC /NY Sept 2023 Present
Senior Data Engineer
Responsibilities:

Created and developed scalable batch and real-time data pipelines for large-scale data intake, transformation, and analytics with Hadoop (HDFS, MapReduce, Hive, Pig, HBase), Spark (PySpark, Spark SQL, Scala), Databricks, and Kafka/Kinesis.
Developed distributed ETL/ELT workflows with Oozie, Apache NiFi, StreamSets, Airflow, and AWS Glue to integrate data from RDBMS (Oracle, SQL Server, Teradata, PostgreSQL, MySQL) into data lakes and warehouses (Redshift, Snowflake).
Used CI/CD and Infrastructure-as-Code (Terraform, CloudFormation, GitLab, CodePipeline, CodeBuild, CodeDeploy) with Docker, Kubernetes, and OpenShift for containerized deployments and automated data platform management.
Built and optimized end-to-end data pipelines for ingestion, transformation, and storage of structured and semi-structured data, improving data availability and performance for downstream analytics and reporting.
Developed and optimized data processing solutions using Databricks and Snowflake for high-performance distributed data processing and analytics workloads.
Designed and implemented data pipelines integrating with Microsoft Azure services to support scalable data ingestion, processing, and analytics workloads and developed and orchestrated ETL workflows using Azure Data Factory to automate data movement and transformation across enterprise data platforms.
Led design and architecture reviews for data pipelines, platform components, and infrastructure-as-code, identifying technical risks early and implementing mitigation strategies and implemented CI/CD pipelines using Azure DevOps and Git, enabling automated deployments, version control, and improved release quality across data engineering workflows.
Maintained comprehensive documentation of data models, transformation rules, lineage, and metric definitions to promote transparency, governance compliance, and cross-team knowledge sharing.
Integrated Cosmos DB with Azure analytics services to support real-time data streaming and reporting.
Performed schema design, data modeling, and query performance tuning to optimize large-scale data processing workloads and troubleshot production issues by analyzing logs, monitoring pipeline performance, and implementing root cause fixes to ensure minimal downtime.
Optimized data pipeline performance by implementing partitioning strategies, query tuning, and efficient data storage formats and maintained scalable data warehouse solutions supporting enterprise analytics and reporting environments.
Implemented secure data access and identity management using Microsoft Entra ID (formerly Azure Active Directory) to ensure compliance with enterprise security standards and Integrated data pipelines with reporting tools such as Microsoft Power BI to enable real-time dashboards and business intelligence reporting.
Participated in data architecture discussions to design scalable data models, ingestion strategies, and storage architectures and worked in a cloud-based data platform environment, supporting modern data engineering practices and scalable infrastructure.
Created AI/ML solutions with Python, Pandas, NumPy, Scikit-learn, TensorFlow, Keras, and NLTK, including regression, clustering (K-Means, Gaussian Mixture), Random Forest, KNN, time-series forecasting, and real-time prediction models (buy probability, customer segmentation).
Created and implemented analytics dashboards in Tableau and AWS QuickSight, allowing business stakeholders to track KPIs and campaign performance utilizing controlled and secure information.
Monitored data pipeline performance and implemented optimization strategies for scalability, reliability, and cost efficiency within the Azure ecosystem.
Proficient in working with MongoDB, Cassandra, and DynamoDB, as well as creating NoSQL data models for high-volume applications, data migration, API integration, and assuring security and governance compliance.
Affinity Bank | Atlanta, GA June 2021 Aug 2023
Senior Data Engineer
Responsibilities:

Developed scalable batch and real-time data pipelines with Spark (PySpark/Scala/Spark SQL), Kafka, Flink, Kinesis, and Apache Iceberg, allowing for fault-tolerant streaming and large-scale data processing across cloud platforms.
Created and optimized business data warehouses utilizing AWS Redshift and Snowflake, including dimensional modeling (Star/Snowflake schemas, SCD), performance tuning, query optimization, and secure access via IAM and encryption standards.
Created cloud-native ETL/ELT solutions using AWS Glue, Lambda, EMR, S3, Azure Data Factory, Azure Databricks (Delta Lake), and GCP Dataflow, enabling smooth data movement from on-premises to cloud settings.
Monitored and maintained production pipelines, troubleshooting failures and ensuring high availability and reliability of data workflows and collaborated in agile teams to deliver data engineering solutions in a fast-paced, cross-functional development environment.
Collaborated with cross-functional teams to design secure and scalable data architectures aligned with enterprise governance frameworks and regulatory standards.
Built and optimized data models in Snowflake and BigQuery, leveraging clustering, partitioning, and materialized views for high-performance analytics and cost-efficient storage.
Built and maintained web applications using Django, designing modular, developer-centric solutions with robust backend functionality. Designed, developed, and integrated REST and GraphQL APIs to enable seamless communication between microservices, third-party applications, and client systems.
Set up and managed Databricks clusters, refactored ETL notebooks, and developed PySpark transformations, including hashing and encryption approaches for sensitive data compliance and governance.
Collaborated closely with Agile teams and stakeholders, facilitating sprint ceremonies, backlog prioritization, UAT support, technical documentation, and cross-functional coordination for successful enterprise data delivery.
Implemented highly optimized data models in BigQuery, leveraging table partitioning, clustering, and materialized views to reduce query execution times and storage costs, improving analytics performance for large-scale fraud datasets.
Integrated enterprise data pipelines with BI and reporting tools such as Microsoft Power BI, enabling business users to access curated datasets and actionable insights.
Demonstrated strong understanding of Hadoop ecosystem components, including YARN resource management and HDFS storage architecture.
Streamlined microservice and containerized data application deployments by automating CI/CD pipelines with Jenkins, GitHub, Docker, Kubernetes, and integrated DevOps tools (Jira, Slack).
Designed and built REST APIs in Python (Flask), PostgreSQL, and MongoDB to provide high-performance data access layers and NoSQL solutions for scalable applications.
Used extensive data modeling, mapping, and transformation techniques, including SQL validation checks (duplicates, null handling, and aggregations), to ensure good data quality across ETL cycles.
Created AI/ML solutions utilizing Python, Pandas, NumPy, and Scikit-learn, including regression models, forecasting, customer segmentation, and real-time purchase probability prediction models for business intelligence.
Created interactive BI dashboards with Tableau and Qlik, combining numerous data sources (SQL, Salesforce, APIs, and Excel) to allow executive reporting and decision-making.
Kin Insurance Company | Chicago, IL Nov 2018 May 2021
Data Engineer
Responsibilities:
Migrated data pipelines from legacy systems to AWS Snowflake using DBT, Glue, and Python.
Built scalable ELT processes on Redshift, enhancing load and query efficiency.
Migrated on-prem data ingestion workflows to ADF and Snowflake, improving performance and maintainability.
Automated data extraction and ingestion from relational databases and flat files using ADF and Python scripts.
Developed and maintained data ingestion frameworks integrating AWS S3 and Snowflake.
Ensured data security, compliance, and governance in cloud deployments and API integrations.
Mentored junior engineers on Python development, Django frameworks, API best practices, and cloud deployment strategies.
Enhanced data ingestion monitoring and alerting with ADF triggers and log analytics for operational visibility.
Implemented real-time stream ingestion using Kafka and Spark Streaming for transactional data feeds.
Collaborated with cross-functional teams to integrate cloud-based applications, data pipelines, and APIs, ensuring consistency, reliability, and adherence to best practices.
Worked on Spark SQL and Scala to replace legacy Hive queries, improving performance.
Developed One Lake ingestion workflows using AWS Lambda and S3 for Capital One data lake.
Built and scheduled Control-M workflows for data orchestration and batch processing.
Integrated SSIS with Azure Data Factory for hybrid ETL workflows in cloud-native environments.
Implemented real-time stream processing using Apache Spark Structured Streaming with integration to Apache Kafka and AWS S3.
Utilized Jira and Confluence in Agile-based sprint planning and documentation.
Engineered Delta Lake structures on Databricks and enabled incremental loads using ADF.
Conducted performance tuning for Spark and Snowflake jobs, reducing processing times.
Built reusable PySpark modules and unit-tested data transformations.
Wrote scalable ETL jobs using Apache Spark in PySpark and Scala to handle structured and semi-structured data (JSON, Parquet, Avro).
Collaborated with business analysts to convert business logic into efficient ETL workflows.
Handled JSON/XML data parsing and transformation for ingestion into Redshift
New York Life Insurance Company | NYC, NY May 2017 Oct 2018
ETL Developer
Responsibilities:

Designed and developed ETL processes using Informatica and SQL for financial data systems.
Performed data extraction, transformation, and loading from multiple relational sources.
Developed star and snowflake schema models to support BI reporting and analytics.
Implemented job scheduling and monitoring using Control-M and Autosys.
Developed complex SQL queries and stored procedures for data transformation logic.
Ensured data consistency and accuracy through automated validation scripts.
Prepared detailed technical documentation for ETL design, mapping, and testing.
Implemented reusable mappings and workflows to optimize development time.
Collaborated with QA and UAT teams for validation and deployment support.
Monitored ETL workflows for failures and executed root cause analysis for incident resolution.
Enhanced performance through tuning of database queries and indexes.
Automated report generation for daily, weekly, and monthly data quality summaries.
Worked closely with business users to define and refine ETL specifications.
Supported production deployments and resolved post-implementation issues.
Contributed to migration of legacy ETL systems to Informatica PowerCenter
Educational Details:
Master of Business Administration MBA, National Institute of Technology Warangal -2011
Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning business intelligence sthree database rlang microsoft mississippi Georgia Idaho Illinois New York

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6966
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: