Resume View

Home

Niharika Pande - Data Engineer

Location: Jersey City, New Jersey, USA

Relocation: Yes

Visa: H1

Resume file: Niharika Pande Data Engineer C2C ETL Python AWS_1755742791264.pdf
Please check the file(s) for viruses. Files are checked manually and then made available for download.

NIHARIKA
(732-338-8659) | [email protected] |

SUMMARY
8 years of professional experience in Data Engineering, Cloud Solutions, and Big Data Analytics across finance, e-commerce, and technology domains
Expertise in developing and automating ETL pipelines, real-time data processing, and orchestration of data pipelines using Apache Spark, Kafka, and Airflow
Hands-on experience in cloud computing environments including AWS (S3, Lambda, Redshift, EBS) and
Azure
Proficient in Python, SQL, Pandas, NumPy, and using data-centric tools for large-scale data manipulation and analysis
Designed and deployed data lakes, implemented data warehousing architectures, and optimized Redshift and
Snowflake queries for analytics use cases
Developed interactive dashboards and self-service BI tools using Tableau, Power BI, and Redash, enabling timely, data-driven decision-making
Experienced in managing CI/CD pipelines using Git, integrating with containerized deployments and automated testing
Skilled in data modeling, feature engineering, data governance, metadata management, and cloud-native storage solutions
Expertise in Agile software development methodologies with experience managing production support, incident management, and system troubleshooting

SKILLS
Programming: Python, SQL, Java, R, C/C++
Cloud & Big Data: AWS (EBS, S3, Redshift, Lambda), Hadoop, Spark, Hive, Kafka
Databases: MySQL, MongoDB, Oracle Apex, SSIS
Tools: PyCharm, Jupyter, Git, Tableau, Power BI, SSRS
Data Science: Pandas, NumPy, Scikit-learn, TensorFlow
Methodologies: Agile, SDLC, ETL, Data Warehousing
Project Management Tools: MS Project, MS Excel, MS SharePoint, Jira, Rally
Analysis Skills: Cost/Benefit, Impact, GAP, Risk, SWOT Analysis
Documentation: BRD, FRD, SRS, BPMN, UML, Use Cases, Test Plans
Soft Skills: Time Management, Leadership, Problem-Solving, Decision-Making

EXPERIENCE
Bluescape, CA | Data Engineer | March 2023 Present
Developed end-to-end ETL pipelines using Apache Spark and Python, reducing batch processing latency.
Designed and implemented AWS-based data lake architecture with S3, Lambda, and Redshift Spectrum, supporting large-scale analytics
Built robust real-time data ingestion pipelines using Kafka, Spark Streaming, and AWS Glue for near real-time insights

Orchestrated ETL jobs using Apache Airflow, including DAG monitoring, logging, SLA alerts, and retries
Performed feature engineering and cleansing on petabyte-scale datasets using Pandas, NumPy, and custom Spark UDFs
Integrated machine learning models for predictive analytics using Spark MLlib and MapReduce patterns
Designed and published Tableau dashboards and KPI visualizations for operations and finance teams, reducing manual reporting by 60%
Conducted schema optimization and query tuning in Amazon Redshift, improving report response time by 30%
Designed data quality frameworks to validate batch and streaming data loads across multiple systems
Implemented data versioning and lineage tracking mechanisms for auditability and compliance
Supported multi-environment deployments and coordinated with DevOps to manage CI/CD pipeline using Git
and Jenkins
Led migration of legacy ETL workflows to distributed Spark jobs using AWS Glue, reducing job runtime by 50%
Collaborated with data scientists to integrate pre-processed feature sets into model training pipelines using S3 and Athena
Created monitoring dashboards in CloudWatch and Grafana for Airflow DAGs and batch job health
Standardized data schema definitions and enforced data governance through schema registry and versioning

Amazon Web Services, Boston | SDE | July 2022 March 2023
Built scalable SQL-based ETL workflows on Amazon Redshift, aggregating application logs into business-ready tables
Designed Python-based automation scripts using Boto3 for EBS snapshot creation, cross-region replication, and lifecycle archival
Built data integrity validation tools to ensure end-to-end consistency of snapshots and backups using Python and SQL
Designed real-time dashboards in Redash, pulling data from CloudWatch logs and Redshift views for production monitoring
Contributed to implementation of Git-based CI/CD pipelines integrated with code review and rollback automation
Enhanced data governance through metadata tracking and implemented data quality checks using Python assertions
Performed root-cause analysis for production incidents and authored runbooks to improve on-call response time
Designed and optimized Lambda functions for data processing and automated alerting
Supported multi-region deployments for compliance and developed tools for WORM-compliant backup policies
Enabled performance tuning of SQL queries in Redshift using query plans, EXPLAIN output, and compression encodings
Designed and implemented event-driven Lambda architectures triggered by S3 events and SNS notifications
Collaborated on building a data pipeline for internal usage analytics using Redshift Spectrum and S3 partitioned data
Created templated CloudFormation stacks for deploying snapshot management infrastructure across environments
Authored extensive documentation and internal wikis for CI/CD automation, pipeline architecture, and

compliance auditing

Accenture Solutions, India | SE | Nov 2018 May 2020
Automated end-to-end deployment workflows across 100+ projects using Robot Framework, Selenium, and
Shell Scripting
Built PowerShell-based ETL scripts and Azure Scheduler jobs to ingest and transform large datasets from various sources
Simulated event-driven test scenarios using MySQL and Linux-based LAB servers, enabling robust QA for mission-critical systems
Designed and published Power BI dashboards with DAX, providing clients real-time visibility into financial and operational data
Collaborated with cross-functional teams in Agile setup, managing incidents, testing workflows, and client demos
Developed test automation frameworks in Python to validate data ingestion and processing for BAU pipelines
Scheduled ETL and reporting jobs using cron jobs and automated logs/metrics collection
Worked closely with stakeholders to conduct impact analysis, create test plans, and support UAT
Facilitated production readiness and supported cutovers during high-priority release cycles
Developed a reusable Python module for log parsing and alerting, integrated with Splunk and internal monitoring systems
Used SSIS and Azure Data Factory to modernize legacy ETL jobs for cloud compatibility
Conducted peer code reviews and contributed to team s best practices for Git branching and version control
Automated server health checks and ETL job validation scripts for pre-deployment testing phases

Xceed Technologies, India | DE | September 2017 - Nov 2018
Built robust real-time and batch data pipelines using Apache Spark, Kafka to ingest multi-source data into
Hadoop ecosystem
Developed HiveQL queries and Spark transformations to cleanse, join, and prepare data for BI and ML
workloads
Designed and deployed interactive Power BI reports powered by DAX for operations, sales, and customer success teams
Managed data transfers from Oracle and MySQL to HDFS, improving refresh cycles by 30%
Tuned Hadoop job configurations to optimize YARN container usage and reduce ingestion time by 25%
Created custom Spark UDFs to implement complex transformation logic not natively available in Spark
Maintained Git repositories for version control and collaborated with DevOps to integrate builds into Jenkins
pipelines
Implemented alerting scripts and job retry strategies for pipeline resilience and data quality enforcement
Created documentation for onboarding, including architecture diagrams, data flow maps, and glossary of ETL terms
Engaged in continuous integration practices and enhanced ETL monitoring via dashboards and alerting tools

EDUCATION
Master of Science in Information Technology & Analytics Rutgers University, Newark, New Jersey, USA

Graduate Assistant:
Automated data workflows using Python and SQL, reducing manual data processing time by a few hours/week.
Developed Tableau dashboards for financial data analysis, enabling non-technical stakeholders to track KPIs.
Collaborated with faculty to design academic content using Adobe Illustrator and MS Office Suite.

Bachelor of Technology in Computer Science and Technology SNDT University, Mumbai, Maharashtra, India

CERTIFICATION

SQL Gold (Hackerrank), AWS CCP, Business Analysis Fundamentals (Udemy)
Keywords: cprogramm cplusplus continuous integration continuous deployment quality analyst machine learning business intelligence sthree rlang microsoft California Delaware

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];6016

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: