Resume View

Home

Nikitha - Data Engineer

Location: Remote, Remote, USA

Relocation: YES

Visa: GC

Resume file: Nikitha_DE_1772814121940.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

Nikitha Bheemireddy
Phone: +1 781-813-4320
Email: [email protected]

PROFESSIONAL SUMMARY
Experienced AI & Data Engineer with 12+ years in cloud, big data, and intelligent automation specializing in building scalable AI-driven data pipelines, autonomous agents, and LLM-integrated systems across Azure, AWS, and GCP.
Experienced in Agile-based data engineering delivery with expertise in AWS monitoring and alerting tools such as CloudWatch and OpenSearch for proactive issue detection and performance tracking.
Designed and implemented agentic retrieval workflows that dynamically pull code context, historical changes, metadata, and embeddings to improve LLM-based scoring and classification accuracy.
Skilled in designing Star/Snowflake data models, optimizing curated datasets, and enabling secure, governed data access through RBAC and integration with BI teams.
Built scalable ingestion pipelines to collect and normalize GitHub/GitLab PR metadata, including commits, branches, merge status, timestamps, and reviewer activity.
Built context-retrieval agents that aggregate information from Git repositories, commit history, diffs, and metadata services, enabling higher-fidelity LLM reasoning for automated analysis.
Hands-on experience with Large Language Models (LLMs), GPT-based models, RAG pipelines, and Agentic AI workflows
Hands-on experience supporting BI and reporting teams with scalable, validated, and performance-optimized datasets aligned with enterprise data governance standards.
Developed RAG-based pipelines that retrieve relevant code snippets, documentation, and historical artifacts using vector search (FAISS / Pinecone) to enhance LLM-driven evaluations.
Experienced SDET with strong expertise in automation frameworks, API testing, and CI/CD-driven test pipelines.
Skilled in designing scalable automation frameworks using Java/Python, Selenium, and REST Assured.
Hands-on experience in integrating automated tests into AWS DevOps/Jenkins pipelines for continuous testing.
Adept in Azure Services and its components, including Azure Data Factory (ADF), Azure Databricks, Azure Synapse Analytics, Azure Data Lake Gen 2 (ADLS GEN 2), Azure Blob Storage, Key Vault, Azure Logical Apps, Azure function Apps, and Azure DevOps services.
Implemented PR-to-commit and branch-to-main linking logic, enabling accurate historical tracking, backfills, and longitudinal analysis across repositories
Strong DevOps background with hands-on experience in automated code promotion, release management, and environment governance across cloud data platforms.
Designed enterprise lakehouse architectures using Delta Lake, Apache Iceberg, and Apache Hudi with ACID transactions, schema evolution, and time travel.
Skilled in deploying Snowflake objects (roles, warehouses, schemas, RBAC policies) using Terraform IaC modules and GitOps workflows.
Expert in building and optimizing batch and streaming data pipelines on Databricks using Python, PySpark, Delta Lake, and Structured Streaming, enabling feature engineering for ML and LLM workflows.
Hands-on experience building data pipelines on GCP using BigQuery, Pub/Sub, Cloud Storage, and Cloud Run, with strong understanding of real-time and event-driven architecture.
Skilled in operational pipeline management, debugging workflows, and resolving production data issues to ensure SLA compliance and high data reliability.
Experienced in building and automating CI/CD pipelines for Snowflake, Airflow, and Informatica IICS using GitHub Actions/Azure DevOps.
Proficient in Grafana and Datadog to implement dashboards, alerts, and distributed monitoring across cloud data pipelines and compute workloads.
Multi-cloud expert with hands-on experience across Azure, AWS, and GCP, integrating cross-cloud ingestion, storage, compute, and governance layers.
Familiar with core mainframe data structures, including fixed-width datasets, COBOL copybooks, and hierarchical IMS/DB-style formats, with hands-on experience integrating these sources into modern cloud ETL pipelines.
Worked on legacy modernization projects migrating on-prem and traditional systems to cloud platforms, providing transferable understanding of mainframe-to-cloud integration patterns
Designed streaming pipelines using Kafka, Spark Streaming, and Pub/Sub-style messaging systems for low-latency analytics.
Strong experience administering Databricks Unity Catalog, implementing catalog/schema/table permissions, lineage, ACLs, and fine-grained governance.
Experienced in designing and developing high-performance data pipelines using Databricks (PySpark, Delta Lake), Azure Data Factory (ADF), and Azure SQL Server for scalable data integration and analytics.
Hands-on experience with Apache Iceberg table creation, schema evolution, partitioning, and integration with Snowflake.
Senior Data Engineer / Architect with expertise in designing enterprise-scale Data Fabric architectures on Azure and AWS for multi-source integration, analytics, and governance.
Built AI orchestration frameworks enabling autonomous agents to interact with APIs, databases, and enterprise systems through secure REST endpoints.
Performed deep data analysis to identify data quality gaps, detect anomalies, and improve business data accuracy across financial datasets.
Implemented API-driven ingestion frameworks in Python and PySpark for batch and near real-time workloads.
Skilled in designing and managing Azure Kubernetes Service (AKS) clusters including node pool configuration, networking, and upgrades for scalable microservice deployments.
Collaborated across data science and backend teams to operationalize AI models using Azure Functions, AWS Lambda, and Kubernetes-based microservices for scalable deployment.
Strong experience in data analysis, data tracing, and remediation using SQL, AWS Athena, and Redshift to ensure data integrity and accuracy across large datasets.
Partnered with architecture and DevOps teams to deploy automated, version-controlled CI/CD pipelines (GitHub Actions, Azure DevOps) for repeatable and reliable data engineering workflows.
Implemented observability and feedback loops for AI pipelines, using Grafana, Prometheus, and Azure Monitor to monitor model performance and decision outcomes.
Designed and maintained PySpark-based ETL pipelines in Azure Databricks and AWS EMR, enabling scalable data ingestion, transformation, and validation for analytical reporting.
Applied responsible AI practices including governance, transparency, and secure data handling in production AI workloads.
Experience developing and orchestrating Airflow DAGs for GCP and multi-cloud deployments.
Hands-on experience with Infrastructure as Code (IaC) using Terraform and Azure DevOps pipelines for automated provisioning and configuration of Azure infrastructure.
Expert in designing and deploying Snowflake-based ETL pipelines on AWS cloud, leveraging SnowSQL, Snowpark, and Python for efficient data processing.
Adept at gathering, analyzing, and interpreting business data requirements, translating them into actionable technical solutions and documentation.
Collaborated with business analysts and actuarial teams to map financial and claims-style data into standardized data models supporting risk and reinsurance reporting frameworks.
Expertise in modern data lakehouse architecture on Azure, leveraging Snowflake, Databricks, and Iceberg.
Proficient in developing scalable batch data processing workflows on AWS (EMR, Glue, Lambda) using Spark and PySpark, with hands-on experience integrating MongoDB for schema design, aggregation pipelines, and optimized querying.
Hands-on experience in building Delta Live Tables (DLT) pipelines within Azure Databricks, implementing declarative ETL workflows, data quality expectations, and Medallion architecture (Bronze Silver Gold) for batch and streaming data.
Strong hands-on expertise in Azure Data Factory (ADF) and Azure Databricks for large-scale data integration, transformation, and production support, with proficiency in IBM DataStage and Python automation for AMS environments.
Developed data integration workflows across Denodo, Snowflake, and Azure Data Lake, conceptually aligned with Palantir Foundry data lineage and ontology principles for unified analytics.
Strong hands-on expertise in ETL/ELT development, data modeling, and data pipeline automation using Python, SQL, and Spark.
Explored AI-assisted code generation using LLMs and automation frameworks to accelerate SQL and ETL development.
End-to-end ETL validation using SQL, SnowSQL, and Informatica IDQ for data accuracy and completeness.
Experience managing external Iceberg catalogs (AWS Glue, Hive Metastore) and Snowflake-managed Iceberg tables.
Skilled in multimodal data processing (text, documents, embeddings) and integrating structured and unstructured data for AI-driven search and generation.
Familiar with GitOps principles, leveraging GitHub Actions and FluxCD/ArgoCD for continuous deployment and environment consistency.
Demonstrated ability to develop and review data mappings, database specifications, and report documentation to support both business and technical stakeholders.
Built self-healing, event-driven data pipelines with integrated observability and automated recovery to ensure reliability under dynamic workloads.
Proficient in scheduling and orchestrating batch and streaming workflows using Control-M, Autosys, and Airflow
Extensive experience developing and optimizing stored procedures in SQL Server, Azure SQL Database, and Synapse Analytics.
Implemented data quality, validation, and metadata management frameworks ensuring compliance, lineage traceability, and governance across enterprise datasets.
Adept at explaining technical concepts, pipeline issues, and data architecture solutions to non-technical stakeholders for actionable insights.
Designed backfill-safe pipelines to replay historical PR and commit data, ensuring consistency across downstream analytics and scoring systems.
Experience implementing observability and monitoring solutions using Grafana, Prometheus, and Azure Monitor, including alerting and health checks for distributed systems.
Experienced in designing and managing information systems, performing data audits, and producing detailed operational reports and dashboards.
Skilled in optimizing PySpark jobs using Adaptive Query Execution (AQE), caching, and broadcast joins to enhance performance and reduce processing time.
Experienced in leveraging Azure AI Foundry, Azure Cognitive Search, and Azure Functions for scalable and secure data retrieval frameworks.
Working knowledge of service mesh technologies such as Istio for secure inter-service communication and traffic management.
Strong communicator with experience presenting complex data solutions to business stakeholders and mentoring data engineering teams.
Adept at building and optimizing data lakehouse architectures using Azure Synapse Analytics, Azure Data Lake Gen2, and Databricks for batch and streaming workloads.
Extensively utilized Azure Data Lake Storage Gen2 (ADLS Gen 2) seamlessly integrated with Azure Databricks enables efficient data storage and processing, empowering advanced analytics and insights generation.
Experienced Data Engineer with proven expertise in Dremio Lakehouse architecture, designing semantic layers, and optimizing analytical SQL queries for performance and scalability.
Managed the provisioning and scaling of virtual warehouses in Snowflake to meet varying workload demands, optimizing resource allocation and cost management.
Implemented Azure Data Governance frameworks, including cataloging, lineage tracking, and access control policies for enterprise-scale pipelines.
Configured Azure resources including Key Vault, Application Gateway, Storage Accounts, and Azure Container Registry (ACR) for enterprise-scale environments.
Skilled in advanced SQL analytics including CTEs, window functions, and aggregation strategies for transforming and analyzing large-scale datasets across Dremio, Snowflake, and Azure platforms
Strong background in ETL/ELT pipeline design using distributed SQL engines and Lakehouse technologies (Delta Lake, Apache Iceberg) to support unified analytics and self-service BI.
Collaboration with BA and QA teams for requirement analysis, test planning, and UAT.
Experience in AI governance, responsible AI practices, and compliance frameworks.
Proficient in Delta Lake features including ACID transactions, time travel, and Delta Optimization (Z-ordering, file compaction, vacuum) for scalable data management.
Skilled in CI/CD for ML (MLOps), containerization, and orchestration using Docker/Kubernetes.
Utilized Snowpark to build data validation frameworks that ensured data integrity and quality throughout the ETL process.
Utilized SnowSQL (Snowflake SQL) to efficiently load and unload data between local files and Snowflake, optimizing data ingestion processes for analytics.
Experienced in developing data ingestion frameworks, managing dependencies, and implementing end-to-end monitoring using Azure Monitor, CloudWatch, and custom alerting solutions.
Experienced in integrating Azure Data Factory (ADF) with Databricks workflows, automating pipeline orchestration and deployment via Azure DevOps and Databricks CLI.
Adept in query performance tuning and workload optimization in Dremio and PostgreSQL environments to ensure interactive response times.
Proficiency in multi-cloud environments (Azure, AWS, GCP).
Conducted performance tuning of virtual warehouses, resulting in improved query response times and enhanced overall system efficiency.
Created comprehensive test cases for Snowpipe processes to ensure data integrity and successful error handling during loading operations.
Developed automated scripts for monitoring the usage of virtual warehouses, providing insights that informed capacity planning and optimization strategies.
Collaborated with data engineering teams to design workflows that effectively utilized virtual warehouses for concurrent data processing tasks.
Implemented Snowpipe for continuous data ingestion, enabling real-time loading of data from cloud storage to Snowflake tables without manual intervention.
Hands-on experience implementing data governance, cataloging, and lineage tracking for enterprise analytics environments.
Designed and implemented scalable data architecture solutions, utilizing modern data storage technologies and cloud platforms to support high-volume data processing and analytics.
Over 5+ years of hands-on experience in creating ETL data pipelines utilizing Spark and PySpark on Azure Databricks.
Collaborated with data architects to design optimized file formats and partitioning strategies for efficient use of Snowpipe.
Developed automated pipelines utilizing Snowpipe to ensure efficient data processing and minimize latency in analytics workflows.
Core Skills: Databricks (Delta Lake, DLT, Unity Catalog, Structured Streaming), PySpark, MLflow, Feature Engineering, Azure Data Factory, Terraform, Airflow, Great Expectations, Kafka, Vector Databases (FAISS, Pinecone), RAG Pipelines, CI/CD (Azure DevOps, GitHub Actions)
Expert in SQL optimization and query performance tuning across Oracle, SQL Server, Snowflake, and Dremio, supporting large-scale financial data systems.
Developed data transformation workflows using Snowpark to leverage the power of programming languages like Python, allowing for more complex data processing directly within Snowflake.
Led migration projects to Microsoft Azure architecture, optimizing existing infrastructure.
Executed complex SQL queries using SnowSQL (Snowflake SQL) and applied performance tuning techniques to improve query efficiency and reduce execution times.
Widely utilized Azure Logic App Integration for intricate workflows and implemented advanced analytics solutions on Azure Synapse, integrating data warehousing and big data analytics capabilities.
Proven ability to perform data analysis, profiling, and validation to ensure high data quality, supporting business intelligence and analytics initiatives.
Designed and implemented Azure Functional apps to deploy serverless, event-driven applications, utilizing triggers and integrating with Azure Key Vault for secure management of cryptographic keys and secrets.
Expertly deployed Azure Functions, Azure Storage, and Service Bus queries, optimizing enterprise ERP integration systems for streamlined data processing and communication in complex environments.
Experienced in creating and managing Azure DevOps tools for continuous integration and deployment (CI/CD) pipelines
Designed, developed, and optimized data transformation workflows using dbt to build and manage scalable data models in a cloud-based data warehouse, improving data accessibility and reporting efficiency
Implemented dbt-based ETL pipelines to automate the transformation of raw data into structured datasets for analytics, ensuring high data accuracy and consistency for business users.
Collaborated with data engineers and analysts to create reusable dbt models, ensuring modularity and efficiency in data transformations and reducing development time.
Integrated dbt with cloud-based data warehouses to automate data transformation processes, enabling faster and more accurate analytics.
Built and maintained complex dbt models to clean, aggregate, and transform data from various sources, making it ready for reporting and business intelligence analysis.
Designed and implemented UDFs (User-Defined Functions) using Snowpark, facilitating custom calculations and data manipulations that optimized performance for analytical queries.
Designed and deployed interactive dashboards and reports using Power BI, streamlining data analysis and enabling real-time decision-making.
Integrated data from multiple sources, including Excel, SQL Server, and cloud-based databases, into Power BI for comprehensive business analysis.
Created and optimized DAX measures and calculated columns in Power BI to enhance reporting accuracy and analytical capabilities.
Developed scripts with SnowSQL (Snowflake SQL) to automate ETL processes, streamlining data transformations and enhancing data pipeline workflows.
Managed Snowflake databases using SnowSQL (Snowflake SQL) , including creating, modifying, and dropping tables, views, and other database objects to support data organization.
Designed and implemented ETL data pipelines using PySpark, Spark SQL, and Scala, demonstrating proficiency in big data processing and maintain corporate solutions, facilitating seamless data extraction, transformation, and loading for effective integration.
Developed and managed Azure PaaS databases to enable seamless integration with web applications, enhancing performance and reliability.
Monitored Snowpipe performance and utilization metrics, optimizing configurations to improve data ingestion speed and reliability.
Leveraged SnowSQL (Snowflake SQL) for data transformation tasks, ensuring data quality and consistency before loading it into target tables for reporting and analysis.
Extensive experience in crafting and optimizing Data Pipeline Development and Data Modelling strategies, crucial for driving efficient data processing and analysis workflows.
Formulated data ingestion workflows for efficient storage and retrieval, working with Avro, Parquet, Sequence, JSON, and ORC file formats.
Implemented data governance frameworks within the data architecture, establishing standards for data quality, validation, and metadata management to ensure accuracy and compliance.
Created complex Synapse stored procedures for data transformation and aggregation, significantly improving ETL processes for large datasets.
Integrated SnowSQL (Snowflake SQL) with business intelligence tools, such as Tableau and Power BI, to facilitate seamless data visualization and reporting.
Designed and managed data architectures using Snowflake to support scalable data warehousing solutions, improving data accessibility for analytics teams.
Adept in configuring Apache Oozie workflows, orchestrating Hadoop jobs efficiently, and proficient in SQOOP for seamless HDFS to relational database data transfer.
Implemented partitioning and bucketing strategies for performance tuning in data engineering workflows, enhancing data processing efficiency.
Optimized query performance within Snowflake by leveraging clustering keys and materialized views, resulting in significant reductions in query execution time.
Monitored and optimized data sharing configurations to enhance performance and reliability, ensuring seamless access to shared datasets.
Developed and executed batch processing scripts using SnowSQL(Snowflake SQL) to handle large volumes of data efficiently, ensuring timely data availability.
Developed Spark scripts with Scala shell commands, tailored to project needs, demonstrating adeptness in advanced programming for efficient big data processing.
Designed and enforced data security measures as part of the data architecture, including encryption, access control, and auditing, to safeguard sensitive information and meet regulatory requirements.
Implemented real-time streaming with Kafka as data pipeline, integrating Spark Streaming for continuous data processing, showcasing expertise in event processing and distributed computing.
Leveraging Informatica PowerCenter for orchestrating data integration, transformation, and ETL operations, ensuring uninterrupted data flow and precision in intricate corporate requirements.
Experienced in infrastructure as code practices using Terraform, enabling automation and scalability in deploying and managing cloud infrastructure for data engineering projects, ensuring efficient resource utilization and reproducibility.
Developed and maintained ETL processes in Snowflake, integrating diverse data sources to create a unified analytics environment.
Hands-on experience with Master Data Management (MDM) concepts, reference data governance, and Azure Entra integration for secure identity management.
Collaborated with data engineers to optimize data models and enhance data pipelines using SnowSQL (Snowflake SQL) , ensuring alignment with business requirements.
Architected solutions using Cosmos DB to provide globally distributed data storage, ensuring low-latency access for customer-facing applications.
Proficient in Teradata, leveraging its powerful data warehousing and analytics capabilities to manage and analyse large volumes of data effectively in complex enterprise environments.
Extensive experience in developing, maintaining, and implementing Enterprise Data Warehouse (EDW), Data Marts, ODS, and Data warehouses with Star schema and Snowflake schema.
Collaborated with cross-functional teams to deploy data solutions in Snowflake, ensuring alignment with business objectives and data governance standards.
Demonstrated mastery in SDLC management, skilfully applying Agile Methodology to steer iterative development and continuous software project improvement.
Proof of concept (POC) initiatives utilizing Snowflake, Airflow, and DBT, exploring data warehousing, workflow orchestration, and transformation capabilities within a modern data engineering ecosystem.
EDUCATION
Master s in Computer Science at University of Central Missouri, Dec 2013
Bachelors in Computer Science and Engineering at A.S.N Women s Engineering College, May 2012
Certifications
DP-900 Microsoft Certified Azure Data Fundamentals.
DP-203 Microsoft Certified Azure Data Engineer Associate.
AZ-305 Designing Microsoft Azure Infrastructure Solutions
SnowPro Core Certification
SnowPro Advanced
TECHNICAL SKILLS
Cloud Services(AWS): Data Factory (ADF), Databricks, Synapse, Data Lake, Event Hubs, Key Vault, Logic Apps
, Functions, Azure DevOps,EMR, Glue, EKS, Kubernetes, HashiCorp Vault, Lambda, S3, Redshift, DynamoDB.
AWS: EMR, Lambda, Aurora, OpenSearch, S3, Glue, Redshift, DynamoDB.
Data Quality Tools: Informatica Data Quality (IDQ) rule creation, data profiling, monitoring.
Financial Data Platforms: FactSet, Security Master, Holdings datasets.
Monitoring & Automation: SLA Monitoring, Alerting Systems, AWS CloudWatch, Splunk, OpenSearch, SLA Monitoring, Alerting Systems.
Oracle (11g/12c/19c) :- PL/SQL, Partitioning, Index Optimization, Materialized Views, Performance Tuning, Oracle Analytics Cloud (conceptual)
Mainframe-Adjacent Skills: COBOL copybook integration, fixed-width parsing, mainframe extract processing, VSAM-style dataset handling, JCL-inspired batch workflow orchestration (Control-M, Autosys)
Data Virtualization: Denodo (Data Federation, Caching, Query Optimization, Lineage Tracking) Palantir Foundry (conceptual experience ontology, pipeline design, governance workflows)
Big Data Technologies: Hadoop (1.0X and 2.0X), Hortonworks HDP (2.4/2.6), HDFS, YARN,
MapReduce, PigHBase, Hive, Sqoop, Flume, Spark, Oozie, Airflow, Ambari and Apache Kafka.
Containers & Orchestration: Docker, Kubernetes (EKS).
Programming languages: MapReduce,PIG,Java,Python,C#,PySpark,SparkSQL,Linux,Unix,Shell,Scripting,
SQL,PL/SQL.
Databricks: Delta Live Tables (DLT), Unity Catalog, MLflow, Databricks Workflows, Delta Optimization (Z-ORDER, file compaction)
ETL Tools: IBM Information Server 11.5/9.1/8.7/8.5, IBM Infosphere DataStage 8.1.0, Assential
DataStage.7.5.X, Quality Stage, Talend 6.4, SSIS, SSRS, Informatica.
Business Intelligence: Power BI, SAP Business Objects 11.5, Qlik Sense, Tableau.
Scheduling: Control-M, Autosys, Oozie, Apache Airflow.
Version Control Tools: Git, CI/CD, Jenkins.
Databases: NoSQL: HBase and Cassandra
Row-Oriented: Oracle 11g/10g, MS SQL Server, MySQL, Teradata V2R5/V2R6, DB2.
Columnar: HP Vertica.

WORK EXPERIENCE
Client: NTT Data Jan 2023-Present
Role: Lead. Azure Data Engineer, Canton, MI
Responsibilities:
Integrated various data sources with Tableau to build real-time visualizations for decision-makers in marketing, sales, and finance.
Worked with Cardinal Commerce, a global financial services technology company that offers a wide range of services
Configured and maintained Apache Atlas workflows for metadata cataloging, lineage tracking, and policy enforcement across fund administration data pipelines.
Monitor the pipelines and make sure that they are running well, and periodically do unit testing for the data that we receive from these pipelines.
Designed and implemented large-scale data architectures using Azure Synapse, ADLS Gen2, and Databricks to support batch, streaming, and real-time analytics.
Experience in using Azure Logic Apps to trigger notifications and updates based on the transaction status.
Designed and delivered Star/Snowflake data models to support Power BI and enterprise reporting needs.
Designed automated ETL pipelines integrating Oracle on-prem data sources with cloud data platforms (Snowflake, Azure SQL), applying reusable transformation logic in Python and SQL.
Built RAG-based LLM applications integrating vector search, embeddings, and LangChain orchestration for enterprise use cases.
Designed and automated CI/CD pipelines using AWS CodePipeline and CodeBuild to continuously execute UI and API test suites for every code commit, improving deployment reliability and reducing manual validation effort by 70%.
Integrated automated tests into the build-and-release workflow, enabling early defect detection and faster feedback loops for developers
Built curated Gold/Silver/Bronze layers in ADLS/Synapse for analytics and BI consumption.
Designed and implemented scalable ETL/ELT pipelines using Qlik Replicate and Qlik Compose (conceptually aligned with Databricks and ADF frameworks) for real-time data ingestion from relational and cloud data sources into Snowflake.
Led data domain activation across multiple business units by defining sub-domains, identifying critical data elements (CDEs), and driving stewardship assignments.
Built end-to-end CI/CD pipelines for Snowflake, automating SQL deployments, RBAC provisioning, Streams/Tasks, and object creation using GitHub Actions/Azure DevOps.
Developed autonomous AI agents using LangChain and Azure OpenAI to automate data validation, transformation, and metadata cataloging tasks.
Implemented Terraform-based IaC for Snowflake roles, warehouses, resource monitors, pipes, and integrations, ensuring repeatable and compliant deployments.
Architected Data Lake zones (Bronze Silver Gold) with Delta Lake/Iceberg/Hudi enabling ACID, schema evolution, and time travel.
Developed high-quality, performance-optimized datasets used by BI teams for dashboards and reports.
Built and maintained Qlik Sense dashboards to visualize operational metrics and data quality KPIs, integrating with Snowflake and AWS data sources for interactive analysis.
Collaborated on Oracle data migration and modernization initiatives, ensuring seamless data transfer and performance optimization in hybrid environments.
Designed and implemented enterprise Data Governance frameworks including data cataloging, metadata lineage, and stewardship processes across cloud platforms (Azure, AWS).
Collaborated with mainframe teams during data migrations from legacy systems, validating extract files, reconciling structured datasets, and mapping COBOL copybook formats into modern cloud models.
Integrated API-based data ingestion from Oracle applications and flat files into cloud data lakes, establishing automated refresh and validation processes.
Integrated LLM-based reasoning APIs with backend data services, enabling workflow automation and intelligent data enrichment.
Automated Informatica IICS deployments (mappings, tasks, connections) using API-based scripts and CI/CD workflows for Dev QA Prod promotion.
Configured Denodo data virtualization layer to integrate data from Oracle, Snowflake, and AWS S3, enabling a unified analytics view without physical data movement.
Built Python and Java-based microservices exposing AI capabilities (summarization, classification, and data retrieval) via REST endpoints.
Configured AWS CloudWatch metrics and alarms to monitor ETL job performance, reducing downtime and improving pipeline reliability.
Collaborated with cross-functional teams to define data ownership, stewardship, and accountability models, ensuring compliance and governance across all data domains.
Worked with fixed-width data formats, hierarchical structures, and batch extracts commonly originating from mainframe systems (IMS/DB, VSAM).
Deployed AI workloads on Azure Kubernetes Service (AKS) with CI/CD pipelines, ensuring scalability, resilience, and performance optimization.
Automated Atlas metadata updates using REST API scripts, improving catalog freshness and reducing manual governance overhead.
Managed, monitored, and debugged complex ETL/ELT pipelines across Snowflake, Databricks, and Azure Data Factory, ensuring operational stability and high performance.
Developed custom Atlas metadata enrichments to tag fund structures, data ownership, and processing lineage for improved traceability.
Built and optimized GCP-based pipelines leveraging BigQuery for analytics, Pub/Sub for event ingestion, and Cloud Run for containerized microservices.
Collaborated with data governance and investment operations teams to define metadata standards and lineage mapping between Atlas and Canoe datasets.
Designed and managed AKS clusters for containerized data processing workloads, optimizing node pools, ingress controllers, and network security policies.
Designed and automated batch workflows that replicate JCL-style sequencing, dependency handling, error recovery, and SLA adherence in cloud environments.
Implemented automated code generation templates using Python and Jinja for ETL framework standardization and faster deployment.
Created and maintained business glossaries, technical metadata, and reference data dictionaries in collaboration with SMEs and domain owners.
Designed Denodo base and derived views, optimized query federation, and implemented caching strategies to improve reporting performance by 30%.
Developed PySpark batch and streaming jobs for real-time transformations processed through GCP and Kafka.
Designed, developed, and optimized end-to-end data pipelines using Azure Data Factory, Databricks, and Azure SQL Server, enabling scalable data processing across multiple sources.
Developed SQL-based validation and reconciliation frameworks to ensure consistency and accuracy across multiple data layers and systems.
Implemented Terraform-based infrastructure automation to provision Azure services (Key Vault, Storage Accounts, Container Registry, Application Gateway, and Databricks workspaces).
Developed AWS Cloud batch workflows using EMR, Glue, and Lambda for large-scale data processing, automating ETL operations and optimizing cost efficiency.
Implemented API ingestion frameworks consuming REST/JSON data into cloud storage and downstream pipelines.
Developed and maintained operational dashboards and alerting systems for proactive monitoring of data workflows.
Provided L2/L3 production support for Azure Data Factory and Databricks pipelines, handling incident resolution, root cause analysis (RCA), and performance optimization.
Integrated GitOps workflows using GitHub Actions and FluxCD, ensuring consistent and version-controlled deployments across environments.
Standardized terminology, definitions, and data usage rules across domains to support organization-wide data consistency.
Migrated Oracle OTC and legacy ERP datasets into AWS Snowflake using Glue and Python ETL.
Conducted data quality assessments and issue remediation using SQL, Athena, and Redshift for critical production datasets.
Designed and implemented Data Fabric architecture integrating Snowflake, Azure Data Lake, SQL Server, and Cosmos DB to provide unified access and governed analytics across the enterprise.
Scheduled and monitored ETL pipelines using Control-M and Autosys for batch and streaming workflows, ensuring timely data availability.
Created and executed SQL-based test scripts to verify transformation logic, joins, and aggregations within Snowflake and Azure pipelines.
Created and maintained data documentation artifacts, including data dictionaries, lineage diagrams, and mapping documents for ETL workflows.
Built an indexing and retrieval framework to support enterprise data onboarding, integrating Azure AI Search, Databricks, and FastAPI services.
Collaborated in Agile sprints to deliver incremental data engineering solutions, ensuring timely releases and continuous integration.
Deployed and configured Istio service mesh to manage inter-service traffic, mutual TLS (mTLS) authentication, and zero-trust communication between workloads.
Collaborated with QA and business teams for defect tracking, RCA, and regression testing to ensure production data quality and reliability.
Implemented AWS SNS/SQS for automated event notifications and pipeline orchestration across ETL workflows.
Collaborated with cross-functional teams, including business stakeholders and engineering teams, to resolve operational issues and implement enhancements.
Implemented Snowflake Streams and Tasks to enable incremental ETL processing and real-time data ingestion.
Led architectural decisions for enterprise-scale data pipelines ensuring scalability, performance, and operational efficiency.
Communicated technical findings, pipeline issues, and remediation steps to non-technical users and senior stakeholders, facilitating informed decision-making.
Wrote modular and testable Python scripts following clean code principles, ensuring maintainable and reusable ETL components.
Implemented ELT workflows in Databricks using PySpark for data transformation and integrated results into Azure SQL Server for downstream analytics.
Developed Delta Live Tables (DLT) pipelines to automate data ingestion and transformation with built-in data validation and quality checks.
Developed custom Splunk dashboards for pipeline health checks, enabling proactive monitoring and SLA compliance.
Mentored team on cloud-native ETL patterns, data modeling best practices, and Data Fabric concepts.
Collaborated in Agile sprints, contributing to sprint planning, code reviews, and CI/CD deployments for continuous improvement.
Supported ad hoc analytical and reporting requests by extracting and transforming large datasets using Athena, S3, and Redshift.
Built parameterized and modular pipelines to handle dynamic data ingestion from APIs, SQL databases, and blob storage into Databricks Delta tables.
Presented Snowflake ETL architecture and pipeline design to business and technical stakeholders, ensuring clear understanding of complex data flows.
Integrated On-Premises (MYSQL, Cassandra) and cloud data storage (Blob storage, Azure SQL Database) using Azure Data Factory (ADF) and applied transformations after loading into Snowflake.
Developed Snowflake-based ETL pipelines on AWS cloud, integrating SnowSQL, Snowpark, and Python for scalable and maintainable data processing.
Implemented API ingestion frameworks consuming REST/JSON data into cloud storage and downstream pipelines.
Built end-to-end ML pipelines using Databricks, ADF, and PySpark for high-volume data processing.
Designed and maintained Dremio-based data models and semantic layers, integrating data from Azure Data Lake and Snowflake to deliver unified analytical views.
Created runbooks, job-monitoring dashboards, and alerting systems in Azure Monitor to ensure 24x7 operational stability and SLA adherence.
Integrated Git-based CI/CD pipelines for version-controlled SQL scripts, transformations, and analytics code deployments.
Automated event-driven ETL workflows using AWS Lambda integrated with S3 triggers.
Leveraged OpenSearch dashboards for real-time log analytics and monitoring.
Implemented CI/CD workflows for ML models with GitHub Actions, Docker, and Kubernetes (EKS/AKS).
Built and optimized Amazon Aurora clusters for transactional reporting, ensuring high availability
Modeled data in Snowflake using data warehousing techniques, performed data cleansing, managed Slowly Changing Dimensions(SCD), assigned Surrogate keys, and implemented change data capture.
Automated pipeline health checks and data validation processes using Python and REST APIs for proactive anomaly detection.
Designed and implemented data quality rules, profiling, and monitoring using Informatica Data Quality (IDQ) to ensure accuracy and consistency across financial datasets.
Implemented Medallion architecture in Databricks using Delta Lake to support scalable and maintainable data pipelines.
Collaborated with DevOps and CloudOps teams to support deployment management, hotfix releases, and version control through Azure DevOps and GitHub Actions.
Azure Data Lake Storage Gen2 in Azure cloud as a central repository, enabling scalable storage and efficient management of diverse data types for streamlined processing and analysis.
Migrated Azure/AWS workloads into GCP environments as part of multi-cloud standardization.
Collaborated with business stakeholders to define KPIs and ROI for AI initiatives.
Partnered with business stakeholders to analyze recurring failures and implement long-term preventive measures to enhance data reliability.
Built complex PySpark transformations and UDFs for handling large-scale batch and streaming data in Databricks notebooks.
Developed complex SQL queries in Dremio leveraging window functions, CTEs, and analytical aggregations for performance-intensive analytics workloads.
Conducted query profiling and performance optimization in Dremio and PostgreSQL, reducing latency and improving interactive query response times.
Developed SLA monitoring and alerting frameworks to proactively detect pipeline failures and ensure compliance with business requirements.
Built ETL/ELT pipelines for structured and semi-structured datasets using Dremio, ensuring seamless integration across data sources and consumers.
Prototyped agentic AI workflow using LangChain and Azure OpenAI, enabling multi-agent coordination for autonomous document processing.
Built and optimized Airflow DAGs (Directed Acyclic Graphs) to orchestrate complex data workflows, ensuring the reliable execution of data transformations and monitoring for failures or delays.
Utilized Airflow to integrate DBT with cloud-based data platforms such as Snowflake or BigQuery, streamlining the ETL process and automating data pipeline management for improved operational efficiency.
Worked with Azure Cosmos DB to manage and optimize the storage and retrieval of large-scale, globally distributed data, ensuring high availability and low-latency access to mission-critical data for real-time applications.
Implemented and optimized stored procedures in JavaScript within Azure Cosmos DB for efficient data processing, enabling seamless integration between the database and application layer for real-time data querying and reporting.
Designed and implemented Cosmos DB collections and partitioning strategies to ensure efficient data access, improve query performance, and scale as data volumes grew, meeting high-performance and scalability requirements.
Developed and optimized stored procedures in JavaScript to perform complex data manipulations and transformations within Cosmos DB, reducing the need for external data processing and ensuring data consistency.
Applied scaling strategies to Cosmos DB by implementing partitioning and indexing techniques, enabling efficient querying and faster response times even as the data volume increased.
Addressed data issues by conducting thorough root cause analysis to identify and resolve discrepancies in data integrity, ensuring data consistency and accuracy across systems.
Developed data validation and reconciliation frameworks using Python and SQL to ensure data integrity across bronze, silver, and gold layers.
Performed root cause analysis on production data failures and performance bottlenecks, implementing corrective actions to improve data quality and reduce downtime in production environments.
Provided production support for data pipelines, troubleshooting issues in DBT models, Airflow workflows, and Cosmos DB integrations, ensuring that data processes remained operational and efficient.
Implemented Snowflake Snowpipe and Azure Blob Storage integrations for real-time data ingestion and monitoring of large datasets.
Collaborated with cross-functional teams to address and resolve data issues by designing and implementing solutions that improved data quality and streamlined data flows across systems.
Improved data governance and quality processes by addressing root cause issues in production environments, identifying recurring problems, and implementing proactive measures to prevent future data inconsistencies.
Optimized data workflows in DBT, Airflow, and Cosmos DB to ensure that ETL processes were efficient, scalable, and capable of handling large volumes of data while minimizing errors and performance degradation.
Worked closely with stakeholders to troubleshoot and resolve production issues across DBT, Airflow, and Cosmos DB, ensuring that data transformations and integrations were accurate, timely, and aligned with business requirements.
Created procedures to leverage Time Travel for efficient rollback of changes, enhancing data reliability in production environments.
Migrated existing Delta Lake datasets to Iceberg format for better interoperability
Implemented Azure Databricks, leveraging its advanced analytics capabilities for efficient data processing, and collaborative development, enhancing overall performance.
Utilized Azure Data Factory (ADF), Azure Data Lake, and Azure Synapse Analytics to solve business problems with an analytical approach.
Skilled in leveraging Informatica PowerCenter for orchestrating data integration, transformation, and ETL operations, ensuring uninterrupted data flow and precision in intricate corporate requirements.
Conducted performance assessments of Time Travel usage, ensuring that historical queries did not adversely impact overall system performance.
Built and managed Iceberg-based tables in Snowflake with schema evolution and snapshot retention.
Implemented AWS EMR clusters with PySpark for large-scale data processing and integrated workflows with EKS and Kubernetes for containerized job orchestration
Integrated third-party and custom applications with Azure Active Directory (AAD) using Azure Services for authentication and authorization.
Implemented data quality and governance controls, including lineage tracking, auditing, and catalog management using Dremio and Azure Data Catalog.
Designed and developed interactive Tableau dashboards to visualize operational metrics, data trends, and quality KPIs for business users.
Integrated Snowflake with AWS Glue Catalog for Iceberg metadata.
Leveraged Delta Lake s Delta Optimization features, such as data compaction and file pruning, to improve query performance and reduce storage costs.
Designed ELT/ETL pipelines to enable bidirectional data transfer between Snowflake, utilizing Snowflake Snow SQL to guarantee seamless integration and transformation processes.
Enabled secure data sharing across departments and external partners using Snowflake s data sharing features, facilitating collaboration without data duplication.
Established data governance KPIs, reporting dashboards, and status presentations to track progress and communicate to leadership.
Built RAG-based API layer using Python FastAPI + vector database to deliver contextual search and summarization capabilities.
Utilized Docker to containerize ETL components, ensuring portability and consistent deployments across environments.
Developed and managed data pipelines that utilize Delta Lake's capabilities for handling streaming and batch data processing, ensuring real-time data accuracy and consistency.
sources including databases, APIs, and file systems, ensuring seamless data integration.
Developed strategies for effective data sharing that adhered to compliance requirements while maximizing data accessibility for stakeholders.
Implemented data lineage and metadata tracking in Denodo to support governance and audit compliance for enterprise data fabric architecture.
Secured secrets management by integrating pipelines with HashiCorp Vault, improving compliance and reducing manual credential handling.
Developed and published Tableau dashboards integrating Snowflake and Dremio datasets, enabling near real-time visualization of data quality metrics, ETL performance, and business KPIs.
Implemented secure storage and management of cryptographic keys, secrets, and certificates using Azure Key Vault service, using Azure Key Vault cloud integration features like encryption at rest, access policies, and integration with Azure services for enhanced data protection and compliance.
Implemented role-based access controls in data sharing setups to ensure that sensitive data was only accessible to authorized users.
Environment: Azure Databricks, Azure Event Hubs, Azure Data Factory, Azure Synapse Analytics, Key Vault, Logic Apps, Functional Apps, Informatica, Snowflake, MS SQL, Vertica, Oracle, Cassandra, HDFS, MapReduce, YARN, Spark,Lambda, Hive, SQL, Python, C#, Scala, Pyspark, shell scripting, JIRA, Agile, Jenkins, Kafka, Tableau.

Client: Deloitte Nov 2019-Jan 2023
Role: Azure Data Engineer Suwanee, GA
Responsibilities:
Ingested data into various Azure services such as Azure Data Factory, Azure Data Lake Storage Gen 2, Azure SQL server, Azure Blob storage, and Azure Data Warehouse, leveraging Azure Databricks for data processing.
Worked with BI teams to validate data models for RLS/OLS and governed data access requirements.
Performed ETL operations using Azure Databricks and successfully migrated on-premises Oracle ETL processes to Azure Synapse Analytics.
Integrated IICS deployment steps into existing DevOps pipelines, enabling zero-downtime releases and consistent parameter management.
Implemented Synapse Dedicated SQL Pool pipelines and stored procedures for enterprise-grade transformations.
Developed Qlik data models (Snowflake schema) following associative modeling principles to enable self-service analytics and faster query responses.
Facilitated data ownership and stewardship governance operating models, ensuring accountability across business, analytics, and IT teams.
Provisioned scalable test infrastructure on AWS using EC2, S3, and IAM, enabling isolated, on-demand testing environments with controlled access and improved test execution speed.
Leveraged AWS Lambda for serverless test triggers, log processing, and automated cleanup tasks, reducing resource usage and operational costs by 30 40%.
Partnered closely with Power BI developers to deliver datasets optimized for dashboard performance and refresh cycles.
Implemented Azure RBAC and data-access controls to ensure secure and compliant data consumption.
Established business glossaries, data dictionaries, and standardized naming conventions to enhance data discoverability and consistency.
Implemented monitoring and alerting for IICS job failures and latency issues using CloudWatch / Grafana.
Implemented metadata management processes including glossary alignment, lineage documentation, and attribute definitions.
Built and optimized data ingestion pipelines for fund admin data using Python, PySpark, and Atlas APIs, ensuring accurate metadata capture and schema consistency.
Monitored and optimized Qlik ETL performance, applying caching, incremental reloads, and optimized query structures to reduce latency and improve system throughput.
Developed governance frameworks covering data quality, metadata, lifecycle management, access controls, and compliance.
Implemented RAG (Retrieval-Augmented Generation) pipelines combining Azure OpenAI with Pinecone/FAISS for context-aware responses.
Integrated Qlik data ingestion pipelines with AWS and Azure cloud storage, ensuring seamless data flow between hybrid cloud environments.
Led Oracle OTC to Snowflake migration, designing data models, transformation logic, and validation frameworks using AWS Glue and SnowSQL.
Partnered with business and IT stakeholders to define data domain activation playbooks covering definitions, lineage, data quality, and lifecycle management.
Enhanced platform observability by integrating Prometheus, Grafana, and Loki for real-time metrics, distributed tracing, and log aggregation.
Integrated Canoe platform APIs for document ingestion and automated fund data extraction, mapping outputs to Atlas-managed data domains.
Collaborated with BI and Data Science teams to enable self-service analytics through curated Dremio datasets and governed semantic models.
Designed LLM service interfaces to integrate with enterprise APIs and Snowflake data layers, enabling automated insight generation.
Developed and maintained ETL/ELT data pipelines in Databricks and Azure Data Factory to process structured and semi-structured financial datasets.
Collaborated with DevOps to containerize LLM services using Docker and deploy to EKS/AKS clusters for production inference.
Performed source-to-target data mapping and lineage tracing between legacy Cisco systems and the Snowflake reporting layer, ensuring consistency and accuracy across migration stages.
Utilized Control-M and Airflow to schedule and orchestrate data workflows across multiple cloud environments.
Migrated SQL databases to Azure Data Factory (ADF), Azure Data Lake Gen 2 (ADLS GEN 2), Azure Synapse Analytics, Azure SQL Database, Azure Databricks, and Azure SQL Data Warehouse.
Managed database access control and facilitated the migration of On-premises databases to Azure Data Lake Storage Gen 2 (ADLS GEN 2) using Azure Data Factory (ADF).
Collaborated with security teams to implement RBAC for Kubernetes clusters and Azure role assignments for resource-level governance.
Developed automated data validation scripts in Python and SQL to reconcile migrated data volumes and ensure referential integrity across systems.
Integrated Azure SQL Server with Databricks for data loading and transformations, optimizing read/write performance through partitioning and indexing strategies.
Designed incremental load processes using Snowflake Streams and Tasks, ensuring low-latency and high-performance data pipelines.
Implemented Azure Application Gateway/WAF for ingress traffic management, SSL termination, and centralized security enforcement.
Automated backup and disaster recovery workflows for critical services using Azure-native tooling and IaC modules.
Utilized Python automation to generate operational health reports and manage error-handling mechanisms in production pipelines.
Built and deployed custom LLM endpoints using Azure OpenAI Service for internal knowledge automation.
Proficient in utilizing Informatica PowerCenter for data integration, transformation, and ETL processes, ensuring seamless data flow and accuracy within complex business environments and experienced in designing and implementing scalable solutions for data warehousing and analytics.
Performed deep data analysis on complex loan-level datasets to identify data quality issues, detect anomalies, and extract actionable business insights.
Leveraged Azure Synapse Analytics and Polybase for seamless and optimized data ingestion, integration, and transfer, enhancing data processing efficiency and scalability.
Authored Python ETL scripts integrating AWS Glue workflows and Redshift Spectrum for efficient data ingestion and transformation.
Implemented PySpark-based transformation logic in Databricks notebooks, applying schema validation and automated quality checks for production-grade reliability.
Implemented real-time streaming data processing using Azure Event Hubs, enabling timely insights and proactive interventions for business decision-making and these technologies such as data partitioning, indexing, and stream processing for optimized performance and scalability.
Developed enterprise-level solutions using batch processing and streaming frameworks, including Spark Streaming and Apache Kafka.
Built metadata-driven frameworks for pipeline orchestration using ADF, enabling reusability and reduced manual intervention.
Implemented data quality and governance controls, including lineage tracking, auditing, and catalog management using Dremio and Azure Data Catalog.Elemented data modeling strategies for Cosmos DB to optimize performance, achieving high throughput and efficient scaling.
Implemented Retrieval-Augmented Generation (RAG) pipelines combining LLMs with vector databases (Pinecone, FAISS, ChromaDB).
Utilized Azure Active Directory (AAD) expertise to manage identities, authentication, and access control for applications and services, implementing single sign-on (SSO) and multi-factor authentication (MFA) solutions.
Built and optimized analytical SQL queries and stored procedures in Redshift to support compliance, risk management, and portfolio reporting
Leveraging Azure Monitor for proactive monitoring, performance optimization, resource availability, usage analysis, metrics configuration, log management, alert configuration, dashboard setup, operational efficiency, timely issue identification, continuous improvement, reliability enhancement, scalability optimization, cloud solutions.
Collaborated with business analysts and data governance teams to define KPIs and metrics for financial data validation and audit readiness.
Applied quantization, pruning, and distillation techniques to optimize model performance for production.
Developed real-time data synchronization processes utilizing Cosmos DB, enhancing the user experience for applications with dynamic data requirements.
Delivered analytical insights and pipeline performance reports to stakeholders, demonstrating improvements in data accuracy, reliability, and refresh frequency.
Used Git as a version control tool for code repository management.
Environment: Azure Databricks, Azure Event Hubs, Informatica, Azure Data Factory, Azure Synapse Analytics, Azure Monitoring, Key Vault, Logic Apps, Functional App, Snowflake, MS SQL, Vertica, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, C#, Scala, Pyspark, shell scripting, GIT, JIRA, Kafka, ADF Pipeline, Power BI.

Client: GM Financial Dec 2017-Nov 2019
Role: Big Data Engineer Atlanta, GA
Responsibilities:
Utilized Sqoop for periodic ingestion of data from MySQL into HDFS, ensuring seamless integration and efficient data transfer within big data environments, facilitating robust data processing and analysis workflows.
Performed aggregations on large amounts of data using Apache Spark and Scala and stored the data in Hive warehouse for further analysis.
Implemented CI/CD pipelines for AWS Glue and Lambda ETL jobs using CodePipeline, CodeBuild, and CloudFormation templates, automating deployment and version control.
Utilized Unity Catalog for secure data governance and lineage tracking across Databricks environments.
Contributed to Lakehouse optimization by leveraging Apache Iceberg and Delta Lake for versioned, ACID-compliant, and performant analytical tables.
Automated Airflow environment configuration (connections, variables, secrets) using Terraform and Python-based admin scripts.
Designed audit dashboards and reconciliation reports to monitor fund data ingestion status, validation errors, and Atlas lineage coverage.
Developed and maintained ETL workflows using IBM DataStage 11.5, integrating data from relational and big data systems for downstream analytics.
Automated Databricks deployments and pipeline versioning using Azure DevOps, Databricks CLI, and Git integration.
Engaged with Data Lakes and prominent big data ecosystems such as Hadoop, Spark, Hortonworks, and Cloudera, orchestrating data processing and analytics tasks within scalable and distributed computing environments.
Integrated AWS CloudWatch alarms and SNS notifications to monitor job performance and trigger remediation workflows.
Ingested and transformed extensive volumes of Structured, Semi-structured, and Unstructured data, leveraging big data technologies to handle diverse data formats efficiently within scalable distributed computing environments.
Implemented Apache Ambari for centralized management and monitoring of Big Data infrastructure, streamlining administration tasks and ensuring optimal performance across Hadoop clusters.
Wrote Hive queries to meet business requirements and conducted data analysis. Built HBASE tables by leveraging HBASE integration with HIVE on the Analytics Zone.
Utilized a range of big data analytics tools like Hive and MapReduce to analyze Hadoop clusters, alongside developing a robust data pipeline with Kafka, Spark, and Hive for end-to-end data ingestion, transformation, and analysis.
Wrote Hive queries to meet specified business requirements, created Hive tables, and utilized Hive QL to simulate MapReduce functionalities.
Implemented UNIX and YAML scripts for orchestrating use case workflows, automating data file processing, job execution, and deployment processes, enhancing efficiency and scalability within big data environments.
Executed a seamless migration of data from Oracle RDBMS to Hadoop utilizing Sqoop, facilitating efficient data processing and integration within the big data ecosystem, optimizing scalability and performance.
Environment: Sqoop, MYSQL, HDFS, Apache Spark, Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, C#, Pyspark, JVM, shell script, Flume, YAML, Unix, Cassandra, Ambari, JIRA, GIT.

Client: Morgan Stanley Feb 2014- Dec 2017
Role: Data Warehouse Developer, Jersey City, NJ
Responsibilities:
Experience as a SQL Server Analyst/Developer/DBA specializing in SQL Server versions 2012, 2015, and 2016 within data warehousing environments.
Experience in developing complex store procedures, efficient triggers, required functions, creating indexes and indexed views for performance.
Developed and deployed Airflow DAGs through CI/CD pipelines, using Git triggers and containerized test environments.
Designed and implemented enterprise-class data warehouses on Oracle 11g/12c, leveraging dimensional (star and snowflake) data models for analytical reporting and performance optimization.
Developed advanced PL/SQL procedures, packages, and functions to automate ETL workflows, ensuring high data quality and efficient batch processing.
Created and tuned complex SQL and PL/SQL scripts for data extraction, transformation, and loading from flat files and relational sources.
Implemented Airflow observability dashboards, SLA monitoring, retry rules, and automated failure alerts integrated with Slack/Email/Splunk.
Optimized query performance using indexes, partitioning, and materialized views to reduce data processing time across large datasets.
Designed and fine-tuned transformer-based models (BERT, GPT, T5) for NLP tasks such as text classification, summarization, and entity recognition.
Demonstrated expertise in monitoring and fine-tuning SQL Server performance to optimize data warehouse operations and enhance query efficiency.
Expert in designing ETL data flows using SSIS, creating mappings/workflows to extract data from SQL Server and Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS.
Efficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, and developing, fact tables, dimension tables, using Slowly Changing Dimensions (SCD).
Developed jobs, configured SQL Mail Agent, set up Alerts, and scheduled DTS/SSIS Packages within data warehousing.
Developed end-to-end ETL solutions by integrating SSIS for data extraction, transformation, and loading, and SSRS for creating detailed, interactive reports based on processed data.
Designed and implemented automated data workflows using SSIS, and delivered actionable business insights through SSRS reports, enabling decision-makers to access real-time data.
Implemented error-handling and logging mechanisms in SSIS packages, ensuring data integrity and reliability, and developed troubleshooting reports in SSRS to monitor the success or failure of ETL processes.
Delivered end-user training on using SSRS reports and dashboards, while also ensuring seamless data flow and integration using SSIS for backend processes.
Optimized SSIS data flows to reduce processing time and increased the efficiency of SSRS reports by enhancing their data models and query performance.
Manage and update the Erwin models - Logical/Physical Data Modeling for Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB according to the user requirements.
Proficient in designing and implementing dimensional modelling techniques, including star schema and snowflake schema, to optimize data storage, streamline querying processes, and enhance reporting efficiency in data warehousing environments.
Conducted UAT testing and validation of Canoe-ingested fund data, verifying transformations against business rules and data quality metrics.
Utilized TFS for source control and tracking environment-specific script deployments, ensuring version management and traceability in data warehouse development processes.
Experienced in designing and implementing both snowflake and star schema structures, optimizing data organization for efficient querying and reporting.
Exported Data Models from Erwin to PDF format and published them on SharePoint, enabling access for diverse users in data warehouse development.
Environment: SQL Server 2012/2014/2015 Enterprise Edition, SSRS, SSIS,SSAS, T-SQL, Shell script, Windows Server 2012, PerformancePoint Server 2007, Oracle 12c, visual Studio 2010/2013, Share point,, Star Schema, Snowflake, Dimension Modelling Normalization, GIT, Dimensioning modelling MDX Scripting.
Keywords:csharp continuous integration continuous deployment quality analyst business analyst artificial intelligence machine learning user interface business intelligence sthree database active directory information technology hewlett packard microsoft mississippi procedural language Arizona Georgia Michigan New Jersey

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];6958

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: