| Vishnu Sunil - Mainframe cobol developer |
| [email protected] |
| Location: Remote, Remote, USA |
| Relocation: yes |
| Visa: GC |
| Resume file: Vishnu - Mainframe Developer (1)_1776785505976.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Professional Experience:
Experienced Senior Data Engineer with 12+ years of experience designing and delivering scalable, secure, and high-performance data platforms across banking, health, and retail industries. Experienced AWS Data Engineer with 6+ years of expertise in building scalable ETL/ELT pipelines using AWS Glue, Lambda, S3, and Redshift. Strong background in implementing CI/CD pipelines, infrastructure automation using Terraform, and deploying data workflows using Jenkins and AWS CodePipeline. Skilled in data quality validation, monitoring with CloudWatch, and developing reliable, high-performance data platforms aligned with business needs. Experienced in Microsoft Fabric, Azure Data Explorer (ADX), and KQL for real-time analytics and dashboarding. Proficient in building cloud-native data solutions using major platforms like AWS, Azure, GCP and implementing real-time, batch, and event-driven pipelines to support advanced analytics, machine learning, and regulatory compliance. Strong Infrastructure as Code (IaC) expertise using Terraform, enabling automated and compliant cloud provisioning across multi-cloud environments. Experience in containerized deployments using Docker and Kubernetes (AKS, GKE) for microservices and data processing applications. Designed and developed Apache Airflow DAGs to orchestrate complex ETL/ELT pipelines with task dependencies and dynamic scheduling. Designed and implemented cloud migration strategies using assessment, planning, re-hosting, re-platforming, and re-architecting approaches. Expert in building scalable ETL/ELT pipelines using Apache Spark (PySpark/Scala), AWS Glue, Azure Data Factory, Databricks, Apache Beam, and Informatica/Talend. Built highly scalable ETL/ELT pipelines using Apache Beam on Cloud Dataflow, enabling both batch and streaming workloads with low latency and fault tolerance. Implemented dbt Core on Azure Databricks to build Loom modular transformation pipelines supporting Medallion Architecture (Bronze, Silver, Gold layers). Skilled in dbt for modular transformation and CI/CD integration of analytics-ready datasets. Proficient in T-SQL, PL/SQL, and PostgreSQL for writing high-performance queries, stored procedures, and data models. Architected low-latency data pipelines using Kafka, Flink, Kinesis, Spark Structured Streaming, and Event Hubs for fraud detection, retail POS systems, and IoT telemetry. Integrated real-time analytics with systems like AWS SageMaker, Vertex AI, and Cloud Functions for ML-driven alerting and scoring. Built and managed large-scale data lakes using AWS S3, Azure Blob Storage, and Google Cloud Storage with formats like Parquet, Avro, and Iceberg. Designed star/snowflake schemas and optimized warehouse performance on Snowflake, Big Query, Amazon Redshift, Synapse Analytics. Implemented lakehouse-style architectures on Google Cloud Storage using Parquet and Iceberg formats for scalable and analytics-ready data lakes. Enabled data governance and lineage using Data Catalog, ensuring compliance with HIPAA, GDPR, PCI-DSS, and SOC 2 standards. Integrated BigQuery streaming inserts for real-time dashboards and operational reporting. Implemented medallion architecture (bronze/silver/gold) and data mesh for domain-oriented data ownership. Provisioned and managed GCP infrastructure using Terraform (GCP provider) following Infrastructure-as-Code and security best practices. Built ML pipelines with SageMaker, Azure Databricks, and Vertex AI for customer segmentation, credit risk scoring, and demand forecasting. Orchestrated complex data workflows using Cloud Composer (Apache Airflow) with SLA monitoring, retries, and dependency management. Engineered real-time streaming pipelines with Pub/Sub + Dataflow, achieving near real-time ingestion for IoT, POS, fraud detection, and clickstream use cases. Implemented feature stores using Delta Lake, and integrated MLflow for model tracking. Developed and optimized BigQuery data warehouses, leveraging partitioning, clustering, materialized views, and BI Engine to improve query performance and reduce costs. Created dashboards and reports using Power BI, Tableau, Looker for actionable business insights. Designed and maintained Apache Airflow and Step Functions DAGs for ETL orchestration and workflow automation. Developed data validation frameworks using Great Expectations, pytest, and custom Python libraries. Managed metadata and lineage using Glue Data Catalog, Talend Metadata Manager, and internal cataloging tools. Deep experience in RDBMS (PostgreSQL, SQL Server, Oracle, MySQL), and NoSQL databases including MongoDB, Cassandra, HBase, DynamoDB, Cosmos DB. Implemented lakehouse-style architectures on Google Cloud Storage using Parquet and Iceberg formats for scalable and analytics-ready data lakes. Led end-to-end cloud migration initiatives, moving on-premise applications and workloads to cloud platforms to improve scalability and reliability. Used SSIS, SSAS, and SQL Server Agent for legacy systems integration and OLAP analytics. Automated testing, deployment, and CI/CD with GitHub Actions, Azure DevOps, GitLab, Bitbucket, and Jenkins. Created centralized monitoring/logging with CloudWatch, Elasticsearch, Azure Monitor, and custom dashboards. Built monitoring and alerting using Cloud Monitoring and Cloud Logging, improving observability and SLA adherence. Led cross-functional teams using Agile/Scrum methodologies with tools like Jira and Confluence. Hands-on experience with Hadoop ecosystem, including HDFS, Hive, HBase, Pig, Oozie, Sqoop, and YARN for large-scale batch processing. Built high-performance Spark jobs using Apache Spark (PySpark/Scala) on YARN, EMR, Databricks, and Azure Synapse. Worked with Apache Flink, Kafka Streams, and Apache Beam for real-time big data pipelines. Utilized Delta Lake, Iceberg, and Parquet formats for scalable and ACID-compliant data lake architectures. Education: Bachelor of Technology C.S.E - JNTUK 2008 - 2012 Master of Science Data Science & AI Saint Peter s University 2012 - 2014 Technical Skills: Category Skills Programming Languages Python, Scala, SQL, Java, R, Shell Scripting, SQL (Advanced) Python Libraries Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, PySpark, Beautiful Soup, PyTorch, TensorFlow, GitHub . Big Data Technologies Apache Spark, Hadoop, HDFS, MapReduce, Apache Beam, Apache Flink, HBase, Hive, Pig, Sqoop. Streaming & Messaging Apache Kafka, Spark Structured Streaming, Azure Event Hubs, AWS Kinesis, Pub/Sub, Splunk (familiar). ETL/ELT Tools Airflow, Informatica, SSIS, dbt, AWS Glue, Databricks Delta Live Tables, DataStage, Microsoft Fabric, Talend, ETL Development (Rockstar Level). Cloud Platforms AWS S3, EMR, EC2, Lambda, Redshift, Glue, DynamoDB, CloudWatch, SageMaker, Step Functions, Aurora, Bedrock, Azure, Microsoft Fabric, Azure Data Explorer (ADX). Cloud Platforms - Azure Databricks, Data Factory, Event Grid, Functions, Blob Storage, Synapse Analytics, HDInsight Cloud Platforms GCP BigQuery, Dataflow, Datapost, GCS, Cloud Functions, Pub/Sub, Data prep, Vertex AI , BigQuery, Dataflow, Pub/Sub, Composer, Cloud Storage Building scalable cloud-based data pipelines. Cloud Run, Data Catalog, IAM, Cloud Monitoring, Cloud Logging, Cloud KMS, VPC, Terraform (GCP) Data Warehousing/Lakes Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, Data Lake Design Patterns, Apache Iceberg Databases MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, Cassandra, DynamoDB, HBase, T-SQL, Azure Data Studio, DBT. Data Modeling Star Schema, Snowflake Schema, Data Vault, Dimensional Modeling, Entity-Relationship Modeling, Handling large datasets across hybrid (on-prem + cloud) environments Visualization Tools Tableau, Power BI, Looker, Google Data Studio, Looker Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree database rlang procedural language |