| Dheeraj Kumar - Sr. Devops Engineer (Infra & Storage) |
| [email protected] |
| Location: Little Elm, Texas, USA |
| Relocation: Yes |
| Visa: H1b |
|
PROFESSIONAL EXPERIENCE:
With more than 10 years of diverse IT experience, I specialize in roles such as Kubernetes Platform Engineer, AWS Cloud Automation Engineer, Cloud Migration Engineer, and Platform Engineer. I have a proven track record in leading Build & Release Projects, overseeing on-premises data centers, and implementing serverless automation with Python Boto3. My expertise also includes designing and scaling NoSQL data structures and implementing advanced Kubernetes strategies. I have a deep understanding of platform virtualization, continuous integration and deployment, and configuration management, all while effectively managing cloud security protocols. My combination of technical proficiency and leadership abilities allows me to deliver cutting edge solutions that optimize operational efficiency and contribute to business growth. SUMMARY PROFESSIONAL: Cognitive about designing, deploying, and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS). Adept working on BareMetal, VMware ESXI, OpenShift 3&4, AWS, and Azure platforms. Well-versed in commissioning, Installation, and provisioning UNIX systems in Green Field Data Centers. Experienced in Performance Monitoring, Security, Troubleshooting, Backup, Disaster recovery, Maintenance and Support of UNIX systems. Specialized in implementing Organization DevOps strategy in various operating environments of Linux and servers along with cloud strategies of Amazon Web Services. Stay updated on Databricks features and migrate legacy systems to the platform. Proficient in writing Cloud Formation Templates (CFT) & Terraform code to build the AWS Services with the paradigm of Infrastructure as a Code. Expertise in integrating CPEH-FS, CEPH object storage on OpenStack and OpenShift environments. Worked on OpenStack infrastructure upgrades, expansion, scaling, troubleshooting, and debugging for clients in the most challenging, complex environments Expertise in Azure cloud resources, including Azure Active Directory, Azure API Management, AKS, VMs, Azure storage, Virtual Networks, Azure Functions, SQL services and Azure App Service. Proficient in generating Helm values files and deploying production load on Tanzu Kubernetes, EKS. Expert in troubleshooting Linux, DNS, DHCP, NAS, SAN, and volume managers. Hands-on experience in deploying VNFs & CNFs using ETSI-MANO architecture and guidelines. Experienced with event-driven and scheduled AWS Lambda functions to trigger various Services. Acquired practical exposure with Continuous Integration/Continuous Delivery tools like Jenkins, Code Build to merge development with testing through pipelines. In Depth/ Working knowledge in 5G architecture CU/ DU/ EMS & experience in deploying the 5G Teleco workloads on AWS, VMware Platforms. Implement and automate CI/CD pipelines for DataBricks notebooks. Carried out the automation processes utilizing Configuration Management tools such as Ansible. Conducted a proof of concepts (POC) using Docker and Padman container platforms to showcase the feasibility and benefits of containerization, including encapsulating code within a file system and enhancing abstraction and automation in our development workflow. Accomplished migrating systems from Physical to Virtual environments and evolving virtual services into Microservices architectures. Expertise on re-architecting and re-factoring the legacy on-prem application to public cloud. Experienced on DB migration and schema migration to public cloud (AWS, Azure) MySQL servers. Experienced in writing complex NoSQL queries to do CRUD Operations on NoSQL Databases (DynamoDB). Experienced in version control, Revision Control, and source code management tools like GIT, Code Commit. Designed serverless architecture to enhance application scalability and reduce infrastructure management overhead. Certifications TECHNICAL SKILLS: SCM Tool GIT, Bitbucket, Code Commit Build Tool Code Build, Maven Languages Python Boto3, Shell Operating System Photon OS, UNIX, Linux, Microsoft Windows Containers Docker, Pod Man, Kubernetes EKS, Tanzu Kubernetes, OpenShift 3,4 Load Balancers F5 SPK, NLB, ALB Registries ECR, Harbor Migrations Services Data Migration Service (DMS), Data Sync, NetApp ONTAP-Snap Mirror, ONTAP-Flexcache, AWS FSx for Lustre Database System My SQL, aurora-MySQL, No-SQL DynamoDB, Elastic Search, EFS Storage NetApp Trident, CEPH, EBS, Longhorn, FSX, EFS Monitoring Tools CloudWatch, Grafana, Prometheus Ticketing System Jira, Salesforce Remedy, ServiceNow Build Tools / Release Engineering / DevOps/ Open Source Terraform sentinel, code deploy, code build git cicd, Jenkins, Ansible, Cloud formation. WORK EXPERIENCE: PROJECT 5: Organization: Accenture Limited, Plano, Texas. Client: JP Morgan Chase Role: AWS Lead Engineer (Infra & Storage) July 2024 Present As an AWS Lead Product Engineer at JPMC, I build and automate cloud-based storage solutions using AWS services like FSx for NetApp ONTAP, EFS, S3 EBS, IAM, Dynamo DB, AWS FSx for Lustre, and EKS. I develop serverless workflows with Lambda and Step Functions, automate infrastructure with Terraform. I also focus on creating secure and resilient storage architectures, support real-time data streaming with Kafka, and ensuring compliance with farm breaks and security processes. AWS Engineer Roles & Responsibilities: Design & implement robust storage solutions using AWS FSx for NetApp ONTAP, EFS, EBS and S3 to ensure high availability, performance, fault tolerance. Developed Infrastructure as Code (IaC) using Terraform to automate the provisioning and management of AWS storage & compute resources and serverless components (Lambda, Step functions, Dynamo DB). Automate data ingestion, transformation, and real-time processing using AWS Lambda with Python Boto3, enhancing the efficiency of data-driven applications. Deploy, manage, and scale containerized applications on AWS EKS, automating CI/CD pipelines with Jenkins and Bitbucket for streamlined code deployment. Implemented automated backup strategies using AWS FSx for NetApp ONTAP and S3, enabling data protection and disaster recovery for critical workloads. Leveraged AWS Step Functions to build and manage complex serverless workflows, integrating AWS Lambda functions for seamless automation and data processing. Led the data migration from Source to destination by leveraging AWS DMS service. Leveraged AWS IAM to enforce robust identity and access management policies, securing AWS resources and storage environments. Design, build and maintain ETL/ELT pipelines using Databricks (PySpark, Scala, SQL). Implement real-time data streaming using AWS MSK (Kafka) to facilitate high-throughput data processing and integration with serverless workflows. Configured centralized logging and monitoring for serverless applications using CloudWatch, integrating with AWS Lambda for automated event responses. Collaborated closely with AWS and NetApp product managers to introduce and implement new features. Implemented secure data migration from on-prem file systems to AWS using AWS DataSync and FSx for NetApp ONTAP. Integrated on-premises infrastructure with AWS storage solutions to enable hybrid cloud replication and seamless data flow. Implemented serverless data processing pipelines by orchestrating Lambda functions and Step Functions through Boto3. Develop and maintained ETL/ELT workflows using Databricks notebooks, jobs, and workflows. Designed and maintained high-performance file systems using AWS FSx for Lustre to handle data-heavy workflows. Integrated FSx with S3 to speed up data processing, making sure everything ran smoothly and efficiently. Automated the setup with Terraform to save time and keep things consistent. Responsible for setting up AWS Event Bridge to connect different services and automate workflows efficiently. Build and maintain Databricks jobs, workflows, and job clusters; manage job scheduling and retries. Technical Scope: AWS DMS Data Sync, Data Bricks, CloudWatch, load Balancer, NetApp (Asta Trident), Snap mirror, Flex cache, Luster, FSX, EFS, ECR, S3, Python boto3, Shell, Ansible, Event bridge, Apache Kafka. PROJECT 5: Organization: Samsung Electronics America, Plano, Texas. Client: Dish Wireless 5G, Verizon Role: Lead Platform Engineer Feb 2023 2024 July Samsung is an electronics manufacturing company popular over the globe for its products, along with electronics Samsung develop and sell its 5G/ 4G vRAN teleco solutions across the globe. Project Objective: To build & maintain on-premises VMware, Open shift labs to deploy Samsung 5G applications for testing before rolling out in customers lab/ production environment (DISH/ Verizon) Kubernetes Platform Engineer Roles & Responsibilities: Design and implement cloud-based solutions, ensuring high availability, security, and scalability for enterprise applications and products. Manage Prod, QA, and Dev environments, providing continuous optimization and cost control. Responsible for core platform engineering of critical applications and databases. Responsible for the design, installation and maintenance of CEPH clusters and integrating with the OpenShift environment. Responsible for configuring ESXI hypervisor on BareMetal servers and adding the server to vCenter. Responsible for creating VM s on vCenter and adding the cluster to Teleco cloud Automation orc. Integrated RGW with OpenShift to provide object storage for applications. Responsible for delivering the software package to customers and deploying the application on Tanzu Kubernetes Platform (VMware), ensure the application is up on running in customer environment. Deployed RAN 5G-DU applications on VMware platform, resolved deployment issues, and performed root cause analysis to effectively address underlaying problems from both Kernel & Application sides. Configured the Longhorn storage to EKS cluster for setting up high availability (PVC s) Created CEPH FS and integrated with OpenShift for applications that require shared filesystems. Designed and developed security policies and IAM rules to enforce HIPAA-compliant protection of sensitive data. Manage and Conduct SOX and PCI compliance, adhering to financial and data security standards. Experience conducting risk assessments, remediation planning and working with teams to achieve and maintain compliance. Implemented Dynatrace monitoring for end-to-end monitoring of applications and infrastructure. Created Management Zones, Port monitoring setup, Alerting profiles, Metric events, dashboards for application monitoring. Leveraged Ansible to automate the management of 5G RAN services running on DU Servers, including starting, stopping, and restarting critical network functions to reduce operational errors to ensure HA, reliability of the network. Developed Ansible playbooks for installation of One Agent across all the servers. Participate in incident response activities, troubleshoot, and resolve issues to minimize service disruptions. Aggregate and analyze logs to conduct Root Cause Analysis and work with the respective stake holders to prevent similar issues. Design and execute migration of three-tier web applications and databases from on-premises infrastructure to AWS. Led the migration from Virtual Network Functions (VNFs) to Cloud-Native Network Functions (CNFs), enhancing network agility and scalability by leveraging containerization and microservices architecture. Executed migration of SVN repositories from Team Forge to GitLab. Played a significant role in successful Physical-to-Virtual and virtual-to-virtual migrations, ensuring seamless transitions. Deployed the teleco application on AWS EKS. Installed OpenShift 3&4 on BareMetal & configured service proxy for Kubernetes (F5-SPK), F5 load balancers. Configured Astra Trident (NetApp) a storage orchestrator for containers & Kubernetes distributions to OpenShift (on-premises). Technical Scope: Dell R740, ESXI, OpenShift, AWS, Teleco cloud, 5G (DU,CU), Tanzu Kubernetes, AWS EC2, Cost Optimization, CloudWatch, Grafana, F5 load Balancer, NetApp (Asta Trident), vCenter, ECR, Harbor, EKS, Open Stack, Long Horn, S3, Python, Shell, Ansible. PROJECT 4: Organization: Tech M, Plano, Texas Client: Dish Wireless 5G Role: Platform Engineer Jul 2022 Feb 2023 DISH wireless is an American television provider building the nation s first 5G mobile network on the AWS Cloud. Project Objective: To develop and deploy the Git Ops pipelines for the ISV providers Samsung Infoblox, Mavenir, Nokia, Radom etc. and to automate the pipelines as per business requirements (One touch provisioning). Platform Engineer AWS : Roles & Responsibilities: Developed the CDK script to provision AWS Infra and integrated the CDK in the pipeline. Developed end to end code pipeline and deployed the applications on EKS for ISV providers Integrating Apache and Tomcat with CI/CD pipelines to automate deployments and streamline application updates. Written ansible roles to enable, disable & change the security config, monitoring rules on remote servers. Created Pipeline failure alerts using SNS CDK for ETL (data pipelines) for ISV providers. Written platform automation scripts using python Boto3 module to pull the keys from secrets manager and parameter store for authenticating the cross-account services. Responsible for build and deployment automation using AWS, ECS, EKS. Automated the continuous integration and deployments using Code build, Codepipeline, Ansible and AWS Cloud Templates Developed Ansible playbooks to automate 5G RAN deployments, including running & stopping services, to handle complex setups, kept configurations consistent across the network for streamlining operations. Installed OpenShift and Tanzu platforms on Dell bare metal servers to test the solution compatibility on platforms. Deployed SAMSUNG S 5G-RAN DU, CU solution in DISH productions and Labs on VMware, AWS, Open shift platforms. Configured and analyzed logs in Tomcat and Apache, and integrated tools like Nagios and Prometheus to keep servers running smoothly. SSL/TLS for secure connections, implementing role-based access controls, and safeguarding web applications against vulnerabilities Developed an end-to-end automation flow to audit existing CU infrastructure using AWS step functions. AWS Cost Optimization: Developed automation scripts to identify the orphan services which incur costs and terminate them by handling the dependencies. Developed and maintained cloud infrastructure capacity and demand forecasts incl. day to day cost monitoring and tracking of cloud capacity & resource mgmt. for all services. Recommend savings opportunities related to (Reserved Instances, Savings Plans, Spot, Dedicated, Hybrid) and track utilization of these commitments Technical Scope: Tanzu Kubernetes, Open shift, VMware, Code pipeline, AWS lambda, Dynamo DB code commit, S3, ECR, Samsung DU, CU 5G-RAN, EKS, Ansible, CloudFormation, Python Boto3, Shell, AWS-SSM, Rancher, Ansible, CloudWatch, AWS CDK, Code Pipeline. PROJECT 3: Organization: Infosys Limited, Plano, Texas Client: Applied Materials Role: IoT- Cloud Engineer AWS/ Application SME Dec-2021 Jul-2022 AMAT is the leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. Project Objective: Developed, deployed, and monitored an IoT application on AWS Cloud to oversee the lab components used in manufacturing semiconductor chipsets and wafers at Applied Materials fabs, enabling predictive maintenance to prevent component overheating after extended use. IOT/Cloud Engineer: Roles and Responsibilities: Connecting to the remote embedded hardware devices to install Raspberry Pi OS on devices and to write the python scripts on micro controller to Publish the data to AWS using notification topics. Established connections between AWS IoT Core and Raspberry Pi devices using the MQTT protocol. Creating and registering the Things in AWS IOT to gather the telemetric data from Devices. Written IOT Rules and Actions to send the telemetric data to other AWS Services like Kinesis Firehouse, Elastic Search, Lambda, DynamoDB and SNS to Store, monitor and predict the incidents. Python boto3 module is used to write the business logic in lambda to query and update the data. Kinesis firehouse is used to structure the RAW inbound data to firehouse and stored in the s3 bucket. Created Rest Api s in API Gateway service to establish the secure connection between the React UI and AWS Lambda s Configure CloudWatch Events and Cron Jobs to trigger the lambda at regular intervals of time. Deployed the application on Lambda using Code deploy and the AWS Code build, Pipeline services. Designed and Developed Pipelines in Jenkins to publish container images to cloud container registries and deploy application on Kubernetes cluster managed by EKS. Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins along with Python and Shell scripts to automate routine jobs. Created custom helm charts for deploying microservices monitoring stack ( i.e Prometheus, Grafana) on Kubernetes clusters (EKS). Configured CloudFront with Lambda Edge location using Sigv4 Certs to access the static website from nearest edge location and hosted the Smart Sense IOT application as static website on S3 bucket. Provisioned the DynamoDB tables and cloud infrastructure using Cloud Formation. Configured PING Identity provider with SAML Authentication for the Cognito users to login to the application. Established VPN connection between AMAT Corp network and AWS VPC. Written IAM Fine grain policies to AWS services to restrict the unauthorized users to access the production environment. Configured SNS Topics with AWS IoT Core and Raspi devices to alert the business lab team to get the status of the hardware components. Encrypting the static website and telemetry data in S3 bucket using SSE-KMS Configured CloudFront with lambda Edge using SIGV4 Certificates to deliver the application with zero latency. Created Cognito user pools and configured the rules to every pool that are accessing the application. Technical Scope: Python Boto3, Secret Manager, SSE-KMS, Code Commit, S3, Kubernetes (EKS), CloudFront, Lambda Edge, Code Deploy, SNS, IOT CORE, IOT Rules & Actions, Kinesis Firehouse, Elastic Search, VPC, VPN, API Gateway, Cloud Formation, DynamoDB, Secret Manager, CloudFormation, Nat Gateway, AWS Cognito, React JS. PROJECT 2: Organization: Infosys Limited Client: Global Foundries Role: Cloud Migration Engineer AWS/ Low code Developer (Medix) Sep-2019 Dec-2021 Global Foundries is one of the leading semi-conductor manufacturing foundries based out of USA, Germany & Singapore. As part of cloud migration, Cots, Java, SVN .NET web applications are migrated to AWS. The scope of work includes the re-architect, re-factor, lift and shift of the fleet of applications into the AWS environment. Project Objective: To migrate, re-factor and re-architect the on-premises infrastructure on to Amazon cloud and to deploy on AWS Cloud & Develop the applications on low code platform. Cloud Migration/ Low code Roles and Responsibilities: Analyze and gather business requirements from the client like existing on-premises client infrastructure details, middleware, third-party integration components and their existing framework compatibility with cloud. Provisioned the High Availability cloud infrastructure using cloud formation. Upgraded the software version to the latest and Installed Message Queue (IBM MQ Enterprise), MySQL Enterprises edition, dotnet framework, Oracle enterprise edition, IIS Server and its dependent packages in App, Web and DB Servers. Configured Highly Available active- active- and active-passive clusters for Web. App and DB servers using server machine keys. Configured Amazon FSX for data synchronous replication between active-active DB clusters. Provisioned CI CD pipeline for deploying applications on servers using code build, Code commit and Code pipeline services. Migrated on-premises data in DB servers to AWS RDS servers via Schema Conversion tool and DMS. Migrated on premises Cots, Java and Dotnet apps to AWS cloud. Created linked DB servers to connect: a) Ec2 win 2019 operating system with MS SQL 2016 to RDS MS SQL 2016 server via SQL server studio manager (SSMS). b) Linked RDS MS SQL 2016 server with ec2 windows 2019 operating system with MS SQL 2016 via T- versa of step 1. c) Linked ec2 MS SQL server to another ec2 MS SQL server via SSMS and DBA. d) Linked ec2 Oracle 11g server to RDS MS SQL 2016 server via SQL developer and ORCL ODBS providers. e) Linked RDS MS SQL2016 server to ec2 Oracle11g via dB links and stored procedures. Created domain controller (LDAP) server in AWS cloud and allowed the app to connect with users SAML (Tivoli Identity Federation). Performed High-availability load test on migrated apps by stopping one server and vice-versa to other. Configured AWS Batch service to trigger jobs at regular intervals. Migrated tasks from on-premises windows server to cloud windows server and configured task scheduler to trigger the jobs in regular interval. Configured cross cloud SSO, where user login to AWS cloud can also login to Azure without any authentication. MENDIX CLOUD: Developed applications on Mendix platform by integrating external relational databases of type PostgreSQL as DB. Integrated AWS with Mendix to access the objects stored in S3 bucket. I.e. Applications on Mendix platform should access the S3 bucket to store their object data. Involved in supporting cloud instances running Linux and Windows on AWS, experience with Elastic IP, Security Groups and Virtual Private Cloud in AWS Extensive experience on configuring Amazon EC2, Amazon S3, Amazon Elastic Load Balancing, IAM and Security Groups in Public and Private Subnets in VPC and other services in the AWS Managed network security using Load balancer, Auto-scaling, Security groups and NACL Create, assess, update and maintain documentation pertaining to PCF platforms Extensively worked on Jenkins CI/CD pipeline jobs for end-to-end automation to build, test and deliver artifacts and troubleshoot the build issue during the Jenkins build process Technical Scope: Kubernetes, docker, Linux, Quality Center, Java/J2EE, DB2, Web Sphere, Windows, Load Runner, Oracle, SQL, PL/SQL, MS Excel, MS Office, EC2, AWS, ELB, Terraform, CloudFormation, VPC, VPN, S3, EFS, FSX, DMS, SMS, SKT, TGW, SSO, Terraform, Red Hat Linux, Git, Jenkins, Code Commit. PROJECT 1: Organization: Archeplay.com, India, Bengaluru. Oct-2015 Sep-2019. Role: Cloud Automation Engineer (AWS Server Less) / DevOps AWS Archeplay is a cloud-based application development organization. Project Objective: Archeplay.com is a unified platform, which removes the overhead of end users to design, & deploy their application on public cloud [AWS], with, Archeplay.com end users can simply take their application to live with simple click of button. Roles and Responsibilities: Responsible for Creating Cross Account IAM Roles to Get Authorization for Provisioning Resources on AWS Accounts. Experienced in Creating Amazon Machine Images [AMI], which in turn used by Autoscaling Groups to Launch or Terminate Instances. Experienced in creating multiple VPC s and private public subnets as per requirement/architecture and distributed them as groups into various availability zones of the vpc. Created NAT gateways and instances to allow communication from the private VM s to the internet through bastion hosts. Created and deployed APIs in API Gateway to trigger AWS Lambda functions, utilizing security groups, internet gateways, and route tables to establish a secure network environment in the AWS public cloud. Created and configured elastic load balancers and auto scaling groups to distribute the traffic and to have a cost efficient, fault tolerant and highly available environment. Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application. Created CloudWatch Dashboards to Monitor the AWS Services Metrics and Alarms at One Place. Configured S3 buckets with various life cycle policies to archive the infrequently accessed data to storage classes based on requirement. Written cloud formation templates in Json to create custom VPC, subnets, NAT and other infrastructure services to ensure successful deployment of web applications. Implemented DNS through route 53 to have highly available and scalable applications. Created EBS volumes for storing app files for use with EC2 when the existing disk is out of memory. Created snapshots to take backups of the volumes and images to store launch configurations of the EC2. Written templates for AWS infrastructure as a code using Terraform to build staging and prod env s. Written Step functions automation code to automate the Lambda functions. Implemented CI/CD using Code Build and Code Commit from scratch. Created Hosted Zone in Route53 for domains and Added SSL/TLS Certificates to domain to encrypt the requests and verified domains using SES For sending e-mails. Technical Scope: Docker, ECS, Step functions, Secret Manager, ACM, IAM, Batch, CloudWatch, Cloud Trail, SQS, EC2, ECS, EKS, ELB, CloudFormation, Terraform, Lambda, DynamoDB, API Gateway, Route53, VPC, S3, TGW, SSO, Amazon Linux, RHEL, Git, Code Build, Code Pipeline Jenkins/Maven, Shell scripting, Python Boto3, Json forms. Keywords: cprogramm continuous integration continuous deployment quality analyst user interface message queue javascript sthree database information technology ffive fiveg fourg microsoft mississippi procedural language |