Home

Surya Kiran - Devops SRE
[email protected]
Location: Columbus, Ohio, USA
Relocation: YES
Visa: H1B
SURYAKIRAN REDDY PURAMANI
SRE & DevOps Engineer
+1 732 338 8535
[email protected]
https://www.linkedin.com/in/suryakiranreddy-puramani

PROFESSIONAL SUMMARY:
Overall 10+ years of experience in the field of IT which includes DevOps, Observability & Monitoring, Cloud Engineering, and Linux System Administration & Automation with proven success in automating, building and deploying process for operational excellence.
Responsible for ensuring the reliability, scalability, and performance of large-scale, cloud-based infrastructure, and the applications/services it supports, through Service Level Objectives (SLOs), metrics/Service Level Indicators (SLIs) and continuous operational process improvements.
Responsible to develop and maintain internal tools to automate processes related to system reliability. This includes building infrastructure tools and automating tasks to reduce manual intervention.
Responsible for managing incidents, preparing disaster recovery plans, and responding to on-call incidents to maintain production stability.
Proficient in setting up and configuring Splunk Enterprise & Cloud environments for monitoring and alerting.
Responsible for writing efficient SPL( Search Processing Language) queries to filter, search, and analyse large volumes of data in Splunk.
Proficient in a wide range of AWS services including EC2, ELB, EBS, Security Groups, VPC, ECS, S3, VPC, IAM, SQS, Route53, RDS, DynamoDB, Lambda, Cloud Watch, Storage Gateway, Cloud Formation and Autoscaling.
Expertise in setting up and maintaining CI/CD pipelines using Jenkins, GitLab CI to streamline application deployment processes.
Skilled in using automation tools such as Ansible, Chef, and Puppet for configuration management and infrastructure provisioning.
Experienced in containerizing applications using Docker and orchestrating them with Kubernetes, ECS, and EKS.
Responsible Incident handling and assisting Tier 2 support with after action reports such as Root Cause Analysis Reports, to include researching potential incident/problem mitigation and documenting the troubleshooting steps.
Expertise in using build tools like MAVEN for the building of deployable Artifacts such as Jar & War from source code.
Proficient in Shell Scripting languages like Python, and Bash for automation and task management.
Adept at using Terraform for Infrastructure as code (IaC), ensuring consistent and repeatable deployments.
Highly efficient in working on configuring Cross-Account deployments using AWS Code Pipeline, Code Build and Code Deploy by creating Cross-Account Policies & Roles on IAM.
Proficient knowledge with Helm charts to manage and release helm packages.
Strong knowledge of version control systems such as Git, GitHub, and Bitbucket for collaborative development.
Implemented robust monitoring and logging solutions using AWS Cloud Watch, ELK Stack, and Prometheus with Grafana to ensure system reliability and performance.
Experience in provisioning, maintenance, hardening, load balancing and scaling of Web Servers like Nginx and Apache.
Experience in maintaining serverless applications using AWS Lambda and DynamoDB.
Experience in Linux administration tasks like managing disk & memory resources, managing processes, User and group administration, setting up and managing SSH keys, Managing File systems, File permissions.
Experience in setting up and expanding RAID volumes, Creating and managing LVM s, troubleshoot networking issues - Packet capturing & analysing etc.
Experience in Installation, configuration of Patches and Packages using RPM and YUM.
Good Understanding of Linux Networking concepts such as Firewalls, Ethernet, IP, TCP, UDP and OSI Model. Experience working with Netstat, Dig, Traceroute, Port Forwarding, Nmap and RSync.
Hands-on experience on configuring and maintaining Enterprise Linux/Unix servers and System Hardening.
CORE SKILL SET:
Cloud Platforms: AWS & GCP
Configuration Management: Ansible, Chef, Puppet
CI/CD Tools: Jenkins, Argo CD, GitLab CI.
Containerization: Docker, Kubernetes, ECS, EKS
Scripting Languages: Bash, and Python
Version Control: Git, GitHub, Bitbucket
Monitoring & Logging: Splunk, Datadog, AWS CloudWatch, Prometheus with Grafana, and ELK.
Infrastructure as Code: Terraform
Operating Systems: Linux (Ubuntu, CentOS, Red Hat), Windows
Programming: C, Python, C-Linux, and YAML.
Project Management Tools: Service Now, and JIRA.
Databases: MYSQL, Amazon Aurora and DynamoDB.
IDE Tools: VS Code, AWS Cloud9 & Eclipse.

CERTIFICATIONS:
AWS CERTIFIED DEVOPS ENGINEER Date of Issue: OCT,02,2021
VALIDATION NUMBER: ZP5XM44JEJQ41HSP Date of Expiry: OCT,02,2024
GOOGLE CLOUD CERTIFIED CLOUD ENGINEER Date of Issue: JAN,27,2024
VALIDATION NUMBER: 0F9EAE9C2B564E7FA2C88B8EDC8076C7 Date of Expiry: JAN,27,2027

ACADEMIC RECORD:
BACHELOR OF TECHNOLOGY (ELECTRONICS & COMMUNICATION ENGINEERING) GRADUATED, APRIL 2014
JNTU-HYDERABAD, T.S-INDIA MARKS (%) 76%
PG DIPLOMA IN CLOUD COMPUTING GRADUATED, JUNE 2024
GREAT LEARNING CGPA A++

PROFESSIONALEXPERIENCE:
PROFESSIONALEXPERIENCE:

BRISTOL MYERS SQUIBB (BMS) NJ
Sr. SRE Engineer
Sept 2024 - Present
June 2022 - August 2024
Roles & Responsibilities:
Designed and implemented scalable, secure, and highly available AWS infrastructure using Terraform.
Developed and maintained CI/CD pipelines using Jenkins to automate application deployment and infrastructure provisioning and Ansible for Configuration Management.
Responsible for writing custom Python Scripts to monitor and alerting solutions using psutil library in Python.
Responsible for CI/CD pipeline Automation using Python scripts.
Extensively used Ansible for Configuration Management of Resources, Installing software, configuring services and manage system setting using Ansible Playbooks.
Responsible for ensuring the reliability, scalability, and performance of large-scale, cloud-based infrastructure, and the applications/services it supports, through Service Level Objectives (SLOs), metrics/Service Level Indicators (SLIs) and continuous operational process improvements.
Responsible to develop and maintain internal tools to automate processes related to system reliability. This includes building infrastructure tools and automating tasks to reduce manual intervention.
Responsible for managing incidents, preparing disaster recovery plans, and responding to on-call incidents to maintain production stability.
Responsible for code analysis, code build and release activities as a part of Continuous Integration and Deployment Projects by using GitHub, Jenkins, SonarQube, CI/CD pipelines.
Managed containerized applications using Docker and Kubernetes, ensuring efficient resource utilization and high availability.
Implemented monitoring and logging solutions using Cloud Watch, and SPLUNK to ensure system reliability and performance.
Collaborated with development teams to optimize application performance and troubleshoot issues in production environments.
Conducted regular security assessments and implemented best practices to ensure compliance with industry standards.
Writing hardening scripts in Ansible to secure the servers and automating repetitive tasks.
Wrote shell scripts for provisioning the resources in multi region and deploying the applications in Multiple Kubernetes clusters.
Provisioning and maintaining High availability Kubernetes clusters in AWS.
Building servers using AWS, importing volumes, launching EC2, RDS, Creating security groups, auto-scaling, load balancers (ELBs) in the defined virtual private connection.
Creating EC2 Image building recipes to create the latest AMI s and distributor it through different regions
Creating RBAC roles and role-bindings for different Users.
Making automated Jenkins jobs to take the backups of databases, integrating Ops-genie and Slack channels with the Jenkins for failure management.
Creating automated scripts to sync the data between multi region buckets.
Troubleshooting and fixing IAM permissions for developers to work in their local machines with remote clusters
Integrating Datadog with Kubernetes, creating monitors for high memory usage, low disk space, container restarts and sending the alerts to slack channels.
Provisioning ELK cluster and monitoring the logs for multiple environments.
Collecting the business requirements from developers, product owners and architecting the solution with support of colleagues.
Responsible for code analysis, code build and release activities as a part of Continuous Integration and Deployment Projects by using GitHub, Jenkins and Circle-CI.
Writing automation scripts in Python for generating the terraform files and creating the Jenkins jobs
Performing disaster recovery (DR) exercises, maintaining updated Technical Review Guides (TRG), and making documentation for SOC audits.
Writing scripts to automate the process and improve efficiency as well as platform availability using Terraform.

AMERICAN AIRLINES TX Apr 2017 - June 2022
Cloud DevOps Specialist
Roles & Responsibilities:
Initial setup and ongoing management of custom AWS VPC and Security Groups specific to product region, environment and Lamp stack.
Migrated on-premises applications to AWS, ensuring minimal downtime and seamless integration with existing systems.
Automated infrastructure provisioning and configuration management using Ansible and Chef.
Developed and maintained CI/CD pipelines using Jenkins and Travis CI to streamline application deployment processes.
Implemented monitoring and alerting solutions using CloudWatch and Splunk to proactively identify and resolve issues.
Collaborated with cross-functional teams to design and implement disaster recovery and backup solutions.
Provisioning servers, maintenance, hardening and performance tuning of Nginx Web server using AWS EC2 instances for development, testing and Staging/Production environments.
Creating virtual hosts and deploying SSL certificates in AWS EC2 for domain and DNS propagation at domain Registrant.
Experience in Building servers using AWS, importing volumes, launching EC2, RDS, Creating security groups, auto-scaling, load balancers (ELBs) in the defined virtual private connection.
Responsible for code analysis, code build and release activities as a part of Continuous Integration and Deployment Projects by using GitHub, Jenkins, SonarQube, CI/CD pipelines.
Responsible for database security, user creation and access management for multiple applications to enable better troubleshooting by adding row level policies for tables.
Responsible for cost optimization and troubleshooting of Amazon Web Services.
Writing scripts to automate the process and improve efficiency as well as platform availability using Terraform, Ansible and Jenkins.
Architecting serverless applications using AWS Lambda, Cognito, DynamoDB.
Creating Lambda jobs for exponential backups of stateful AWS EBS volumes.
Build, configure and manage Kubernetes cluster using Terraform.
Creating virtual hosts and deploying SSL certificates in AWS EC2 for domain and DNS propagation at domain Registrant.
Responsible for creation, maintenance, hardening and performance tuning of AWS RDS instances with PostgreSQL and Elastic Cache instances with Redis.

TELSTRA-CDW SYDNEY, AUS Apr 2014 - Apr 2017
Linux System Engineer
Roles & Responsibilities:
Responsible for providing assistance in administration, troubleshooting, automation, configuration, installation, deployment, maintenance, upgrades for high availability design projects.
Created, extended, reduced and administer of Logical Volume Manager (LVM) in RHEL environment.
Worked with database team to resolve performance issues, database capacity issues, replication, and other distributed data issues.
Updated & running the various source code for migration & updating follow-up the release management.
Worked on upgrades without downtime using yum and as well as setting servers locally for upgrades.
Participate in an on-call rotation to provide 24x7 troubleshooting and Infrastructure support.
Upgrading, updating and packaging the RedHat and Debian Linux distributions using YUM, RPM and APT package managers.
Experience Installing, configuring, securing, and troubleshooting CentOS, RHEL, or Ubuntu Linux Administration.
Experience in installing, configuring and managing the Apache Web server.
Responsible for setting up CRON jobs scripts on QA and Pre-Production servers.
Responsible for taking regular and periodical backups using Linux command line utilities like RSYNC.
Strong knowledge and experience in writing, modifying and running Shell Scripts to automate Day to Day administration task and schedule jobs on various Linux machines.
Experience in backend solution, day to day system admin and monitoring, file system management and disk management and creation of shell scripts for the automation of tasks.
Strong knowledge and Network experience with the following: TCP/IP, NIS, NFS, DNS, DHCP, FTP/TFTP, SSH, SFTP, ARP.
Experience and worked on installing multiple Linux distribution and configuring the systems installing packages and dependencies.
Experience in User management that includes Creating single and multiple users, setting up home directories, passwords, leveraging the sudor privileges and assigning permission to files and directories.
Experience in Controlling access to files and directories with Linux file system permissions
Performed Linux system monitoring and processes using utilities like TOP, HTOP, PS.
Experience in creating and setting up the CRON jobs to schedule automation for repetitive task.
Worked on file transfers on local and remote machines using FTP, NFS.
Solid Understanding and worked on Linux monitoring and worked on monitoring the Linux activities to avoid performance issues that includes CPU Utilization, Memory Utilization, Disk Utilization, Swap Space, Networking monitoring.
Worked on implementing regular Patches and installing latest packages on the Linux systems to maintain system integrity. Strong knowledge on system upgrades including Hardware, Operating system, and periodical patches.
Involved in Rotational on Call and provided support in terms of Systems maintenance etc.
Create, manage, and delete user accounts and permissions. Ensure proper access control and security for user accounts.
Install and configure Linux operating systems, including back-end databases and scripts.
Set up and configure hardware and software components to meet organizational requirements.
Provide day-to-day Tier 3 level operational support including responding to incidents, troubleshooting, and executing corrective action.
Assisting Tier 2 support with after action reports such as Root Cause Analysis Reports, to include researching potential incident/problem mitigation and preventive measures and assisting with putting those measures in place.
Keywords: cprogramm continuous integration continuous deployment quality analyst sthree information technology New Jersey Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];4598
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: