| Pratibha Chaurasia - Production Support SRE |
| [email protected] |
| Location: Phoenix, Arizona, USA |
| Relocation: Yes |
| Visa: H1B |
| Resume file: Pratibha Chaurasia_ Application Support Engineer _ SRE-3 - Copy_1760370475543.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Professional Experience:
Highly skilled Application Support Engineer, having almost 10 years of experience in designing, implementing, and maintaining resilient and scalable infrastructure solutions Proficient in bridging the gap between operations and development, implementing best practices, and providing 24/7 support Setup monitoring and alerting systems to detect issues proactively Experience into creating Automations scripts for testing environments Excellent analytical, decision-making, and communication skills with a focus on delivering exceptional reliability and performance Domains Health & Insurance, Banking, Telecom, Entertainment, Retail. Skill Set & Tools: Monitoring Tools: Airflow, Splunk, ELF, AppDynamics, Dynatrace, Elasticsearch, Grafana, Prometheus Middleware / File Transfer Tools: Secure File Transfer (SFT), Apigee API Gateway Operating Systems: Linux, Windows Cloud Technology: AWS, Azure SCM tools: Git Continuous Integrations: Jenkins, TeamCity, GitHub Action Containerization and orchestration: Dockers & Kubernetes Automation, IAC tool: Terraform, Ansible Bug Tracking Tools: Jira DevSecOps: SonarQube Project Management / ISTM Tool: Service Now, Pier Automation & Reporting Tools: Splunk, PowerBI Job Scheduling: Autosys, Airflow Incidence Response: PagerDuty Programming & Scripting Language: Bash, Java Methodologies: Agile, SaFe Agile, Waterfall, SDLC Leadership skills: Work with IT team to formulate, define and implement strategies and plans to promote and customize company s IT services. Work with client to understand the requirements and converting business requirements into functional and technical requirements. Possess strong commitment to team environment dynamics with the ability to contribute expertise and follow leadership directives at appropriate times. Working closely with each team member on active tasks or blocker, and product owner on the release pipeline. Contributed to the creation of an on- call rotation schedule, ensuring 24/7 system availability with minimal disruptions. Excellent written and verbal communication skills. Educational Qualification: Bachelor of Technology from UPTU University, India Certifications: Multi Cloud Network Associate- Aviatrix Certified Engineer. AWS- Certified Cloud Practitioner. Work History- American Express: Title: Engineer 2 Nov 2024 August 2025 Owned and led support for Secure File Transfer Protocol (SFT) applications, acting as the primary point of contact for developers and business teams, ensuring smooth coordination and delivery. Supported API-based traffic flows across internal and external systems using Apigee API Gateway, monitoring request/response logs, analysing traffic patterns, and ensuring high availability. Managed API security and certificate lifecycle (one-way and two-way SSL/TLS), including monitoring for expiry, assisting consumers with certificate renewals, and validating secure connectivity for customer-facing and internal APIs. Provided flexible 24x7 support, including off-shore timing activities such as release calls, development calls, and performing testing & validation of new SFT application changes. Supported and resolved Tier-0 outages for business-critical applications, coordinating with multiple stakeholders, driving quick recovery, and minimizing business impact. Successfully executed Disaster Recovery (DR) exercises for two SFT applications, ensuring business continuity and validating recovery processes under outage scenarios. Worked extensively with Apigee API Gateway, ELK/Splunk, and developed dashboards to improve observability and real-time monitoring of application traffic and system health. Designed and implemented a POC using Prometheus and Grafana, driving adoption of proactive monitoring and setting observability standards for SFT metrics. Automated incident reporting using Power BI, reducing manual dependency and improving management-level visibility into incident trends. Directed the resolution of JIRA tickets for incidents, service requests, and change management, collaborating with cross-functional teams to ensure timely resolution within SLAs. Coordinated certificate renewals and validations, ensuring secure and uninterrupted system communication. Provided leadership during critical incidents through on-call/bridge call support, driving root cause analysis, reducing downtime, and ensuring client confidence. Led release calls and development calls with on-shore/off-shore teams, ensuring smooth coordination, validation of changes, and successful go-live activities. Acted as a mentor and guide for junior engineers during incident resolution and API troubleshooting, improving team effectiveness and knowledge sharing. Exposure to GenAI applications, collaborating with internal teams to support next-generation AI-driven solutions. Upskilling & Professional Development- Currently preparing for Terraform certification and AWS Certified Solutions Architect Associate, strengthening cloud and infrastructure expertise to align with modern DevOps/SRE practices. Albertsons: Title: Application Support Engineer/SRE/Devops Nov 2023 October 2024 Roles & Responsibilities: Partnered with application teams to establish and enforce SRE standards, SLIs/SLOs/SLAs, and operational policies, driving reliability and consistency across environments. Designed and implemented observability solutions (Grafana, dashboards, KPIs, alerts), enabling proactive detection and resolution of incidents, reducing MTTR. Created and maintained Grafana dashboards to visualize application metrics, infrastructure performance, and API traffic in real time. Built and customized Kibana dashboards to visualize log data, application metrics, and API traffic patterns, improving real-time monitoring. Analyzed logs stored in Elasticsearch via Kibana for root cause analysis (RCA), error tracking, and incident investigation, reducing MTTR significantly. Automated infrastructure provisioning and deployments across AWS & Azure using Terraform, Ansible, Kubernetes (EKS/Helm), and CI/CD pipelines, minimizing manual effort and configuration drift. Supported high-availability MongoDB environments, optimizing performance and ensuring data reliability. Led risk assessments, change management initiatives, and release activities, improving service stability and reducing deployment failures. Integrated ITSM tools (ServiceNow, Jira Service Management) for request tracking, change peer review, and SLA adherence. Conducted cloud cost analysis and capacity planning for observability clusters/resources, optimizing spend while ensuring system scalability. Mentored and guided junior engineers, fostering a culture of automation, reliability, and continuous improvement. Additional BA operations support provided to team- Work closely with business stakeholders to understand their needs and objectives. Conduct document analysis to gather functional and technical requirements. Break down high-level business requirements into detailed, actionable tasks for the development team, and prioritize them based on business goals and technical feasibility. Develop documentation such as business requirements documents (BRD), functional specification documents (FSD), and system requirements specifications (SRS). Prepare detailed user stories or use cases that outline specific business scenarios and map them to technical solutions. Create visual models such as process flow diagrams, use case diagrams to communicate system interactions and workflows. Created and maintained documentation, process flows, swim lane diagrams. Communicate technical requirements and progress to business stakeholders in a way they can understand, and conversely, explain business needs to the technical team. Provide regular updates to stakeholders, ensuring they are informed about project progress, risks, and issues. Evaluate existing systems and processes to identify areas for improvement or optimization. Assist in troubleshooting and resolving any issues that arise after implementation. Coordinate and participate in Non-Prod and Prod validation, helping end users verify that the system meets their needs. Performed Cloud cost analysis for Observability Cluster/ resources. T-Mobile: Title: Operations SRE April 2022 October 2023 Primary Roles & Responsibilities: Managed and responsible for end-to-end BOOST to DISH Migration system. Used Autosys for Job scheduling. Created and maintained scripts for System checks. Creating runbooks on confluence for various RCA, disaster recovery and mitigation plans. Supporting project release activities. Developing and maintaining Power BI dashboards for monitoring migration rate. Responsible for Defects resolution and leading the team s defect and triage cycle. Used Snowflake for data retrieve and manipulation activities. Led incident response and RCA thereby enhancing systems scalability. Spearheaded the design and implementation of monitoring & alerting systems, reducing downtime. Conducted regular system shakeouts, Disaster recovery plans, thereby ensuring systems stability. Provide weekly reports to client detailing project progress, roadblocks, and Mitigation Plan. Report any technical problem affecting project completion, and work with development team and co-ordinate required support for problem resolution. Presenting monthly project status, roadblocks, issues, mitigation plan and Project Dashboards to stake holders and senior leadership. Conducting daily status meetings (stand up calls), sprint planning, Backlog tracking and task prioritization. Keywords: continuous integration continuous deployment business analyst artificial intelligence business intelligence information technology golang Colorado |