Production operations team is responsible for making sure technical operations at Zapr are up and running 24x7. The team manages and monitors the Service Level Indicators and Objectives for various mission-critical apps and make sure that the businesses are not disrupted in case of issues or failures. The team would also be responsible for tracking the agreed objectives and presenting weekly and monthly reports on the health of various systems.
As the Production Engineer, the candidate will be responsible for managing and maintaining a scalable, fault-tolerant, high performing distributed infrastructure – at ZAPR, we’re talking incredible scale. They would closely monitor, track and diagnose system health and performance using a variety of tools and diagnostics.
The candidate should be able to plan, create and maintain the infrastructure required for the deployment of the software stack. This would involve tracking all the Service Level Indicators and Objectives for each service and working with DevOps and other development teams in case they go below agreed thresholds.
The candidate should have a degree in Computer Science or IT, with prior experience handling scalable systems along with strong troubleshooting skills. The candidate should have a good understanding of Linux system and various monitoring tools. Ideal candidate would have good documentation skills to manage and maintain run books for various scenarios.
The candidate should be comfortable working in shifts with rotating on-call roster.
Roles & Responsibilities: