We are looking for an SRE to join our team. As an SRE you’ll be maintaining our Kubernetes cluster as well as our different applications and services, you’ll be someone with a deep interest in system stability and reliability.
The successful candidate will have experience with automation of infrastructure tasks, and the development of solutions (primarily in Go) necessary to enable their role, fix issues and improve efficiency and reliability across multiple services.
On a day to day basis you will be deploying applications (infrastructure as code/GitOps), thinking about and then implementing improvements to existing processes, as well as writing code for new ones. In addition you will be responding to the rare Prometheus alert, writing Prometheus rules, co-ordinating with development teams to enable useful instrumentation, creating Grafana dashboards, and managing our fleet of servers and systems.