This is a remote position.
Interested in helping us change the world of payments forever? The Stellar Development Foundation (SDF) is looking for a talented, experienced, and hands-on Site Reliability Engineer to join our team. In this role, you will be ensuring the reliability of our services, building infrastructure to enable our team's production and testing environments, and greasing the rails of our systems to ensure they're robust, efficient, and easy to deploy.
- Maintain, improve, scale and secure our AWS infrastructure and Ubuntu Linux systems.
- Assist our development teams in running, packaging, deploying and troubleshooting applications
- Work with developers on streamlining deployment processes with Jenkins and other tooling
- Maintain, monitor and improve our Kubernetes clusters.
- Work with development teams on migrating applications to Kubernetes.
- Be responsible for maintenance and improvements to multiple internal services, for example Kubernetes, Prometheus, ELK and LDAP.
- Monitor, triage and respond to alerts in our 24/7/365 environment.
- Participate in design and code reviews, and ensure that the foundation for our services is best in class.
- Evaluate new technologies, design and implement as appropriate.
- Identify automation opportunities and implement by creating custom or by using off the shelf solutions.
- You have 3+ years of experience of working in cloud-based systems operations, as a Linux systems administrator, SRE or DevOps engineer.
- You’re very comfortable with Linux command line
- You're a natural at troubleshooting and debugging - no issue is impossible to solve.
- You have a good understanding of computer networking, TCP/UDP, load balancing, distributed computing, web services, and the fundamental protocols used by the internet (HTTP, HTTPS, DNS, etc.).
- You have experience supporting production workloads and are familiar with monitoring concepts and tooling. You’re able to take part in an on-call rotation
- You're proficient in at least one scripting language and you are familiar with a few (Ruby, Perl, Python, Bash, etc.).
- You have first-hand experience with configuration management tools (Puppet, Chef, etc.), preferably Puppet.
- You're always willing to do what it takes to help your teammates - especially in stressful situations.
- You're enthusiastic about working in a small, growing team, you are open, empathetic, and care about putting the best ideas forward in a collaborative and helpful manner.
- You can work independently and are able to deliver results without supervision
Nice to have
- Familiarity with Docker and Kubernetes
- Experience with Prometheus and Grafana
- Experience with AWS
- Ability to understand Go, C++ and TypeScript source code
- Experience with CI pipelines and Jenkins
- Deb or RPM packaging experience
Why work for us
- You’ll have a lot of autonomy in the team
- You’ll work with kubernetes in production and we’ll help you get up to speed if needed
- You will be able to make visible impact quickly and will have a strong influence on the team’s direction, tooling, processes and technology choices
- You will work on many open source projects that aim to improve financial inclusion on a global scale