Bonsai is hiring a Senior Platform Engineer to help build, scale and support the underlying technical platform that help us manage thousands of Elasticsearch clusters on AWS and GCP. This is a 100% remote full-time position. Salary ranges from $120 to $130k, based on experience.
About the job
“Hey, we’ve put your add-on in production. Good luck. Don’t crash.” —Heroku
The essence of platform engineering at Bonsai will be to operate and support Elasticsearch at scale. The emphasis here is more on the scale part than the Elasticsearch part, but you’ll definitely become intimate with Elasticsearch and Lucene along the way.
There are several key components involved. First we have Elasticsearch itself. Then a handful of proprietary plugins to enhance its functionality and support its operation. From there, the networking stack that handles connections and does diagnostic tracing. Telemetry and observability across the board. Finally our packaging and deployment, and internal services for fleet orchestration.
If that sounds like more than one person’s job, we agree. Your future colleague Dan is going to be particularly stoked to work with you.
You can think of this similar to a “SRE” position. When there’s an issue with performance or reliability, you’ll dig in and trace requests and analyze from load balancers down to memory managers, and help code and ship a patch to make it visible, and make it better.
There’s a heavy dose of Java and Linux involved in all of this, but if you have some experience in systems programming in other languages, we can certainly teach all of that.
We’re a small team, but we punch above our weight in systems engineering and operations. Launching at the right place and time dropped us into the deep end of early adopters, and we’ve been scaling ever since. Fortunately our early team was heavily engineering minded. Our original founder was previously a database engineer at Twitter when they went through their years of crazy scaling. We also hosted some massive sites like Pinterest whose 100x growth on our platform was a true trial by fire.
This position does involve wearing the metaphorical pager in a rotation with other engineers on the team. We’re on call not because we expect to be woken up, but so we’re accountable to shipping systems that never need us to!
Some example projects
- Moving decentralized server-initiated threshold alerts into a centralized time-series stream analysis service.
- Building a continuous delivery service that performs gradual fleetwide rollouts of new and updated services, subject to canary stages and operational verification at certain checkpoints.
- Build and package new versions of Elasticsearch OSS, and update our suite of plugins to use the latest plugin interfaces, including customer-supplied proprietary plugins.
- Troubleshoot a customer-supplied Elasticsearch plugin with a performance hot-spot, trace the problem to a likely location and provide support and guidance to improve efficiency.
- Diagnose a server-side agent as having problematic memory usage, and port it from Ruby to Crystal to improve performance and resource usage.
- Collaborate with Product engineers to build a data pipeline to support customer-facing metrics graphs.
- Assist our customer support by triaging operational incidents and performing incident response.
The ideal engineer
We’re looking for someone experienced, who’s ready to dive in. You don’t need to be an Elasticsearch expert — you’ll learn all of that on the job. We’ll have plenty of conversations about how Lucene is really a data structures library optimized for disk access.
Experience with Java is more helpful, although C, C++ and Golang would be a good starting points. We’ll also be looking for solid fundamentals in networking, disk access, memory management, and schedulers.
Several of our systems make heavy use of Netty, as does Elasticsearch itself. So familiarity with Netty or evented systems will be helpful.
How we work
One More Cloud (OMC) is small, remote-first, and team-conscious.
OMC has always been comprised of a small team. As such, each of our colleagues wears many hats. We lack middle managers and dedicated Project Managers that slice and dice out work; OMC managers are also contributors. They serve as a sounding board and for coaching higher level project and career questions. Everyone is expected to manage projects together. So, our team works best with individuals who take responsibility for their to-do lists, and those who enjoy working collaboratively with teammates to plan out projects and don’t shy away from offering their opinions.
OMC has been a remote-first company from day one, and we have a lot of experience in managing and communicating across multiple timezones. One of our key ways of getting focused work done is eliminating too many meetings and video calls by writing out our thought processes, documenting steps we take, and sharing it with the team. Our ideal teammate should be comfortable with and undaunted at writing clear and logical longer-form English prose on a regular cadence.
We are team-conscious. Yes, we have a company hierarchy based on skills and the level of risk a position incurs (like responsibility for servers that have big costs attached to them). However, the onus is on every individual at OMC (regardless of hierarchy) to create a culture that allows a space for creativity, honesty, and autonomy for everyone who joins OMC. We don’t look for team heroes or martyrs but rather strive to create healthy and realistic team responsibility. We collaborate best with those who are considerate of their teammates, respect boundaries, and are dedicated to pursuing our work with curiosity, respect, and optimism.
Benefits for working at OMC include:
- Medical and Dental Insurance
- 40 hour work week. We practice healthy work and life boundaries.
- Work where you want. We're a remote-first company, and have been from day one since 2009.
- 401k, with company contributions
- Wellness allowances
- Annual continuing education allowance
- Paid parental leave