About the role
Kentik, the leading Network Analytics platform, is looking for an experienced Site Reliability Engineer to join its SRE team. The SRE team is responsible for architecting, building, monitoring and maintaining the infrastructure and services that the Kentik platform runs on top of.
We operate a well-organized, well-instrumented platform, with an emphasis on results rather than process. We’re expanding fast and offer enormous opportunities for employee growth.
We’re looking for a seasoned, self-driven SRE with an eye for detail, ability to collaborate in a global company, ready to take Kentik to the next level by utilizing their experience and knowledge.
Due to our oncall schedule (follow-the-sun), working hours in North American time zones is a requirement for this position.
What to expect
- A truly remote, global team and company across many countries and many time zones.
- A real-time, scalable, microservices-based infrastructure, running on free software across multiple locations and all major cloud vendors.
- Deep-diving into diverse topics, from NetFlow and IP routing, to database replication strategies or HTTP optimization.
- Contribute code, code reviews and tools or patches to all kinds of existing code.
- Take part in incident handling, compiling postmortems, RCAs, providing input at all stages.
- Provide valuable feedback on team goals, projects, and processes. We believe in continuously improving our team.
- Write design documents or collaborate on colleagues’ docs to introduce new features or changes into our infrastructure.
You may be a good fit if you have
- 4+ years of relevant SRE experience. Ideally, you've worked in a microservices environment in the past and feel comfortable working with complex architectures.
- Communication: Kentik is a remote-first company, so we're looking for a team player who is able to collaborate in an asynchronous environment via tools such as Email, Google Docs, Slack, Zoom, and Git.
- HTTP experience: You know what TLS, nginx, HAProxy, or HTTP headers are.
- Networking experience: Terms such as routes and iptables sound familiar.
- An urge to document code, processes, and infrastructure in runbooks and wikis.
- Infrastructure as code sounds like a good idea.
- A preference to automate your way out of tedious and repetitive tasks - toil is bad.
- Some familiarity with coding in Python, Ruby, or Go, while using Git to record your changes.
- Experience with cloud architectures and technologies.
The list is not exhaustive or in any way a checklist; we'll be happy to hear about any other skills or experience you may have!
Our tech stack
- Our core data engine and platform are primarily written in Go
- We use Node.js + Express for application serving, and React as our primary UI framework
- We also use some JS and Python for tooling/scripting
- In addition to our own database, we use Postgres, Kafka, Mysql, and Redis
- Internal and public APIs expose both rest/json and gRPC endpoints
- Haproxy, Envoy for API traffic routing and balancing
- Github for source control, PRs, issues
- Jenkins for automated builds
Healthcare and Leave*
- Medical, Dental, and Vision that is 100% covered for employees & dependents
- Health Reimbursement Account
- Maternity/Paternity leave, medical leave to care for yourself or a loved one
- Stock Options
- Flexible time off
- Vanguard 401(k)
* Benefits are as listed for US employees. Local health insurance coverage and similar benefits are provided for employees outside of the US.
Perks and Reimbursement Programs
- $100 per month for wellness expenses and activities
- Internet / Phone Reimbursement
Come work with us
We promise it will be worth it! You will be working at a fast-growing, well-funded startup, alongside world-class engineers and thought leaders as we build the future of network observability and digital operations. With a competitive salary and first-rate benefits on top of meaningful, challenging projects, we’re sure you will enjoy joining our team!