Site Reliability Engineer - Postgres

Supabase

Remote work

Regular employment

7 - 12 years of experience

Full Time

Remote - Worldwide

Responsibilities

Supabase is an Open Source and fully remote company building developer tools for databases.

We’re hiring a Site Reliability Engineer to lead efforts in database resiliency and failover. Your mission is to ensure high availability for Postgres clusters by building automated recovery workflows, testing disaster scenarios, and improving our replication and switchover strategies. You’ll own incident response playbooks, HA architectures (e.g., Patroni, Stolon), and replication topologies.

You should be fluent in diagnosing replication lag, tuning synchronous standby setups, and designing systems that degrade gracefully under pressure. This role is critical to uptime guarantees for our enterprise customers.

What You’ll Work On:

Design and maintain HA Postgres architectures using tools like Patroni, Stolon, or custom tooling
Build and improve automated failover and recovery workflows to minimize downtime
Tune and monitor replication topologies, ensuring performance and data integrity across synchronous and asynchronous setups
Create and own incident response playbooks for Postgres-related issues
Work with platform and infra teams to improve fault tolerance, rollback paths, and system reliability under failure conditions
Enhance observability into Postgres replication, lag, switchover timing, and cluster health
Continuously improve our ability to meet uptime guarantees for production systems

We offer:

100% remote work from anywhere in the world. No location-based adjustment to your salary.
Autonomous work. We work collaboratively on projects, but you set your own pace.
Health, Vision and Dental benefits. Supabase covers 100% of the cost for employees and 80% for dependants
Tech Allowance for any office setup you need
Annual Education Allowance
Annually run off-sites.

About the team

We're a startup. It's unstructured.
Collectively founded more than 30 startups.
Globally distributed team with more than 30 different nationalities.
We deeply believe in the efficacy of collaborative open source. We support existing communities and tools, rather than building "yet another xx".
We "dogfood" everything. If you use it in your project, we use it in Supabase.

Process

The entire process is fully remote and all communication will happen over email or via video chat.
Once you've submitted your application, the team will review your submission and may reach out for a short screening interview over a video call.
If you pass the screen you will be invited to up to four follow-up interviews.
The calls:
- usually take between 20-45 minutes each depending on the interviewer.
- most of the time, are all 1:1.
- will be with the founders, a member of either the growth or engineering team (depending on the role) and usually one other person from your immediate team or function.
Once the interviews are over, the team will meet to discuss several roles and candidates and may:
- ask one or two follow-up questions over email or a quick call.
- go directly to making an offer.

Required skills

Disaster Recovery

IT Management

Performance Testing

Performance Tuning

System Design

PostgreSQL

Cloud architectures

SRE principles

Observability

incident response

Switches

SQL databases

IT resilience

English

Job posted 1 day ago