nPloy Logo

Site Reliability Engineer - Postgres

Logo of Supabase

Supabase

Remote work

Remote work

Regular employment

7 - 12 years of experience

Full Time

Remote - Worldwide

Responsibilities

Supabase is an Open Source and fully remote company building developer tools for databases.

We’re hiring a Site Reliability Engineer to lead efforts in database resiliency and failover. Your mission is to ensure high availability for Postgres clusters by building automated recovery workflows, testing disaster scenarios, and improving our replication and switchover strategies. You’ll own incident response playbooks, HA architectures (e.g., Patroni, Stolon), and replication topologies.

You should be fluent in diagnosing replication lag, tuning synchronous standby setups, and designing systems that degrade gracefully under pressure. This role is critical to uptime guarantees for our enterprise customers.

What You’ll Work On:

  • Design and maintain HA Postgres architectures using tools like Patroni, Stolon, or custom tooling

  • Build and improve automated failover and recovery workflows to minimize downtime

  • Tune and monitor replication topologies, ensuring performance and data integrity across synchronous and asynchronous setups

  • Create and own incident response playbooks for Postgres-related issues

  • Work with platform and infra teams to improve fault tolerance, rollback paths, and system reliability under failure conditions

  • Enhance observability into Postgres replication, lag, switchover timing, and cluster health

  • Continuously improve our ability to meet uptime guarantees for production systems

We offer:

  • 100% remote work from anywhere in the world. No location-based adjustment to your salary.

  • Autonomous work. We work collaboratively on projects, but you set your own pace.

  • Health, Vision and Dental benefits. Supabase covers 100% of the cost for employees and 80% for dependants

  • Tech Allowance for any office setup you need

  • Annual Education Allowance

  • Annually run off-sites.

About the team

  • We're a startup. It's unstructured.

  • Collectively founded more than 30 startups.

  • Globally distributed team with more than 30 different nationalities.

  • We deeply believe in the efficacy of collaborative open source. We support existing communities and tools, rather than building "yet another xx".

  • We "dogfood" everything. If you use it in your project, we use it in Supabase.

Process

  • The entire process is fully remote and all communication will happen over email or via video chat.

  • Once you've submitted your application, the team will review your submission and may reach out for a short screening interview over a video call.

  • If you pass the screen you will be invited to up to four follow-up interviews.

  • The calls:

    • usually take between 20-45 minutes each depending on the interviewer.

    • most of the time, are all 1:1.

    • will be with the founders, a member of either the growth or engineering team (depending on the role) and usually one other person from your immediate team or function.

  • Once the interviews are over, the team will meet to discuss several roles and candidates and may:

    • ask one or two follow-up questions over email or a quick call.

    • go directly to making an offer.

Required skills

Disaster Recovery
IT Management
Performance Testing
Performance Tuning
System Design
PostgreSQL
Cloud architectures
SRE principles
Observability
incident response
Switches
SQL databases
IT resilience
English
Job posted 1 day ago

or

to apply.