Index

Fixing a data loss bug in Zero

February 2026

Zero is a local-first sync engine by Rocicorp. I use it in Dara as the sync layer between the client and our API. In production, we hit two related bugs in how Zero handles JWT expiry that together caused silent data loss.

Bug 1: Race condition in connection state machine

When a JWT expires, the server sends an AuthInvalidated error followed by a websocket close with code 3000. Both events call #disconnect() on the client. The error handler transitions state to NeedsAuth via needsAuth(). The close handler calls connecting() with a CleanClose reason.

The problem is ordering. connecting() in connection-manager.ts only guards against transitions from Closed and Disconnected:

if (this.#state.name === ConnectionStatus.Closed) {
  return;
}
if (this.#state.name === ConnectionStatus.Disconnected && !isHiddenDisconnect) {
  return;
}

NeedsAuth and Error are defined as terminal states in TERMINAL_STATES, but connecting() does not check for them. If the close event processes after the error, connecting() overrides NeedsAuth and the client enters a 60-second retry loop with an expired token.

I opened #5500 with a fix and test case. Rocicorp shipped a cleaner version in #5504 that guards connecting() against all terminal states.

Bug 2: Stale token forwarding in zero-cache

With the race condition fixed, NeedsAuth fired correctly and the client refreshed its token. But mutations kept failing with JWTExpired on the API side.

The issue is in syncer-ws-message-handler.ts. Zero-cache caches the auth token at connection time in a private field:

// DEPRECATED: remove #token
// and forward auth and cookie headers that were
// sent with the push.
readonly #token: string | undefined;

This token is used for all subsequent pushes to the upstream API. When the client refreshes its JWT and sends a mutation, zero-cache ignores the fresh token in the push body and forwards the stale connection token instead.

When the API returns 401, zero-cache marks the mutation as processed without retrying or surfacing an error to the client. The write is lost.

Timeline of what this looked like in production with a 5-minute token lifetime:

token issued:  00:44:44 (5 min lifetime)
token expires: 00:49:44

00:49:49 - zero-cache forwards token, expiresIn: -5s  → auth ok (60s tolerance)
00:50:24 - zero-cache forwards token, expiresIn: -40s → auth ok
00:50:52 - zero-cache forwards token, expiresIn: -68s → 401 FAIL

The client refreshed multiple times during this window. Zero-cache kept using the 6-minute-old connection token.

Fix

Added an optional auth field to the PushBody schema. The client includes its current token with each push. Zero-cache prefers push auth over the cached connection token:

const authToken = msg[1].auth ?? this.#token;

Backwards compatible. Clients without the field fall back to existing behavior. Bumped PROTOCOL_VERSION to 47 since the schema change affects the protocol hash.

PR #5503: 250 additions, 9 deletions, 9 files changed. Reviewed and merged by Matt Wonlaw. There was discussion about whether the new field would break old servers, since connection.ts rejects unknown fields in schema validation. We kept the approach and documented that servers need to update before clients.

Follow-up

The fix led to #5530 by Rocicorp, extending the same pattern to changeDesiredQueries so query auth is also refreshed per-request rather than cached at connection time.