Progressive Rollout with Feature Flags for Remotes #

Ship a new version of a federated remote to a slice of traffic by letting a feature flag decide—at host bootstrap—whether each session loads the v1 or v2 remoteEntry.js, and flip back automatically when the error rate spikes.

This is the safest way to release a remote you cannot fully trust yet. The host stays on one URL contract; only the resolved remoteEntry changes per cohort. If v2 misbehaves, the flag service moves everyone back to v1 without a redeploy.

Prerequisites #

This builds directly on dynamically loading remote modules at runtime; if static remotes are still in your config, migrate first.

Flag-driven progressive rollout with auto-rollback User to flag service to host, host selects v1 or v2 remoteEntry, telemetry feeds an automated rollback that flips the flag. User session Flag service percentage / segment Host resolve + load remoteEntry v1 remoteEntry stable v2 remoteEntry canary Telemetry cohort + errors Auto-rollback error-rate threshold
Each session resolves a flag, the host loads the matching remoteEntry, telemetry feeds an automated rollback that flips the flag back to v1.

Step 1 — Define the rollout flag #

Model the flag as a structured value, not a boolean. You want a percentage, an explicit version map, and a kill switch the rollback automation can toggle.

{
  "key": "checkout-remote-rollout",
  "enabled": true,
  "rolloutPercentage": 10,
  "segments": ["internal-staff"],
  "versions": {
    "stable": {
      "version": "1.4.0",
      "remoteEntry": "https://cdn.example.com/remotes/checkout/1.4.0/remoteEntry.js"
    },
    "canary": {
      "version": "2.0.0",
      "remoteEntry": "https://cdn.example.com/remotes/checkout/2.0.0/remoteEntry.js"
    }
  }
}

rolloutPercentage: 10 sends ~10% of sessions to the canary. segments force specific cohorts (staff, beta opt-ins) onto the canary regardless of percentage. The automation in Step 5 sets rolloutPercentage back to 0 when it trips.

Step 2 — Resolve the cohort deterministically at bootstrap #

Resolution must run before any remote loads and must be sticky for the session—otherwise a user can flip between versions mid-visit. Hash a stable session identifier into a 0–99 bucket so the same user always lands in the same cohort for a given percentage.

// flagResolver.ts
export interface RolloutFlag {
  enabled: boolean;
  rolloutPercentage: number;
  segments: string[];
  versions: {
    stable: { version: string; remoteEntry: string };
    canary: { version: string; remoteEntry: string };
  };
}

export interface Cohort {
  name: 'stable' | 'canary';
  version: string;
  remoteEntry: string;
}

// Stable 32-bit hash -> bucket 0..99 (deterministic per id).
function bucketFor(id: string): number {
  let h = 2166136261;
  for (let i = 0; i < id.length; i++) {
    h ^= id.charCodeAt(i);
    h = Math.imul(h, 16777619);
  }
  return Math.abs(h) % 100;
}

export function resolveCohort(
  flag: RolloutFlag,
  ctx: { sessionId: string; segments: string[] }
): Cohort {
  const onCanary =
    flag.enabled &&
    (ctx.segments.some((s) => flag.segments.includes(s)) ||
      bucketFor(ctx.sessionId) < flag.rolloutPercentage);

  const pick = onCanary ? flag.versions.canary : flag.versions.stable;
  return { name: onCanary ? 'canary' : 'stable', ...pick };
}

Use a persisted sessionId (a cookie or localStorage value), not a random per-page-load value. That stickiness is what prevents the mixed-version session described in Troubleshooting.

Step 3 — Fetch the flag and resolve before first load #

Bootstrap fetches the flag, resolves the cohort once, and stashes it where the loader can read it synchronously.

// bootstrap.ts
import { resolveCohort, type Cohort, type RolloutFlag } from './flagResolver';
import { getSessionId } from './session';

let resolvedCohort: Cohort | null = null;

export async function initRollout(): Promise<Cohort> {
  const flag: RolloutFlag = await fetch('/api/flags/checkout-remote-rollout', {
    headers: { 'cache-control': 'no-store' },
  }).then((r) => r.json());

  resolvedCohort = resolveCohort(flag, {
    sessionId: getSessionId(),
    segments: window.__USER_SEGMENTS__ ?? [],
  });
  return resolvedCohort;
}

export function getCohort(): Cohort {
  if (!resolvedCohort) throw new Error('initRollout() must run before loading remotes');
  return resolvedCohort;
}

The cache-control: no-store header matters: a CDN- or browser-cached flag is the most common cause of a rollback that “doesn’t take.” See Troubleshooting for the cache-pinning failure mode.

Step 4 — Dynamically load the chosen remoteEntry #

Inject the resolved remoteEntry script, initialize the share scope, then pull the exposed module. This is the runtime-loading pattern adapted to read its URL from the cohort instead of a hard-coded constant.

// loadRemote.ts
import { getCohort } from './bootstrap';
import { emitCohort } from './telemetry';

declare const __webpack_init_sharing__: (scope: string) => Promise<void>;
declare const __webpack_share_scopes__: { default: unknown };

const loaded = new Map<string, Promise<void>>();

function injectScript(url: string): Promise<void> {
  if (!loaded.has(url)) {
    loaded.set(
      url,
      new Promise<void>((resolve, reject) => {
        const el = document.createElement('script');
        el.src = url;
        el.onload = () => resolve();
        el.onerror = () => reject(new Error(`Failed to load ${url}`));
        document.head.appendChild(el);
      })
    );
  }
  return loaded.get(url)!;
}

export async function loadCheckout<T>(exposed: string): Promise<T> {
  const cohort = getCohort();
  emitCohort({ remote: 'checkout', cohort: cohort.name, version: cohort.version });

  await injectScript(cohort.remoteEntry);
  await __webpack_init_sharing__('default');

  const container = (window as any).checkout;
  await container.init(__webpack_share_scopes__.default);
  const factory = await container.get(exposed);
  return factory().default as T;
}

Because the URL carries the version (/2.0.0/remoteEntry.js), the canary and stable scripts never collide in the browser cache—each version is its own immutable file.

Step 5 — Emit cohort telemetry and wire automated rollback #

Every session reports which cohort and version it used. The rollback watcher joins those cohort events to error events and flips the flag when the canary’s error rate crosses a threshold.

// telemetry.ts
export function emitCohort(e: { remote: string; cohort: string; version: string }) {
  navigator.sendBeacon(
    '/api/telemetry/cohort',
    JSON.stringify({ ...e, sessionId: getSessionIdSafe(), ts: Date.now() })
  );
}

function getSessionIdSafe() {
  try {
    return getSessionId();
  } catch {
    return 'unknown';
  }
}

The watcher runs server-side (a cron, queue consumer, or alerting webhook). It compares the canary error rate against both an absolute floor and the stable baseline, then calls the flag service to set rolloutPercentage to 0.

// rollbackWatcher.ts
const ABS_THRESHOLD = 0.05; // 5% of canary sessions erroring
const REL_MULTIPLIER = 3;   // or 3x the stable error rate
const MIN_SESSIONS = 200;   // ignore early noise

interface Window { canary: number; canaryErr: number; stable: number; stableErr: number; }

export async function evaluate(w: Window): Promise<void> {
  if (w.canary < MIN_SESSIONS) return;

  const canaryRate = w.canaryErr / w.canary;
  const stableRate = w.stable ? w.stableErr / w.stable : 0;
  const tripped = canaryRate > ABS_THRESHOLD || canaryRate > stableRate * REL_MULTIPLIER;

  if (tripped) {
    await fetch('/api/flags/checkout-remote-rollout', {
      method: 'PATCH',
      headers: { 'content-type': 'application/json' },
      body: JSON.stringify({ rolloutPercentage: 0, enabled: true }),
    });
    console.warn(`Rollback: canary ${(canaryRate * 100).toFixed(1)}% vs stable ${(stableRate * 100).toFixed(1)}%`);
  }
}

Keep enabled: true and only zero the percentage—that keeps segments intact so staff can keep validating the fix on the canary while real users are back on stable.

Verification #

Confirm the cohort split in logs. Run a count over the cohort beacon stream for a recent window and check the ratio matches your rolloutPercentage:

# Group last 1k cohort events by cohort name
curl -s "https://logs.example.com/query?q=remote:checkout&limit=1000" \
  | jq -r '.events[].cohort' | sort | uniq -c
#   902 stable
#    98 canary   <- ~10%, matches rolloutPercentage: 10

Confirm stickiness. Reload the canary path several times in one browser session; the network panel should request the same versioned remoteEntry.js every time. A different version on reload means resolution is not session-sticky.

Force a rollback. Temporarily lower ABS_THRESHOLD or inject errors into the canary, then watch the flag flip and the canary share fall to zero on the next bootstrap:

curl -s https://flags.example.com/api/flags/checkout-remote-rollout | jq '.rolloutPercentage'
# 0   <- watcher tripped; new sessions resolve to stable

Troubleshooting #

Flag flicker causing mixed versions in one session. Symptom: a user loads v1 for one widget and v2 for another in the same visit, often crashing on shared singletons. Diagnosis: the cohort was re-resolved per component instead of once at bootstrap, or sessionId is regenerated per page load. Fix: resolve exactly once in initRollout(), persist sessionId, and have every loader read getCohort()—never re-fetch the flag mid-session.

CDN or browser cache pinning a version. Symptom: you flip the flag to 0% but canary traffic continues for minutes or hours. Diagnosis: the flag JSON is being served from cache. Fix: serve the flag endpoint with Cache-Control: no-store (Step 3) or a very short TTL, and version the remoteEntry URLs so the canary script is immutable while the flag itself stays fresh. This pairs with CDN cache invalidation for federated remotes.

SSR/CSR flag mismatch causing hydration errors. Symptom: React hydration warnings and a flash of the wrong remote when the server renders one cohort and the client resolves another. Diagnosis: the server and client bucketed differently because they used different sessionId sources. Fix: resolve the cohort once on the server, serialize it into the HTML (e.g. window.__COHORT__), and have the client read that serialized value instead of re-bucketing.

Rollback fires on early noise. Symptom: the canary trips within seconds of release on a handful of sessions. Diagnosis: MIN_SESSIONS is too low or the watcher window is too short. Fix: require a meaningful sample before evaluating and compare against the live stable baseline rather than a fixed number, as shown in Step 5.