Error Boundary Telemetry for Remote Apps #

When a single federated remote throws during render, the default behavior is brutal: React unmounts the whole tree and the host goes blank — so a bug owned by one team takes down every other team’s UI. This guide shows how to wrap each remote in an error boundary that contains the crash, renders a graceful fallback, and emits structured telemetry tagging exactly which remote and which remoteEntry version failed, so you can attribute and roll back fast.

Prerequisites #

The mental model: the boundary is a circuit breaker around each remote. The crash is contained, the host stays interactive, and a beacon carries the attribution out.

Error boundary containment and telemetry flow The host shell stays interactive while a crashing remote is contained by its boundary, which renders a fallback and sends a structured beacon to a telemetry sink. Host shell — stays interactive Remote A (ok) renders normally Error Boundary Remote B throws ✕ Fallback UI "Couldn't load" + Retry Telemetry sink remote + version + rate beacon
One remote's crash is caught by its boundary, which shows a fallback and beacons attribution — the host and sibling remotes keep running.

Step 1 — A telemetry-aware error boundary #

A class component is the only thing that can catch a child’s render error. The key design choice is that the boundary is parameterized by the remote it wraps — its remoteName and version props become the attribution tags on every event.

// RemoteErrorBoundary.tsx
import { Component, type ErrorInfo, type ReactNode } from 'react';
import { reportRemoteError } from './telemetry';

interface Props {
  remoteName: string;
  version: string;
  fallback: (retry: () => void) => ReactNode;
  children: ReactNode;
}
interface State {
  hasError: boolean;
}

export class RemoteErrorBoundary extends Component<Props, State> {
  state: State = { hasError: false };

  static getDerivedStateFromError(): State {
    return { hasError: true };
  }

  componentDidCatch(error: Error, info: ErrorInfo) {
    reportRemoteError({
      remote: this.props.remoteName,
      version: this.props.version,
      message: error.message,
      stack: error.stack,
      componentStack: info.componentStack ?? undefined,
    });
  }

  retry = () => this.setState({ hasError: false });

  render() {
    if (this.state.hasError) return this.props.fallback(this.retry);
    return this.props.children;
  }
}

getDerivedStateFromError flips into the fallback state; componentDidCatch is where the side effect — telemetry — lives. Capturing info.componentStack is what lets you see which mounted component inside the remote blew up, not just the error message.

Step 2 — Tag events with the remote name and remoteEntry version #

Attribution is the whole point. An untagged error tells you something broke; a tagged one tells you [email protected] broke, which is what drives a rollback decision. Define the event shape once and emit it to the sink.

// telemetry.ts
export interface RemoteErrorEvent {
  remote: string;
  version: string;
  message: string;
  stack?: string;
  componentStack?: string;
  kind?: 'render' | 'chunk-load';
}

const ENDPOINT = '/beacon/remote-error';

export function reportRemoteError(evt: RemoteErrorEvent) {
  const payload = JSON.stringify({
    ...evt,
    kind: evt.kind ?? 'render',
    href: location.href,
    ts: Date.now(),
  });

  // Sentry-style SDK path (preferred when present)
  if (window.Sentry) {
    window.Sentry.captureException(new Error(evt.message), {
      tags: { remote: evt.remote, remote_version: evt.version, kind: evt.kind },
      extra: { componentStack: evt.componentStack },
    });
    return;
  }

  // Custom beacon path — survives page unload
  navigator.sendBeacon(ENDPOINT, new Blob([payload], { type: 'application/json' }));
}

navigator.sendBeacon is deliberate: it fires even if the user navigates away mid-crash, where a fetch would be cancelled. The Sentry path uses tags (indexed, filterable) for remote and remote_version so you can group error rate per remote in the dashboard.

Step 3 — Wrap each remote with its version #

Lazy-load the remote and feed the boundary the metadata. The version comes from a build-time define so it always matches the deployed remoteEntry.js.

// MountCheckout.tsx
import { lazy, Suspense } from 'react';
import { RemoteErrorBoundary } from './RemoteErrorBoundary';

declare const __CHECKOUT_VERSION__: string; // injected by DefinePlugin

const Checkout = lazy(() => import('checkout/App'));

export function MountCheckout() {
  return (
    <RemoteErrorBoundary
      remoteName="checkout"
      version={__CHECKOUT_VERSION__}
      fallback={(retry) => (
        <div role="alert" className="remote-fallback">
          <p>Checkout is temporarily unavailable.</p>
          <button onClick={retry}>Try again</button>
        </div>
      )}
    >
      <Suspense fallback={<div>Loading checkout…</div>}>
        <Checkout />
      </Suspense>
    </RemoteErrorBoundary>
  );
}

Inject the version in the remote’s webpack config so it’s baked into the host’s view of that remote:

// host webpack.config.js (snippet)
const { DefinePlugin } = require('webpack');

plugins: [
  new DefinePlugin({
    __CHECKOUT_VERSION__: JSON.stringify(process.env.CHECKOUT_VERSION ?? 'dev'),
  }),
];

Step 4 — Handle async chunk-load failures with retry #

A render error boundary does not catch a failed import() of remoteEntry.js (a CDN blip, a stale hashed chunk, a deploy mid-flight). Those reject the promise as a ChunkLoadError. Wrap the dynamic import so it retries with backoff, and report the failure with kind: 'chunk-load'.

// loadRemote.ts
import { reportRemoteError } from './telemetry';

export function loadRemote<T>(
  remote: string,
  version: string,
  importer: () => Promise<T>,
  retries = 2,
): Promise<T> {
  return importer().catch((err: Error) => {
    const isChunkError =
      err.name === 'ChunkLoadError' || /Loading chunk|dynamically imported/.test(err.message);

    if (isChunkError && retries > 0) {
      const delay = (3 - retries) * 400; // 0ms, 400ms, 800ms
      return new Promise<T>((resolve) =>
        setTimeout(() => resolve(loadRemote(remote, version, importer, retries - 1)), delay),
      );
    }

    reportRemoteError({
      remote,
      version,
      message: err.message,
      stack: err.stack,
      kind: 'chunk-load',
    });
    throw err; // let the boundary render the fallback
  });
}

Use it in the lazy call so retries happen transparently and the boundary only sees the final failure:

const Checkout = lazy(() =>
  loadRemote('checkout', __CHECKOUT_VERSION__, () => import('checkout/App')),
);

Re-throwing after the retries are exhausted is what lets Suspense reject and the surrounding boundary catch it — so chunk-load failures and render failures funnel into the same fallback and the same telemetry shape.

Step 5 — Aggregate per-remote error rate to drive rollback #

A single error is noise; a rate is a signal. Have the beacon endpoint (or a Sentry alert) compute errors-per-session per remote@version. The decision rule is simple: if a freshly deployed version’s error rate crosses a threshold above baseline, roll it back.

// beacon-aggregator.ts (server side, illustrative)
const counts = new Map<string, { errors: number; sessions: Set<string> }>();

export function record(evt: { remote: string; version: string; session: string }) {
  const key = `${evt.remote}@${evt.version}`;
  const bucket = counts.get(key) ?? { errors: 0, sessions: new Set() };
  bucket.errors += 1;
  bucket.sessions.add(evt.session);
  counts.set(key, bucket);

  const rate = bucket.errors / bucket.sessions.size;
  if (bucket.sessions.size >= 50 && rate > 0.05) {
    triggerRollback(evt.remote, evt.version); // pin host back to previous remoteEntry
  }
}

The sessions.size >= 50 guard prevents a single flaky client from triggering a rollback. Tie triggerRollback into the same mechanism that points the host at a remote version — for the broader picture of correlating these failures across the system, see distributed tracing across micro-frontends.

Verification #

Force a remote to throw and confirm two things: the host survives, and an event lands in your sink.

  1. Add a throw switch inside the remote, gated on a query param:
// inside checkout/App.tsx
if (new URLSearchParams(location.search).has('boom')) {
  throw new Error('forced checkout crash');
}
  1. Load the host with ?boom and verify in DevTools:
✓ Host header/nav still rendered and clickable (no white screen)
✓ Checkout area shows "Checkout is temporarily unavailable" + Try again
✓ Network tab: POST /beacon/remote-error  (or Sentry envelope request)
✓ Payload includes remote:"checkout", version:"2.4.1", componentStack present
  1. Assert it in a test using a throwing child:
import { render, screen } from '@testing-library/react';
import { RemoteErrorBoundary } from './RemoteErrorBoundary';

const Boom = () => { throw new Error('kaboom'); };

test('contains crash and reports attribution', () => {
  render(
    <RemoteErrorBoundary remoteName="checkout" version="2.4.1" fallback={() => <p>down</p>}>
      <Boom />
    </RemoteErrorBoundary>,
  );
  expect(screen.getByText('down')).toBeInTheDocument();
});

Spy on reportRemoteError in the test to assert the remote and version tags are present.

Troubleshooting #

Boundary doesn’t catch the failure. React error boundaries only catch errors thrown during render, lifecycle, and constructors of descendants. They miss errors in event handlers, in setTimeout/Promise callbacks, and — crucially — the chunk-load rejection of import() itself. Route async work through loadRemote (Step 4) so the rejection surfaces through Suspense, and wrap event-handler logic in try/catch that calls reportRemoteError directly.

Events arrive with no remote attribution. Usually the version is 'dev' or undefined in production because DefinePlugin wasn’t given the deployed value, or the same generic <ErrorBoundary> wraps several remotes so every event looks identical. Give each remote its own boundary instance with its own remoteName/version, and confirm process.env.CHECKOUT_VERSION is set in the build pipeline, not just locally.

Error storms flood the sink. A remote that throws on every render can emit hundreds of beacons per second. Add a client-side cap (e.g. a per-session counter that stops after N events for the same remote@version) and a server-side rate limiter keyed on remote@version. Sampling at the SDK level (sampleRate) also keeps cost bounded while preserving the rate signal.

Retry loops on a permanently broken chunk. If remoteEntry.js is genuinely gone (deleted by a CDN purge), retries just delay the inevitable. Keep retries small (2 is plenty), use exponential-ish backoff, and make the final fallback actionable — a real “reload” button beats an infinite spinner.