Deployment & Observability for Micro-Frontends #

Q: Do all remotes need to use the same observability vendor?

No. They need to agree on a propagation format — W3C traceparent and tracestate — and report to a shared collector. The collector can fan out to different backends. What cannot vary is the trace context format, because that is the contract that lets spans from different teams join one trace.

Q: How do I keep distributed tracing from bloating every remote's bundle?

Expose a single tracing module from the host and mark it singleton in the share scope, exactly as you would React. Remotes import the host's tracer instead of bundling their own OpenTelemetry instance, so instrumentation adds near-zero per-remote weight and avoids duplicate, conflicting tracers.

Independent deployment is the headline benefit of micro-frontends — and the source of their hardest operational failures. The moment a remote ships on its own schedule, the assumptions baked into monolithic release and monitoring tooling quietly break. This guide sits under Core Micro-Frontend Architecture Tradeoffs and covers the deployment and observability disciplines that keep autonomous remotes from turning into an undebuggable distributed system: distributed tracing across micro-frontends so a single user action stays correlated as it crosses host and remote boundaries, error boundary telemetry for remote apps so a crashing remote reports itself instead of dragging the page down silently, CDN cache invalidation for federated remotes so a fresh deploy is actually served, and progressive rollout with feature flags for remotes so a bad version reaches one percent of users instead of all of them.

What Breaks When Remotes Deploy Independently #

In a monolith, a deploy is atomic. One artifact ships, one version runs, one source map resolves every stack trace, and one cache-busting hash invalidates everything at once. Federation discards all four guarantees simultaneously.

Consider the failure mode in concrete terms. A user clicks “Checkout” in the host shell. The host renders a checkout remote, which calls a payments remote, which reads a token managed by an auth remote. The payment fails. Your monitoring shows a 500 from one backend, an unhandled rejection in the browser console, and nothing connecting the two — because each remote was built by a different team, deployed at a different time, and reports to its own dashboard with its own request IDs.

The pain compounds across four axes:

No correlation. Without a shared trace context, you cannot prove that the checkout error and the auth latency spike belong to the same user journey. Each remote is an island of telemetry.
Invisible failures. A remote that fails to load (404 on its remoteEntry.js, or a chunk-load error mid-session) often produces a blank region rather than a logged, attributable event.
Stale code in production. You deploy a fix, the CDN keeps serving the old remoteEntry.js for hours, and you “confirm” the bug is fixed against cached bytes.
All-or-nothing blast radius. A regression in one remote hits 100% of traffic the instant the manifest pointer flips, with no graduated exposure and no automatic way back.

Solving these is not optional polish. It is the cost of admission for independent deployment, and it is governed by how you treat versioning. The remote contract you ship — covered in versioning strategies for remote apps — determines whether a rollback is a one-line manifest revert or a coordinated firefight.

One root span in the host flows as W3C traceparent into each remote; all spans and errors land in a shared collector under one trace id, while the CDN serves immutable, versioned entry files.

Key Objectives #

A deployment and observability strategy for federated remotes should deliver:

End-to-end trace correlation — every span from host and remotes shares one trace id, so a user journey is reconstructable.
Attributable failures — every remote load failure and runtime crash emits a structured, owner-tagged event, not a blank <div>.
Deterministic cache behavior — a deploy is guaranteed to be served, with no manual purge guesswork.
Graduated exposure — new remote versions reach a slice of traffic first, gated by flags, with automated rollback on regression.
Pipeline independence — a remote ships without rebuilding the host, while still passing contract and smoke checks.

CI/CD Pipeline Shape for Independent Remote Deploys #

The pipeline’s job is to publish a new, immutably-named remoteEntry.js to the CDN and then atomically flip a manifest pointer that the host resolves at runtime. The host never rebuilds. This decoupling is what the remotes map in your Module Federation configuration makes possible — the host resolves the remote URL at runtime, so swapping the bytes behind that URL is a deploy.

Here is an annotated GitHub Actions pipeline for a single remote. Note where tracing, cache headers, flagging, and rollback hooks attach.

name: deploy-remote-checkout
on:
  push:
    branches: [main]

concurrency:
  group: deploy-checkout       # serialize deploys of THIS remote only
  cancel-in-progress: false    # never cancel a deploy mid-flight

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - run: npm ci

      # Inject the build version into the bundle so spans/errors can tag it.
      - name: Set version
        run: echo "REMOTE_VERSION=${GITHUB_SHA::8}" >> "$GITHUB_ENV"

      # Output is remoteEntry.[contenthash].js — content-addressed, immutable.
      - name: Build
        run: npm run build
        env:
          REMOTE_VERSION: ${{ env.REMOTE_VERSION }}
          PUBLIC_PATH: https://cdn.example.com/checkout/${{ env.REMOTE_VERSION }}/

      # Contract test: assert exposed module surface still matches the host's expectation
      # before anything reaches the CDN. A breaking change fails the deploy here.
      - name: Verify remote contract
        run: npm run test:contract

      # Upload hashed assets with a long, immutable cache; never the manifest yet.
      - name: Upload assets to CDN
        run: |
          aws s3 sync dist/ s3://mfe-cdn/checkout/${REMOTE_VERSION}/ \
            --cache-control "public,max-age=31536000,immutable" \
            --exclude "manifest.json"

      # Smoke test the freshly uploaded entry in isolation (loads, exposes modules).
      - name: Smoke test deployed entry
        run: npm run test:smoke -- --url "https://cdn.example.com/checkout/${REMOTE_VERSION}/remoteEntry.js"

      # Register the version behind a flag — 0% traffic until rollout begins.
      - name: Register version with flag service
        run: |
          curl -fsS -X POST "$FLAG_API/remotes/checkout/versions" \
            -H "Authorization: Bearer $FLAG_TOKEN" \
            -d "{\"version\":\"${REMOTE_VERSION}\",\"rollout\":0}"
        env:
          FLAG_API: ${{ secrets.FLAG_API }}
          FLAG_TOKEN: ${{ secrets.FLAG_TOKEN }}

      # Flip the pointer LAST, with a short TTL so a rollback propagates fast.
      - name: Publish manifest pointer
        run: |
          echo "{\"checkout\":\"${REMOTE_VERSION}\"}" > manifest.json
          aws s3 cp manifest.json s3://mfe-cdn/checkout/manifest.json \
            --cache-control "public,max-age=30,must-revalidate"

The two cache policies are the crux. Hashed assets get max-age=31536000,immutable because their content can never change under a given URL. The small manifest.json that maps a remote name to its current version gets a short TTL (max-age=30) so that flipping or reverting the pointer reaches users in seconds. This separation — covered in depth in CDN cache invalidation for federated remotes — gives you both aggressive caching and fast rollback.

Rollback is a Pointer Revert #

Because the host resolves remotes through the manifest at runtime, a rollback is one operation: rewrite manifest.json to the previous version string. The old immutable assets are still on the CDN, so the previous version is instantly serviceable. No rebuild, no re-upload — just flip the pointer back and wait out the 30-second TTL.

Correlating Traces Across Host and Remotes #

The unit of correlation is the W3C traceparent, a single header/field format every remote can agree on regardless of framework. The host starts a root span when a user journey begins, and each remote either continues that trace (for in-page operations) or propagates the context onto its outbound fetch calls so the backend trace stitches to the frontend one.

The mechanism is a shared tracing context exposed by the host and consumed by remotes. Because all remotes share a single tracer instance (the same singleton discipline you use for avoiding bundle duplication of React), spans nest correctly instead of starting orphaned traces.

// host/tracing.ts — initialized once, shared with every remote via the share scope.
import {
  context,
  trace,
  propagation,
  type Span,
} from '@opentelemetry/api';

const tracer = trace.getTracer('mfe-host', __HOST_VERSION__);

// Remotes call this to wrap any operation in a child span of the active trace.
export function withRemoteSpan<T>(
  remoteName: string,
  remoteVersion: string,
  fn: (span: Span) => Promise<T>,
): Promise<T> {
  const span = tracer.startSpan(`remote:${remoteName}`, {
    attributes: {
      'mfe.remote.name': remoteName,
      'mfe.remote.version': remoteVersion, // the GITHUB_SHA from the pipeline
    },
  });
  // Run fn inside this span's context so nested spans + fetches inherit it.
  return context.with(trace.setSpan(context.active(), span), async () => {
    try {
      return await fn(span);
    } catch (err) {
      span.recordException(err as Error);
      throw err;
    } finally {
      span.end();
    }
  });
}

// A fetch wrapper that injects W3C traceparent into outbound requests so the
// backend trace joins the same trace id as the frontend span.
export async function tracedFetch(input: RequestInfo, init: RequestInit = {}) {
  const headers = new Headers(init.headers);
  propagation.inject(context.active(), headers, {
    set: (carrier, key, value) => (carrier as Headers).set(key, value),
  });
  return fetch(input, { ...init, headers });
}

A remote consumes this without importing its own tracer:

// checkout-remote/submit.ts
import { withRemoteSpan, tracedFetch } from 'host/tracing'; // from the share scope

export function submitOrder(payload: OrderPayload) {
  return withRemoteSpan('checkout', __REMOTE_VERSION__, async (span) => {
    span.setAttribute('order.items', payload.items.length);
    const res = await tracedFetch('/api/orders', {
      method: 'POST',
      body: JSON.stringify(payload),
    });
    span.setAttribute('http.status_code', res.status);
    return res.json();
  });
}

Now the checkout span, the auth span, and the backend POST /api/orders span all carry the same trace id. The full pattern — sampling, context propagation across lazy-loaded remotes, and avoiding double instrumentation — is detailed in distributed tracing across micro-frontends.

Error Boundary Telemetry #

A traced happy path is half the story. The other half is making failures loud and attributable. Two failure classes matter: a remote that fails to load (network 404, chunk-load error, integrity mismatch) and a remote that throws after mounting.

Wrap every remote mount in an error boundary that reports to the same collector, tagged with the remote’s name and version so the right team is paged.

// host/RemoteBoundary.tsx
import { Component, type ReactNode } from 'react';
import { trace } from '@opentelemetry/api';

interface Props { remote: string; version: string; children: ReactNode; }
interface State { failed: boolean; }

export class RemoteBoundary extends Component<Props, State> {
  state: State = { failed: false };

  static getDerivedStateFromError() {
    return { failed: true };
  }

  componentDidCatch(error: Error, info: { componentStack: string }) {
    const span = trace.getActiveSpan();
    span?.recordException(error);
    // Attribute the failure: which remote, which version, where in the tree.
    navigator.sendBeacon('/telemetry/errors', JSON.stringify({
      kind: 'remote_runtime_error',
      remote: this.props.remote,
      version: this.props.version,
      message: error.message,
      stack: error.stack,
      componentStack: info.componentStack,
      traceId: span?.spanContext().traceId,
    }));
  }

  render() {
    if (this.state.failed) {
      return <div role="alert">This section is temporarily unavailable.</div>;
    }
    return this.props.children;
  }
}

The boundary degrades gracefully — the host keeps working while one region shows a fallback — and it converts a silent blank space into a structured event carrying the trace id. Tying error volume to a specific version is also what makes automated rollback possible: a spike in remote_runtime_error for version a1b2c3d4 is the rollback signal. The deeper treatment, including load-failure telemetry and source-map symbolication per remote, is in error boundary telemetry for remote apps.

CDN Cache Invalidation & Versioned remoteEntry #

The cache strategy follows directly from the pipeline. Two rules prevent the “fix deployed against cached bytes” failure mode:

Content-address everything except the pointer. Every asset — including remoteEntry.[contenthash].js — lives under a version-scoped path and is served immutable. Its bytes never change, so it can cache forever and rollback is just re-pointing.
Keep the manifest pointer tiny and short-lived. The only mutable object is manifest.json, served with a 30-second TTL. That is your invalidation surface — flip it, wait the TTL, done. You almost never issue a CDN purge.

The classic mistake is caching remoteEntry.js itself with a long TTL at a stable URL. Then a deploy uploads new bytes the CDN refuses to serve, and edge nodes disagree for hours. Versioned paths plus an immutable policy make that physically impossible. Edge cases — surrogate keys, multi-CDN consistency, and import-map variants — are covered in CDN cache invalidation for federated remotes.

Feature-Flagged Progressive Rollout & Automated Rollback #

Even with perfect tracing, telemetry, and caching, flipping 100% of traffic to a new remote is a gamble. Progressive rollout converts a binary deploy into a dial. The pipeline registered the new version at rollout: 0; rollout is the act of turning that dial up while watching telemetry.

The host resolves which version to load per user from the flag service, falling back to the last-known-good version if the flag service is unreachable:

// host/resolveRemoteVersion.ts
export async function resolveRemoteVersion(remote: string, userId: string) {
  try {
    const res = await fetch(`/flags/remotes/${remote}?u=${userId}`, {
      cache: 'no-store',
    });
    const { version } = await res.json(); // service buckets the user by rollout %
    return version;
  } catch {
    // Flag service down: serve the stable manifest pointer, never block render.
    const manifest = await fetch('/checkout/manifest.json').then((r) => r.json());
    return manifest[remote];
  }
}

Automated rollback closes the loop. Wire an alerting rule against the telemetry you already emit — error rate for the candidate version, or a drop in a conversion span — and have it call the flag API to set rollout back to 0. Because version selection is runtime and the old immutable assets are still on the CDN, the rollback takes effect on the next page load with no rebuild. The bucketing strategy, canary cohorts, and metric thresholds are detailed in progressive rollout with feature flags for remotes.

Testing & Validation #

Validate the observability story the same way you validate features — in the pipeline, before production:

Contract test the exposed module surface so a breaking change fails the deploy, not a user’s session.
Smoke test the deployed remoteEntry.js in isolation: it loads, exposes its modules, and initializes the share scope.
Trace assertion test in an integration environment: drive a host + remote flow and assert a single trace id spans both, with the expected mfe.remote.version attribute.
Telemetry test the error boundary: throw on purpose in a mounted remote and assert the beacon payload carries remote, version, and traceId.

Common Pitfalls #

Issue	Root cause & resolution
Errors from one remote can’t be tied to a user journey	Each remote starts its own trace. Share a single tracer through the federation share scope and propagate W3C `traceparent` so spans nest under one trace id.
A deployed fix isn’t served; users still hit the bug	`remoteEntry.js` cached long-lived at a stable URL. Content-address assets under version paths with `immutable`; keep only a tiny manifest pointer short-lived.
Remote load failure shows a blank region, no alert	No load-failure telemetry. Wrap mounts in an error boundary that beacons a structured, version-tagged event and renders a fallback.
Rolling back means a rebuild and re-deploy	Remote version baked into the host build. Resolve remote versions at runtime via the manifest/flag service so rollback is a pointer revert.
New version breaks 100% of users instantly	No graduated exposure. Register versions at 0% behind a flag and dial up while watching version-tagged telemetry; automate rollback on threshold breach.
Can’t tell which remote version caused a regression	Spans and errors lack a version attribute. Inject the build SHA at build time and tag every span and error with `mfe.remote.version`.

FAQ #

Do all remotes need to use the same observability vendor?

No. They need to agree on a propagation format — W3C traceparent and tracestate — and report to a shared collector. The collector can fan out to different backends. What cannot vary is the trace context format, because that is the contract that lets spans from different teams join one trace.

How do I keep distributed tracing from bloating every remote’s bundle?

Expose a single tracing module from the host and mark it singleton in the share scope, exactly as you would React. Remotes import the host’s tracer instead of bundling their own OpenTelemetry instance, so instrumentation adds near-zero per-remote weight and avoids duplicate, conflicting tracers.

What is the safest order of operations in a remote deploy?

Build with a content-hashed entry, run contract and smoke tests, upload immutable assets to a version-scoped path, register the version at 0% rollout, and only then flip the short-TTL manifest pointer. The pointer flip is last and cheap to revert, so the riskiest step is also the most reversible one.

How fast can a rollback take effect?

Roughly the manifest’s TTL — seconds, if you keep it at 30. Because version resolution happens at runtime and the previous immutable assets remain on the CDN, reverting the pointer (or setting the flag rollout to 0) is served on the next page load with no rebuild and no CDN purge.

← Back to Core Micro-Frontend Architecture Tradeoffs