Back to blog

Sharing State Across Node.js Cluster Workers with IPC Sockets

·8 min read
nodejsclusteringcachingperformanceipc

When you scale a Node.js application using cluster mode (or PM2), you spin up multiple worker processes to utilize all CPU cores. But there's a catch: each worker has its own isolated memory space. Your in-memory cache becomes useless.

In this post, I'll show you how to solve this using Unix domain sockets for inter-process communication (IPC), creating a shared cache that all workers can access.

The Problem with Cluster Mode Caching

You might think this would work:

// lib/cache.ts
const cache = new Map<string, any>();
 
export function get(key: string) {
  return cache.get(key);
}
 
export function set(key: string, value: any) {
  cache.set(key, value);
}

It won't at least not across workers. When running in cluster mode:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Worker 1   │     │  Worker 2   │     │  Worker 3   │
│  cache: {}  │     │  cache: {}  │     │  cache: {}  │
└─────────────┘     └─────────────┘     └─────────────┘
     ↑ sets              ↑ gets              ↑ gets
   user:123            user:123            user:123
   = data              = undefined!        = undefined!

Worker 1 caches some data, but Workers 2 and 3 have completely separate Map instances. They'll never see each other's cached values. Every worker ends up fetching the same data independently.

The Solution: External Cache Process

The fix is to move the cache outside the worker processes entirely. We'll create a small daemon that:

  1. Runs as a separate process (independent of cluster workers)
  2. Stores cache data in memory
  3. Communicates via Unix domain sockets (fast, no network overhead)
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Worker 1   │     │  Worker 2   │     │  Worker 3   │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │ Unix Socket
                    ┌──────▼──────┐
                    │ IPC Cache   │
                    │   Server    │
                    └─────────────┘

Why Unix Domain Sockets?

  • Speed: No TCP/IP stack overhead, just direct kernel-level IPC
  • Simplicity: No ports to manage, just a file path
  • Security: Socket file permissions control access
  • Reliability: The OS handles connection management

Implementation

The Cache Server

First, let's create a simple cache server:

// cache-server.ts
import { createServer, Socket } from 'net';
import { existsSync, unlinkSync } from 'fs';
 
const SOCKET_PATH = '/tmp/app-cache.sock';
const cache = new Map<string, { value: any; expires: number }>();
 
// Clean up stale socket file
if (existsSync(SOCKET_PATH)) {
  unlinkSync(SOCKET_PATH);
}
 
const server = createServer((socket: Socket) => {
  socket.on('data', (data) => {
    try {
      const { action, key, value, ttl } = JSON.parse(data.toString());
      let response: any;
 
      switch (action) {
        case 'get': {
          const entry = cache.get(key);
          if (entry && entry.expires > Date.now()) {
            response = { ok: true, value: entry.value };
          } else {
            if (entry) cache.delete(key); // Clean expired
            response = { ok: true, value: null };
          }
          break;
        }
        case 'set': {
          const expires = Date.now() + (ttl || 60000); // Default 60s TTL
          cache.set(key, { value, expires });
          response = { ok: true };
          break;
        }
        case 'delete': {
          cache.delete(key);
          response = { ok: true };
          break;
        }
        case 'clear': {
          cache.clear();
          response = { ok: true };
          break;
        }
        case 'stats': {
          response = { ok: true, size: cache.size };
          break;
        }
        default:
          response = { ok: false, error: 'Unknown action' };
      }
 
      socket.write(JSON.stringify(response));
    } catch (err) {
      socket.write(JSON.stringify({ ok: false, error: 'Parse error' }));
    }
  });
});
 
server.listen(SOCKET_PATH, () => {
  console.log(`Cache server listening on ${SOCKET_PATH}`);
});
 
// Graceful shutdown
process.on('SIGTERM', () => {
  server.close();
  if (existsSync(SOCKET_PATH)) unlinkSync(SOCKET_PATH);
  process.exit(0);
});

The Cache Client

Now the client that your Next.js app will use:

// lib/ipc-cache.ts
import { createConnection, Socket } from 'net';
 
const SOCKET_PATH = '/tmp/app-cache.sock';
 
function sendCommand(command: object): Promise<any> {
  return new Promise((resolve, reject) => {
    const socket: Socket = createConnection(SOCKET_PATH);
    let data = '';
 
    socket.on('connect', () => {
      socket.write(JSON.stringify(command));
    });
 
    socket.on('data', (chunk) => {
      data += chunk.toString();
    });
 
    socket.on('end', () => {
      try {
        const response = JSON.parse(data);
        if (response.ok) {
          resolve(response.value ?? response);
        } else {
          reject(new Error(response.error));
        }
      } catch {
        reject(new Error('Invalid response'));
      }
    });
 
    socket.on('error', (err) => {
      reject(err);
    });
 
    // Timeout after 1 second
    setTimeout(() => {
      socket.destroy();
      reject(new Error('Cache timeout'));
    }, 1000);
  });
}
 
export const ipcCache = {
  async get<T>(key: string): Promise<T | null> {
    try {
      return await sendCommand({ action: 'get', key });
    } catch {
      return null; // Graceful fallback
    }
  },
 
  async set(key: string, value: any, ttl?: number): Promise<void> {
    try {
      await sendCommand({ action: 'set', key, value, ttl });
    } catch {
      // Silent fail - cache is optional
    }
  },
 
  async delete(key: string): Promise<void> {
    try {
      await sendCommand({ action: 'delete', key });
    } catch {
      // Silent fail
    }
  },
 
  async clear(): Promise<void> {
    try {
      await sendCommand({ action: 'clear' });
    } catch {
      // Silent fail
    }
  },
};

Usage in Your App

Now you can use it anywhere every worker shares the same cache:

// routes/users.ts
import { ipcCache } from './lib/ipc-cache';
 
export async function getUser(id: string) {
  const cacheKey = `user:${id}`;
 
  // Try cache first - works across ALL workers
  const cached = await ipcCache.get<User>(cacheKey);
  if (cached) {
    return cached;
  }
 
  // Fetch fresh data
  const user = await db.users.findById(id);
 
  // Cache for 5 minutes - available to all workers instantly
  await ipcCache.set(cacheKey, user, 5 * 60 * 1000);
 
  return user;
}

Running with Cluster Mode

Here's a complete example with Node.js cluster:

// server.ts
import cluster from 'cluster';
import { cpus } from 'os';
import { spawn } from 'child_process';
 
if (cluster.isPrimary) {
  // Start the cache server first
  const cacheServer = spawn('node', ['--import', 'tsx', 'cache-server.ts'], {
    stdio: 'inherit',
  });
 
  // Fork workers
  const numWorkers = cpus().length;
  for (let i = 0; i < numWorkers; i++) {
    cluster.fork();
  }
 
  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died, restarting...`);
    cluster.fork();
  });
 
  process.on('SIGTERM', () => {
    cacheServer.kill();
    process.exit(0);
  });
} else {
  // Worker process - start your app
  import('./app');
}

Or with PM2, add to your ecosystem.config.js:

module.exports = {
  apps: [
    {
      name: 'cache-server',
      script: 'cache-server.ts',
      interpreter: 'tsx',
      instances: 1, // Single instance!
    },
    {
      name: 'api',
      script: 'app.ts',
      interpreter: 'tsx',
      instances: 'max', // One per CPU
      exec_mode: 'cluster',
    },
  ],
};

Performance Considerations

Connection Pooling

For high-throughput scenarios, you might want to maintain persistent connections:

// lib/ipc-cache-pooled.ts
import { createConnection, Socket } from 'net';
 
const SOCKET_PATH = '/tmp/app-cache.sock';
const pool: Socket[] = [];
const POOL_SIZE = 5;
 
function getConnection(): Promise<Socket> {
  const socket = pool.pop();
  if (socket && !socket.destroyed) {
    return Promise.resolve(socket);
  }
 
  return new Promise((resolve, reject) => {
    const newSocket = createConnection(SOCKET_PATH);
    newSocket.on('connect', () => resolve(newSocket));
    newSocket.on('error', reject);
  });
}
 
function releaseConnection(socket: Socket) {
  if (!socket.destroyed && pool.length < POOL_SIZE) {
    pool.push(socket);
  } else {
    socket.destroy();
  }
}

Binary Protocol

For even better performance, consider using a binary protocol instead of JSON:

// Simple length-prefixed binary protocol
function encodeMessage(obj: object): Buffer {
  const json = JSON.stringify(obj);
  const length = Buffer.byteLength(json);
  const buffer = Buffer.alloc(4 + length);
  buffer.writeUInt32BE(length, 0);
  buffer.write(json, 4);
  return buffer;
}

Alternatives Considered

Redis

Redis is the obvious choice for shared caching, but it:

  • Requires running another service (and managing it)
  • Has network overhead (even on localhost, TCP adds latency)
  • Is overkill when you just need simple cross-worker state

For large-scale deployments or when you need persistence, Redis is still the right choice. But for single-server cluster deployments, IPC sockets are simpler.

Node.js Built-in IPC

Node.js cluster has built-in IPC via worker.send(), but:

  • Messages go through the primary process (bottleneck)
  • No built-in request/response pattern
  • Primary process becomes a single point of failure

Shared Memory

Node.js doesn't have great native shared memory support. Libraries exist, but they're:

  • Platform-specific
  • Complex to set up correctly
  • Often require native modules

Unix domain sockets hit the sweet spot: fast, simple, and universally supported.

Conclusion

When running Node.js in cluster mode and you need shared state between workers, a separate cache process with Unix domain sockets is an elegant solution. It's fast (kernel-level IPC), simple (just a socket file), and avoids the complexity of Redis for single-server deployments.

This pattern also works for:

  • Rate limiting across workers (shared counters)
  • Session coordination (sticky sessions without load balancer support)
  • Distributed locks (ensure only one worker processes a job)
  • Real-time counters (live user counts, etc.)

The implementation above is intentionally minimal. For production use, you might want to add:

  • Health checks and automatic cache server restart
  • Automatic reconnection in the client
  • Cache eviction policies (LRU, etc.)
  • Metrics and logging

But for many use cases, this simple approach is all you need.