Process management

XMTP SDKs have a few guardrails in place to prevent crashes. However, it's difficult to guarantee 100% uptime for long-running processes. For that reason, we recommend using a process manager like PM2 to ensure proper restart behavior and logging.

Installation

Install PM2 as a dependency:

npm

npm i pm2

Ecosystem config file

Create ecosystem.config.cjs with the following structure:

const path = require("path");
 
const projectRoot = path.resolve(__dirname, "../../..");
 
module.exports = {
  apps: [
    {
      name: "<bot-name>",
      script: "node_modules/.bin/tsx",
      args: "src/index.ts",
      cwd: projectRoot,
      autorestart: true,
      max_memory_restart: "1G",
      error_file: "./logs/pm2-<bot-name>-error.log",
      out_file: "./logs/pm2-<bot-name>-out.log",
      restart_delay: 4000,
      min_uptime: 1000,
      unstable_restarts: 10000, // CRITICAL: Prevents PM2 from stopping restarts
      env: {
        NODE_ENV: "production",
      },
    },
  ],
};

Critical config settings

unstable_restarts: 10000 - REQUIRED. Without this, PM2 will stop restarting if it detects the process as "unstable" (crashing too quickly). This allows PM2 to keep restarting even during rapid crash cycles.
min_uptime: 1000 - Process must run at least 1 second to be considered stable
restart_delay: 4000 - Wait 4 seconds before restarting (prevents rapid restart loops)

Agent code pattern

1. Restart logging (FIRST THING)

Add restart logging as the very first line of code, before any async operations:

Node

// Immediate synchronous log - FIRST THING that runs
console.log(
  `[RESTART] <Bot Name> bot starting - PID: ${process.pid} at ${new Date().toISOString()}`,
);

This log appears immediately when PM2 restarts the process, making it easy to track restarts in logs.

2. Error handlers

Add this error handler after agent creation but before agent.start():

Node

// Handle agent-level unhandled errors
agent.on("unhandledError", (error) => {
  console.error("<Bot Name> bot fatal error:", error);
  if (error instanceof Error) {
    console.error("Error stack:", error.stack);
  }
  console.error("Exiting process - PM2 will restart");
  process.exit(1);
});

Note: Process-level handlers (uncaughtException, unhandledRejection) are typically commented out, as agent.on("unhandledError") handles most cases.

Troubleshooting

PM2 shows "waiting restart" but never restarts

Symptom: PM2 status shows "waiting restart" but no new process spawns.

Solution: Add unstable_restarts: 10000 to ecosystem config. PM2 is blocking restarts because it thinks the process is unstable.

No restart logs appearing

Symptom: Process crashes but [RESTART] logs never appear.

Solution:

Verify [RESTART] log is the very first line of code
Check PM2 config has unstable_restarts: 10000
Check PM2 logs: pm2 logs <bot-name> --raw

Process restarts too quickly

Symptom: Process restarts in a tight loop.

Solution: Increase restart_delay to give the process time to initialize before allowing another restart.

Application exits early

When PM2 runs in daemon mode, pm2 start exits right after launching the processes. In container or PaaS environments, this can make the platform think your app has finished and should shut down.

To keep PM2 in the foreground, use pm2-runtime instead of pm2 start:

Node

pm2-runtime ecosystem.config.cjs

This keeps the PM2 process running in the foreground, which is typically required by PaaS platforms such as Render or Heroku.