How to Deploy an AI Agent Worker to Railway

Running an AI agent on your laptop works fine for testing. The problem starts the moment you close the lid. For an autonomous workflow to be useful, it needs to run continuously, restart automatically when it crashes, and give you visibility into what it is doing. This tutorial shows you how to package a Python AI agent as a persistent worker service on Railway, with proper environment management and logging.

By the end you will have a worker that starts on push, restarts on failure, reads secrets from environment variables, and streams logs you can tail from anywhere.

What you will need

A Railway account (free tier works for this tutorial)
The Railway CLI installed (npm install -g @railway/cli)
A Python 3.11 project with a working agent script
A GitHub repository for your project

Step 1: Structure your project

Railway deploys whatever is in your repository. The directory layout below keeps things clean:

my-agent/
  agent.py          # your main agent loop
  requirements.txt  # Python dependencies
  Procfile          # tells Railway how to start the service
  railway.toml      # optional: service configuration

If you are starting from scratch, create the files now:

mkdir my-agent && cd my-agent
git init
touch agent.py requirements.txt Procfile railway.toml

Step 2: Write a minimal agent loop

A Railway worker is just a long-running process. The simplest pattern is an infinite loop with a sleep:

# agent.py
import os
import time
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
)

log = logging.getLogger(__name__)

API_KEY = os.environ["OPENAI_API_KEY"]
POLL_INTERVAL = int(os.environ.get("POLL_INTERVAL_SECONDS", "60"))


def run_once():
    # Replace this with your actual agent logic
    log.info("Agent tick — checking for work")
    # e.g. pull from a queue, call an API, write results to a database


if __name__ == "__main__":
    log.info("Agent worker starting")
    while True:
        try:
            run_once()
        except Exception as exc:
            log.error("Tick failed: %s", exc)
        time.sleep(POLL_INTERVAL)

Two things to note:

Secrets come from environment variables, never hardcoded.
Exceptions inside run_once are caught and logged so the outer loop keeps running. Railway will restart the whole process if it exits with a non-zero code, but you do not want a bad API response to kill your worker every minute.

Step 3: Add a Procfile

Railway uses a Procfile to know what command to run. For a worker (no HTTP port) use the worker process type:

worker: python agent.py

If your agent also exposes a health endpoint (recommended), use web instead and bind to $PORT:

web: python agent.py

Then in agent.py, start a minimal HTTP server in a background thread:

import threading
from http.server import BaseHTTPRequestHandler, HTTPServer

class HealthHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"ok")

    def log_message(self, *args):
        pass  # silence access logs


def start_health_server():
    port = int(os.environ.get("PORT", "8080"))
    server = HTTPServer(("0.0.0.0", port), HealthHandler)
    server.serve_forever()


threading.Thread(target=start_health_server, daemon=True).start()

This lets Railway's health checks confirm the service is alive, and it enables the uptime monitoring approach covered in the monitoring tutorial.

Step 4: Pin your dependencies

# requirements.txt
openai==1.30.1

Pin exact versions. Unpinned dependencies will eventually break your deploy when a package releases a breaking change and you are not watching.

Step 5: Add railway.toml

This file is optional but useful. It locks the Python version and sets the restart policy:

[build]
builder = "nixpacks"

[deploy]
restartPolicyType = "on_failure"
restartPolicyMaxRetries = 10

on_failure means Railway restarts the container if it exits with a non-zero code. Without this, a crashed agent stays dead until you notice.

Step 6: Set environment variables

Never commit secrets to Git. Add them in Railway's dashboard or via the CLI.

railway login
railway link

Set your variables:

railway variables set OPENAI_API_KEY=sk-...
railway variables set POLL_INTERVAL_SECONDS=30

To verify what is set:

railway variables

Variables are injected at runtime. Your agent reads them with os.environ["KEY"].

Step 7: Deploy

Push to GitHub and connect the repository in the Railway dashboard, or deploy directly from the CLI:

railway up

Railway will:

Detect Python via Nixpacks
Install requirements.txt
Run the command from your Procfile
Stream build logs to your terminal

A successful deploy ends with something like:

✔  Build complete
✔  Deploy complete
    Service URL: https://my-agent-production.up.railway.app

Step 8: Tail logs

Once the worker is running, you want to see what it is doing. From the CLI:

railway logs

This streams live output. You should see your agent's log lines every POLL_INTERVAL_SECONDS seconds:

2026-03-27 09:00:01 INFO Agent worker starting
2026-03-27 09:00:01 INFO Agent tick — checking for work
2026-03-27 09:00:31 INFO Agent tick — checking for work

If the process crashed and restarted, you will see Railway's restart notice between log lines. That is the restart policy working.

Step 9: Handle configuration changes without a full redeploy

Environment variable changes take effect on the next restart. Force a restart without a code change:

railway redeploy

This is useful when you want to rotate an API key or change the poll interval without pushing a commit.

Common issues

Build fails with "No start command found": Railway could not find your Procfile. Check it is in the repository root and committed to Git.

Worker exits immediately: An unhandled exception at startup. Run railway logs to see the traceback. The most common cause is a missing environment variable: the os.environ["KEY"] form raises KeyError if the variable is not set, which exits the process. Use railway variables to confirm all required variables are present.

Logs stop appearing: The worker is still running but run_once is hanging. Add a timeout to any external API calls. For the openai library:

from openai import OpenAI
client = OpenAI(timeout=30.0)

Railway keeps restarting the service in a loop: Your agent is hitting restartPolicyMaxRetries. Check the logs for the root error. Once you fix it, the restart counter resets on the next successful run.

What to build next

A single worker polling on an interval covers a wide range of use cases: newsletter generation, data enrichment, queue processing, scheduled reports. Once this pattern is stable, the natural next steps are:

Replace the sleep loop with a proper queue (Redis, SQS, or a Postgres table) so multiple workers can process tasks in parallel without stepping on each other.
Add structured JSON logging so you can send logs to a log aggregation service and search across runs.
Wire the /health endpoint into an uptime monitor so you get alerted the moment the worker goes silent.

The deploy itself is the easy part. The work is in making the agent logic reliable enough that you stop thinking about it.