How to Deploy an AI Agent Worker to Railway
A step-by-step guide to packaging a Python AI agent as a persistent worker service on Railway, with environment management, health checks, and log tailing.
How to Deploy an AI Agent Worker to Railway
Running an AI agent on your laptop works fine for testing. The problem starts the moment you close the lid. For an autonomous workflow to be useful, it needs to run continuously, restart automatically when it crashes, and give you visibility into what it is doing. This tutorial shows you how to package a Python AI agent as a persistent worker service on Railway, with proper environment management and logging.
By the end you will have a worker that starts on push, restarts on failure, reads secrets from environment variables, and streams logs you can tail from anywhere.
What you will need
- A Railway account (free tier works for this tutorial)
- The Railway CLI installed (
npm install -g @railway/cli) - A Python 3.11 project with a working agent script
- A GitHub repository for your project
Step 1: Structure your project
Railway deploys whatever is in your repository. The directory layout below keeps things clean:
my-agent/
agent.py # your main agent loop
requirements.txt # Python dependencies
Procfile # tells Railway how to start the service
railway.toml # optional: service configuration
If you are starting from scratch, create the files now:
mkdir my-agent && cd my-agent
git init
touch agent.py requirements.txt Procfile railway.toml
Step 2: Write a minimal agent loop
A Railway worker is just a long-running process. The simplest pattern is an infinite loop with a sleep:
# agent.py
import os
import time
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(message)s",
)
log = logging.getLogger(__name__)
API_KEY = os.environ["OPENAI_API_KEY"]
POLL_INTERVAL = int(os.environ.get("POLL_INTERVAL_SECONDS", "60"))
def run_once():
# Replace this with your actual agent logic
log.info("Agent tick — checking for work")
# e.g. pull from a queue, call an API, write results to a database
if __name__ == "__main__":
log.info("Agent worker starting")
while True:
try:
run_once()
except Exception as exc:
log.error("Tick failed: %s", exc)
time.sleep(POLL_INTERVAL)
Two things to note:
- Secrets come from environment variables, never hardcoded.
- Exceptions inside
run_onceare caught and logged so the outer loop keeps running. Railway will restart the whole process if it exits with a non-zero code, but you do not want a bad API response to kill your worker every minute.
Step 3: Add a Procfile
Railway uses a Procfile to know what command to run. For a worker (no HTTP port) use the worker process type:
worker: python agent.py
If your agent also exposes a health endpoint (recommended), use web instead and bind to $PORT:
web: python agent.py
Then in agent.py, start a minimal HTTP server in a background thread:
import threading
from http.server import BaseHTTPRequestHandler, HTTPServer
class HealthHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b"ok")
def log_message(self, *args):
pass # silence access logs
def start_health_server():
port = int(os.environ.get("PORT", "8080"))
server = HTTPServer(("0.0.0.0", port), HealthHandler)
server.serve_forever()
threading.Thread(target=start_health_server, daemon=True).start()
This lets Railway's health checks confirm the service is alive, and it enables the uptime monitoring approach covered in the monitoring tutorial.
Step 4: Pin your dependencies
# requirements.txt
openai==1.30.1
Pin exact versions. Unpinned dependencies will eventually break your deploy when a package releases a breaking change and you are not watching.
Step 5: Add railway.toml
This file is optional but useful. It locks the Python version and sets the restart policy:
[build]
builder = "nixpacks"
[deploy]
restartPolicyType = "on_failure"
restartPolicyMaxRetries = 10
on_failure means Railway restarts the container if it exits with a non-zero code. Without this, a crashed agent stays dead until you notice.
Step 6: Set environment variables
Never commit secrets to Git. Add them in Railway's dashboard or via the CLI.
Log in and link your project:
railway login
railway link
Set your variables:
railway variables set OPENAI_API_KEY=sk-...
railway variables set POLL_INTERVAL_SECONDS=30
To verify what is set:
railway variables
Variables are injected at runtime. Your agent reads them with os.environ["KEY"].
Step 7: Deploy
Push to GitHub and connect the repository in the Railway dashboard, or deploy directly from the CLI:
railway up
Railway will:
- Detect Python via Nixpacks
- Install
requirements.txt - Run the command from your Procfile
- Stream build logs to your terminal
A successful deploy ends with something like:
✔ Build complete
✔ Deploy complete
Service URL: https://my-agent-production.up.railway.app
Step 8: Tail logs
Once the worker is running, you want to see what it is doing. From the CLI:
railway logs
This streams live output. You should see your agent's log lines every POLL_INTERVAL_SECONDS seconds:
2026-03-27 09:00:01 INFO Agent worker starting
2026-03-27 09:00:01 INFO Agent tick — checking for work
2026-03-27 09:00:31 INFO Agent tick — checking for work
If the process crashed and restarted, you will see Railway's restart notice between log lines. That is the restart policy working.
Step 9: Handle configuration changes without a full redeploy
Environment variable changes take effect on the next restart. Force a restart without a code change:
railway redeploy
This is useful when you want to rotate an API key or change the poll interval without pushing a commit.
Common issues
Build fails with "No start command found": Railway could not find your Procfile. Check it is in the repository root and committed to Git.
Worker exits immediately: An unhandled exception at startup. Run railway logs to see the traceback. The most common cause is a missing environment variable: the os.environ["KEY"] form raises KeyError if the variable is not set, which exits the process. Use railway variables to confirm all required variables are present.
Logs stop appearing: The worker is still running but run_once is hanging. Add a timeout to any external API calls. For the openai library:
from openai import OpenAI
client = OpenAI(timeout=30.0)
Railway keeps restarting the service in a loop: Your agent is hitting restartPolicyMaxRetries. Check the logs for the root error. Once you fix it, the restart counter resets on the next successful run.
What to build next
A single worker polling on an interval covers a wide range of use cases: newsletter generation, data enrichment, queue processing, scheduled reports. Once this pattern is stable, the natural next steps are:
- Replace the sleep loop with a proper queue (Redis, SQS, or a Postgres table) so multiple workers can process tasks in parallel without stepping on each other.
- Add structured JSON logging so you can send logs to a log aggregation service and search across runs.
- Wire the
/healthendpoint into an uptime monitor so you get alerted the moment the worker goes silent.
The deploy itself is the easy part. The work is in making the agent logic reliable enough that you stop thinking about it.