Remote Execution
By default, the scheduler schedules, dispatches, and executes trains all within the same process. Remote execution lets you separate where trains are scheduled from where they run, offloading execution to dedicated worker servers, AWS Lambda, ECS tasks, or any other compute.
Key Concept
Postgres is always the source of truth. Every deployment model, local, remote, or standalone, connects to the same Postgres database for metadata, manifests, and state. The only thing that changes is where the train code runs.
┌──────────────────────────────────────────────────────────────────────────┐
│ Shared PostgreSQL │
│ │
│ trax.manifest trax.metadata trax.work_queue trax.dead_letter │
│ (schedules) (job state) (dispatch queue) (failed jobs) │
└──────────────────────────────┬───────────────────────────────────────────┘
│
┌──────────────┼───────────────┬────────────────┬────────────────┐
│ │ │ │ │
Local Workers Remote Workers Lambda Workers SQS Workers Standalone Workers
(same process) (HTTP push) (direct SDK) (SQS + Lambda) (separate process)
Two abstraction boundaries control where trains execute:
For queued trains (queue* mutations, scheduled jobs): The IJobSubmitter interface controls where the JobDispatcher sends work.
| Implementation | What it does |
|---|---|
PostgresJobSubmitter | Inserts into background_job table (default when Postgres is configured) |
HttpJobSubmitter | POSTs to a remote HTTP endpoint (used by UseRemoteWorkers) |
SqsJobSubmitter | Sends to an SQS queue for Lambda consumption (used by UseSqsWorkers, requires Trax.Scheduler.Sqs) |
LambdaJobSubmitter | Invokes an AWS Lambda function directly via SDK (used by UseLambdaWorkers, requires Trax.Scheduler.Lambda) |
InMemoryJobSubmitter | Runs inline, synchronously (automatic default when no database provider is configured) |
| Custom | Implement IJobSubmitter and register via OverrideSubmitter() |
For run trains (run* mutations, queries): The IRunExecutor interface controls where direct execution happens.
| Implementation | What it does |
|---|---|
LocalRunExecutor | Executes in-process via ITrainBus.RunAsync (default) |
HttpRunExecutor | POSTs to a remote HTTP endpoint, blocks until complete (used by UseRemoteRun) |
LambdaRunExecutor | Invokes an AWS Lambda function directly via SDK, blocks until complete (used by UseLambdaRun, requires Trax.Scheduler.Lambda) |
Deployment Models
Model 1: Local Workers
Everything runs on one process. This is the default and simplest setup.
services.AddTrax(trax => trax
.AddEffects(effects => effects
.UsePostgres(connectionString)
)
.AddMediator(assemblies)
.AddScheduler(scheduler => scheduler
.ConfigureLocalWorkers(opts => opts.WorkerCount = 8)
.Schedule<IMyTrain, MyInput>("my-job", new MyInput(), Every.Minutes(5))
)
);
┌─────────────────── Single Process ───────────────────┐
│ │
│ ManifestManager ──→ WorkQueue ──→ JobDispatcher │
│ │ │
│ ▼ │
│ PostgresJobSubmitter │
│ │ │
│ ▼ │
│ background_job table │
│ │ │
│ ▼ │
│ LocalWorkerService │
│ (N worker tasks) │
│ │ │
│ ▼ │
│ JobRunnerTrain │
│ └─→ Your Train │
└───────────────────────────────────────────────────────┘
When to use: Most applications. Simple, no network hops, easy to debug. Start here and scale out only when you need to.
Model 2: Remote Workers (Push-Based)
The scheduler dispatches jobs via HTTP POST to a remote endpoint. The remote process receives the request and runs the train.
Scheduler side:
services.AddTrax(trax => trax
.AddEffects(effects => effects
.UsePostgres(connectionString)
)
.AddMediator(assemblies)
.AddScheduler(scheduler => scheduler
.UseRemoteWorkers(
remote =>
{
remote.BaseUrl = "https://my-workers.example.com/trax/execute";
remote.Timeout = TimeSpan.FromSeconds(60);
},
routing => routing.ForTrain<IMyTrain>())
// Optional: also offload run* mutations to the remote endpoint
.UseRemoteRun(remote =>
remote.BaseUrl = "https://my-workers.example.com/trax/run"
)
.Schedule<IMyTrain, MyInput>("my-job", new MyInput(), Every.Minutes(5))
)
);
Remote side (ASP.NET Core host):
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddTrax(trax => trax
.AddEffects(effects => effects
.UsePostgres(connectionString)
.UseBroadcaster(b => b.UseRabbitMq(rabbitMqConnectionString))
)
.AddMediator(typeof(MyTrain).Assembly)
);
builder.Services.AddTraxJobRunner();
var app = builder.Build();
app.UseTraxJobRunner("/trax/execute"); // queue path
app.UseTraxRunEndpoint("/trax/run"); // synchronous run path
app.Run();
The UseBroadcaster() call is essential for cross-process subscriptions, without it, the API process has no way to receive lifecycle events from the remote worker. See UseBroadcaster for details.
Remote side (AWS Lambda with Trax.Runner.Lambda):
For Lambda deployments, consider using Lambda Workers (Direct Invocation) instead, it eliminates public endpoints entirely. The HTTP model shown here requires a Function URL or API Gateway, which creates a publicly-routable endpoint.
If you do use HTTP with Lambda, the TraxLambdaFunction base class handles service provider lifecycle, envelope-based dispatching, cancellation from Lambda’s remaining time, and error handling. RunLocalAsync() exposes HTTP endpoints for local development. See the TraxLambdaFunction API reference for details.
┌──── Scheduler Process ────┐ ┌──── Remote Process ────────────┐
│ │ │ │
│ ManifestManager │ │ POST /trax/execute │
│ JobDispatcher │ │ │ │
│ │ │ │ ▼ │
│ ▼ │ HTTP │ JobRunnerTrain │
│ HttpJobSubmitter ─────────┼────────→│ └─→ Your Train │
│ │ POST │ │
└────────────────────────────┘ └────────────────────────────────┘
│
▼
Shared PostgreSQL
When to use:
- Serverless compute (AWS Lambda, Google Cloud Run, Azure Functions), trains only run when invoked, zero idle cost
- Isolation: trains run in a separate security boundary or VPC
- Heterogeneous compute: different train types need different hardware (GPU, high memory)
- Scaling: the remote endpoint can auto-scale independently of the scheduler
Sample: See Trax.Samples.ContentShield.Api and Trax.Samples.ContentShield.Runner in the samples/EphemeralWorkers/ directory of the Trax.Samples repository. The API serves GraphQL and dispatches queued mutations to the Runner via HTTP. No background_job table, no DB polling. The Runner uses UseBroadcaster with RabbitMQ so GraphQL subscriptions on the API are notified when queued trains complete.
Model 2b: SQS Workers (Queue-Based, AWS Lambda)
Like Remote Workers but with a durable SQS queue between the scheduler and workers. The scheduler sends RemoteJobRequest messages to SQS, and Lambda functions consume them. This adds guaranteed delivery, automatic retries, dead-letter queues, and backpressure that HTTP dispatch lacks.
Requires the Trax.Scheduler.Sqs package.
Scheduler side:
using Trax.Scheduler.Sqs.Extensions;
services.AddTrax(trax => trax
.AddEffects(effects => effects
.UsePostgres(connectionString)
)
.AddMediator(assemblies)
.AddScheduler(scheduler => scheduler
.UseSqsWorkers(
sqs => sqs.QueueUrl = "https://sqs.us-east-1.amazonaws.com/123456789/trax-jobs",
routing => routing.ForTrain<IMyTrain>())
// Optional: keep UseRemoteRun for synchronous mutations
.UseRemoteRun(remote =>
remote.BaseUrl = "https://my-runner.example.com/trax/run"
)
.Schedule<IMyTrain, MyInput>("my-job", new MyInput(), Every.Minutes(5))
)
);
Lambda consumer:
using Trax.Scheduler.Sqs.Lambda;
public class Function
{
private static readonly IServiceProvider Services = BuildServiceProvider();
private readonly SqsJobRunnerHandler _handler = new(Services);
public async Task FunctionHandler(SQSEvent sqsEvent, ILambdaContext context)
{
await _handler.HandleAsync(sqsEvent, context.CancellationToken);
}
}
┌──── Scheduler Process ────┐ ┌──── SQS ────┐ ┌── Lambda ──────────────┐
│ │ │ │ │ │
│ ManifestManager │ │ trax-jobs │ │ SqsJobRunnerHandler │
│ JobDispatcher │ │ queue │ │ │ │
│ │ │ SQS │ │ SQS │ ▼ │
│ ▼ │ Send │ │ Event │ JobRunnerTrain │
│ SqsJobSubmitter ──────────┼────────→│ │──────→│ └─→ Your Train │
│ │ │ │ │ │
└────────────────────────────┘ └──────────────┘ └────────────────────────┘
│
▼
Shared PostgreSQL
When to use:
- AWS Lambda: event-driven, auto-scaling, zero idle cost with durable message delivery
- Guaranteed delivery: SQS retries failed messages and dead-letters after max retries
- Backpressure: SQS buffers burst traffic; Lambda drains at a controlled rate
- High volume: thousands of concurrent jobs without overwhelming endpoints
Sample: The SQS transport is not yet production-ready. See the Lambda Workers model for the recommended Lambda deployment pattern.
Model 2c: Lambda Workers (Direct Invocation)
Like Remote Workers but without any public endpoint. The scheduler invokes the Lambda function directly via the AWS SDK. No API Gateway, Function URLs, or HTTP endpoints. Access is controlled entirely by IAM policies.
Requires the Trax.Scheduler.Lambda package.
Scheduler side:
using Trax.Scheduler.Lambda.Extensions;
services.AddTrax(trax => trax
.AddEffects(effects => effects
.UsePostgres(connectionString)
)
.AddMediator(assemblies)
.AddScheduler(scheduler => scheduler
.UseLambdaWorkers(
lambda => lambda.FunctionName = "content-shield-runner",
routing => routing
.ForTrain<IReviewContentTrain>()
.ForTrain<ISendViolationNoticeTrain>())
// Optional: also offload run* mutations to Lambda
.UseLambdaRun(lambda => lambda.FunctionName = "content-shield-runner")
)
);
Lambda function:
using Amazon.Lambda.Core;
using Amazon.Lambda.Serialization.SystemTextJson;
using Trax.Runner.Lambda;
[assembly: LambdaSerializer(typeof(DefaultLambdaJsonSerializer))]
public class Function : TraxLambdaFunction
{
protected override void ConfigureServices(IServiceCollection services, IConfiguration configuration)
{
var connString = configuration.GetConnectionString("TraxDatabase")!;
var rabbitMqConnString = configuration.GetConnectionString("RabbitMQ")!;
services.AddTrax(trax => trax
.AddEffects(effects => effects
.SkipMigrations()
.UsePostgres(connString)
.UseBroadcaster(b => b.UseRabbitMq(rabbitMqConnString)))
.AddMediator(typeof(MyTrain).Assembly));
}
}
The TraxLambdaFunction base class receives a LambdaEnvelope payload directly from the SDK. No HTTP routing is needed. The envelope’s Type field determines whether the request is a fire-and-forget job execution (Execute) or a synchronous train run (Run). See the TraxLambdaFunction API reference for details.
┌──── Scheduler Process ────┐ ┌──── AWS Lambda ─────────────────┐
│ │ │ │
│ ManifestManager │ │ LambdaEnvelope (Execute/Run) │
│ JobDispatcher │ │ │ │
│ │ │ SDK │ ▼ │
│ ▼ │ Invoke │ TraxLambdaFunction │
│ LambdaJobSubmitter ───────┼────────→│ └─→ ITraxRequestHandler │
│ │ │ └─→ Your Train │
└────────────────────────────┘ └─────────────────────────────────┘
│
▼
Shared PostgreSQL
Two invocation modes:
| Mode | InvocationType | Behavior |
|---|---|---|
| Execute (queued trains) | Event | Fire-and-forget. scheduler gets 202, Lambda runs async |
| Run (synchronous trains) | RequestResponse | Scheduler blocks until Lambda completes and returns output |
When to use:
- AWS Lambda: direct SDK invocation, no public endpoints, access governed by IAM
- Security-sensitive workloads: no Function URLs or API Gateway; the Lambda is never publicly reachable
- Simpler infrastructure: fewer AWS resources to manage (no API Gateway, no Function URL configuration)
- Lower latency: direct invocation avoids the API Gateway routing layer
Local development: For local dev and testing, RunLocalAsync() starts a Kestrel server that exposes the same /trax/execute and /trax/run HTTP endpoints. Use UseRemoteWorkers() + UseRemoteRun() on the scheduler side during development, then switch to UseLambdaWorkers() + UseLambdaRun() for production deployment.
IAM permissions: The scheduler process needs lambda:InvokeFunction on the target function ARN. The Lambda execution role needs its normal permissions (database access, etc.).
Payload size limit: Lambda invocation payloads are limited to 256 KB (synchronous) and 256 KB (async). If your serialized train input exceeds this, store the data externally and pass a reference.
Sample: See Trax.Samples.ContentShield.Api and Trax.Samples.ContentShield.Runner in the samples/EphemeralWorkers/ directory of the Trax.Samples repository. The sample uses UseRemoteWorkers() for local development with commented-out UseLambdaWorkers() configuration for production deployment.
Model 3: Standalone Workers (Poll-Based)
A separate, always-on process polls the background_job table and runs trains. No scheduler logic, just execution.
Scheduler side (scheduling only, no local execution):
services.AddTrax(trax => trax
.AddEffects(effects => effects
.UsePostgres(connectionString)
)
.AddMediator(assemblies)
.AddScheduler(scheduler => scheduler
// Register PostgresJobSubmitter without starting local workers.
// Jobs are written to background_job and picked up by the worker process.
.OverrideSubmitter(s => s.AddScoped<IJobSubmitter, PostgresJobSubmitter>())
.Schedule<IMyTrain, MyInput>("my-job", new MyInput(), Every.Minutes(5))
)
);
Tip:
OverrideSubmitterwithPostgresJobSubmittergives you a scheduler that only writes to thebackground_jobtable. NoLocalWorkerServiceis started. By default (withoutOverrideSubmitter), local workers are started automatically when Postgres is configured, and you can run standalone workers alongside them for horizontal scaling.
Standalone worker process:
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddTrax(trax => trax
.AddEffects(effects => effects
.UsePostgres(connectionString)
)
.AddMediator(typeof(MyTrain).Assembly)
);
builder.Services.AddTraxWorker(opts => opts.WorkerCount = 4);
var app = builder.Build();
app.Run();
┌──── Scheduler ────────────┐ ┌──── Standalone Worker ─────────┐
│ │ │ │
│ ManifestManager │ │ LocalWorkerService │
│ JobDispatcher │ │ (4 worker tasks) │
│ │ │ │ │ │
│ ▼ │ │ ▼ │
│ PostgresJobSubmitter │ │ SELECT ... FOR UPDATE │
│ │ │ │ SKIP LOCKED │
│ ▼ │ │ │ │
│ background_job ───────────┼─────────┼──→ JobRunnerTrain │
│ table │ same │ └─→ Your Train │
└────────────────────────────┘ DB └────────────────────────────────┘
When to use:
- Separate servers: dedicated worker machines with different specs
- Horizontal scaling: run multiple worker processes, each polling the same table (PostgreSQL
SKIP LOCKEDprevents duplicates) - Process isolation: scheduler crash doesn’t kill in-flight trains
- Kubernetes/ECS: deploy workers as a separate service with independent scaling
Sample: See Trax.Samples.EnergyHub.Hub and Trax.Samples.EnergyHub.Worker in the samples/DistributedWorkers/ directory of the Trax.Samples repository for a working example. The Hub combines GraphQL API, scheduler, and dashboard in one process while offloading all train execution to the Worker.
Which Model Should I Use?
| Scenario | Recommended Model |
|---|---|
| Single-server deployment | Local Workers: simplest setup, no network overhead |
| Separate worker servers (always running) | Standalone Workers: poll-based, no HTTP layer needed |
| AWS Lambda (recommended) | Lambda Workers: direct SDK invocation, no public endpoints, IAM-governed |
| AWS Lambda with durable queuing | SQS Workers: guaranteed delivery, retries, DLQ, auto-scaling |
| Google Cloud Run / Azure Functions | Remote Workers: push-based HTTP, matches serverless event model |
| Different hardware per train type | Remote Workers: route to GPU/high-memory endpoints |
| Security-sensitive Lambda workloads | Lambda Workers: no Function URL or API Gateway needed |
| Just getting started | Local Workers: scale out later when you need to |
You can also mix models. For example, run local workers for fast trains and remote workers for expensive GPU trains, using per-train routing with ForTrain<T>():
.AddScheduler(scheduler => scheduler
.ConfigureLocalWorkers(opts => opts.WorkerCount = 8)
.UseRemoteWorkers(
remote => remote.BaseUrl = "https://gpu-workers/trax/execute",
routing => routing
.ForTrain<IHeavyComputeTrain>()
.ForTrain<IAiInferenceTrain>())
)
Trains not routed via ForTrain<T>() or [TraxRemote] execute locally.
Authentication
Trax does not bake in any authentication mechanism. Both the scheduler and remote sides use standard ASP.NET patterns:
Scheduler side: configure the HttpClient used by HttpJobSubmitter:
.UseRemoteWorkers(
remote =>
{
remote.BaseUrl = "https://my-workers.example.com/trax/execute";
// Bearer token
remote.ConfigureHttpClient = client =>
client.DefaultRequestHeaders.Add("Authorization", "Bearer my-token");
// Or API key
remote.ConfigureHttpClient = client =>
client.DefaultRequestHeaders.Add("X-Api-Key", "my-key");
// Or any custom header your endpoint expects
remote.ConfigureHttpClient = client =>
client.DefaultRequestHeaders.Add("X-Custom-Header", "value");
},
routing => routing.ForTrain<IMyTrain>())
Remote side: use ASP.NET middleware:
var app = builder.Build();
// Your choice of auth middleware:
app.UseAuthentication();
app.UseAuthorization();
app.UseTraxJobRunner("/trax/execute");
app.Run();
Or restrict the endpoint directly:
app.UseTraxJobRunner("/trax/execute").RequireAuthorization();
This keeps Trax focused on scheduling and execution while letting you use whatever auth strategy your infrastructure requires: API keys, JWT tokens, mTLS, IAM roles, or nothing at all.
Host Tracking
Every metadata record automatically captures which host executed the train, hostname, environment type (Lambda, ECS, Kubernetes, etc.), and instance ID. This works across all deployment models with zero configuration. You can also add custom labels (region, service, team) via the builder API.
See Host Tracking for details on auto-detection, custom labels, and querying by host.
Shared Requirements
Regardless of deployment model, every process that executes trains must:
- Reference the same train assemblies: the train types are resolved by fully-qualified name
- Connect to the same Postgres database: metadata, manifests, and state are shared
- Register the effect system:
AddTrax()withUsePostgres()andAddMediator()
Failure Handling
When the JobDispatcher dispatches a job, the Metadata record is committed to the database before the job is submitted to the worker. This is necessary because the worker needs to read the Metadata. However, if the submission fails (network timeout, remote worker unreachable, throttling), the Metadata would be orphaned in Pending state.
Trax handles this with multiple layers of protection:
1. Retry with Exponential Backoff
All remote submitters (HTTP and Lambda) retry on transient failures with exponential backoff and jitter.
HttpJobSubmitter and HttpRunExecutor retry on HTTP 429, 502, and 503. If the server sends a Retry-After header, the helper uses that instead of the computed backoff delay.
LambdaJobSubmitter and LambdaRunExecutor retry on AWS status codes 429 (Throttling), 502, 503, and 504, plus network-level HttpRequestException.
| Option | Default | Description |
|---|---|---|
Retry.MaxRetries | 5 | Maximum retry attempts before giving up |
Retry.BaseDelay | 1 second | Starting delay, doubled on each attempt |
Retry.MaxDelay | 30 seconds | Cap on exponential growth |
Configure via the options object on each transport:
// HTTP
.UseRemoteWorkers(
remote =>
{
remote.BaseUrl = "https://my-workers.example.com/trax/execute";
remote.Retry.MaxRetries = 10;
remote.Retry.BaseDelay = TimeSpan.FromSeconds(2);
},
routing => routing.ForTrain<IMyTrain>())
// Lambda
.UseLambdaWorkers(
lambda =>
{
lambda.FunctionName = "my-runner";
lambda.Retry.MaxRetries = 10;
lambda.Retry.BaseDelay = TimeSpan.FromSeconds(2);
},
routing => routing.ForTrain<IMyOtherTrain>())
Set MaxRetries = 0 to disable retries entirely.
2. Dispatch Requeue
If the HTTP request still fails after exhausting retries, the work queue entry is automatically reset to Queued status so the next dispatcher cycle can try again. Each failed attempt:
- Marks the orphaned Metadata as
Failed(immutable audit record) - Increments
dispatch_attemptson the work queue entry - Resets
statustoQueued, clearsmetadata_idanddispatched_at - On the next dispatch cycle, a new Metadata row is created
After MaxDispatchAttempts failures, the entry stays in Dispatched status and feeds into the dead letter pipeline.
.AddScheduler(scheduler => scheduler
.MaxDispatchAttempts(10) // default: 5
// ...
)
Set MaxDispatchAttempts(0) to disable requeuing (immediate failure, matching pre-1.2.0 behavior).
3. Stale Pending Reaper
The ManifestManager runs a ReapStalePendingMetadataJunction on every polling cycle. Any Metadata that has been in Pending state longer than StalePendingTimeout (default: 20 minutes) is automatically marked as Failed. This catches edge cases where the remote worker received the job but crashed before updating the Metadata.
.AddScheduler(scheduler => scheduler
.StalePendingTimeout(TimeSpan.FromMinutes(10))
// ...
)
Or at runtime via the Dashboard under Server Settings > Job Settings > Stale Pending Timeout.
4. Stale InProgress Reaper
The ManifestManager also runs a ReapStaleInProgressMetadataJunction on every polling cycle. Any Metadata that has been in InProgress state longer than StaleInProgressTimeout (default: 60 minutes) is automatically marked as Failed. This catches hard crashes where the worker dies without reaching FinishServiceTrain: Lambda hard-kills, OOM events, or process crashes that bypass all .NET exception handling.
.AddScheduler(scheduler => scheduler
.StaleInProgressTimeout(TimeSpan.FromMinutes(45))
// ...
)
This timeout should be longer than DefaultJobTimeout (default: 20 minutes) to give cooperative cancellation time to propagate before force-failing. The ordering in the ManifestManager pipeline is: CancelTimedOutJobsJunction (cooperative cancel) → ReapStalePendingMetadataJunction → ReapStaleInProgressMetadataJunction (force-fail) → ReapFailedJobsJunction (dead-letter).
5. Dead-Lettering
After MaxRetries failed executions (distinct from dispatch attempts), the ManifestManager creates a DeadLetter record and marks the manifest as AwaitingIntervention. Dead letters can be resolved via the Dashboard or programmatically.
Failed metadata feeds into the normal retry pipeline, if the manifest has retries remaining, the ManifestManager will create a new work queue entry on the next cycle.
Tuning for Throttled Environments
When deploying to capacity-limited backends (e.g., AWS Lambda with reserved concurrency), align these settings:
| Setting | Recommendation |
|---|---|
MaxConcurrentDispatch | Match or stay below the backend’s concurrency limit |
MaxActiveJobs | Match the backend’s concurrency limit to prevent dispatch overwhelming |
Retry.MaxRetries | 5-10 for throttle-heavy environments |
MaxDispatchAttempts | 5-10 to cover longer outages |
Structured Error Propagation
When a train fails on a remote worker, Trax preserves the full exception context across the HTTP boundary. Both endpoints (/trax/execute and /trax/run) return structured error responses with:
| Field | Description |
|---|---|
IsError | Whether the execution failed |
ErrorMessage | The error message |
ExceptionType | The .NET exception type name (e.g., "InvalidOperationException") |
FailureJunction | The train junction where the failure occurred (extracted from TrainExceptionData) |
StackTrace | The remote stack trace |
On the API side, HttpJobSubmitter and HttpRunExecutor read the response body and reconstruct a TrainException with the structured data intact. Metadata.AddException() populates FailureException, FailureJunction, FailureReason, and StackTrace from the reconstructed exception. Locally-executed trains attach this data via Exception.Data["TrainExceptionData"]; remote trains carry it as JSON in the exception message instead.
Runner Process API Process
───────────────── ───────────────────
Train fails with exception
│
▼
TraxRequestHandler catches exception
Extracts: Type, Junction, Message, Stack
│
▼
RemoteRunResponse / RemoteJobResponse
(structured error fields)
│
├───── HTTP 200 + JSON body ──────→ HttpRunExecutor / HttpJobSubmitter
reads response body
│
▼
Reconstructs TrainException
with TrainExceptionData JSON
│
▼
Metadata.AddException() parses
into structured failure fields
If the HTTP call itself fails (network error, infrastructure 5xx before reaching the endpoint), the error body is read and included in the exception message for debugging, you’ll see the HTTP status code and the response body rather than a generic “500 Internal Server Error”.
Debugging Remote Failures
When a remote job fails, check these in order:
- Metadata table:
SELECT failure_exception, failure_junction, failure_reason, stack_trace FROM trax.metadata WHERE id = <id>. These fields are populated from the structured error response. - Log table:
SELECT * FROM trax.log WHERE metadata_id = <id> ORDER BY id. IfAddDataContextLogging()is enabled on the runner, junction-level logs are persisted. - Stale pending check: If
failure_exception = 'StalePendingTimeout', the runner never started executing. Check runner health, network connectivity, and deployment status.
Limitations
- Cancellation is process-local. The
ICancellationRegistryis in-memory. Dashboard “Cancel” only cancels trains running on the same process as the dashboard. Remote trains cannot be cancelled via the dashboard in v1. - Type resolution requires shared assemblies. The remote process must reference the same NuGet packages and assemblies that define your train types. Types are resolved by fully-qualified name from loaded assemblies.
See Also
- Job Submission: architecture of the job submission pipeline
- ConfigureLocalWorkers: API reference for local worker configuration
- UseRemoteWorkers: API reference for remote workers (HTTP)
- UseLambdaWorkers: API reference for Lambda workers (direct SDK invocation)
- UseLambdaRun: API reference for Lambda run execution (direct SDK invocation)
- UseSqsWorkers: API reference for SQS workers (Lambda)
- UseRemoteRun: API reference for remote run execution (HTTP)
- AddTraxJobRunner: API reference for remote receiver setup
- AddTraxWorker: API reference for standalone worker setup