What are long running jobs?
A long-running job refers to any task that takes more than a few seconds to complete. Running such tasks directly within a web request can lead to timeouts and a poor user experience, as the request may hang or fail before the task finishes. To maintain application responsiveness and reliability, these jobs are usually offloaded to background processes or queued systems where they can run independently without blocking user interactions.
Why they are painful in traditional systems?
Imagine you’re baking a massive cake, but your oven can only bake one thing at a time. A “long-running job” is like that giant cake – it takes up the entire oven for a long time.
In traditional systems, these jobs are a problem because they tie up the whole system. While that one job is running, other, smaller tasks have to wait their turn. It’s like a line at a checkout counter where one person is paying with a million coins – everyone else is stuck waiting.
For example: A traditional system might have to generate a large number of payslips at the end of the month. This task could take several hours. During that time, the system might be too slow to process customer orders or update inventory, causing a bottleneck for everyone else who needs to use the system.
How Does Temporal.io Fit In?
Whether it’s a background task that retries payments, a multi-step user onboarding flow, or a report that takes 20 minutes to generate, traditional solutions often fail under real world complexity.
You end up cobbling together:
-
- Message queues (RabbitMQ)
-
- Background workers (Hangfire)
-
- Retry logic (often custom)
-
- State storage (usually a DB table)
-
- Monitoring (if you remember)
It works – until it doesn’t. Jobs get lost. Crashes corrupt progress. Debugging is a nightmare in traditional systems.
What is Temporal?
Temporal.io is a platform built to handle long-running, durable, resilient workflows – all through plain code. You define workflows and activities as regular functions. Temporal takes care of:
- Durability: State is persisted automatically
- Retries: Built-in and configurable
- Fault tolerance: Workers can crash, jobs won’t be lost
- Timeouts & cancellation: First-class features
- Observability: Comes with a web UI out of the box
Core Concepts in Temporal.io
Workflows
In Temporal, a workflow is a piece of code that orchestrates a series of tasks, known as Activities, to complete a larger process. Defines what needs to happen and in what order.
It is like movie script.
Example: “First welcome the customer, then process payment, then send confirmation.”
// workflows/onboardingWorkflow.ts
import { proxyActivities } from '@temporalio/workflow';
import type * as activities from '../activities';
const { sendWelcomeEmail, processPayment, sendConfirmation } = proxyActivities({
startToCloseTimeout: '5 minutes',
});
export async function onboardingWorkflow(customerId: string) {
await sendWelcomeEmail(customerId);
await processPayment(customerId);
await sendConfirmation(customerId);
}
Activities
They are the individual, specific tasks that the workflow tells them to do, like making a database call or sending an email. Activities do the actual work.
Each scene (activity) might succeed or fail.
Example: “Send welcome email” or “Charge credit card”.
// activities/index.ts
export async function sendWelcomeEmail(customerId: string) {
console.log(`Sending welcome email to ${customerId}`);
}
export async function processPayment(customerId: string) {
console.log(`Processing payment for ${customerId}`);
}
export async function sendConfirmation(customerId: string) {
console.log(`Sending confirmation email to ${customerId}`);
}
Workers
Workers are the machines that run your code, They are programs you deploy on your own servers or in the cloud. These basically executes the script.
If a worker stops, another can take over – the movie continues.
Example: A failed camera crew doesn’t stop the movie.
// worker.ts
import { Worker } from '@temporalio/worker';
import * as activities from './activities';
import * as workflows from './workflows/onboardingWorkflow';
async function runWorker() {
const worker = await Worker.create({
workflowsPath: require.resolve('./workflows/onboardingWorkflow'),
activities,
taskQueue: 'onboarding-queue',
});
await worker.run(); // Keeps polling task queue and executing tasks
}
runWorker().catch((err) => {
console.error('Worker failed:', err);
process.exit(1);
});
Task Queues
A Task Queue in Temporal is like a waiting room for tasks. When a Workflow needs to perform an Activity or continue its own logic, it places a task on a specific queue. Workers are constantly listening to these queues, and when they see a new task, they grab it and execute the corresponding code. This mechanism acts as a simple, powerful way to connect workflows and workers, ensuring that tasks are distributed and processed reliably.
Example: “Scene 5: charge payment – waiting for available crew.”
// client.ts
import { Connection, WorkflowClient } from '@temporalio/client';
import { onboardingWorkflow } from './workflows/onboardingWorkflow';
async function startWorkflow() {
const connection = await Connection.connect();
const client = new WorkflowClient({ connection });
const handle = await client.start(onboardingWorkflow, {
args: ['customer-123'],
taskQueue: 'onboarding-queue', // This maps to where the worker is polling
workflowId: 'onboarding-workflow-customer-123',
});
console.log(`Started workflow ${handle.workflowId}`);
}
startWorkflow().catch(console.error);
Handling Failures, Retries, and Timeouts Gracefully
Failure are everywhere in distributed systems – APIs go down, services restart, workers crash. In most systems, you have to glue together retry logic, persistence, and timeout tracking manually.
Temporal handles all of this out-of-the-box, with very little configuration.
Automatic Retries
By default, Temporal retries failed Activities automatically. If your activity throws an error or the worker crashes mid-execution, Temporal catches it and retries based on a built-in policy.
You don’t have to write try/catch, backoff logic, or state recovery.
Example:
// activities.ts
export async function unstableActivity() {
if (Math.random() < 0.5) {
throw new Error('Random failure');
}
console.log('Activity succeeded!');
}
// workflow.ts
import { proxyActivities } from '@temporalio/workflow';
import * as activities from '../activities';
const { unstableActivity } = proxyActivities({
startToCloseTimeout: '10s', // required
});
export async function retryDemoWorkflow() {
await unstableActivity(); // Temporal will retry this on failure
}
Custom Retry Policies
You can configure retry behavior with options like:
- Max attempts
- Backoff intervals
- Retryable/non-retryable error types
Example: Custom retry configuration
const { flakyTask } = proxyActivities({
startToCloseTimeout: '10s',
retry: {
maximumAttempts: 5,
initialInterval: '1s',
backoffCoefficient: 2,
nonRetryableErrorTypes: ['ValidationError'],
},
});
| Now flakyTask will retry 5 times with exponential backoff (1s, 2s, 4s, 8s…) unless it throws ValidationError.
Timeouts (Per Activity or Workflow)
Temporal lets you set precise timeouts for:
- Start-to-Close Timeout: max time for one activity run
- Schedule-to-Start Timeout: max time in queue before being picked up
- Workflow Run Timeout: overall workflow execution time
Example: Timeout configuration for Activity
const { slowApiCall } = proxyActivities({
startToCloseTimeout: '15s', // kill the activity if it runs longer than 15s
});
| If slowApiCall hangs for 30 seconds, Temporal will automatically fail it and retry (based on retry policy).
When Should You Use Temporal?
Temporal is powerful, but not for everything. It shines when reliability, orchestration, and state management matter over long periods – from minutes to months.
Let’s be real: if your task could be done with a cron job and setTimeout, Temporal is probably overkill.
Ideal Applications
- Payment retries: Handle third-party failures, retry with backoff, keep track of state over days, all of this without writing persistence logic.
- User onboarding flows: Multi-step workflows (email, setup, API integrations) that span days and depend on user actions or external services.
- Data pipelines / Data syncing of distinct systems / ETL: Trigger-heavy flows that fetch, transform, load, retry, and notify.
- File processing: Users upload a file → chunking → parsing → storing → notifying. If a step fails, you want it to resume, not restart.
- Scheduled tasks with business logic: Need to do
Xexactly 3 days afterY? Temporal timers survive restarts, unlike setTimeout, cron, or job queues.
When NOT to Use Temporal
- Ultra-low-latency APIs: Temporal adds slight latency. If your service needs sub-50ms responses, keep it in-memory.
- Stateless: Sending one email or writing to a DB once? Temporal’s setup is overkill. Stick with your existing queue or serverless function.
- High-throughput streaming: Temporal is optimized for durable orchestration. Use Kafka real-time logs and events.
- Very short-lived experiments or scripts: If it is throwaway code or a hacky cron job, keep it simple. Use Temporal when the code needs to survive, scale, and evolve.
TL;DR? – Temporal.io for Long-Running Jobs
- Temporal provides a code-first, fault-tolerant, and stateful way to manage long-running workflows with built-in retries, timeouts, and observability.
- Use it for workflows like payment retries, user onboarding, file processing, or multi-step orchestrations, especially where failure handling and durability matter.
- Avoid it for ultra-fast, simple, or stateless tasks, it is not a lightweight job runner.
- With Temporal, you write plain code. It handles the messy parts: persistence, retries, state recovery, and fault tolerance so you don’t have to.
References and Further Reading
- Official Docs: https://docs.temporal.io/
- Project Based Tutorials: https://learn.temporal.io/tutorials/typescript/