You Built a GenServer. Now Make It Fast, Observable, and Bulletproof.

Introduction

Remember the last time you shipped a shiny new GenServer to production? It passed every unit test, handled your happy-path demo traffic, and looked rock-solid on paper. Then real users showed up. Latency spikes, CPU climbs, and suddenly the BEAM scheduler view in :observer looks like a Christmas tree. I’ve been there - and I’ve learned that building a GenServer is the easy part; making it fast, observable, and bulletproof is where the real work starts.

In my previous article I walked through a TDD approach to testing GenServers. This follow-up is the field manual I wish I had when I first pushed one of those servers to production. We’ll build a mental model of how GenServers really cost CPU cycles, then apply a toolbox of performance and observability techniques you can drop into your code today.

By the end you’ll know how to:

  1. Read the BEAM’s “cost model” - mailbox size, scheduler reductions, message queue length - so you can spot trouble early.
  2. Refactor hot paths so callbacks never block schedulers.
  3. Push read-heavy state to ETS / persistent_term without losing consistency.
  4. Add cheap, composable Telemetry so dashboards light up before pagers do.
  5. Choose when to graduate from a single GenServer to GenStage, Broadway, or full-blown distributed sharding.

Let’s dive in.


1. The Real GenServer Cost Model (mental model)

A GenServer is just a process with a mailbox, but the devil is in the scheduler details. The BEAM VM runs N schedulers - one per CPU core by default - and each scheduler works through a run queue of processes. Key things to watch:

  • Mailbox size - Process.info(pid, :message_queue_len) tells you how many messages are waiting. A consistently growing queue is a red flag, as an overloaded mailbox can delay its own replies, inflating end‑to‑end latency of other processes on the same scheduler.
  • Reductions - every BEAM operation costs reductions; long-running callbacks burn the budget, delaying other work.
  • Scheduler migrations - when a process hogs a scheduler for too long it triggers load balancing and the VM may migrate it to a different scheduler core. This context switch can lead to CPU cache misses as the process’s data is no longer in the local L1/L2 cache, introducing latency.
  • Sync vs. async - GenServer.call/3 blocks the caller; cast/2 doesn’t. Calls are convenient but couple lifecycles and back-pressure.

Tools I keep on my belt:

:observer.start()
:recon.proc(:info)
# A library like telemetry_metrics_statsd to consume telemetry events

Spend five minutes watching these metrics during load and your optimisation story usually writes itself.


2. Performance & Throughput Techniques

2.1 Keep Callbacks Non-Blocking

If a callback waits on disk, network, or heavy CPU, your entire GenServer stalls. The key is to move blocking work out of the GenServer’s main loop. The Task module provides several patterns for this.

For “fire-and-forget” work where the caller doesn’t need a result, Task.start/1 offloads the work into a new, linked process. The GenServer can immediately process the next message.

def handle_cast({:track_event, event}, state) do
  # This task is linked to the GenServer. If it crashes, the GenServer crashes.
  Task.start(fn -> Analytics.track(event) end)
  {:noreply, state}
end

When a result is needed but you can’t block the GenServer, a common pattern is to have the GenServer start a task and return it to the caller. The caller then Task.await/1s the result. This frees the GenServer while the client waits.

# In the GenServer
def handle_call({:compute, input}, _from, state) do
  task = Task.async(fn -> heavy_math(input) end)
  {:reply, {:ok, task}, state}
end

# In a client module
def compute(server, input) do
  {:ok, task} = GenServer.call(server, {:compute, input})
  Task.await(task, 30_000) # Always use a timeout!
end

For background jobs that shouldn’t be linked to your GenServer, use a Task.Supervisor to run them as supervised, independent processes.

The Goal: Keep your handle_call and handle_cast callbacks consistently fast (a good budget is <1ms). When profiling reveals a slow callback, delegate the work using one of these patterns.

2.2 Post-Init Heavy Work with handle_continue

Boot time matters when your GenServer sits inside a supervision tree - a slow init/1 delays the whole app. Load large datasets after the process is up:

def init(opts) do
  {:ok, %{}, {:continue, :warm_cache}}
end

def handle_continue(:warm_cache, state) do
  cache = load_big_table()
  {:noreply, %{state | cache: cache}}
end

Your supervision tree comes online instantly, and the heavy work happens without blocking.

2.3 Externalize Read-Heavy State (ETS / persistent_term)

A GenServer’s state is its bottleneck; every read is a serialized request. For highly contended data, moving state to :ets or :persistent_term can unlock massive read concurrency. But this power comes with sharp trade-offs.

ETS tables, especially with read_concurrency: true, offer fast, parallel reads. However, this comes at a cost:

  • Write Serialization: By default, all writes are still serialized through the single process that owns the table. Consider using true or auto (OTP 25+) for write_concurrency. Different objects of the same table can be mutated (and read) by concurrent processes. This is achieved to some degree at the expense of memory consumption and the performance of sequential access and concurrent reading.
  • Consistency: read_concurrency can lead to dirty reads. A reader might see a partially updated record if a write is happening concurrently.
  • Ownership: The table’s lifecycle is tied to the owner process. If it dies, the table vanishes.

For truly static data that is read often and written almost never, :persistent_term is a powerful alternative. Reads are virtually free—no message passing, no memory copies, no GC impact. The catch is that persistent_term.put/2 is a globally blocking operation that can cause a multi-millisecond pause across the entire BEAM (on a modern OTP (25+) it’s typically sub‑millisecond for small updates). It should only be used for data that is set once at application boot or updated very rarely during a maintenance window.

Verdict: Use these tools surgically. Profile your application, understand the read/write ratio, and always measure the performance impact of both reads and writes before committing to this pattern.

2.4 Batching & Coalescing Patterns

Sometimes the cheapest optimisation is do less. Accumulate writes and flush every X milliseconds:

def init(_opts) do
  schedule_flush()
  {:ok, %{buffer: []}}
end

def handle_cast({:track, metric}, state) do
  {:noreply, %{state | buffer: [metric | state.buffer]}}
end

def handle_info(:flush, state) do
  schedule_flush()
  flush(state.buffer)
  {:noreply, %{state | buffer: []}}
end

defp schedule_flush do
  Process.send_after(self(), :flush, 1000)
end

Used sparingly, batching smooths traffic spikes without complex back-pressure logic.

2.5 Back-Pressure & Demand Control

If producers outpace your GenServer, queues explode. Options:

  1. Bounded mailbox - set a max queue length and reject or drop messages after a threshold.
  2. Timeouts on call/3 - force callers to handle slowness.
  @impl true
  def handle_call({:process, _item}, _from, state) do
    # Check the mailbox size first.
    case Process.info(self(), :message_queue_len) do
      {:message_queue_len, len} when len > @max_queue_len ->
        # "Reject" the call because the server is overloaded.
        {:reply, {:error, :overloaded}, state}

      _ ->
        # Mailbox is not full, process the request.
        # ... do actual work ...
        {:reply, :ok, %{state | processed: state.processed + 1}}
    end
  end

Consider moving to GenStage or Broadway when:

  • You need standardized, pull-based back-pressure across a multi-stage data processing pipeline.
  • Your workload naturally fits a consumer-producer model (e.g., consuming from SQS).
  • You need concurrent processing of events while preserving order within a partition.

Migration can be incremental. You can embed a GenStage producer inside an existing GenServer and fan out from there.

2.6 Sharding Hot Keys

One GenServer → one mailbox. Hot keys will hit the limit. Partition with a Registry:

key = :erlang.phash2(customer_id, 16)
# Note: if `customer_id` is user-controllable, this could be
# vulnerable to hash-collision attacks creating a hot shard.
{:ok, pid} = MyShardSupervisor.start_child(key)

Or reach for libraries like hash_ring.

3. Observability & Instrumentation

You can’t fix what you can’t see. The BEAM emits rich :telemetry events - use them.

:telemetry.execute([
  :my_app, :genserver, :callback, :stop
],
%{duration: duration},
%{module: __MODULE__, callback: :handle_call})

Pipe these events into a library like PromEx to expose them to Grafana or Datadog. Add tracing (OpenTelemetry) around external calls to stitch latency graphs end-to-end. Set budgets (SLOs) and alert on 95th percentile, not averages.


4. Conclusion

A GenServer is a beautiful abstraction, but it hides sharp edges. With a clear mental model and a small set of techniques - non-blocking callbacks, state externalisation, batching, back-pressure, and solid instrumentation - you can take that weekend prototype and run it under serious production load.

Every optimization is a trade-off. Always profile your application to identify true bottlenecks before adding complexity. Instrument first, optimise second.

Next up in this series: distributed GenServers and cluster-wide coordination - we’ll tackle hand-off, global registries, and truly elastic scaling. Stay tuned.

Key Takeaways

  • Non-blocking callbacks keep schedulers healthy.
  • Move highly contended reads to ETS or :persistent_term, but understand the trade-offs.
  • Instrument first, optimise second.
  • Use supervision and back-pressure patterns to stay resilient.

Resources & Further Reading

  • Official OTP docs - gen_server, :erlang.process_info/2.
  • Fred Hébert - “Adopting Erlang/OTP” chapters on monitoring.
  • Saša Jurić - “Elixir in Action” sections on performance.
  • Erlang Solutions - “Designing for Scalability with Erlang/OTP”.



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • From Skeptic to Believer: My Journey with the stdlib approach and AI agents
  • AI-Powered Software Development: A Short Guide to Your 10X Productivity
  • How to test Elixir GenServers
  • BEAM and Team: Your Playbook for Hiring Elixir Developers
  • Unlocking Growth: The Power of Strategic Software Migration