From Python to Elixir for AI Agents — Alessandro Usseglio Viretta

A few months ago, I converted most of my AI code from Python to Elixir. I already knew Elixir: I came back to it after growing frustrated with Python's async/await, and I have not looked back. I'd like to explain the frustration, because I think it is shared more than people admit.

When you build AI agents, you are orchestrating concurrent, long-running, failure-prone processes: LLM calls that take several seconds, external APIs that time out, background workers that must retry cleanly when they fail and streaming responses to a browser while simultaneously processing webhooks. In Python, threading all of this with async/await requires constant vigilance. You must mark every function that touches async code. You must track which event loop is running. One synchronous call in the wrong place silently starves the entire loop. The bugs are subtle and hard to reproduce.

I admit I may have failed to master Python's async model properly, but I suspect the difficulty is not entirely mine. Coming back to Elixir, the relief was immediate. Not because Elixir makes concurrency easy in some hand-wavy sense, but because the runtime was built for this from the start. Elixir runs on the BEAM, the Erlang virtual machine, which uses preemptive scheduling. The runtime divides CPU time fairly between processes. No single process can accidentally starve others. There is no event loop to block, no function-level annotations. You write ordinary sequential code, and the VM handles the rest.

When I run an LLM call inside an Oban worker in Silex, my platform underlying Aleik, an AI email-based ghostwriting agency, that call blocks the worker process for as long as it needs to: seconds, sometimes longer. Nothing else suffers, other workers keep running, the Phoenix endpoint keeps serving requests, the PubSub system keeps pushing real-time updates to the browser, the supervision tree watches all of it. This is not a workaround, it is the design. The AI-agent-generated codebase documentation for Silex puts it plainly: "The LLM call blocks the Oban worker process but that is intentional — Oban workers are designed for long-running work, and synchronous flow gives clean retry semantics."

Clean retry semantics. In a world where LLM calls fail intermittently, time out unexpectedly, or return malformed JSON, that is not a nice-to-have. It is the difference between a reliable system and a fragile one. Silex is a framework for email-based AI agents driven by dynamic system prompts created with my LLM behaviour specification methodology. Aleik, the AI ghostwriting agency running on top of it, handles inbound emails from subscribers, runs agentic tool-calling conversations, drafts personalised content, manages a human review queue, and dispatches delivery via AWS SES.

The architecture follows directly from how Elixir works. Email arrives via AWS SES/SNS. A webhook controller hands it to an Oban worker. The worker parses the email, runs an LLM input guard, finds or creates a conversation thread, and dispatches to a handler. The handler runs a tool-calling loop that may invoke tools like select_document, author_content, or save_writing_preferences, each triggering further LLM calls, database writes, or background jobs. In parallel, a relay pipeline fans out to separate Oban jobs with independent retry logic. A Human-in-the-Loop system assigns reviewers to threads via round-robin and routes drafts for editing before delivery. A campaign scheduler dispatches personalised emails based on inferred subscriber preferences.

All of this runs in a single Elixir application. No Celery, no Redis, no Lambda. The BEAM is the infrastructure. I kept waiting for the moment where the concurrency model would become a liability. It never came.

Elixir also turns out to be a remarkably good language for a coding agent to write.

José Valim published on Dashbit a piece recently on exactly this, citing a Tencent study that tested how well AI models solve problems written in different programming languages. Elixir scored 97.5%, the highest of any language tested. Claude Opus achieved 80.3% on Elixir versus 74.9% on C# and 72.5% on Kotlin.

The reasons come down to three properties of the language itself:

Data is immutable, so functions have explicit inputs and outputs with no hidden side effects. The pipe operator makes transformation chains readable as a sequence. Elixir supports executable documentation examples that run as part of the test suite. Well-maintained libraries use them, which means HexDocs is more reliably accurate than ecosystems where documentation and code drift apart silently. The APIs have not thrashed. A 2018 tutorial is still largely recognisable today.

My experience building Silex with Claude Code (and, more recently, Kimi) bears this out. The Human-in-the-Loop reviewer system (round-robin assignment, inbox routing, subject line parsing, delivery policy resolution) was implemented almost entirely by the coding agent. The code was clean, correctly typed, and tested. I was not fighting async patterns or explaining event loop semantics. The model understood what needed to happen because the language gave it the structure to reason clearly.

The same properties that make Elixir good for building AI agents make it easy for a coding agent to write it. That is not a coincidence. Explicit data flow and immutability reduce cognitive load for humans and models alike.

The honest caveat: Elixir is not Python.

If you need model training or inference at scale, you will reach for Python. The ML tooling exists (Nx, Bumblebee) but it is nowhere near mature. The functional programming model is a genuine adjustment, and OTP's process model takes time to internalise. These are not concepts you absorb in a weekend.

But for building agents that orchestrate LLM calls, rather than training models, you do not need Python's ML libraries. You need reliability and clean retry semantics. You need a language that lets you write the messy, long-running coordination logic surrounding LLM calls without the async/await model turning it into a special kind of hell.

Most teams are building on Python because that is where the AI tooling is. As agents grow more complex, operating over email, webhooks, scheduled jobs, and human review queues, the limitations of cooperative async will become more painful.

I moved from Python to Elixir not because of a benchmark. I moved because I was building something real and the tools were failing me. Preemptive scheduling and the supervision tree are not academic properties. They are what let me ship a production agent system as a solo developer. The Tencent data is a nice confirmation. The production system is the argument.