
Real-Life Jarvis
Most AI assistants live in a terminal or a chat window. You type, it responds, and that's about it. But what if you could just... talk to it? Like, pick up the phone and have a real conversation with your agent?
That’s exactly what I set up after installing OpenClaw on my Mac Mini and connecting it to an ElevenLabs voice agent and Twilio. Once you see it working, it’s hard to go back to interacting with AI through a terminal or chat window. In this post, I’ll walk through how the system works and how you can set it up yourself.
The Idea
The core concept is pretty simple. OpenClaw handles the thinking side of things: reasoning, memory, tool use, and autonomous workflows. ElevenLabs handles everything voice: speech-to-text, text-to-speech, conversation management, and even telephony.
The two systems talk to each other over an OpenAI-style chat completions API, so the integration is clean and doesn't require anything custom on either end.
A simple way to think about it is ElevenLabs = ears and mouth, while OpenClaw = the brain.
Step 1: Run OpenClaw Locally
To start, you need OpenClaw up and running locally with an OpenAI-compatible endpoint exposed at:
http://localhost:3000/v1/chat/completions
ElevenLabs uses the standard Chat Completions API format when it sends requests, so your server needs to speak that language. A basic request looks like this:
{
"model": "openclaw-agent",
"messages": [
{
"role": "user",
"content": "What tasks are running?"
}
]
}
OpenClaw processes that and returns a response, which ElevenLabs will convert to speech.
Step 2: Expose Your Local Server with ngrok
ElevenLabs runs in the cloud, so it has no way to reach your localhost directly. The fix is ngrok, which tunnels your local port to a public URL.
Run this in your terminal:
ngrok http 3000
You'll get back something like:
https://7a82-123-45-67.ngrok-free.app
That turns your local endpoint into:
https://7a82-123-45-67.ngrok-free.app/v1/chat/completions
Now ElevenLabs can reach your agent from anywhere.
Step 3: Create an ElevenLabs Voice Agent
Head over to the ElevenLabs dashboard and set up a new agent:
- Go to Agents
- Click Create Agent
- Choose Blank Agent
- Fill in the basics: name, system prompt / personality, voice, and language
This agent becomes the voice interface for OpenClaw. Everything it hears gets forwarded to your local server.
Step 4 (Optional): Cloning Your Voice in ElevenLabs
One of the most interesting parts of building a voice agent is making it sound like you. ElevenLabs makes this possible through voice cloning. Instead of your AI assistant speaking in a generic voice, it can speak in a voice that closely matches your own. In my case, I trained the voice using about an hour of audio pulled from videos on my YouTube channel. That gave the model enough data to capture my tone, pacing, and speaking style.
In my opinion, this is completely optional and I understand why doing this might turn some people away.
Step 5: Connecting the ElevenLabs Agent to OpenClaw
Now it's time to point the agent to your OpenClaw instance.
Inside the ElevenLabs dashboard:
- Open your Agent
- Go to LLM Configuration
- Select Custom LLM
- Set the endpoint to your ngrok URL
Example:
https://YOUR-NGROK-URL/v1/chat/completions
This tells ElevenLabs where to send every message from the conversation.
Adding the OpenClaw Auth Token
One detail that’s easy to miss is authentication.
If your OpenClaw instance is protected, ElevenLabs must include your OpenClaw auth token in the request headers when it calls the API.
Inside the Custom LLM configuration, add a header like this:
Authorization: Bearer YOUR_OPENCLAW_TOKEN
This allows ElevenLabs to successfully call the OpenClaw /v1/chat/completions endpoint.
Without this header, ElevenLabs may reach your ngrok URL but OpenClaw will reject the request because it isn’t authenticated.
Step 6: Test It Out
ElevenLabs has a built-in test interface inside the dashboard. Click Test Agent, speak into your mic, and watch the whole pipeline fire in real time.
If everything is wired up correctly, you'll hear your OpenClaw agent talking back to you within a second or two. That moment when it first responds out loud is genuinely kind of wild.
Step 7: Connecting Twilio to Your ElevenLabs Agent
Once your voice agent is working inside ElevenLabs, the next step is making it reachable through a real phone number. This is where Twilio comes in.
Twilio handles incoming phone calls and routes the audio to ElevenLabs, which then processes the conversation and sends spoken responses back to the caller.
Step 1: Create a Twilio Account
First, you need a Twilio account.
- Go to twilio.com
- Create an account
- Purchase a Twilio phone number
This phone number will be the number people call to reach your AI agent.
Inside the Twilio dashboard you’ll find two important credentials:
- Phone Number
- Account SID
- Auth Token
You’ll need both of these when connecting Twilio to ElevenLabs.
Step 2: Import Your Twilio Number into ElevenLabs
Next, switch over to the ElevenLabs dashboard.
- Open Telephony
- Go to Phone Numbers
- Click Import Number
- Choose Import from Twilio
You’ll then be asked to enter:
- Phone Number
- Twilio Account SID
- Twilio Auth Token
Step 3: Assign the Phone Number to Your Agent
After importing the number, you’ll see it appear in the Phone Numbers section.
The next step is to assign it to your voice agent.
- Click the imported phone number
- Find the Agent Assignment dropdown
- Select your agent
Once this is done, all incoming calls to that number will automatically be routed to your agent.
Step 4: Test the Integration
Now the fun part.
Pick up your phone and call the Twilio number you just connected.
When the call connects:
- Twilio receives the call
- The call is routed to your ElevenLabs agent
- ElevenLabs converts your speech to text
- The request is sent to OpenClaw
- The response is converted to speech and played back to the caller
The result is a fully functional AI phone assistant.
Final Thoughts
It's pretty wild to hear an AI talking to you in your own voice. One thing I wish was better is the latency when having a real-time conversation with the agent. When conversing with ChatGPT's voice agent, responses feel almost instantaneous, but with this setup, they can take 5+ seconds. Nonetheless, this was a pretty fun process with some interesting use cases.