How I Gave OpenClaw My Voice

Real-Life Jarvis

Most AI assistants live in a terminal or a chat window. You type, it responds, and that's about it. But what if you could just... talk to it? Like, pick up the phone and have a real conversation with your agent?

That’s exactly what I set up after installing OpenClaw on my Mac Mini and connecting it to an ElevenLabs voice agent and Twilio. Once you see it working, it’s hard to go back to interacting with AI through a terminal or chat window. In this post, I’ll walk through how the system works and how you can set it up yourself.

The Idea

The core concept is pretty simple. OpenClaw handles the thinking side of things: reasoning, memory, tool use, and autonomous workflows. ElevenLabs handles everything voice: speech-to-text, text-to-speech, conversation management, and even telephony.

The two systems talk to each other over an OpenAI-style chat completions API, so the integration is clean and doesn't require anything custom on either end.

A simple way to think about it is ElevenLabs = ears and mouth, while OpenClaw = the brain.

Step 1: Run OpenClaw Locally

To start, you need OpenClaw up and running locally with an OpenAI-compatible endpoint exposed at:

http://localhost:3000/v1/chat/completions

ElevenLabs uses the standard Chat Completions API format when it sends requests, so your server needs to speak that language. A basic request looks like this:

{
  "model": "openclaw-agent",
  "messages": [
    {
      "role": "user",
      "content": "What tasks are running?"
    }
  ]
}

OpenClaw processes that and returns a response, which ElevenLabs will convert to speech.

Step 2: Expose Your Local Server with ngrok

ElevenLabs runs in the cloud, so it has no way to reach your localhost directly. The fix is ngrok, which tunnels your local port to a public URL.

Run this in your terminal:

ngrok http 3000

You'll get back something like:

https://7a82-123-45-67.ngrok-free.app

That turns your local endpoint into:

https://7a82-123-45-67.ngrok-free.app/v1/chat/completions

Now ElevenLabs can reach your agent from anywhere.

Step 3: Create an ElevenLabs Voice Agent

Head over to the ElevenLabs dashboard and set up a new agent:

Go to Agents
Click Create Agent
Choose Blank Agent
Fill in the basics: name, system prompt / personality, voice, and language

This agent becomes the voice interface for OpenClaw. Everything it hears gets forwarded to your local server.

Step 4 (Optional): Cloning Your Voice in ElevenLabs

One of the most interesting parts of building a voice agent is making it sound like you. ElevenLabs makes this possible through voice cloning. Instead of your AI assistant speaking in a generic voice, it can speak in a voice that closely matches your own. In my case, I trained the voice using about an hour of audio pulled from videos on my YouTube channel. That gave the model enough data to capture my tone, pacing, and speaking style.

In my opinion, this is completely optional and I understand why doing this might turn some people away.

Step 5: Connecting the ElevenLabs Agent to OpenClaw

Now it's time to point the agent to your OpenClaw instance.

Inside the ElevenLabs dashboard:

Open your Agent
Go to LLM Configuration
Select Custom LLM
Set the endpoint to your ngrok URL

Example:

https://YOUR-NGROK-URL/v1/chat/completions

This tells ElevenLabs where to send every message from the conversation.

Adding the OpenClaw Auth Token

One detail that’s easy to miss is authentication.

If your OpenClaw instance is protected, ElevenLabs must include your OpenClaw auth token in the request headers when it calls the API.

Inside the Custom LLM configuration, add a header like this:

Authorization: Bearer YOUR_OPENCLAW_TOKEN

This allows ElevenLabs to successfully call the OpenClaw /v1/chat/completions endpoint.

Without this header, ElevenLabs may reach your ngrok URL but OpenClaw will reject the request because it isn’t authenticated.

Step 6: Test It Out

ElevenLabs has a built-in test interface inside the dashboard. Click Test Agent, speak into your mic, and watch the whole pipeline fire in real time.

If everything is wired up correctly, you'll hear your OpenClaw agent talking back to you within a second or two. That moment when it first responds out loud is genuinely kind of wild.

Step 7: Connecting Twilio to Your ElevenLabs Agent

Once your voice agent is working inside ElevenLabs, the next step is making it reachable through a real phone number. This is where Twilio comes in.

Twilio handles incoming phone calls and routes the audio to ElevenLabs, which then processes the conversation and sends spoken responses back to the caller.

Step 1: Create a Twilio Account

First, you need a Twilio account.

Go to twilio.com
Create an account
Purchase a Twilio phone number

This phone number will be the number people call to reach your AI agent.

Inside the Twilio dashboard you’ll find two important credentials:

Phone Number
Account SID
Auth Token

You’ll need both of these when connecting Twilio to ElevenLabs.

Step 2: Import Your Twilio Number into ElevenLabs

Next, switch over to the ElevenLabs dashboard.

Open Telephony
Go to Phone Numbers
Click Import Number
Choose Import from Twilio

You’ll then be asked to enter:

Phone Number
Twilio Account SID
Twilio Auth Token

Step 3: Assign the Phone Number to Your Agent

After importing the number, you’ll see it appear in the Phone Numbers section.

The next step is to assign it to your voice agent.

Click the imported phone number
Find the Agent Assignment dropdown
Select your agent

Once this is done, all incoming calls to that number will automatically be routed to your agent.

Step 4: Test the Integration

Now the fun part.

Pick up your phone and call the Twilio number you just connected.

When the call connects:

Twilio receives the call
The call is routed to your ElevenLabs agent
ElevenLabs converts your speech to text
The request is sent to OpenClaw
The response is converted to speech and played back to the caller

The result is a fully functional AI phone assistant.

Final Thoughts

It's pretty wild to hear an AI talking to you in your own voice. One thing I wish was better is the latency when having a real-time conversation with the agent. When conversing with ChatGPT's voice agent, responses feel almost instantaneous, but with this setup, they can take 5+ seconds. Nonetheless, this was a pretty fun process with some interesting use cases.