April 24, 2026 · 6 min read

Use Ollama Over SSH With a Real Agent Loop

Every "Ollama over SSH" guide ends at the port-forward. That's the easy part. The interesting part is tying the tunnel to an agent that can actually run tool calls on the remote.

The standard recipe (recap)

You've got a beefy home box running Ollama. You want to use it from a laptop. Classic port-forward:

ssh -L 11434:localhost:11434 you@homebox

Now localhost:11434 on your laptop talks to Ollama on the server. You could use it with any OpenAI-compatible client. That's where most posts stop.

What the recipe is missing

If you're using this for coding, chat isn't enough — you want tool calls. The model should be able to:

Run a shell command and read stdout.
Read and edit files.
Grep a directory.
Hit a web page and summarise it.

And crucially, those tool calls should happen on the remote box — the one with your code — not on your laptop. A port-forward alone only moves the inference traffic.

Setup that gets you the full loop

Two options, depending on where you want the UI.

Option A: run Tron on the remote, open it in a browser

Simplest. Install Tron on the same box as Ollama. Tron's agent loop calls Ollama at localhost:11434, and its tool calls run on that same box. You open http://homebox:3888 (or a Tailscale URL, or a Cloudflare tunnel) from your laptop.

# On the server
git clone https://github.com/Shadowhusky/Tron.git
cd Tron
npm install
npm run build:web
npm run start:web

In Settings > AI: provider Ollama, base URL http://localhost:11434, model qwen2.5-coder:7b (or whatever you've pulled). The agent runs. Shell, files, web search — all on the server.

Option B: run Tron locally, SSH to the remote via Tron's SSH adapter

Install Tron on your laptop. Add an SSH profile for your server. When you start a session, Tron opens an SSH shell — but crucially, its file ops and the execute_command tool automatically fall back to shell commands executed over the same SSH session. So the agent's tool calls still land on the remote box.

For the LLM side, port-forward Ollama and point Tron at localhost:11434:

ssh -L 11434:localhost:11434 you@homebox

Now inference is fast (local loopback, after it reaches the server), and the agent runs tools on the SSH'd host. Two tunnels, same session.

Picking a model for tool calling over slow links

A few notes from running this in practice:

qwen2.5-coder:7b — the smallest model I'd trust for tool-call loops. Decent at picking the right tool, occasionally wanders.
qwen2.5-coder:14b / 32b — meaningfully better at multi-step debugging. Worth it if you have the VRAM.
llama3.1:8b — fine for "run this one command and explain the output" but weaker at deciding what command to run next.
Avoid models without structured-tool training. They'll produce text that looks like a plan instead of actual tool calls and the agent will stall.

Latency realities

If Ollama is on your home box over residential upstream, expect ~1–3s first-token latency per tool call. Tron's auto-compaction (old tool outputs get summarised at 90% context) helps keep round-trips cheap. Running Ollama in the same region as your dev server, or on the same machine, makes a large difference.

Why bother instead of just using the cloud?

The usual answers apply: your prompts and code never leave your hardware, no rate limits, no per-token cost, offline works. Less obvious: when the agent is executing tool calls on the same box as the model, you cut out the round-trip over your home's upstream for every action the agent takes. The bottleneck becomes raw GPU throughput, not your ISP.

Try the loop

Install, point at Ollama, watch it execute.

GitHub Downloads