April 24, 2026 · 6 min read

Use Ollama Over SSH With a Real Agent Loop

Every "Ollama over SSH" guide ends at the port-forward. That's the easy part. The interesting part is tying the tunnel to an agent that can actually run tool calls on the remote.

The standard recipe (recap)

You've got a beefy home box running Ollama. You want to use it from a laptop. Classic port-forward:

ssh -L 11434:localhost:11434 you@homebox

Now localhost:11434 on your laptop talks to Ollama on the server. You could use it with any OpenAI-compatible client. That's where most posts stop.

What the recipe is missing

If you're using this for coding, chat isn't enough — you want tool calls. The model should be able to:

And crucially, those tool calls should happen on the remote box — the one with your code — not on your laptop. A port-forward alone only moves the inference traffic.

Setup that gets you the full loop

Two options, depending on where you want the UI.

Option A: run Tron on the remote, open it in a browser

Simplest. Install Tron on the same box as Ollama. Tron's agent loop calls Ollama at localhost:11434, and its tool calls run on that same box. You open http://homebox:3888 (or a Tailscale URL, or a Cloudflare tunnel) from your laptop.

# On the server
git clone https://github.com/Shadowhusky/Tron.git
cd Tron
npm install
npm run build:web
npm run start:web

In Settings > AI: provider Ollama, base URL http://localhost:11434, model qwen2.5-coder:7b (or whatever you've pulled). The agent runs. Shell, files, web search — all on the server.

Option B: run Tron locally, SSH to the remote via Tron's SSH adapter

Install Tron on your laptop. Add an SSH profile for your server. When you start a session, Tron opens an SSH shell — but crucially, its file ops and the execute_command tool automatically fall back to shell commands executed over the same SSH session. So the agent's tool calls still land on the remote box.

For the LLM side, port-forward Ollama and point Tron at localhost:11434:

ssh -L 11434:localhost:11434 you@homebox

Now inference is fast (local loopback, after it reaches the server), and the agent runs tools on the SSH'd host. Two tunnels, same session.

Picking a model for tool calling over slow links

A few notes from running this in practice:

Latency realities

If Ollama is on your home box over residential upstream, expect ~1–3s first-token latency per tool call. Tron's auto-compaction (old tool outputs get summarised at 90% context) helps keep round-trips cheap. Running Ollama in the same region as your dev server, or on the same machine, makes a large difference.

Why bother instead of just using the cloud?

The usual answers apply: your prompts and code never leave your hardware, no rate limits, no per-token cost, offline works. Less obvious: when the agent is executing tool calls on the same box as the model, you cut out the round-trip over your home's upstream for every action the agent takes. The bottleneck becomes raw GPU throughput, not your ISP.

Try the loop

Install, point at Ollama, watch it execute.