Intro

I recently wrote about gptel - the Emacs package for interacting with LLMs via APIs. It’s brilliant, but every query costs money and requires an internet connection. For quick tasks like code explanations, refactoring suggestions, or drafting text, those API calls add up.

Enter Ollama and Ellama. Ollama lets you run LLMs locally on your machine, and Ellama provides a clean Emacs interface to interact with them. No API keys, no costs, no internet required. If you’re working on sensitive data or just want to experiment without watching your API spend, this is worth exploring.

Quick Start Guide

Install Ollama

First, get Ollama running on your system. On Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Once installed, pull a model. I started with llama3.2 but you can browse available models at ollama.ai/library:

ollama pull llama3.2

Test it works:

ollama run llama3.2

If you get a prompt, you’re good. Exit with /bye.

Install Ellama in Emacs

Ellama is available on MELPA. Add this to your Emacs config:

(use-package ellama
  :ensure t
  :init
  (require 'llm-ollama)
  (setq ellama-provider
        (make-llm-ollama
         :chat-model "llama3.2"
         :embedding-model "llama3.2"))
  :bind ("C-c e" . ellama-transient-main-menu))

The key bits:

  • llm-ollama is the backend that connects to your local Ollama instance
  • chat-model and embedding-model should match the model you pulled
  • The keybinding C-c e opens the Ellama transient menu (adjust to your preference)

Restart Emacs or evaluate the config.

Basic Usage

Open the Ellama menu with C-c e (or M-x ellama-transient-main-menu). You’ll see options like:

  • Chat: Start a conversation in a dedicated buffer
  • Ask about: Query the LLM about selected text
  • Code review: Get feedback on code in the region
  • Improve writing: Refine prose
  • Summarize: Condense text

For example, select some code, hit C-c e, choose “Ask about”, and type your question. The response appears in a new buffer, and you can iterate from there.

Thoughts on Workflows

After a few days with this setup, here’s what I’ve found:

Speed: Local inference is fast enough for most tasks, especially on decent hardware. Not instant, but perfectly usable. Your mileage will vary depending on your CPU/GPU and the model size.

Quality: Smaller models like llama3.2 are good for straightforward tasks - code explanations, simple refactoring, quick drafts. For complex reasoning or long-form generation, you’ll notice the gap compared to larger cloud models. But for 80% of my queries, it’s more than adequate.

Privacy: Working on proprietary code or sensitive data? This is a big win. Nothing leaves your machine.

Cost: Zero. Run as many queries as you want.

Offline: Works without internet. Useful on planes, trains, or when the network is flaky.

Integration: Ellama integrates nicely with Emacs workflows. Select text, ask a question, get an answer. No context switching, no copy-paste to a web interface.

The biggest limitation is model quality. Local models are improving rapidly, but they’re not GPT-4 or Claude Opus. For deep technical work or complex writing, I still reach for gptel and a cloud API. But for quick tasks where “good enough” is actually good enough, Ellama has become my default.

Final Thoughts

If you’re already living in Emacs and want to experiment with LLMs without the cost or privacy concerns of cloud APIs, Ollama and Ellama are worth setting up. It took me maybe 15 minutes to get running, and it’s already proven useful for daily tasks.

Start with a smaller model, see if it fits your workflow, and scale up from there. And if you’re working on anything sensitive, the privacy alone might be reason enough to give this a shot.