GNU ELPA - llm

llm

Description: Interface to pluggable llm backends
Latest: llm-0.31.3.tar (.sig), 2026-Jul-23, 530 KiB
Maintainer: Andrew Hyatt <ahyatt@gmail.com>
Website: https://github.com/ahyatt/llm
Browse ELPA's repository: CGit or Gitweb
All Dependencies: plz (.tar), plz-event-source (.tar), plz-media-type (.tar), compat (.tar)
Badge

To install this package from Emacs, use package-install or list-packages.

Full description

1. Introduction

This library provides an interface for interacting with Large Language Models (LLMs). It allows elisp code to use LLMs while also giving end-users the choice to select their preferred LLM. This is particularly beneficial when working with LLMs since various high-quality models exist, some of which have paid API access, while others are locally installed and free but offer medium quality. Applications using LLMs can utilize this library to ensure compatibility regardless of whether the user has a local LLM or is paying for API access.

This library abstracts several kinds of features:

Chat functionality: the ability to query the LLM and get a response, and continue to take turns writing to the LLM and receiving responses. The library supports both synchronous, asynchronous, and streaming responses.
Chat with image and other kinda of media inputs are also supported, so that the user can input images and discuss them with the LLM.
Tool use is supported, for having the LLM call elisp functions that it chooses, with arguments it provides.
Embeddings: Send text and receive a vector that encodes the semantic meaning of the underlying text. Can be used in a search system to find similar passages.
Prompt construction: Create a prompt to give to an LLM from one more sources of data.

Certain functionalities might not be available in some LLMs. Any such unsupported functionality will raise a 'not-implemented signal, or it may fail in some other way. Clients are recommended to check llm-capabilities when trying to do something beyond basic text chat.

2. Packages using `llm`

There a few packages using LLM (please inform us or open a PR to add anything here):

ellama, a package providing a host of useful ways to use LLMs to chat and transform text.
magit-gptcommit, a package providing autogenerated commit messages for use with magit.
ekg, a sqlite-backed notetaking application that optionally interfaces with LLMs for note similarity and text generation in response to notes.

3. Setting up providers

Users of an application that uses this package should not need to install it themselves. The llm package should be installed as a dependency when you install the package that uses it. However, you do need to require the llm module and set up the provider you will be using. Typically, applications will have a variable you can set. For example, let's say there's a package called "llm-refactoring", which has a variable llm-refactoring-provider. You would set it up like so:

(use-package llm-refactoring
  :init
  (require 'llm-openai)
  (setq llm-refactoring-provider (make-llm-openai :key my-openai-key))

Here my-openai-key would be a variable you set up before with your OpenAI key. Or, just substitute the key itself as a string. It's important to remember never to check your key into a public repository such as GitHub, because your key must be kept private. Anyone with your key can use the API, and you will be charged.

You can also use a function as a key, so you can store your key in a secure place and retrieve it via a function. For example, you could add a line to ~/.authinfo.gpg:

machine llm.openai password <key>

And then set up your provider like:

(setq llm-refactoring-provider (make-llm-openai :key (plist-get (car (auth-source-search :host "llm.openai")) :secret)))

All of the providers (except for llm-fake), can also take default parameters that will be used if they are not specified in the prompt. These are the same parameters as appear in the prompt, but prefixed with default-chat-. So, for example, if you find that you like Ollama to be less creative than the default, you can create your provider like:

(make-llm-ollama :embedding-model "mistral:latest" :chat-model "mistral:latest" :default-chat-temperature 0.1)

For embedding users. if you store the embeddings, you must set the embedding model. Even though there's no way for the llm package to tell whether you are storing it, if the default model changes, you may find yourself storing incompatible embeddings.

3.1. Open AI

You can set up with make-llm-openai, with the following parameters:

:key, the Open AI key that you get when you sign up to use Open AI's APIs. Remember to keep this private. This is non-optional.
:chat-model: A model name from the list of Open AI's model names. Keep in mind some of these are not available to everyone. This is optional, and will default to a reasonable model.
:embedding-model: A model name from list of Open AI's embedding model names. This is optional, and will default to a reasonable model.

3.2. Open AI Compatible

There are many Open AI compatible APIs and proxies of Open AI. You can set up one with make-llm-openai-compatible, with the following parameter:

:url, the URL of leading up to the command ("embeddings" or "chat/completions"). So, for example, "https://api.openai.com/v1/" is the URL to use Open AI (although if you wanted to do that, just use make-llm-openai instead).
:chat-model: The chat model that is supported by the provider. Some providers don't need a model to be set, but still require it in the API, so we default to "unset".
:embedding-model: An embedding model name that is supported by the provider. This is also defaulted to "unset".

3.3. OpenRouter

OpenRouter is a popular router for different models with an Open AI compatible API. This is defined in the llm-openai module. Configure it with the following parameters:

:chat-model: The chat model, or list of chat models, prefixed by provider. If this is a list, it will enable OpenRouter's ability to do fallbacks.
:embedding-model: An embedding model name that is supported by the provider. This is also defaulted to "unset".

3.4. Azure's Open AI

Microsoft Azure has an Open AI integration, although it doesn't support everything Open AI does, such as tool use. You can set it up with make-llm-azure, with the following parameter:

:url, the endpoint URL, such as "https://docs-test-001.openai.azure.com/".
:key, the Azure key for Azure OpenAI service.
:chat-model, the chat model, which must be deployed in Azure.
embedding-model, the embedding model which must be deployed in Azure.

3.5. GitHub Models

GitHub now has its own platform for interacting with AI models. For a list of models check the marketplace. You can set it up with make-llm-github, with the following parameters:

:key, a GitHub token or an Azure AI production key.
:chat-model, the chat model, which can be any of the ones you have access for (currently o1 is restricted).
:embedding-model, the embedding model, which can be better found through a filtera.

3.6. Gemini (not via Google Cloud)

This is Google's AI model. You can get an API key via their page on Google AI Studio. Set this up with make-llm-gemini, with the following parameters:

:key, the Google AI key that you get from Google AI Studio.
:chat-model, the model name, from the list of models. This is optional and will default to the text Gemini model.
:embedding-model: the model name, which is optional, and will default to a reasonable embedding model.

3.7. Vertex (Gemini via Google Cloud)

This is mostly for those who want to use Google Cloud specifically, most users should use Gemini instead, which is easier to set up.

You can set up with make-llm-vertex, with the following parameters:

:project: Your project number from Google Cloud that has Vertex API enabled.
:chat-model: A model name from the list of Vertex's model names. This is optional, and will default to a reasonable model.
:embedding-model: A model name from the list of Vertex's embedding model names. This is optional, and will default to a reasonable model.

In addition to the provider, which you may want multiple of (for example, to charge against different projects), there are customizable variables:

llm-vertex-gcloud-binary: The binary to use for generating the API key.
llm-vertex-gcloud-region: The gcloud region to use. It's good to set this to a region near where you are for best latency. Defaults to "us-central1".

If you haven't already, you must run the following command before using this:
```
gcloud beta services identity create --service=aiplatform.googleapis.com --project=PROJECT_ID
```

3.8. Claude

Claude is Anthropic's large language model. It does not support embeddings. You can set it up with the following parameters:

:key: The API key you get from Claude's settings page. This is required. :chat-model: One of the Claude models.

3.9. Ollama

Ollama is a way to run large language models locally. There are many different models you can use with it, and some of them support tool use. You set it up with the following parameters:

:scheme: The scheme (http/https) for the connection to ollama. This default to "http".
:host: The host that ollama is run on. This is optional and will default to localhost.
:port: The port that ollama is run on. This is optional and will default to the default ollama port.
:chat-model: The model name to use for chat. This is not optional for chat use, since there is no default.
:embedding-model: The model name to use for embeddings. Only some models can be used for embeddings. This is not optional for embedding use, since there is no default.

3.10. Ollama (authed)

This is a variant of the Ollama provider, which is set up with the same parameters plus:

:key: The authentication key of the provider.

The key is used to send a standard Authentication header.

3.11. Deepseek

Deepseek is a company offers both reasoning and chat high-quality models. This provider connects to their server. It is also possible to run their model locally as a free model via Ollama. To use the service, you can set it up with the following parameters:

:key: The API Key you get from DeepSeek API key page. This is required. :chat-model: One of the models from their model list.

3.12. GPT4All

GPT4All is a way to run large language models locally. To use it with llm package, you must click "Enable API Server" in the settings. It does not offer embeddings or streaming functionality, though, so Ollama might be a better fit for users who are not already set up with local models. You can set it up with the following parameters:

:host: The host that GPT4All is run on. This is optional and will default to localhost.
:port: The port that GPT4All is run on. This is optional and will default to the default ollama port.
:chat-model: The model name to use for chat. This is not optional for chat use, since there is no default.

3.13. llama.cpp

llama.cpp is a way to run large language models locally. To use it with the llm package, you need to start the server (with the "–embedding" flag if you plan on using embeddings). The server must be started with a model, so it is not possible to switch models until the server is restarted to use the new model. As such, model is not a parameter to the provider, since the model choice is already set once the server starts.

There is a deprecated provider, however it is no longer needed. Instead, llama cpp is Open AI compatible, so the Open AI Compatible provider should work.

3.14. Fake

This is a client that makes no call, but it just there for testing and debugging. Mostly this is of use to programmatic clients of the llm package, but end users can also use it to understand what will be sent to the LLMs. It has the following parameters:

:output-to-buffer: if non-nil, the buffer or buffer name to append the request sent to the LLM to.
:chat-action-func: a function that will be called to provide a string or symbol and message cons which are used to raise an error.
:embedding-action-func: a function that will be called to provide a vector or symbol and message cons which are used to raise an error.

4. Models

When picking a chat or embedding model, anything can be used, as long as the service thinks it is valid. However, models vary on context size and capabilities. The llm-prompt module, and any client, can depend on the context size of the model via llm-chat-token-limit. Similarly, some models have different capabilities, exposed in llm-capabilities. The llm-models module defines a list of popular models, but this isn't a comprehensive list. If you want to add a model, it is fairly easy to do, for example here is adding the Mistral model (which is already included, though):

(require 'llm-models)
(llm-models-add
 :name "Mistral" :symbol 'mistral
 :capabilities '(generation tool-use free-software)
 :context-length 8192
 :regex "mistral"))

The :regex needs to uniquely identify the model passed in from a provider's chat or embedding model.

Once this is done, the model will be recognized to have the given context length and capabilities.

The llm package does not attempt to enforce capabilities: if settings are set that cannot be used by the model or provider, they are ignored if possible. For example, not every model supports reasoning, but if the user chooses a model without it, and the client sets a reasoning level, this won't result in an error. It is the responsibility of the client to enforce options via the list of capabilities that they consider essential.

5. Model advice for users

This package attempts to support the latest models. Often, providers will have slightly different API use for newer models that are incompatible with older models. We do not attempt to maintain all features for all models, due to complexity issues. We try to always have a recent model as the default chat model for all providers. If you choose a model yourself, then you should keep it updated as it gets new versions.

6. `llm` and the use of non-free LLMs

The llm package is part of GNU Emacs by being part of GNU ELPA. Unfortunately, the most popular LLMs in use are non-free, which is not what GNU software should be promoting by inclusion. On the other hand, by use of the llm package, the user can make sure that any client that codes against it will work with free models that come along. It's likely that sophisticated free LLMs will, emerge, although it's unclear right now what free software means with respect to LLMs. Because of this tradeoff, we have decided to warn the user when using non-free LLMs (which is every LLM supported right now except the fake one). You can turn this off the same way you turn off any other warning, by clicking on the left arrow next to the warning when it comes up. Alternatively, you can set llm-warn-on-nonfree to nil. This can be set via customization as well.

To build upon the example from before:

(use-package llm-refactoring
  :init
  (require 'llm-openai)
  (setq llm-refactoring-provider (make-llm-openai :key my-openai-key)
        llm-warn-on-nonfree nil)

7. Programmatic use

Client applications should require the llm package, and code against it. Most functions are generic, and take a struct representing a provider as the first argument. The client code, or the user themselves can then require the specific module, such as llm-openai, and create a provider with a function such as (make-llm-openai :key user-api-key). The client application will use this provider to call all the generic functions.

For all callbacks, the callback will be executed in the buffer the function was first called from. If the buffer has been killed, it will be executed in a temporary buffer instead.

7.1. Main functions

llm-chat provider prompt multi-output: With user-chosen provider , and a llm-chat-prompt structure (created by llm-make-chat-prompt), send that prompt to the LLM and wait for the string output.
llm-chat-async provider prompt response-callback error-callback multi-output: Same as llm-chat, but executes in the background. Takes a response-callback which will be called with the text response. The error-callback will be called in case of error, with the error symbol and an error message.
llm-chat-streaming provider prompt partial-callback response-callback error-callback multi-output: Similar to llm-chat-async, but request a streaming response. As the response is built up, partial-callback is called with the all the text retrieved up to the current point. Finally, reponse-callback is called with the complete text.
llm-embedding provider string: With the user-chosen provider, send a string and get an embedding, which is a large vector of floating point values. The embedding represents the semantic meaning of the string, and the vector can be compared against other vectors, where smaller distances between the vectors represent greater semantic similarity.
llm-embedding-async provider string vector-callback error-callback: Same as llm-embedding but this is processed asynchronously. vector-callback is called with the vector embedding, and, in case of error, error-callback is called with the same arguments as in llm-chat-async.
llm-batch-embeddings provider strings: same as llm-embedding, but takes in a list of strings, and returns a list of vectors whose order corresponds to the ordering of the strings.
llm-batch-embeddings-async provider strings vectors-callback error-callback: same as llm-embedding-async, but takes in a list of strings, and returns a list of vectors whose order corresponds to the ordering of the strings.
llm-count-tokens provider string: Count how many tokens are in string. This may vary by provider, because some provideres implement an API for this, but typically is always about the same. This gives an estimate if the provider has no API support.
llm-cancel-request request Cancels the given request, if possible. The request object is the return value of async and streaming functions.
llm-name provider. Provides a short name of the model or provider, suitable for showing to users.
llm-models provider. Return a list of all the available model names for the provider. This could be either embedding or chat models. You can use llm-models-match to filter on models that have a certain capability (as long as they are in llm-models).
llm-chat-token-limit. Gets the token limit for the chat model. This isn't possible for some backends like llama.cpp, in which the model isn't selected or known by this library.

And the following helper functions:
- llm-make-chat-prompt text &keys context examples tools temperature max-tokens response-format reasoning non-standard-params: This is how you make prompts. text can be a string (the user input to the llm chatbot), or a list representing a series of back-and-forth exchanges, of odd number, with the last element of the list representing the user's latest input. This supports inputting context (also commonly called a system prompt, although it isn't guaranteed to replace the actual system prompt), examples, and other important elements, all detailed in the docstring for this function. response-format can be 'json, to force JSON output, or a JSON schema (see below) but the prompt also needs to mention and ideally go into detail about what kind of JSON response is desired. Providers with the json-response capability support JSON output, and it will be ignored if unsupported. reasoning can be 'none, 'light, 'medium or 'maximum to control how much thinking the LLM can do (if it is a reasoning model). The non-standard-params let you specify other options that might vary per-provider, and for this, the correctness is up to the client.
- llm-chat-prompt-to-text prompt: From a prompt, return a string representation. This is not usually suitable for passing to LLMs, but for debugging purposes.
- llm-chat-streaming-to-point provider prompt buffer point finish-callback: Same basic arguments as llm-chat-streaming, but will stream to point in buffer.
- llm-chat-prompt-append-response prompt response role: Append a new response (from the user, usually) to the prompt. The role is optional, and defaults to 'user.

7.1.1. Return and multi-output

The default return value is text except for when tools are called, in which case it is a record of the return values of the tools called.

Models can potentially return many types of information, though, so the multi-output option was added to the llm-chat calls so that the single return value can instead be a plist that represents the various possible values. In the case of llm-chat, this plist is returned, in llm-chat-async, it is passed to the success function. In llm-chat-streaming, it is passed to the success function, and each partial update will be a plist, with no guarantee that the same keys will always be present.

The possible plist keys are:

:text , for the main textual output.
:reasoning, for reasoning output, when the model provides it.
:tool-uses, the tools that the llm identified to be called, as a list of plists, with :name and :args values.
:tool-results, the results of calling the tools.
:input-tokens, the number of input tokens, when available
:output-tokens, the number of output tokens, when available

The input and output token counts should be consistently available per-provider, but not all providers support output tokens, and some don't support either.

7.1.2. JSON schema

By using the response-format argument to llm-make-chat-prompt, you can ask the LLM to return items according to a specified JSON schema, based on the JSON Schema Spec. Not everything is supported, but the most commonly used parts are. To specify the JSON schema, we use a plist-based approach. JSON objects are defined with (:type object :properties (:<var1> <schema1> :<var2> <schema2> ... :<varn> <scheman>) :required (<req var1> ... <req varn>)). Arrays are defined with (:type array :items <schema>). Enums are defined with (:enum [<val1> <val2> <val3>]). You can also request integers, strings, and other types defined by the JSON Schema Spec, by just having (:type <type>). Typically, LLMs often require the top-level schema object to be an object, and often that all properties on the top-level object must be required.

Some examples:

(llm-chat my-provider (llm-make-chat-prompt
                                "How many countries are there?  Return the result as JSON."
                                :response-format
                                '(:type object :properties (:num (:type "integer")) :required ["num"])))

(llm-chat my-provider (llm-make-chat-prompt
                                "Which editor is hard to quit?  Return the result as JSON."
                                :response-format
                                '(:type object :properties (:editor (:enum ["emacs" "vi" "vscode"])
                                                                    :authors (:type "array" :items (:type "string")))
                                        :required ["editor" "authors"])))

7.2. Logging

Interactions with the llm package can be logged by setting llm-log to a non-nil value. This should be done only when developing. The log can be found in the *llm log* buffer.

7.3. How to handle conversations

Conversations can take place by repeatedly calling llm-chat and its variants. The prompt should be constructed with llm-make-chat-prompt. For a conversation, the entire prompt must be kept as a variable, because the llm-chat-prompt-interactions slot will be getting changed by the chat functions to store the conversation. For some providers, this will store the history directly in llm-chat-prompt-interactions, but other LLMs have an opaque conversation history. For that reason, the correct way to handle a conversation is to repeatedly call llm-chat or variants with the same prompt structure, kept in a variable, and after each time, add the new user text with llm-chat-prompt-append-response. The following is an example:

(defvar-local llm-chat-streaming-prompt nil)
(defun start-or-continue-conversation (text)
  "Called when the user has input TEXT as the next input."
  (if llm-chat-streaming-prompt
      (llm-chat-prompt-append-response llm-chat-streaming-prompt text)
    (setq llm-chat-streaming-prompt (llm-make-chat-prompt text))
    (llm-chat-streaming-to-point provider llm-chat-streaming-prompt (current-buffer) (point-max) (lambda ()))))

7.4. Caution about `llm-chat-prompt-interactions`

The interactions in a prompt may be modified by conversation or by the conversion of the context and examples to what the LLM understands. Different providers require different things from the interactions. Some can handle system prompts, some cannot. Some require alternating user and assistant chat interactions, others can handle anything. It's important that clients keep to behaviors that work on all providers. Do not attempt to read or manipulate llm-chat-prompt-interactions after initially setting it up for the first time, because you are likely to make changes that only work for some providers. Similarly, don't directly create a prompt with make-llm-chat-prompt, because it is easy to create something that wouldn't work for all providers.

7.5. Error handling

The llm package defines several error symbols that can be signaled during operations. These errors follow a hierarchy, allowing you to catch errors at different levels of specificity.

7.5.1. Error hierarchy

All LLM-related errors inherit from llm-error:

llm-error: The base error for all LLM operations.
- llm-invalid-argument: Signaled when an invalid argument is provided to an LLM function.
- llm-not-supported: Signaled when a requested operation or feature is not supported by the provider or model.
- llm-provider-error: The base error for provider-related issues.
  - llm-provider-unconfigured: Signaled when the provider is not configured correctly (e.g., missing API key).
- llm-request-error: The base error for request failures.
  - llm-request-timeout: Signaled when a request times out.
  - llm-request-authentication-error: Signaled when authentication fails (e.g., invalid API key).
  - llm-request-bad-request: Signaled when the request was invalid (e.g., bad format).
- llm-tool-call-error: The base error for all tool calling errors.
  - llm-tool-unknown-tool: Signaled when an LLM attempts to call a tool that was not provided in the prompt's tools list.
  - llm-tool-unknown-argument: Signaled when an LLM calls a tool with an argument that is not defined in the tool's argument specification.
  - llm-tool-missing-argument: Signaled when an LLM calls a tool but omits a required (non-optional) argument.

7.5.2. Error data

Most errors return a list with a string as their data (the same as the standard error).

Some errors have structured data instead. These are those errors; When these errors are signaled, they include a data plist with additional information:

llm-tool-unknown-tool: (:tool TOOL-NAME)
- TOOL-NAME: The name of the tool the LLM attempted to call.
llm-tool-unknown-argument: (:tool TOOL-NAME :arg ARG-KEY)
- TOOL-NAME: The name of the tool being called.
- ARG-KEY: The argument key that was not recognized.
llm-tool-missing-argument: (:tool TOOL-NAME :arg ARG-SPEC)
- TOOL-NAME: The name of the tool being called.
- ARG-SPEC: The full argument specification plist for the missing required argument.

7.5.3. Example

(condition-case err
    (llm-chat my-provider my-prompt)
  (llm-tool-unknown-tool
   (message "Unknown tool requested: %s" (plist-get (cdr err) :tool)))
  (llm-tool-call-error
   (message "Tool call error: %s" err))
  (llm-invalid-argument
   (message "Invalid argument: %s" (error-message-string err)))
  (llm-provider-error
   (message "Provider error: %s" (error-message-string err)))
  (llm-error
   (message "LLM error: %s" err)))

7.6. Tool use

Tool use is a way to give the LLM a list of functions it can call, and have it call the functions for you. The standard interaction has the following steps:

The client sends the LLM a prompt with tools it can use.
The LLM may return which tools to use, and with what arguments, or text as normal.
If the LLM has decided to use one or more tools, those tool's functions should be called, and their results sent back to the LLM. This could be the final step depending on if any follow-on is needed.
The LLM will return with a text response based on the initial prompt and the results of the tool use.
The client can now can continue the conversation.

This basic structure is useful because it can guarantee a well-structured output (if the LLM does decide to use the tool). Not every LLM can handle tool use, and those that do not will ignore the tools entirely. The function llm-capabilities will return a list with tool-use in it if the LLM supports tool use. Because not all providers support tool use when streaming, streaming-tool-use indicates the ability to use tool uses in llm-chat-streaming. However, even for LLMs that handle tool use, there is sometimes a difference in the capabilities, for example in the ability to handle nested argument types. So client programs are advised for right now to keep function to simple types.

The way to call tools is to attach a list of tools to the tools slot in the prompt. This is a list of llm-tool structs, which is a tool that is an elisp function, with a name, a description, and a list of arguments. The docstrings give an explanation of the format. An example is:

(llm-chat-async
 my-llm-provider
 (llm-make-chat-prompt
  "What is the capital of France?"
  :tools
  (list (llm-make-tool
         :function
         (lambda (callback country)
           ;; In this example function the assumption is that the callback will
           ;; be called after processing the result is complete, and receives
           ;; the result of the processing as an argument.
           (funcall callback (my-capital-retriever country)))
         :name "capital_of_country"
         :description "Get the capital of a country."
         :args '((:name "country"
                        :description "The country whose capital to look up."

                        :type string
                        ;; Default is nil. This will error if the argument is
                        ;; not provided.
                        :optional nil))
         :async t)))
 #'identity  ;; No need to process the result in this example.
 (lambda (_ err)
   (error "Error on getting capital: %s" err)))

Note that tools have the same arguments and structure as the tool definitions in GTPel.

The various chat APIs will execute the functions defined in tools slot with the arguments supplied by the LLM. The chat functions will, Instead of returning (or passing to a callback) a string, instead a list will be returned of tool names and return values. This is not technically an alist because the same tool might be used several times, so the car can be equivalent.

After the tool is called, the client could use the result, but if you want to proceed with the conversation, or get a textual response that accompany the function you should just send the prompt back with no modifications. This is because the LLM gives the tool use to perform, and then expects to get back the results of that tool use. The results were already executed at the end of the call which returned the tools used, which also stores the result of that execution in the prompt. This is why it should be sent back without further modifications.

Tools will be called with vectors for array results, nil for false boolean results, and plists for objects.

When tools are called, the result have, in multi-output mode will have output like the following:

(:tool-uses ((:name "capital_of_country" :args (("country" . "France" ))))
            :tool-results (("capital_of_country" . "Paris")))

The tool uses here comes from the LLM, whereas the tool results are the result of the elisp function that is executed as part of the tool use.

Without multi-output the result will be just the tool results.

Be aware that there is no gaurantee that the tool will be called correctly. While the LLMs mostly get this right, they are trained on Javascript functions, so imitating Javascript names is recommended. So, "write_email" is a better name for a function than "write-email".

Examples can be found in llm-tester. There is also a tool call to generate tool calls from existing elisp functions in utilities/elisp-to-tool.el. Tool use can be controlled by the :tool-options param in llm-make-chat-prompt that takes a llm-tool-options struct. This can be set to force or forbid tool calling, or to force a specific tool to be called. This is useful when a converastion with tools happens and the tools remain constant but how they are used may need to change. Ollama does not support currently support this.

7.7. Media input

Media can be used in llm-chat and related functions. To use media, you can use llm-multipart in llm-make-chat-prompt, and pass it an Emacs image or an llm-media object for other kinds of media. Besides images, some models support video and audio. For video and audio, the user should be steered towards the correct models to use, because most models do not support video and audio.

7.8. Advanced prompt creation

The llm-prompt module provides helper functions to create prompts that can incorporate data from your application. In particular, this should be very useful for application that need a lot of context.

A prompt defined with llm-prompt is a template, with placeholders that the module will fill in. Here's an example of a prompt definition, from the ekg package:

(llm-defprompt ekg-llm-fill-prompt
  "The user has written a note, and would like you to append to it,
to make it more useful.  This is important: only output your
additions, and do not repeat anything in the user's note.  Write
as a third party adding information to a note, so do not use the
first person.

First, I'll give you information about the note, then similar
other notes that user has written, in JSON.  Finally, I'll give
you instructions.  The user's note will be your input, all the
rest, including this, is just context for it.  The notes given
are to be used as background material, which can be referenced in
your answer.

The user's note uses tags: {{tags}}.  The notes with the same
tags, listed here in reverse date order: {{tag-notes:10}}

These are similar notes in general, which may have duplicates
from the ones above: {{similar-notes:1}}

This ends the section on useful notes as a background for the
note in question.

Your instructions on what content to add to the note:

{{instructions}}
")

When this is filled, it is done in the context of a provider, which has a known context size (via llm-chat-token-limit). Care is taken to not overfill the context, which is checked as it is filled via llm-count-tokens. We usually want to not fill the whole context, but instead leave room for the chat and subsequent terms. The variable llm-prompt-default-max-pct controls how much of the context window we want to fill. The way we estimate the number of tokens used is quick but inaccurate, so limiting to less than the maximum context size is useful for guarding against a miscount leading to an error calling the LLM due to too many tokens. If you want to have a hard limit as well that doesn't depend on the context window size, you can use llm-prompt-default-max-tokens. We will use the minimum of either value.

Variables are enclosed in double curly braces, like this: {{instructions}}. They can just be the variable, or they can also denote a number of tickets, like so: {{tag-notes:10}}. Tickets should be thought of like lottery tickets, where the prize is a single round of context filling for the variable. So the variable tag-notes gets 10 tickets for a drawing. Anything else where tickets are unspecified (unless it is just a single variable, which will be explained below) will get a number of tickets equal to the total number of specified tickets. So if you have two variables, one with 1 ticket, one with 10 tickets, one will be filled 10 times more than the other. If you have two variables, one with 1 ticket, one unspecified, the unspecified one will get 1 ticket, so each will have an even change to get filled. If no variable has tickets specified, each will get an equal chance. If you have one variable, it could have any number of tickets, but the result would be the same, since it would win every round. This algorithm is the contribution of David Petrou.

The above is true of variables that are to be filled with a sequence of possible values. A lot of LLM context filling is like this. In the above example, {{similar-notes}} is a retrieval based on a similarity score. It will continue to fill items from most similar to least similar, which is going to return almost everything the ekg app stores. We want to retrieve only as needed. Because of this, the llm-prompt module takes in generators to supply each variable. However, a plain list is also acceptable, as is a single value. Any single value will not enter into the ticket system, but rather be prefilled before any tickets are used.

Values supplied in either the list or generators can be the values themselves, or conses. If a cons, the variable to fill is the car of the cons, and the cdr is the place to fill the new value, front or back. The front is the default: new values will be appended to the end. back will add new values to the start of the filled text for the variable instead.

So, to illustrate with this example, here's how the prompt will be filled:

First, the {{tags}} and {{instructions}} will be filled first. This will happen regardless before we check the context size, so the module assumes that these will be small and not blow up the context.
Check the context size we want to use (llm-prompt-default-max-pct multiplied by llm-chat-token-limit) and exit if exceeded.
Run a lottery with all tickets and choose one of the remaining variables to fill.
If the variable won't make the text too large, fill the variable with one entry retrieved from a supplied generator, otherwise ignore. These are values are not conses, so values will be appended to the end of the generated text for each variable (so a new variable generated for tags will append after other generated tags but before the subsequent "and" in the text.
Goto 2

The prompt can be filled two ways, one using predefined prompt template (llm-defprompt and llm-prompt-fill), the other using a prompt template that is passed in (llm-prompt-fill-text).

(llm-defprompt my-prompt "My name is {{name}} and I'm here's to say {{messages}}")

(llm-prompt-fill 'my-prompt my-llm-provider :name "Pat" :messages #'my-message-retriever)

(iter-defun my-message-retriever ()
  "Return the messages I like to say."
  (my-message-reset-messages)
  (while (my-has-next-message)
    (iter-yield (my-get-next-message))))

Alternatively, you can just fill it directly:

(llm-prompt-fill-text "Hi, I'm {{name}} and I'm here to say {{messages}}"
                      :name "John" :messages #'my-message-retriever)

As you can see in the examples, the variable values are passed in with matching keys.

8. Contributions

If you are interested in creating a provider, please send a pull request, or open a bug. This library is part of GNU ELPA, so any major provider that we include in this module needs to be written by someone with FSF papers. However, you can always write a module and put it on a different package archive, such as MELPA.

Old versions

llm-0.31.2.tar.lz	2026-Jul-10	138 KiB
llm-0.31.1.tar.lz	2026-Jun-24	137 KiB
llm-0.31.0.tar.lz	2026-May-23	91.5 KiB
llm-0.30.3.tar.lz	2026-May-01	89.5 KiB
llm-0.30.1.tar.lz	2026-Apr-08	88.1 KiB
llm-0.30.0.tar.lz	2026-Apr-06	87.9 KiB
llm-0.29.0.tar.lz	2026-Feb-16	86.8 KiB
llm-0.28.5.tar.lz	2026-Jan-07	84.2 KiB
llm-0.28.0.tar.lz	2025-Dec-06	83.8 KiB
llm-0.27.3.tar.lz	2025-Oct-17	82.7 KiB
llm-0.26.1.tar.lz	2025-Jun-05	81.2 KiB
llm-0.19.1.tar.lz	2024-Nov-29	56.3 KiB
llm-0.9.1.tar.lz	2024-Feb-04	35.6 KiB
llm-0.8.0.tar.lz	2023-Dec-30	34.2 KiB
llm-0.7.0.tar.lz	2023-Dec-18	31.9 KiB
llm-0.6.0.tar.lz	2023-Dec-09	31.6 KiB
llm-0.5.2.tar.lz	2023-Nov-05	30.2 KiB
llm-0.4.0.tar.lz	2023-Oct-14	26.1 KiB
llm-0.3.0.tar.lz	2023-Oct-02	24.3 KiB
llm-0.2.tar.lz	2023-Sep-30	22.0 KiB

News

1. Version 0.31.3

Added Kimi K3.
Added Gemini 3.6 Flash and 3.5 Flash Lite, and removed temperature settings for these models.
Fix some incorrect model identification issues.

2. Version 0.31.2

Fixed Deepseek streaming tool calling, by İsa Mert Gürbüz.

3. Version 0.31.1

Fixed Audio API support for Open AI-compatible providers and Ollama providers by Sergey Kostyaev.
Added Claude Opus 4.8, StepFun 3.7 Flash, and Claude Fable 5
Don't send json format when unsupported; add json-response to more models that support it.
Fix OpenAI compatible names

4. Version 0.31.0

Switch Open AI to Responses API, for more functionality
Keep reasoning information between turns for improved performance in tool calling for Open AI and Claude (Gemini does this already).
Fix error with logged tool calls
Fix behavior when tool calling has an error; we no longer call the success result as well
Fix error with Claude tool use requests
Fix error with Claude 4.5 Haiku and reasoning support
Tweaked Claude name and capabilities to not always support reasoning
Added Gemini 3.5, Qwen 3.7

5. Version 0.30.3

Fix for OpenRouter breakage

6. Version 0.30.2

Fix json encoding error caused by utf-8 strings for Open AI and Ollama
Add Claude Opus 4.7, Kimi K2.6, Qwen 3.6, Chat GPT 5.5, Mistral Medium 3.5, XiaoMi 2.5, and Deepseek V4.
Improved support for Open AI streaming for tool calls, by Renato Ferreira.
Add reasoning controls to DeepSeek and OpenAI
Fix reasoning for Claude Opus 4.7 and Chat GPT.
Changed default Claude model to 4.6 Sonnet
Fix text extraction for Claude when using reasoning
Return token counts when streaming
Return token counts for DeepSeek

7. Version 0.30.1

Fix lack of reasoning response when doing tool calls
Added support for Open AI compatible reasoning_content and reasoning blocks for streaming

8. Version 0.30.0

Add :input-tokens and :output-tokens to multioutput result.
Fixed inability of zero-arg tools to be called
Added OpenRouter as a top-level model type
Add support for Open AI compatible reasoning_content and reasoning blocks
Added Qwen 3.5, LFM2 and LFM 2.5 Thinking
Added Gemini 3.1 Pro, Gemini 3.1 Flash Lite
Added Chat GPT 5.4, with extra context
Added StepFun 3.5 Flash
Added Gemma 4
Added Claude Sonnet 4.6

9. Version 0.29.0

Check for tool use mismatches and define new errors for them
Normalize false values in tool args or tool call results
Add Claude Opus 4.6
Fix bug running two async calls in parallel
Set Gemini default to 3.0 pro
Added Kimi k2.5, GLM-5, and Qwen 3 Coder Next
Increased the default context length for unknown models to be more up to date
Allow Ollama authed keys to be functions

10. Version 0.28.5

Improved the tool calling docs
Fix for running tools in the original buffer with streaming

11. Version 0.28.4

Removed bad interactions made in Ollama tool calls
Fixed Ollama tool calling requests
Fixed Ollama reasoning, whose API has changed
Added gpt-oss, supported low/medium/high reasoning with Ollama
Run tools in the original buffer

12. Version 0.28.3

Fixed breakage in Ollama streaming tool calling
Fixed incorrect Ollama streaming tool use capability reporting
Add Gemini 3 Flash

13. Version 0.28.2

Add Chat GPT post 5.0 series models, such as 5.1 and 5.2

14. Version 0.28.1

Fix error on empty Claude responses

15. Version 0.28.0

Add tool calling options, for forbidding or forcing tool choice.
Fix bug (or perhaps breaking change) in Ollama tool use.
Add Gemini 3 model, update Gemini code to pass thought signatures
Add json-response capability to Claude 4.5 and 4.1 Opus models
Set Sonnet 4.5 as the default Claude model
Fix outdated max output settings in Claude
Add Claude Opus 4.5

16. Version 0.27.3

Add reasoning output for Gemini.
Add Claude 4.5 Sonnet and Haiku to support models, fix model matching for other Claude models.

… …

llm