Building AI Agents with Local LLMs: Using smolagents with LM Studio

Introduction

The world of AI agents has been dominated by cloud-based models like GPT-4o and Claude Sonnet, but there’s a growing movement toward running these capabilities locally. In this post, I’ll show you how to create powerful AI agents using the smolagents library with local Large Language Models (LLMs) running on LM Studio. This approach offers several advantages: enhanced privacy, reduced costs, lower latency, and the ability to work offline.

I’ve created a GitHub repository with complete examples that you can use as a starting point for your own projects.

What is smolagents?

smolagents is a lightweight Python library for creating AI agents that can use tools to accomplish tasks. It’s designed to be simple, flexible, and compatible with various LLM backends, including local models. The library provides several agent types:

CodeAgent: The default agent that writes and executes Python code to solve tasks
ToolCallingAgent: A general-purpose agent that uses tools via function calling (JSON-like blobs)
MultiStepAgent: A base class for agents that solve tasks in multiple steps

For most use cases with local LLMs, either the CodeAgent or ToolCallingAgent is recommended, depending on whether you want the agent to execute Python code.

What is LM Studio?

LM Studio is a desktop application that allows you to download, run, and chat with local LLMs on your computer. It provides a user-friendly interface for managing models and offers an OpenAI-compatible API server, making it easy to integrate with libraries like smolagents.

Prerequisites

Before we begin, you’ll need:

Python 3.8+
smolagents with OpenAI support (pip install 'smolagents[openai]')
requests (pip install requests)
LM Studio installed on your computer

Setting Up LM Studio

Download and install LM Studio from lmstudio.ai
Launch LM Studio and download a model of your choice. For the best experience with function calling (which is essential for tools), I recommend:
- Gemma 3 27B Instruct (gemma-3-27b-it) - On par with Gemini 1.5 Pro
- Gemma 3 12B Instruct (gemma-3-12b-it) - Better than older Gemma 2 27B models
Load your chosen model in LM Studio
Click on the “Local Server” tab
Click “Start Server”
The server should now be running at http://localhost:1234

Testing the Connection

Before diving into agent creation, let’s verify that we can connect to LM Studio’s API. I’ve created a simple test script that you can run:

import requests
import json

def test_lm_studio_connection():
    """
    Test if the connection to LM Studio's API is working properly.
    """
    print("Testing connection to LM Studio API...")
    
    # LM Studio API endpoint
    api_url = "http://localhost:1234/v1/chat/completions"
    
    # Simple test message
    payload = {
        "model": "local-model",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, are you working?"}
        ],
        "temperature": 0.7,
        "max_tokens": 50
    }
    
    try:
        # Send a request to the API
        response = requests.post(
            api_url,
            headers={"Content-Type": "application/json"},
            data=json.dumps(payload),
            timeout=10
        )
        
        # Check if the request was successful
        if response.status_code == 200:
            result = response.json()
            assistant_message = result["choices"][0]["message"]["content"]
            print("\n✅ Connection successful!")
            print("\nLM Studio API response:")
            print(f"Assistant: {assistant_message}")
            return True
        else:
            print(f"\n❌ Error: Received status code {response.status_code}")
            print(f"Response: {response.text}")
            return False
            
    except requests.exceptions.ConnectionError:
        print("\n❌ Connection Error: Could not connect to LM Studio API.")
        print("Make sure LM Studio is running and the server is started.")
        print("Check that the server is running on http://localhost:1234")
        return False
        
    except Exception as e:
        print(f"\n❌ Error: {str(e)}")
        return False

if __name__ == "__main__":
    test_lm_studio_connection()

Save this as test_lm_studio_connection.py and run it. If everything is set up correctly, you should see a success message and a response from the model.

Creating a Basic Agent

Now that we’ve confirmed our connection to LM Studio, let’s create a simple agent using smolagents. Here’s a basic example:

import os
from smolagents import OpenAIServerModel, ToolCallingAgent

# Configure the model to use LM Studio's local API endpoint
model = OpenAIServerModel(
    model_id="local-model",  # This can be any name, LM Studio will use whatever model you have loaded
    api_base="http://localhost:1234/v1",  # Default LM Studio API endpoint
    api_key="not-needed",  # LM Studio doesn't require an API key by default
)

# Create a simple agent using the local model
agent = ToolCallingAgent(
    name="LocalLLMAgent",
    model=model,
    tools=[],  # Empty list of tools
    # You can also add the default toolbox with add_base_tools=True
)

# Example conversation with the agent
response = agent.run("Hello! Can you tell me what you are and how you're running?")
print(f"Agent response: {response}")

Save this as local_llm_example.py and run it. You should see a response from your local LLM.

Adding Tools to Your Agent

The real power of smolagents comes from its ability to use tools. Let’s create a coding assistant agent with a tool for searching Python documentation:

import os
from smolagents import OpenAIServerModel, CodeAgent, tool

# Configure the model to use LM Studio's local API endpoint
model = OpenAIServerModel(
    model_id="local-model",
    api_base="http://localhost:1234/v1",
    api_key="not-needed",
)

# Define a tool that the agent can use to search for Python documentation
@tool
def search_python_docs(query: str) -> str:
    """
    Search for Python documentation based on the query.
    This is a simplified example - in a real application, you might
    use a web API or local documentation.
    
    Args:
        query: The search query
        
    Returns:
        str: Information about the Python feature
    """
    # This is a mock implementation
    python_docs = {
        "list": "Lists are used to store multiple items in a single variable. Lists are ordered, changeable, and allow duplicate values.",
        "dict": "Dictionaries are used to store data values in key:value pairs. A dictionary is a collection which is ordered, changeable and do not allow duplicates.",
        "set": "Sets are used to store multiple items in a single variable. Set is one of 4 built-in data types in Python used to store collections of data.",
        "function": "A function is a block of code which only runs when it is called. You can pass data, known as parameters, into a function.",
    }
    
    for key, value in python_docs.items():
        if key in query.lower():
            return value
    
    return "No specific information found for that query. Try asking about lists, dictionaries, sets, or functions."

# Example usage of the agent
def chat_with_coding_assistant():
    print("Python Coding Assistant (powered by your local LLM)")
    print("Type 'exit' to end the conversation\n")
    
    # Store conversation history for context (since agents don't maintain state between runs)
    conversation_history = []
    
    while True:
        user_input = input("You: ")
        if user_input.lower() == 'exit':
            print("Goodbye!")
            break
        
        # Add user input to conversation history
        conversation_history.append(f"User: {user_input}")
        
        # Create a new agent for each interaction
        coding_assistant = CodeAgent(
            name="PythonCodingAssistant",
            model=model,
            tools=[search_python_docs],
        )
        
        # Include conversation history in the prompt
        full_prompt = "\n".join(conversation_history) + "\n\nPlease respond to the latest message."
        
        # Use reset=False to continue the conversation without resetting the agent's memory
        response = coding_assistant.run(full_prompt, reset=False)
        print(f"Assistant: {response}")
        
        # Add assistant response to conversation history
        conversation_history.append(f"Assistant: {response}")
        
        # You can inspect the agent's logs to see what happened
        # print(coding_assistant.logs)

if __name__ == "__main__":
    print("Reminder: Ensure LM Studio is running with a model loaded on http://localhost:1234\n")
    chat_with_coding_assistant()

Save this as coding_assistant_agent.py and run it. You can now chat with your coding assistant and ask questions about Python.

Using the Default Toolbox

smolagents comes with a default toolbox that you can add to your agent:

from smolagents import OpenAIServerModel, CodeAgent

model = OpenAIServerModel(
    model_id="local-model",
    api_base="http://localhost:1234/v1",
    api_key="not-needed",
)

# Create an agent with the default toolbox
agent = CodeAgent(
    name="AssistantWithDefaultTools",
    model=model,
    tools=[],  # Your custom tools would go here
    add_base_tools=True  # This adds DuckDuckGo search, Python interpreter, and Transcriber
)

response = agent.run("What is the current weather in London?")
print(f"Agent response: {response}")

The default toolbox includes:

DuckDuckGo web search
Python code interpreter (for ToolCallingAgent)
Transcriber (speech-to-text using Whisper-Turbo)

Key Concepts to Understand

When working with smolagents and local LLMs, there are a few important concepts to keep in mind:

1. Tool Creation

There are two ways to create tools:

Using the @tool decorator:

@tool
def my_tool(param: str) -> str:
    """
    Tool description.
    
    Args:
        param: Parameter description
        
    Returns:
        str: Return value description
    """
    # Tool implementation
    return result

Creating a Tool subclass:

This gives you more flexibility, especially for initializing heavy class attributes.

2. Conversation State

To create a conversational interface, you can:

Store conversation history manually
Use reset=False when calling agent.run() to continue the conversation
Create a new agent for each interaction if needed

3. Inspecting Agent Runs

After running an agent, you can inspect what happened:

agent.logs stores fine-grained logs of the agent’s run
agent.write_memory_to_messages() writes the agent’s memory as a list of chat messages

4. Multi-Agent Systems

You can create hierarchical multi-agent systems where one agent manages others:

from smolagents import CodeAgent, OpenAIServerModel, DuckDuckGoSearchTool

model = OpenAIServerModel(
    model_id="local-model",
    api_base="http://localhost:1234/v1",
    api_key="not-needed",
)

web_agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
    name="web_search",
    description="Runs web searches for you. Give it your query as an argument."
)

manager_agent = CodeAgent(
    tools=[], 
    model=model, 
    managed_agents=[web_agent]
)

manager_agent.run("Who is the CEO of Hugging Face?")

5. Interactive UI

You can use GradioUI to interact with your agent:

from smolagents import CodeAgent, OpenAIServerModel, GradioUI

model = OpenAIServerModel(
    model_id="local-model",
    api_base="http://localhost:1234/v1",
    api_key="not-needed",
)

agent = CodeAgent(tools=[], model=model, add_base_tools=True)

GradioUI(agent).launch()

6. Model Performance

The quality of responses depends on the model you have loaded in LM Studio. For the best experience:

Use larger models (12B+ parameters) for better results
Choose models that support function calling (like Gemma 3)
Be aware that some models may not support all features

Common Issues and Troubleshooting

Connection Error: Make sure LM Studio server is running and the port matches your configuration
Poor Responses: Try loading a different/larger model in LM Studio
Tool Execution Issues: Some local models may not support function calling properly. Try a different model or simplify your prompts.
Import Errors: If you get an error about missing ‘openai’, install the OpenAI dependency: pip install 'smolagents[openai]'

Once you’ve configured your agent to your needs, you can share it to the Hugging Face Hub:

agent.push_to_hub("your-username/your-agent-name")

And to load an agent from the Hub:

agent.from_hub("your-username/your-agent-name", trust_remote_code=True)

Conclusion

Running AI agents with local LLMs offers a compelling alternative to cloud-based solutions. With smolagents and LM Studio, you can create powerful agents that respect your privacy, work offline, and don’t incur ongoing API costs.

The examples I’ve shared are just the beginning. You can extend these agents with custom tools, create multi-agent systems, add interactive UIs, and even share your agents with others. The full code for these examples, along with additional resources, is available in my GitHub repository.

As local LLMs continue to improve, the gap between cloud and local capabilities will narrow, making this approach increasingly viable for a wide range of applications. I encourage you to experiment with different models and tools to find the combination that works best for your specific needs.

Happy building!