Run MLX compatible HuggingFace models on Apple silicon locally with Pydantic AI.
Two options are provided as backends;
- LM Studio backend (OpenAI compatible server that can also utilize mlx-lm, model runs on a separate background process)
- mlx-lm backend (direct integration with Apple's library, model runs within your applicaiton, experimental support)
STILL IN DEVELOPMENT, NOT RECOMMENDED FOR PRODUCTION USE YET.
Contributions are welcome!
- LM Studio backend, should be fully supported
- Streaming text support for mlx-lm backend
- Tool calling support for mlx-lm backend
Apple's MLX seems more performant on Apple silicon than llama.cpp (Ollama), as of January 2025.
uv add pydantic-ai-mlx
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage
from pydantic_ai_lm_studio import LMStudioModel
model = LMStudioModel(model_name="mlx-community/Qwen2.5-7B-Instruct-4bit") # supports tool calling
agent = Agent(model, system_prompt="You are a chatbot.")
async def stream_response(user_prompt: str, message_history: list[ModelMessage]):
async with agent.run_stream(user_prompt, message_history) as result:
async for message in result.stream():
yield message
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage
from pydantic_ai_mlx_lm import MLXModel
model = MLXModel(model_name="mlx-community/Llama-3.2-3B-Instruct-4bit")
# See https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md#supported-models
# also https://huggingface.co/mlx-community
agent = Agent(model, system_prompt="You are a chatbot.")
async def stream_response(user_prompt: str, message_history: list[ModelMessage]):
async with agent.run_stream(user_prompt, message_history) as result:
async for message in result.stream():
yield message