Skip to content

Instantly share code, notes, and snippets.

@Wauplin
Created May 6, 2025 16:28
Show Gist options
  • Save Wauplin/b6e5f1a39db843eedfa00738e4998589 to your computer and use it in GitHub Desktop.
Save Wauplin/b6e5f1a39db843eedfa00738e4998589 to your computer and use it in GitHub Desktop.
Python-based HF MCP server
"""
WARNING: This is an experimental implementation. Expect rough edges while using it.
-------------------------------------------------
Defines a FastMCP server that exposes the Hugging Face Hub API as a set of tools.
In practice, all public methods from `HfApi` are exposed as tools, except for the ones dealing with files:
- `create_commit`
- `hf_hub_download`
- `preupload_lfs_files`
- `snapshot_download`
- `upload_file`
- `upload_folder`
- `upload_large_folder`
In addition, a `read_modelcard` tool is added to read the model card of a model on the Hugging Face Hub. Model card is
downloaded on the fly but not cached locally. If file is too large (> 1MB), it will raise an error.
## How to use?
Use the MCP client of your choice to connect to the server.
You must pass the `HF_TOKEN` environment variable to the server.
Here is an example using `Agent` from `@huggingface/mcp-client` package:
```ts
const agent = new Agent({
provider: "nebius",
model: "Qwen/Qwen2.5-72B-Instruct",
apiKey: process.env.HF_TOKEN,
servers: [
{
command: "python",
args: ["hf_mcp.py"],
env: {
HF_TOKEN: process.env.HF_TOKEN ?? "",
},
},
],
});
```
## How it works?
Methods from `HfApi` are registered as tools in the FastMCP server. The `ctx` parameter is added to the method
signature to access the request context. The `token` parameter is removed from the methods signature and docstrings as
authentication is handled once in the context.
"""
import functools
import inspect
import re
import typing
from collections.abc import AsyncIterator
from contextlib import asynccontextmanager
from dataclasses import dataclass
import requests
from mcp.server.fastmcp import Context, FastMCP
from mcp.server.session import ServerSession
from huggingface_hub import HfApi, constants
from huggingface_hub.hf_api import * # noqa: F403 # needed for tools parameter resolution
REMOVE_TOKEN_RE = re.compile(
r"""
\n\s{12}token\s\(
.*?
(\n\s{12}[a-z])
""",
flags=re.VERBOSE | re.DOTALL | re.IGNORECASE | re.MULTILINE,
)
SKIPPED_METHODS = [
"create_commit",
"hf_hub_download",
"preupload_lfs_files",
"run_as_future",
"snapshot_download",
"upload_file",
"upload_folder",
"upload_large_folder",
]
# special params: repo_type, revision, token
@dataclass
class AppContext:
api: HfApi
@asynccontextmanager
async def app_lifespan(server: FastMCP) -> AsyncIterator[AppContext]:
"""Manage application lifecycle with type-safe context"""
yield AppContext(api=HfApi(library_name="huggingface-hub-mcp", library_version="0.0.1"))
mcp = FastMCP("Hugging Face Hub", lifespan=app_lifespan)
def register_hf_tool(api_name: str) -> None:
api_method = getattr(HfApi, api_name)
sig = inspect.signature(api_method)
params = list(sig.parameters.values())
# Remove `self` from the original method signature
if params[0].name == "self":
params = params[1:]
# Tweak input parameters
new_params = (
# Add the `ctx` parameter
[
inspect.Parameter(
"ctx", inspect.Parameter.POSITIONAL_OR_KEYWORD, annotation=Context[ServerSession, AppContext]
)
]
# Remote "token" parameter (handled in context)
+ [param for param in params if param.name != "token"]
)
new_sig = sig.replace(parameters=new_params)
@functools.wraps(api_method)
def wrapper(*args, **kwargs):
bound_args = new_sig.bind(*args, **kwargs)
bound_args.apply_defaults()
ctx = bound_args.arguments.pop("ctx")
api = ctx.request_context.lifespan_context.api
output = getattr(api, api_name)(**bound_args.arguments)
# if output is a generator, convert it to a list
if isinstance(output, typing.Generator):
output = list(output)
print(f"Found generator of length {len(output)}")
return str(output)
wrapper.__signature__ = new_sig
new_doc = api_method.__doc__
new_doc = REMOVE_TOKEN_RE.sub("\1", new_doc)
wrapper.__doc__ = new_doc
mcp.add_tool(wrapper)
@mcp.tool()
def read_modelcard(ctx: Context[ServerSession, AppContext], repo_id: str) -> str:
"""Read the model card of a Model on the Hugging Face Hub.
If file is too large (> 1MB), it will raise an error.
Args:
repo_id: The ID of the repository.
"""
# Download the repo card
api = ctx.request_context.lifespan_context.api
headers = api._build_hf_headers()
# Build file URL
url = constants.HUGGINGFACE_CO_URL_TEMPLATE.format(repo_id=repo_id, revision="main", filename="README.md")
# Check size
response = requests.head(url, headers=headers)
response.raise_for_status()
size = int(response.headers["content-length"])
if size > 1_000_000:
raise ValueError(f"Model card for repo {repo_id} is too large to be read as text: {size} bytes")
# Download the file
response = requests.get(url, headers=headers)
response.raise_for_status()
content = response.content.decode("utf-8", errors="ignore")
return content
# List methods from HfApi
methods = [name for name in dir(HfApi) if callable(getattr(HfApi, name)) and not name.startswith("_")]
# Register tools
for name in sorted(dir(HfApi)):
if name.startswith("_"):
continue
if name in SKIPPED_METHODS:
continue
method = getattr(HfApi, name)
if inspect.iscoroutinefunction(method):
continue
if not callable(method):
continue
print(f"Registering {name}...")
register_hf_tool(name)
if __name__ == "__main__":
# Run the server
mcp.run()
@Wauplin
Copy link
Author

Wauplin commented May 6, 2025

I used packages/mcp-client/cli.ts to test it. Replaced the SERVERS section by

const SERVERS: StdioServerParameters[] = [
	{
		// Python HF-MCP
		command: "python",
		args: ["hf_mcp.py"],
		env: {
			HF_TOKEN: process.env.HF_TOKEN ?? "",
		},
	},
];

Example 1

Prompt

what are the most recent models released by Freepik, what are they built for and how to they compare on benchmarks ? Read their model card to get the information.

Answer

(...)
### Summary
- **Most Recent Model**: `Freepik/nsfw_image_detector` (Created on 2025-04-10)
  - **Intended Use**: NSFW content classification with four categories (neutral, low, medium, high).
  - **Performance**: Outperforms other models in both global and AI-generated content accuracy.
  - **Usage**: Available via pip and Hugging Face Transformers pipeline.
- **Second Most Recent Model**: `Freepik/F-Lite-Texture` (Created on 2025-04-23)
  - **Intended Use**: Text-to-image generation with a focus on better textures and aesthetics.
  - **Performance**: Poor with short prompts and less effective for vector graphics.
  - **Usage**: Available through interactive demos and the `diffusers` library.

Would you like more information on any of these models or their usage?

Example 2

Prompt

Among the top 50 trending models on the Hub, which ones don't have any downloads?

Answer

(...)
These are the models that don't have any downloads among the top 50 trending models on the Hugging Face Hub.

Here is the list for your reference:

- **lodestones/Chroma**: 0 downloads
- **stepfun-ai/Step1X-Edit**: 0 downloads
- **sand-ai/MAGI-1**: 0 downloads

Since we have identified the models with zero downloads, the task is complete.

Example 3

Prompt

(related to https://huggingface.co/Qwen/Qwen3-30B-A3B/discussions/18)

can you summarize the discussion around "Qwen3 is great, but could be better." in Qwen/Qwen3-30B-A3B community tab?

Answer

(...)
### Summary of the Discussion "Qwen3 is great, but could be better" in `Qwen/Qwen3-30B-A3B`

#### **Initial Post by phil111**
- **Title:** Qwen3 is great, but could be better.
- **Content:** 
  - Phil praises Qwen3 for its superior performance in coding, math, reasoning, and general STEM knowledge compared to other similarly sized models.
  - However, he highlights significant weaknesses in creative tasks and popular knowledge, such as poem writing and knowledge of music, movies, and other aspects of pop culture.
  - He suggests fusing the model's weights with a small database of core facts about popular information to improve its performance and reduce hallucinations.
  - Phil believes this would benefit the model across all supported languages and make it a more general-purpose AI suitable for a broader audience.

#### **Responses:**
1. **CHNtentes:**
   - Notes that Phil often raises issues about new models lacking pop culture knowledge.
   - Agrees that repetition is necessary when you feel you have something important to say.

2. **nlpguy:**
   - Supports Phil's perspective, suggesting that the issue is not just about failing specific tests but a broader shift in behavior among new models.
   - Emphasizes the importance of not overfitting tests and maintaining a balance in training.

3. **phil111:**
   - Responds to CHNtentes, explaining that the issue is broader than just overfitting.
   - Provides examples of how Qwen3 excels in STEM tasks but fails in creative and pop culture tasks, even worse than smaller models.
   - Stresses his desire for broad adoption of open-source AI models and the need to address these imbalances.

4. **ZiggyS:**
   - Dismisses the importance of pop culture knowledge in AI models.
   - Suggests that AI should focus on providing new, reasonable output rather than reciting existing knowledge.
   - Criticizes the idea of including such information as wasteful.

5. **CHNtentes:**
   - Agrees with ZiggyS that different users have different needs.
   - Points out that Qwen3 is a general-purpose model, not specifically targeted at coders.
   - Suggests that the poor performance in factual knowledge might be due to issues with expert routing or the inclusion of 100 languages.

6. **phil111:**
   - Clarifies that Qwen3 is a 30B model, not a 3B model as ZiggyS suggested.
   - Reiterates that the Qwen3 family performs poorly in broad knowledge tasks despite having a large parameter count and excelling in STEM.
   - Emphasizes the importance of a balanced model for broad adoption.

7. **ZiggyS:**
   - Defends the model makers' decision to target STEM and coding.
   - Criticizes the "entitlement attitudes" and unwarranted bashing of models for not meeting every user's needs.
   - Suggests that users who don't like the direction can make their own models or use existing ones that fit their needs.

8. **phil111:**
   - Responds to ZiggyS, pointing out that the general population prefers balanced models like GPT4o and Gemini 2.5.
   - Suggests that Chinese models have broad abilities and knowledge in Chinese, implying that the focus on Western STEM scores might be deliberate.
   - Reaffirms his desire for Alibaba's success and the need to avoid overfitting STEM at the expense of other areas.

#### **Conclusion:**
- The discussion revolves around the perceived imbalance in Qwen3's capabilities, with Phil advocating for a more general-purpose model that includes broad popular knowledge and creative abilities.
- Other participants, like ZiggyS, argue for the importance of targeted models and the value of STEM performance.
- The debate highlights the different priorities and use cases of AI models among the community.

Since we have summarized the discussion, the task is complete.

@evalstate
Copy link

I've added an INCLUDED_TOOLS option as well in my copy:

# Specify which tools to include. Leave empty to include all available tools.
INCLUDED_TOOLS = []
...
...

# Register tools
for name in sorted(dir(HfApi)):
    if name.startswith("_"):
        continue
    if name in SKIPPED_METHODS:
        continue
    # Skip if not in INCLUDED_TOOLS (if INCLUDED_TOOLS is not empty)
    if INCLUDED_TOOLS and name not in INCLUDED_TOOLS:
        continue

@ddiddi
Copy link

ddiddi commented May 13, 2025

Nicely done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment