This report details transferring image data via Model Context Protocol (MCP) from Google Apps Script server to a Python/Gemini client, extending capabilities for multimodal applications beyond text.
Following up on my previous report, "Building Model Context Protocol (MCP) Server with Google Apps Script" (Ref), which detailed the transfer of text data between the MCP server and client, this new report focuses on extending the protocol to handle image data. It introduces a practical method for transferring image data efficiently from the Google Apps Script-based MCP server to an MCP client. In this implementation, the MCP client was built using Python and integrated with the Gemini model, allowing for the processing and utilization of the transferred image data alongside text, thereby enabling more complex, multimodal applications within the MCP framework.
The flow of this approach is very simple. The MCP client communicates to the MCP server using the HTTP request as follows.
About the steps for deploying the MCP server, please read my previous report. Ref
The script in this report is as follows.
/**
* This function retrieves an image by searching Google Drive.
*
* This function is run by "tools/call".
* "tools/call": The function name is required to be the same as the name declared at "tools/list".
*/
function get_image(args) {
const { filename } = args;
let result;
try {
const files = DriveApp.searchFiles(
`title contains '${filename}' and mimeType contains 'image' and trashed=false`
);
if (files.hasNext()) {
const file = files.next();
result = {
content: [
{
type: "text",
text: `Actual filename on Google Drive is ${file.getName()}.`,
},
{
type: "image",
data: Utilities.base64Encode(file.getBlob().getBytes()),
mimeType: file.getMimeType(),
},
],
isError: false,
};
} else {
result = {
content: [{ type: "text", text: `There is no file of "${filename}".` }],
isError: true,
};
}
} catch (err) {
result = { content: [{ type: "text", text: err.message }], isError: true };
}
return { jsonrpc: "2.0", result };
}
/**
* Please set and modify the following JSON to your situation.
* The key is the method from the MCP client.
* The value is the object for returning to the MCP client.
* ID is automatically set in the script.
* The specification of this can be seen in the official document.
* Ref: https://modelcontextprotocol.io/specification/2025-03-26
*/
function getserverResponse_() {
return {
/**
* Response to "initialize"
*/
initialize: {
jsonrpc: "2.0",
result: {
protocolVersion: "2024-11-05", // or "2025-03-26"
capabilities: {
experimental: {},
prompts: {
listChanged: false,
},
resources: {
subscribe: false,
listChanged: false,
},
tools: {
listChanged: false,
},
},
serverInfo: {
name: "sample server from MCPApp",
version: "1.0.0",
},
},
},
/**
* Response to "tools/list"
*/
"tools/list": {
jsonrpc: "2.0",
result: {
tools: [
{
name: "get_image", // <--- It is required to create a function of the same name as this.
description: "Get image from Google Drive.",
inputSchema: {
type: "object",
properties: {
filename: {
description: "Get image of this filename from Google Drive.",
type: "string",
},
},
required: ["filename"],
},
},
],
},
},
};
}
/**
* "tools/call": The function name is required to be the same as the name declared at "tools/list".
* "resources/read": The function name is required to be the same as the uri declared at "resources/list".
*/
function getFunctions_() {
return { "tools/call": { get_image } };
}
/**
* This function is automatically run when the MCP client accesses Web Apps.
*/
function doPost(eventObject) {
const object = {
eventObject,
serverResponse: getserverResponse_(),
functions: getFunctions_(),
};
return new MCPApp.mcpApp({ accessKey: "sample" }).server(object);
}
In this sample, the following data is transferred to the MCP server.
{
content: [
{ type: "text", text: `Actual filename on Google Drive is ${file.getName()}.` },
{
type: "image",
data: Utilities.base64Encode(file.getBlob().getBytes()),
mimeType: file.getMimeType(),
}
],
isError: false
}
Both the text data and the image data are transferred. The image data is transferred as base64 data. Also. in this script, the access key of sample
is used. Please copy your Web Apps URL. This URL is used for the MCP client.
In this report, Gemini is used with Python. The original script is from the official page. Ref I modified this script for transferring the image.
Before you run this script, please set your API key. And, replace https://script.google.com/macros/s/###/exec?accessKey=sample
with your Web Apps URL.
import asyncio
import base64
import io
import os
from google import genai
from google.genai import types
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from PIL import Image
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
# Create server parameters for stdio connection
server_params = StdioServerParameters(
command="npx", # Executable
args=[
"mcp-remote",
"https://script.google.com/macros/s/###/exec?accessKey=sample",
], # MCP Server built by Web Apps with Google Apps Script
env=None, # Optional environment variables
)
async def run():
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Prompt to get the weather for the current day in London.
prompt = 'Get an image of "CherryBlossom" from Google Drive.'
# Initialize the connection between client and server
await session.initialize()
# Get tools from MCP session and convert to Gemini Tool objects
mcp_tools = await session.list_tools()
tools = [
types.Tool(
function_declarations=[
{
"name": tool.name,
"description": tool.description,
"parameters": {
k: v
for k, v in tool.inputSchema.items()
if k not in ["additionalProperties", "$schema"]
},
}
]
)
for tool in mcp_tools.tools
]
# Send request to the model with MCP function declarations
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=prompt,
config=types.GenerateContentConfig(
temperature=0,
tools=tools,
),
)
# Check for a function call
if response.candidates[0].content.parts[0].function_call:
function_call = response.candidates[0].content.parts[0].function_call
print(function_call)
# Call the MCP server with the predicted tool
result = await session.call_tool(
function_call.name, arguments=function_call.args
)
for e in result.content:
if e.type == "text":
print(e.text)
if e.type == "image":
image_data = base64.b64decode(e.data)
image = Image.open(io.BytesIO(image_data))
image.show()
else:
print("No function call found in the response.")
print(response.text)
# Start the asyncio event loop and run the main function
asyncio.run(run())
When this MCP client is run, the following result is obtained. You can see both the text data and the image data are retrieved.
This sample image was generated by Gemini for testing in advance.
- This report details transferring image data via MCP from a Google Apps Script server to a Python/Gemini client.
- It extends the Model Context Protocol to handle image data for multimodal applications.
- The server uses Google Apps Script to retrieve images from Google Drive and encode them in base64.
- The Python client with Gemini interacts with the server and processes the received base64 image data.
- This setup enables processing image data alongside text within the MCP framework using Gemini.