Image Transfer: MCP Server (Web Apps/Google Apps Script) to MCP Client (Gemini/Python)

Abstract

This report details transferring image data via Model Context Protocol (MCP) from Google Apps Script server to a Python/Gemini client, extending capabilities for multimodal applications beyond text.

Introduction

Following up on my previous report, "Building Model Context Protocol (MCP) Server with Google Apps Script" (Ref), which detailed the transfer of text data between the MCP server and client, this new report focuses on extending the protocol to handle image data. It introduces a practical method for transferring image data efficiently from the Google Apps Script-based MCP server to an MCP client. In this implementation, the MCP client was built using Python and integrated with the Gemini model, allowing for the processing and utilization of the transferred image data alongside text, thereby enabling more complex, multimodal applications within the MCP framework.

Flow

The flow of this approach is very simple. The MCP client communicates to the MCP server using the HTTP request as follows.

Usage

1. Create an MCP server

About the steps for deploying the MCP server, please read my previous report. Ref

The script in this report is as follows.

/**
 * This function retrieves an image by searching Google Drive.
 *
 * This function is run by "tools/call".
 * "tools/call": The function name is required to be the same as the name declared at "tools/list".
 */
function get_image(args) {
  const { filename } = args;
  let result;
  try {
    const files = DriveApp.searchFiles(
      `title contains '${filename}' and mimeType contains 'image' and trashed=false`
    );
    if (files.hasNext()) {
      const file = files.next();
      result = {
        content: [
          {
            type: "text",
            text: `Actual filename on Google Drive is ${file.getName()}.`,
          },
          {
            type: "image",
            data: Utilities.base64Encode(file.getBlob().getBytes()),
            mimeType: file.getMimeType(),
          },
        ],
        isError: false,
      };
    } else {
      result = {
        content: [{ type: "text", text: `There is no file of "${filename}".` }],
        isError: true,
      };
    }
  } catch (err) {
    result = { content: [{ type: "text", text: err.message }], isError: true };
  }
  return { jsonrpc: "2.0", result };
}

/**
 * Please set and modify the following JSON to your situation.
 * The key is the method from the MCP client.
 * The value is the object for returning to the MCP client.
 * ID is automatically set in the script.
 * The specification of this can be seen in the official document.
 * Ref: https://modelcontextprotocol.io/specification/2025-03-26
 */
function getserverResponse_() {
  return {
    /**
     * Response to "initialize"
     */
    initialize: {
      jsonrpc: "2.0",
      result: {
        protocolVersion: "2024-11-05", // or "2025-03-26"
        capabilities: {
          experimental: {},
          prompts: {
            listChanged: false,
          },
          resources: {
            subscribe: false,
            listChanged: false,
          },
          tools: {
            listChanged: false,
          },
        },
        serverInfo: {
          name: "sample server from MCPApp",
          version: "1.0.0",
        },
      },
    },

    /**
     * Response to "tools/list"
     */
    "tools/list": {
      jsonrpc: "2.0",
      result: {
        tools: [
          {
            name: "get_image", // <--- It is required to create a function of the same name as this.
            description: "Get image from Google Drive.",
            inputSchema: {
              type: "object",
              properties: {
                filename: {
                  description: "Get image of this filename from Google Drive.",
                  type: "string",
                },
              },
              required: ["filename"],
            },
          },
        ],
      },
    },
  };
}

/**
 * "tools/call": The function name is required to be the same as the name declared at "tools/list".
 * "resources/read": The function name is required to be the same as the uri declared at "resources/list".
 */
function getFunctions_() {
  return { "tools/call": { get_image } };
}

/**
 * This function is automatically run when the MCP client accesses Web Apps.
 */
function doPost(eventObject) {
  const object = {
    eventObject,
    serverResponse: getserverResponse_(),
    functions: getFunctions_(),
  };
  return new MCPApp.mcpApp({ accessKey: "sample" }).server(object);
}

In this sample, the following data is transferred to the MCP server.

{
  content: [
    { type: "text", text: `Actual filename on Google Drive is ${file.getName()}.` },
    {
      type: "image",
      data: Utilities.base64Encode(file.getBlob().getBytes()),
      mimeType: file.getMimeType(),
    }
  ],
  isError: false
}

Both the text data and the image data are transferred. The image data is transferred as base64 data. Also. in this script, the access key of sample is used. Please copy your Web Apps URL. This URL is used for the MCP client.

2. Create an MCP client

In this report, Gemini is used with Python. The original script is from the official page. Ref I modified this script for transferring the image.

Before you run this script, please set your API key. And, replace https://script.google.com/macros/s/###/exec?accessKey=sample with your Web Apps URL.

import asyncio
import base64
import io
import os
from google import genai
from google.genai import types
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from PIL import Image

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

# Create server parameters for stdio connection
server_params = StdioServerParameters(
    command="npx",  # Executable
    args=[
        "mcp-remote",
        "https://script.google.com/macros/s/###/exec?accessKey=sample",
    ],  # MCP Server built by Web Apps with Google Apps Script
    env=None,  # Optional environment variables
)


async def run():
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Prompt to get the weather for the current day in London.
            prompt = 'Get an image of "CherryBlossom" from Google Drive.'
            # Initialize the connection between client and server
            await session.initialize()

            # Get tools from MCP session and convert to Gemini Tool objects
            mcp_tools = await session.list_tools()
            tools = [
                types.Tool(
                    function_declarations=[
                        {
                            "name": tool.name,
                            "description": tool.description,
                            "parameters": {
                                k: v
                                for k, v in tool.inputSchema.items()
                                if k not in ["additionalProperties", "$schema"]
                            },
                        }
                    ]
                )
                for tool in mcp_tools.tools
            ]

            # Send request to the model with MCP function declarations
            response = client.models.generate_content(
                model="gemini-2.0-flash",
                contents=prompt,
                config=types.GenerateContentConfig(
                    temperature=0,
                    tools=tools,
                ),
            )

            # Check for a function call
            if response.candidates[0].content.parts[0].function_call:
                function_call = response.candidates[0].content.parts[0].function_call
                print(function_call)
                # Call the MCP server with the predicted tool
                result = await session.call_tool(
                    function_call.name, arguments=function_call.args
                )
                for e in result.content:
                    if e.type == "text":
                        print(e.text)
                    if e.type == "image":
                        image_data = base64.b64decode(e.data)
                        image = Image.open(io.BytesIO(image_data))
                        image.show()
            else:
                print("No function call found in the response.")
                print(response.text)


# Start the asyncio event loop and run the main function
asyncio.run(run())

3. Testing

When this MCP client is run, the following result is obtained. You can see both the text data and the image data are retrieved.

This sample image was generated by Gemini for testing in advance.

Summary

This report details transferring image data via MCP from a Google Apps Script server to a Python/Gemini client.
It extends the Model Context Protocol to handle image data for multimodal applications.
The server uses Google Apps Script to retrieve images from Google Drive and encode them in base64.
The Python client with Gemini interacts with the server and processes the received base64 image data.
This setup enables processing image data alongside text within the MCP framework using Gemini.

tanaikech/submit.md