跳到主要内容

MCP Introduction

With the rise in popularity of AI, people are becoming familiar with using AI assistants. These assistants follow a specific pattern: users provide input to a large language model, which then generates content. Both input and output can be multimodal, including text, images, audio, and video. Of course, each model supports different input and output modalities, with text being the most fundamental type.

In many cases, when we use large language models to generate content, we leverage the model's own knowledge. After training, the model already contains a wealth of knowledge. The more training material used, the larger the model's size, and the more knowledge it typically contains.

For example, if we ask a large language model, "What is the distance from the Earth to the Moon?", the model will likely give the correct answer because this knowledge is already embedded within the model.

In many cases, the model lacks the knowledge necessary to generate content because the relevant knowledge was not included in the training material. This situation is unavoidable. There are at least two types of knowledge that cannot be incorporated into the model.

  • The first type is knowledge that occurs after the model is trained. Each model has a cutoff date, meaning it only includes data before that date. For example, OpenAI's GPT model has a cutoff date of June 1, 2024 for GPT-4.1, and October 1, 2023 for GPT-4o.

  • The second category is private data from businesses and individuals. Commonly used models are trained on public data and do not include private data from businesses or individuals.

Without these two types of knowledge, large language models cannot produce accurate results for certain inputs.

Let's take a practical example: you are an employee responsible for maintaining the company's official social media account. Your boss assigns you to write an article to highlight the company's recent sales performance. You plan to use an AI assistant to assist you. Large language models can help generate and rewrite content. However, they lack crucial data needed for content generation: the company's sales data. Large language models currently available on the market don't include this data. How can this problem be solved?

Data or knowledge can be provided in two ways. One approach is to embed knowledge into the model, essentially fine-tuning the model. The other approach is to enable the model to utilize knowledge through external context.

Through model fine-tuning, additional knowledge can be added to the model as training material. Fine-tuning a model is not a simple task. Although the cost and difficulty of fine-tuning models have been greatly reduced with technological advances, it remains a time-consuming and expensive process. Furthermore, some data is highly real-time and thus unsuitable for model fine-tuning.

Compared to model fine-tuning, implementing contextual knowledge is relatively quick and simple. There are two ways to implement contextual knowledge.

  • The first involves incorporating knowledge into the input sent to the large language model, often referred to as prompts. This is often referred to as retrieval-augmented generation (RAG).
  • The second approach involves providing tools to the large language model. Tools can provide data or perform actions. The application provides the tool's name, description, input parameters schema, and specific implementation. The model then infers the tool's name and actual parameter values as needed. The application calls the tool and returns the results to the model, which then performs subsequent generation based on the tool's call results.

Let's return to the example of writing the article previously introduced.

  • The first approach is to send sales data directly to the model as part of the prompt. Sales data is obtained externally and manually added by the user.
  • The second approach is to develop a tool that can obtain sales data and then provide this tool to the model. During the generation process, the model uses the tool to obtain data. When using the tool, the user does not need to actively provide additional knowledge, but the tool does require additional development effort.

Having introduced the relevant background, let's return to the topic of MCP. MCP stands for Model Context Protocol.

  • Model refers to the AI model.
  • Context refers to contextual information, which is the external knowledge mentioned earlier.
  • Protocol indicates that MCP is a protocol.

Therefore, MCP is a protocol for providing external knowledge to AI models.

The key point of MCP is P, which stands for protocol, indicating that it is a standard interoperable protocol. Methods for providing external knowledge to AI models already existed before the advent of MCP, and RAG and tool invocation are not new. The problem is that there is no standard way to provide and use external knowledge. For example, the provision and use of tools are tightly coupled with the programming language and framework used by the application. Applications written in Java cannot simply use tools written in JavaScript.

The emergence of MCP solves this problem. The solution is simple and not novel. It introduces a standard protocol that divides the provision and use of contextual knowledge into two roles: server and client. The server and client interact using a standard protocol. The MCP server is responsible for providing contextual knowledge, including prompt templates, resources, and tools. The MCP client interacts with the server to obtain this contextual knowledge, including prompts, resource content, and tool invocation. The server and client communicate using a JSON-RPC based protocol. This decouples the providers and users of contextual knowledge. The most direct benefit of this decoupling is that it facilitates sharing and reuse.

For AI applications, simply integrating a single MCP client allows them to use contextual knowledge from different MCP servers, without restricting the implementation of the MCP server itself. For example, an AI application developed in Java, after integrating a Java MCP client, can use contextual knowledge from an MCP server written in JavaScript or Python.

The greatest value of MCP lies in its ability to facilitate the sharing of contextual knowledge. Currently, a large number of reusable MCP servers are available. Due to the importance of MCP, current AI application development is inseparable from understanding MCP.