Skip to main content

Text-to-SQL Course March Update

· 2 min read

In this post, I'll talk about changes made to the implementation of Text-to-SQL course.

Table Metadata Store

In the current implementation, table metadata is stored as the content of a document in the vector store. This metadata has two usage scenarios. One is used in the prompt sent to LLM for SQL statement generation. The other is used for similarity search of tables related to user query. These two usage scenarios are quite different. In the new implementation, table metadata is stored in a Postgres database table. After this separation, the content of a document can store other data. I added a new lecture about this change.

In a previous lecture, I talked about the context size of the prompt sent to an LLM. One reason for this is the cost of calling AI services. Models like DeepSeek have a very low cost to use. AI services also use prompt cache to further reduce the cost. So cost is no longer a big issue. For small to medium sized databases, including metadata of all tables is a simple but effective approach. We don't need to use a vector store to search for related tables.

Sample Database

The sample database is also updated. pagila, a port of mysql sakila database, is used as an example. This database has multiple tables. It's a better example to demonstrate Text-to-SQL capabilities.

Target Driven Generation

For some scenarios, we do have a clear requirement about the format of query result. For example, when developing the backend service for a REST API, we need to query a database to get the data. The format of the query result must match the contract defined by the API. So we can instruct the LLM to return SQL statement that will produce the result matching the format.

I added a new lecture about generating SQL statements matching a predefined schema.

That's all for the updates. The source code has been updated. You can get a copy of the source code from the fifth lecture.