Building a Text-to-SQL App: A Weekend MVP Story
Or: How I learned that schema retrieval is way harder than SQL generation
The Idea
Last weekend, I decided to scratch an itch that’s been bugging me for a while: building an AI app from scratch.
For the past two years, I’ve used no-code AI tools like Lovable . They’re fast, but they’re also black boxes to me. As a mechanical engineer turned analyst, not being able to open the hood and tinker with the internals felt wrong. So I have been wanting to learn how to code properly.
I understand the theory—how apps work, how AI works. This weekend was about learning implementation.
I picked a project I could actually evaluate: an app that converts plain English into SQL queries. With several years of fintech SQL experience, I’d know immediately if it was generating garbage.
I'd actually written about this idea before in “Talk to your Data. Watch it Talk Back” exploring the concept theoretically. This weekend was about putting theory into practice.
Goal: Type “What was the total spend in April?” → Get answer immediately.
The AI worked. But it also hallucinated—a lot. Made up table names, invented columns, generated queries for databases that didn’t exist. The interesting part wasn’t that it happened, but figuring out why and how to prevent it.
The Stack (Speed Over Perfection)
I went with tools that work with python:
Streamlit for the UI
Python for backend logic
SQLite for the database (no setup, perfect for testing)
Google Gemini as the LLM (free tier + surprisingly good at structured output)
This wasn’t about building the perfect tech stack. It was about shipping something that works.
How It Works (The Simple Version)
The flow is straightforward:
User types a question in natural language
App sends the question + database schema to Gemini
Gemini generates SQL
Execute the query on SQLite
Display results back to the user
Getting to Working Prototype
I'm still learning to code, so it took me way longer than expected. I had to look up multiple times how to create virtual environments, debug import errors, and figure out basic Python file structure.
But I broke the problem down into pieces: database first, then schema, then UI. Here's how each piece came together.
SQLite: Starting Smaller Than Small
SQLite was completely new to me. I’d heard of it, knew it was “lightweight,” but had never actually used it.
My first version was almost embarrassingly simple: I hard coded 5 transactions directly into the database.
Transaction 1: Starbucks, $4.50, Coffee
Transaction 2: Shell, $45.00, Gas
Transaction 3: Amazon, $67.23, Shopping
Transaction 4: Whole Foods, $34.12, Groceries
Transaction 5: Netflix, $15.99, EntertainmentThat’s it. Five rows. But it was enough to test the core logic: could Gemini generate SQL that actually worked on my database?
It worked.
Once I had proof of concept with my toy dataset, I first learned how to load csv into SQLite and then upgraded to the Kaggle credit card transactions dataset with 1.2 million rows. This was about testing real world conditions I am used to . Would the app still work when:
Column names got messy (real data always has inconsistencies)
Query responses returned thousands of rows instead of five
The database actually took time to execute queries
Turns out, it mostly worked. The SQL generation didn’t care about dataset size. The problems showed up elsewhere (more on that in the Schema Retrieval section).
I created a separate sql.py file that handles database creation and loading the CSV data. Clean separation—one file for database setup, another for everything else.
The Schema Retrieval Problem (Where I Lost My Weekend)
Here's what nobody tells you about building text-to-SQL systems: when LLMs ask for "context," they don't mean "give me a general idea and I'll figure it out."
They mean: "Give me all the information, structured in an easy-to-read format, and maybe—maybe—I can help you." The difference between "general idea" and "precise schema" is the difference between hallucination and working SQL.
Attempt #1: Just Send the Question
My first approach was naive. I sent user questions directly to Gemini with a basic prompt and hoped for the best.
Result: Gemini confidently generated SQL for tables that didn’t exist. It would use purchases when my table was called CC_trans. It made up column names. Pure hallucination.
Attempt #2: Dump Everything
Fine, I thought. I’ll just send the entire metadata file from Kaggle to Gemini every time.
Result: Too much context and not in the right format. The LLM got confused by irrelevant tables and generated overly complex queries. Plus, for larger databases, this would blow through token limits fast.
Attempt #3: Surgical Precision
The solution that actually worked: label all relevant tables and their column types with a description.
Figuring out the right format took about 80% of my development time.
I hard coded everything directly in the prompt; fastest way to get the MVP working. I looked into automating metadata creation, but quickly realized that was a rabbit hole and not critical for validating the core idea.
Sample of the format I used in the prompt for column descriptions-
Table: CC_trans
-index: Unique row identifier.
-trans_date_trans_time: The timestamp of the transaction (Date and Time).
-cc_num: The credit card number used.
-merchant: The name of the merchant or vendor.
-category: Category of the transaction (e.g., entertainment, food, gas
-amt: The transaction amount in dollars (Numeric).
-first: The first name of the cardholder.
-last: The last name of the cardholder.
-gender: Gender of the cardholder (M/F).
-street: street name for cardholder.
-city: city cardholder resides in.
-state: state city belongs to.
-zip: zipcode of cardholder address.Streamlit: From Zero to UI in an Afternoon
At work, my team had used Streamlit in an earlier project so I theoretically knew why it was good choice for an MVP. This was my first time using it to create a UI
The Streamlit documentation is genuinely good. Combined with a few YouTube tutorials on basic layout and input handling, I had a working UI in a few hours. Not beautiful, but functional:
Text input for user questions
Button to submit
Display area for results
Error messages when things broke
The beauty of Streamlit is that it handles all the web stuff I didn’t want to think about. No HTML, no CSS, no JavaScript. Just Python functions that magically become a web interface.
The Code Structure (Kept Simple on Purpose)
My entire application is two Python files:
sql.py - Database setup
Creates the SQLite database
Loads CSV data from Kaggle
That’s it. Run once, then forget about it.
app.py - Everything else
Streamlit UI elements (text input, buttons, display)
API call to Google Gemini with the query, prompt and schema
Clean the SQL response (Gemini sometimes wraps it in markdown code blocks)
Execute SQL against the database
Clean the results (format tables, handle errors)
Send formatted response back to user
The flow in app.py is linear: Input → LLM → Clean → Execute → Clean → Display
No fancy architecture. No classes. No design patterns. Just functions that call other functions.
Could I have structured this better? Absolutely. Should I have added proper error handling, logging, and validation? Probably. But this was about learning by building, not building the perfect system. I learned most while debugging
The debugging was mostly:
“Why is Gemini returning markdown formatting when I need raw SQL?”
“Why does SQLite hate my date format?”
“How do I display a pandas dataframe in Streamlit without it looking terrible?”
It’s Alive !!
Example 1:
Here's what the user sees:
Types: "Total spend for April?"
Behind the scenes: App grabs schema (hard coded in prompt for MVP) → sends to Gemini along with natural language ask → gets SQL back
Generated SQL:
SELECT SUM(amt) FROM CC_trans WHERE strftime('%m', trans_date_trans_time) = '04'app.py then executes the SQL generated by the LLM on the SQlite database
Result displays in <2 seconds:
Value: 9,452,834.36
Column: SUM(amt) (picked up along with result from the SQLite)
The app uses pandas to format the SQL results into a readable table for Streamlit, keeps everything clean even with simple column names.
Example 2:
More complex example where the LLM has to generate top 5 spenders in California in May. Took ~3s, most of that is SQLite processing the query, not LLM response time.
SQL generated →
SELECT first, last, SUM(amt) AS total_spent
FROM CC_trans WHERE state = ‘CA’
AND STRFTIME(’%m’, trans_date_trans_time) = ‘05’
GROUP BY first, last
ORDER BY total_spent DESC
LIMIT 5Example 3 (fail)
Input - Who spent the most in April and who spent the most in March ?
SQL Generated →
SELECT ‘April’ AS Month, first, last, SUM(amt) AS total_spent
FROM CC_trans
WHERE STRFTIME(’%m’, trans_date_trans_time) = ‘04’
GROUP BY first, last
ORDER BY total_spent DESC
LIMIT 1
UNION ALL
SELECT ‘March’ AS Month, first, last, SUM(amt) AS total_spent
FROM CC_trans
WHERE STRFTIME(’%m’, trans_date_trans_time) = ‘03’
GROUP BY first, last
ORDER BY total_spent DESC
LIMIT 1;**Error:** `near "UNION": syntax error`
SQLite doesn't allow ORDER BY and LIMIT within individual SELECT statements when using UNION—they must either come after the entire UNION or be wrapped in subqueries. Gemini understood the natural language structure (solve for April, solve for March, combine results) but generated invalid SQL syntax.
The LLM got the logic right but missed the syntax rules. This is where validation becomes critical—the query looks reasonable at first glance.
Expected Query
SELECT * FROM
(
SELECT 'April' AS Month, first, last, SUM(amt) AS total_spent
FROM CC_trans
WHERE STRFTIME('%m', trans_date_trans_time) = '04'
GROUP BY first, last
ORDER BY total_spent DESC
LIMIT 1
)
UNION ALL
SELECT * FROM
(
SELECT 'March' AS Month, first, last, SUM(amt) AS total_spent
FROM CC_trans
WHERE STRFTIME('%m', trans_date_trans_time) = '03'
GROUP BY first, last
ORDER BY total_spent DESC
LIMIT 1
);The Validation Problem (Still Unsolved)
Even with good schema context, Gemini occasionally generates invalid SQL. The problems fall into two categories:
1. Wrong table/column names- Happens with complex queries. Gemini sometimes uses exact wording from the user's question instead of mapping to actual schema columns. User asks about "spending," Gemini uses "spending" in the query, but the actual column is "amt."
2. Syntax errors - Sometimes generates PostgreSQL syntax when it should be SQLite. Different databases have different dialects, and LLMs don't always remember which one you're using.
Right now, my "solution" is embarrassingly simple: try/catch blocks, display errors to users, and explicit prompt rules ("Always use SQLite syntax, never PostgreSQL").
Does it work 100% of the time? No. But it catches most issues, and users can rephrase when something breaks.
Lessons learned Building AI-Powered MVPs
1. Language ≠ Intelligence
You can’t just throw a problem at an LLM and expect it to figure everything out. You need to structure the input, provide context, and validate the output.
2. Start Simpler Than You Think
I wanted to handle complex 3-table joins on day one. I should have started with single-table SELECT queries and built up.
3. Ship the MVP, Even If It’s Rough
This app has obvious flaws. The validation is weak. The error handling is basic. The UI is bare-bones.
But it works. And I learned more by shipping it than I would have by endlessly refining it.
Try It Yourself
The hardest part of text-to-SQL isn’t picking an LLM or choosing a database. It’s figuring out how to represent your schema in a way that gives the model exactly enough context to succeed.
Start there. Everything else is easier.
Building something similar?
I’d love to hear your experience. Connect with me on LinkedIn or drop a comment below.

