Ask Questions to your data with LLM— Part 2 database

Praveen Govindaraj
4 min readJul 8, 2023

Data is like an untapped gold mine. With the right tools, you can extract valuable insights from this mine. Today, we’re diving deeper into the mine by exploring the practical implementation of database queries preparation via simple english questions.

We’re going to employ powerful packages like openai and langchain to interact with databases. No matter if you're a seasoned developer or just starting your Python journey, this article is for you!

Before we get started, if you’re new to Python or need a refresher, you might want to check out Part 1 of this series where we introduced the basics.

Installation

To start off, we need to install the openai and langchain packages. These can be installed via pip, Python's package manager.

Installation

To start off, we need to install the openai and langchain packages. These can be installed via pip, Python's package manager.

!pip install openai -q
!pip install langchain -q

Initialisation

The next step is to initialise our database. In this example, we’re using a SQLite database file named chinook.db.Download this datatabase file from this link Database model

import os
from langchain import OpenAI,SQLDatabase,SQLDatabaseChain
os.environ['OPENAI_API_KEY'] = '___ENTER_YOUR_API_KEY______'
db_file = 'sqlite:///chinook.db'
db = SQLDatabase.from_uri(db_file)

Now, let’s initialise the language model and database chain with OpenAI and SQLDatabaseChain.

llm = OpenAI(temperature=0)
db_chain = SQLDatabaseChain(llm=llm,database=db,verbose=True)

Querying Your Database

With our setup complete, we can start asking our database questions. Here’s an example database model for your reference:

Database model

sqlite tutorial database — chinook

Now, let’s dive into the questions:

Question 1: How many employees are there in the employees table?

question_1 = db_chain.run("How many employees in employees tables")
print(question_1)

> Entering new chain… How many employees in employees tables SQLQuery:SELECT COUNT(*) FROM employees; SQLResult: [(8,)] Answer:There are 8 employees in the employees table. > Finished chain. There are 8 employees in the employees table.

Question 2: How many customers are there by country?

question_2 = db_chain.run("How many customers by country")
print(question_2)

> Entering new chain… How many customers by country SQLQuery:SELECT COUNT(*) AS “Number of Customers”, Country FROM customers GROUP BY Country ORDER BY “Number of Customers” DESC LIMIT 5; SQLResult: [(13, ‘USA’), (8, ‘Canada’), (5, ‘France’), (5, ‘Brazil’), (4, ‘Germany’)] Answer:There are 13 customers from the USA, 8 customers from Canada, 5 customers from France, 5 customers from Brazil, and 4 customers from Germany.

> Finished chain. There are 13 customers from the USA, 8 customers from Canada, 5 customers from France, 5 customers from Brazil, and 4 customers from Germany.

Question 3: Who are the top 3 customers who made maximum purchases based on the unit price of items?

question_3 = db_chain.run("what are the top 3 customers who did maximum purchase based on the unit price of items ")
print(question_3)

> Entering new chain… what are the top 3 customers who did maximum purchase based on the unit price of items SQLQuery:SELECT “FirstName”, “LastName”, SUM(“UnitPrice”) AS “TotalPurchase” FROM customers INNER JOIN invoices ON customers.”CustomerId” = invoices.”CustomerId” INNER JOIN invoice_items ON invoices.”InvoiceId” = invoice_items.”InvoiceId” GROUP BY customers.”CustomerId” ORDER BY “TotalPurchase” DESC LIMIT 3; SQLResult: [(‘Helena’, ‘Holý’, 49.620000000000005), (‘Richard’, ‘Cunningham’, 47.62000000000002), (‘Luis’, ‘Rojas’, 46.62000000000002)] Answer:The top 3 customers who did maximum purchase based on the unit price of items are Helena Holý, Richard Cunningham, and Luis Rojas.

> Finished chain. The top 3 customers who did maximum purchase based on the unit price of items are Helena Holý, Richard Cunningham, and Luis Rojas.

Question 4: Who are the top 3 customers based on the listening time of tracks by hours?

question_4 = db_chain.run("who are the top 3 customers based on the listening time of tracks by hours & share the results in hours ")
print(question_4)

> Entering new chain… who are the top 3 customers based on the liserning time of tracks by hours & share the results in hours SQLQuery:SELECT c.FirstName, c.LastName, SUM(t.Milliseconds) / 3600000 AS ‘ListeningTime’ FROM customers c INNER JOIN invoices i ON c.CustomerId = i.CustomerId INNER JOIN invoice_items ii ON i.InvoiceId = ii.InvoiceId INNER JOIN tracks t ON ii.TrackId = t.TrackId GROUP BY c.CustomerId ORDER BY ListeningTime DESC LIMIT 3; SQLResult: [(‘Helena’, ‘Holý’, 10), (‘Richard’, ‘Cunningham’, 9), (‘Luis’, ‘Rojas’, 8)] Answer:The top 3 customers based on the listening time of tracks are Helena Holý (10 hours), Richard Cunningham (9 hours), and Luis Rojas (8 hours).

> Finished chain. The top 3 customers based on the listening time of tracks are Helena Holý (10 hours), Richard Cunningham (9 hours), and Luis Rojas (8 hours).

Take away

And there you have it — a simple way to ask questions to your database with simple english! As you can see, the process is quite straightforward, even for complex queries.

Next time, we’ll explore more advanced queries and how to use this knowledge to drive business insights. So, stay tuned and keep exploring!

Saying Goodbye 👋 to SQL

To reproduce & get the feel , here you can go with this Colab Python notebook

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

What are your thoughts?