Ask Questions to your data with LLM— Part 2 database
Data is like an untapped gold mine. With the right tools, you can extract valuable insights from this mine. Today, we’re diving deeper into the mine by exploring the practical implementation of database queries preparation via simple english questions.
We’re going to employ powerful packages like openai
and langchain
to interact with databases. No matter if you're a seasoned developer or just starting your Python journey, this article is for you!
Before we get started, if you’re new to Python or need a refresher, you might want to check out Part 1 of this series where we introduced the basics.
Installation
To start off, we need to install the openai
and langchain
packages. These can be installed via pip, Python's package manager.
Installation
To start off, we need to install the openai
and langchain
packages. These can be installed via pip, Python's package manager.
!pip install openai -q
!pip install langchain -q
Initialisation
The next step is to initialise our database. In this example, we’re using a SQLite database file named chinook.db
.Download this datatabase file from this link Database model
import os
from langchain import OpenAI,SQLDatabase,SQLDatabaseChain
os.environ['OPENAI_API_KEY'] = '___ENTER_YOUR_API_KEY______'
db_file = 'sqlite:///chinook.db'
db = SQLDatabase.from_uri(db_file)
Now, let’s initialise the language model and database chain with OpenAI
and SQLDatabaseChain
.
llm = OpenAI(temperature=0)
db_chain = SQLDatabaseChain(llm=llm,database=db,verbose=True)
Querying Your Database
With our setup complete, we can start asking our database questions. Here’s an example database model for your reference:
Now, let’s dive into the questions:
Question 1: How many employees are there in the employees table?
question_1 = db_chain.run("How many employees in employees tables")
print(question_1)
> Entering new chain… How many employees in employees tables SQLQuery:SELECT COUNT(*) FROM employees; SQLResult: [(8,)] Answer:There are 8 employees in the employees table. > Finished chain. There are 8 employees in the employees table.
Question 2: How many customers are there by country?
question_2 = db_chain.run("How many customers by country")
print(question_2)
> Entering new chain… How many customers by country SQLQuery:SELECT COUNT(*) AS “Number of Customers”, Country FROM customers GROUP BY Country ORDER BY “Number of Customers” DESC LIMIT 5; SQLResult: [(13, ‘USA’), (8, ‘Canada’), (5, ‘France’), (5, ‘Brazil’), (4, ‘Germany’)] Answer:There are 13 customers from the USA, 8 customers from Canada, 5 customers from France, 5 customers from Brazil, and 4 customers from Germany.
> Finished chain. There are 13 customers from the USA, 8 customers from Canada, 5 customers from France, 5 customers from Brazil, and 4 customers from Germany.
Question 3: Who are the top 3 customers who made maximum purchases based on the unit price of items?
question_3 = db_chain.run("what are the top 3 customers who did maximum purchase based on the unit price of items ")
print(question_3)
> Entering new chain… what are the top 3 customers who did maximum purchase based on the unit price of items SQLQuery:SELECT “FirstName”, “LastName”, SUM(“UnitPrice”) AS “TotalPurchase” FROM customers INNER JOIN invoices ON customers.”CustomerId” = invoices.”CustomerId” INNER JOIN invoice_items ON invoices.”InvoiceId” = invoice_items.”InvoiceId” GROUP BY customers.”CustomerId” ORDER BY “TotalPurchase” DESC LIMIT 3; SQLResult: [(‘Helena’, ‘Holý’, 49.620000000000005), (‘Richard’, ‘Cunningham’, 47.62000000000002), (‘Luis’, ‘Rojas’, 46.62000000000002)] Answer:The top 3 customers who did maximum purchase based on the unit price of items are Helena Holý, Richard Cunningham, and Luis Rojas.
> Finished chain. The top 3 customers who did maximum purchase based on the unit price of items are Helena Holý, Richard Cunningham, and Luis Rojas.
Question 4: Who are the top 3 customers based on the listening time of tracks by hours?
question_4 = db_chain.run("who are the top 3 customers based on the listening time of tracks by hours & share the results in hours ")
print(question_4)
> Entering new chain… who are the top 3 customers based on the liserning time of tracks by hours & share the results in hours SQLQuery:SELECT c.FirstName, c.LastName, SUM(t.Milliseconds) / 3600000 AS ‘ListeningTime’ FROM customers c INNER JOIN invoices i ON c.CustomerId = i.CustomerId INNER JOIN invoice_items ii ON i.InvoiceId = ii.InvoiceId INNER JOIN tracks t ON ii.TrackId = t.TrackId GROUP BY c.CustomerId ORDER BY ListeningTime DESC LIMIT 3; SQLResult: [(‘Helena’, ‘Holý’, 10), (‘Richard’, ‘Cunningham’, 9), (‘Luis’, ‘Rojas’, 8)] Answer:The top 3 customers based on the listening time of tracks are Helena Holý (10 hours), Richard Cunningham (9 hours), and Luis Rojas (8 hours).
> Finished chain. The top 3 customers based on the listening time of tracks are Helena Holý (10 hours), Richard Cunningham (9 hours), and Luis Rojas (8 hours).
Take away
And there you have it — a simple way to ask questions to your database with simple english! As you can see, the process is quite straightforward, even for complex queries.
Next time, we’ll explore more advanced queries and how to use this knowledge to drive business insights. So, stay tuned and keep exploring!
Saying Goodbye 👋 to SQL
To reproduce & get the feel , here you can go with this Colab Python notebook