SQLite Notes

About SQLite

SQLite is an effective solution for data too large for RAM but small enough to fit on a local hard drive.
SQLite is serverless unlike client/server databases such as MySQL, PostgreSQL, SQL Server, and Oracle. SQLite does not need to be “installed” before it is used. There is no “setup” procedure.
A SQLite database is a single ordinary disk file that can be located anywhere in the directory hierarchy.
SQLite is a distinct SQL implementation. SQL code you find online to accomplish a task may not work in SQLite. For example, SQLite has no DATETIME datatype.

Installing SQLite

Windows

Go to https://www.sqlite.org/download.html and download the 64-bit DLL and the command-line tools bundle.
Extract files and copy into a new folder. Suggestion: copy into a folder you create called “SQLite” at the top level of the C drive or root directory.
Add SQLite directory to path: (Win 10) Control Panel > User Accounts > User Accounts, go to “Change my environment variables”, click Path, click Edit, Click New, add C:\SQLite entry, OK, OK.

Macs

Macs should already have SQLite installed.

Running SQLite from command line

Open terminal or cmd.exe and enter sqlite3 followed by name of database you wish to open.
Running sqlite3 by itself creates a transient in-memory database.
The Introduction to SQLite (Jan 2023) course by Brian Jepson on LinkedIn Learning teaches SQLite using the command line.
SQLite provides special commands to sqlite3 called “dot-commands”: https://www.sqlite.org/cli.html
.tables shows all tables in current database
.exit exits SQLite

SQLite GUI interfaces

There are programs available that provide a GUI to make it easier to browse database tables, write SQL syntax, etc.
Two that I have used:
- SQLiteStudio
- DB Browser for SQLite
Both install SQLite when installed. However they tend to use older versions of SQLite.
DB Browser for SQLite is similar to RStudio in that you can write SQL syntax and click Ctrl + Enter to submit and execute code.
The Learning SQL Programming (March 2021) course by Scott Simpson on LinkedIn Learning teaches SQLite using DB Browser for SQLite.
The SQL Essential Training (Dec 2019) course by Bill Weinman on LinkedIn Learning teaches SQLite using SQLiteStudio.

Creating a database

Using command line: navigate to where you want to create database and enter sqlite3 name_of_database.db
Using DB Browser for SQLite: File…New Database…, browse to where you want to create database, enter name, and hit Enter.

Next you need to add tables to the database.

Creating tables

By import, no existing table

If you have data in a CSV file and no existing table, you can do the following from the sqlite3 command line:

sqlite> .mode csv
sqlite> .import file.csv table_name

This will import the contents of the CSV file into a new table called “table_name”. This seems to import everything as TEXT.

To edit table schema using GUIs:

DB Browser for SQLite: Click on table, click Modify Table button, and edit properties. Can modify multiple variables at once.
SQLiteStudio: go to Table, go to Structure tab, doubleclick on variable, and edit properties. Have to do one variable at a time.

Create empty table

You can also define a table and then populate it. Text preceded with -- below are comments.

Example of creating an empty table:

CREATE TABLE t1
  (id  INTEGER,   -- variable name followed by storage type
    y   NUMERIC,
    x1  INTEGER,
    x2  REAL,   
    x3  TEXT);

Can use .schema command at SQLite command line to view and verify table definition. Then you can import a CSV file into the table using the steps above.

Repeat for multiple tables as needed.

SQLite has 5 data types: INTERGER, TEXT, NUMERIC, REAL, BLOB. See: https://www.sqlite.org/datatype3.html

Submitting queries

Queries can be submitted one at a time from SQLite command line or using the GUIs.
Multiple queries can be written/saved as txt in a .sql file.
In DB Browser for SQLite: go to Execute SQL tab, write query, and hit Ctrl + Enter
In SQLiteStudio: Open SQL Editor (Tools…Open SQL Editor), write query, and hit F9
The DB Browser for SQLite editor provides code completion, which is very handy.
DB Browser for SQLite makes it easy to export a query result to CSV. Click on “Save the Results View” button and select “Export to CSV”

Connect to SQLite db - R

The RSQLite package provides an interface to SQLite databases. Per the RSQLite vignette: “RSQLite is a DBI-compatible interface which means you primarily use functions defined in the DBI package, so you should always start by loading DBI, not RSQLite.”

Here’s an example of accessing a SQLite database called “homes” that contains real estate data from Albemarle County in Virginia, USA. We query the parcels table for records where LotSize is greater than 500 acres. The query is issued as a text string using the dbGetQuery() function. The result is saved as d, a data frame, which has 56 rows and 32 variables.

library(DBI)
mydb <- dbConnect(RSQLite::SQLite(), "homes.db")
d <- dbGetQuery(mydb, 'select * from parcels where LotSize > 500;')
dim(d)

[1] 56 32

class(d)

[1] "data.frame"

It’s good practice to close the database connection when finished. “This closes the connection, discards all pending work, and frees resources (e.g., memory, sockets).” (see ?dbDisconnect)

dbDisconnect(mydb)

Connect to SQLite db - Python

The sqlite3 module provides an interface to SQLite databases.

Below we replicate the example from the R section above using the pandas function read_sql_query(). The result is a pandas data frame with 56 rows and 32 variables.

import pandas as pd
import sqlite3
con = sqlite3.connect("homes.db")
df = pd.read_sql_query("select * from parcels where LotSize > 500;", con)
df.shape

(56, 32)

It’s good practice to close the database connection when finished.

con.close()

Example queries/SQL code

These examples come from LinkedIn courses.

Introduction to SQLite

chinook <- dbConnect(RSQLite::SQLite(), "courses/intro_to_sqlite/Chinook_Learning_Sqlite.sqlite")

Select 10 records from Invoice table, only show first 10.

SELECT * FROM Invoice LIMIT 10;

Displaying records 1 - 10
InvoiceId	CustomerId	InvoiceDate	BillingAddress	BillingCity	BillingState	BillingCountry	BillingPostalCode	Total
1	2	2023-01-01 00:00:00	Theodor-Heuss-Straße 34	Stuttgart	NA	Germany	70174	1.98
2	4	2023-01-02 00:00:00	Ullevålsveien 14	Oslo	NA	Norway	0171	3.96
3	8	2023-01-03 00:00:00	Grétrystraat 63	Brussels	NA	Belgium	1000	5.94
4	14	2023-01-06 00:00:00	8210 111 ST NW	Edmonton	AB	Canada	T6G 2C7	8.91
5	23	2023-01-11 00:00:00	69 Salem Street	Boston	MA	USA	2113	13.86
6	37	2023-01-19 00:00:00	Berger Straße 10	Frankfurt	NA	Germany	60316	0.99
7	38	2023-02-01 00:00:00	Barbarossastraße 19	Berlin	NA	Germany	10779	1.98
8	40	2023-02-01 00:00:00	8, Rue Hanovre	Paris	NA	France	75002	1.98
9	42	2023-02-02 00:00:00	9, Place Louis Barthou	Bordeaux	NA	France	33000	3.96
10	46	2023-02-03 00:00:00	3 Chatham Street	Dublin	Dublin	Ireland	NA	5.94

To output query result to csv file using sqlite3 command line, run the following:

.mode csv
.header on
.output filename.csv

Output of next query is sent to filename.csv

To restore default output:

.output stdout
.mode list

To see file from sqlite3 command line:

.shell filename.csv

Examples of using WHERE clause

SELECT * FROM Genre WHERE name LIKE 'S%';

3 records
GenreId	Name
10	Soundtrack
18	Science Fiction
20	Sci Fi & Fantasy

SELECT * FROM Invoice WHERE Total > 10;

Displaying records 1 - 10
InvoiceId	CustomerId	InvoiceDate	BillingAddress	BillingCity	BillingState	BillingCountry	BillingPostalCode	Total
5	23	2023-01-11 00:00:00	69 Salem Street	Boston	MA	USA	2113	13.86
12	2	2023-02-11 00:00:00	Theodor-Heuss-Straße 34	Stuttgart	NA	Germany	70174	13.86
19	40	2023-03-14 00:00:00	8, Rue Hanovre	Paris	NA	France	75002	13.86
26	19	2023-04-14 00:00:00	1 Infinite Loop	Cupertino	CA	USA	95014	13.86
33	57	2023-05-15 00:00:00	Calle Lira, 198	Santiago	NA	Chile	NA	13.86
40	36	2023-06-15 00:00:00	Tauentzienstraße 8	Berlin	NA	Germany	10789	13.86
47	15	2023-07-16 00:00:00	700 W Pender Street	Vancouver	BC	Canada	V6C 1G8	13.86
54	53	2023-08-16 00:00:00	113 Lupus St	London	NA	United Kingdom	SW1V 3EN	13.86
61	32	2023-09-16 00:00:00	696 Osborne Street	Winnipeg	MB	Canada	R3L 2B9	13.86
68	11	2023-10-17 00:00:00	Av. Paulista, 2022	São Paulo	SP	Brazil	01310-200	13.86

SELECT *
FROM Invoice
WHERE Total > 10
AND InvoiceDate BETWEEN '2024-01-01' AND '2024-03-31';

6 records
InvoiceId	CustomerId	InvoiceDate	BillingAddress	BillingCity	BillingState	BillingCountry	BillingPostalCode	Total
250	55	2024-01-01 00:00:00	421 Bourke Street	Sidney	NSW	Australia	2010	13.86
257	34	2024-02-01 00:00:00	Rua da Assunção 53	Lisbon	NA	Portugal	NA	13.86
264	13	2024-03-03 00:00:00	Qe 7 Bloco G	Brasília	DF	Brazil	71020-677	13.86
334	39	2024-01-07 00:00:00	4, Rue Milton	Paris	NA	France	75009	13.86
341	18	2024-02-07 00:00:00	627 Broadway	New York	NY	USA	10012-2612	13.86
348	56	2024-03-10 00:00:00	307 Macacha Güemes	Buenos Aires	NA	Argentina	1106	13.86

SELECT * FROM Genre WHERE LOWER(Name) = 'rock';

1 records
GenreId	Name
1	Rock

SELECT * FROM Genre WHERE LOWER(Name) LIKE '%rock%';

2 records
GenreId	Name
1	Rock
5	Rock And Roll

Examples of queries in aggregate.

Sum of Total, returned as OverallTotal.

SELECT SUM(Total) AS OverallTotal
FROM Invoice;

1 records
OverallTotal
2328.6

Sum of Total by InvoiceDate.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate;

Displaying records 1 - 10
InvoiceDate	DailyTotal
2023-01-01 00:00:00	1.98
2023-01-02 00:00:00	4.95
2023-01-03 00:00:00	5.94
2023-01-06 00:00:00	8.91
2023-01-08 00:00:00	3.96
2023-01-09 00:00:00	3.96
2023-01-10 00:00:00	6.94
2023-01-11 00:00:00	13.86
2023-01-13 00:00:00	17.91
2023-01-15 00:00:00	3.96

Sum of Total by InvoiceDate, sorted by InvoiceDate.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate
ORDER BY InvoiceDate;

Displaying records 1 - 10
InvoiceDate	DailyTotal
2023-01-01 00:00:00	1.98
2023-01-02 00:00:00	4.95
2023-01-03 00:00:00	5.94
2023-01-06 00:00:00	8.91
2023-01-08 00:00:00	3.96
2023-01-09 00:00:00	3.96
2023-01-10 00:00:00	6.94
2023-01-11 00:00:00	13.86
2023-01-13 00:00:00	17.91
2023-01-15 00:00:00	3.96

Sum of Total by InvoiceDate, sorted by InvoiceDate in descending order.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate
ORDER BY SUM(Total) DESC;

Displaying records 1 - 10
InvoiceDate	DailyTotal
2024-11-13 00:00:00	25.86
2024-08-05 00:00:00	23.86
2023-04-28 00:00:00	21.86
2023-02-18 00:00:00	21.86
2023-05-29 00:00:00	18.86
2023-01-18 00:00:00	18.86
2023-01-13 00:00:00	17.91
2024-10-06 00:00:00	16.86
2024-09-05 00:00:00	16.86
2023-06-29 00:00:00	15.86

Sum of Total by InvoiceDate having total > 10, sorted by InvoiceDate in descending order.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate
HAVING SUM(Total) > 10
ORDER BY SUM(Total) DESC;

Displaying records 1 - 10
InvoiceDate	DailyTotal
2024-11-13 00:00:00	25.86
2024-08-05 00:00:00	23.86
2023-04-28 00:00:00	21.86
2023-02-18 00:00:00	21.86
2023-05-29 00:00:00	18.86
2023-01-18 00:00:00	18.86
2023-01-13 00:00:00	17.91
2024-10-06 00:00:00	16.86
2024-09-05 00:00:00	16.86
2023-06-29 00:00:00	15.86

Sum of Total and count of Invoices by InvoiceDate and Customer ID

SELECT InvoiceDate,
       CustomerId,
       SUM(Total) AS DailyTotal,
       COUNT(InvoiceId) AS InvoiceCount
FROM Invoice
GROUP BY InvoiceDate, CustomerId;

Displaying records 1 - 10
InvoiceDate	CustomerId	DailyTotal	InvoiceCount
2023-01-01 00:00:00	2	1.98	1
2023-01-02 00:00:00	4	3.96	1
2023-01-02 00:00:00	26	0.99	1
2023-01-03 00:00:00	8	5.94	1
2023-01-06 00:00:00	14	8.91	1
2023-01-08 00:00:00	43	1.98	1
2023-01-08 00:00:00	45	1.98	1
2023-01-09 00:00:00	47	3.96	1
2023-01-10 00:00:00	51	6.94	1
2023-01-11 00:00:00	23	13.86	1

Examples of JOINS

Inner join: Get the count of invoice IDs for every customer who has a company name associated with their record in the customer table. The WHERE company is NOT NULL clause takes care of only selecting customers with company names.

SELECT Company, COUNT(InvoiceID)
FROM Customer
JOIN Invoice ON Invoice.CustomerId = Customer.CustomerId
WHERE Company IS NOT NULL
GROUP BY Company;

Displaying records 1 - 10
Company	COUNT(InvoiceID)
Apple Inc.	7
Banco do Brasil S.A.	7
Embraer - Empresa Brasileira de Aeronáutica S.A.	7
Google Inc.	7
JetBrains s.r.o.	7
Microsoft Corporation	7
Riotur	7
Rogers Canada	7
Telus	7
Woodstock Discos	7

Left Join: Get results even if customer ID values fall outside of the intersection. So in the case of a customer that has no matching invoice records, such as LinkedIn, the invoice columns are represented as NULLS, and so the COUNT invoice ID calculation returns zero.

SELECT Company, COUNT(InvoiceID)
FROM Customer
LEFT OUTER JOIN Invoice ON Invoice.CustomerId = Customer.CustomerId
WHERE Company IS NOT NULL
GROUP BY Company;

Displaying records 1 - 10
Company	COUNT(InvoiceID)
Apple Inc.	7
Banco do Brasil S.A.	7
Embraer - Empresa Brasileira de Aeronáutica S.A.	7
Google Inc.	7
JetBrains s.r.o.	7
LinkedIn	0
Microsoft Corporation	7
Riotur	7
Rogers Canada	7
Telus	7

Examples of basic functions

SELECT ROUND(1.2345678, 3);

1 records
ROUND(1.2345678, 3)
1.235

SELECT INSTR(BillingAddress, ' ') FROM Invoice LIMIT 5;

5 records
INSTR(BillingAddress, ’ ’)
21
14
13
5
3

SELECT SUBSTR(BillingAddress, 0, INSTR(BillingAddress, ' '))
FROM Invoice LIMIT 5;

5 records
SUBSTR(BillingAddress, 0, INSTR(BillingAddress, ’ ’))
Theodor-Heuss-Straße
Ullevålsveien
Grétrystraat
8210
69

SELECT FORMAT('$%f', SUM(Total)) FROM Invoice;

1 records
FORMAT(‘$%f’, SUM(Total))
$2328.600000

SELECT FORMAT('$%.2f', SUM(Total)) FROM Invoice;

1 records
FORMAT(‘$%.2f’, SUM(Total))
$2328.60

SELECT InvoiceDate, strftime('%Y-%m-%d', InvoiceDate)
FROM Invoice LIMIT 10;

Displaying records 1 - 10
InvoiceDate	strftime(‘%Y-%m-%d’, InvoiceDate)
2023-01-01 00:00:00	2023-01-01
2023-01-02 00:00:00	2023-01-02
2023-01-03 00:00:00	2023-01-03
2023-01-06 00:00:00	2023-01-06
2023-01-11 00:00:00	2023-01-11
2023-01-19 00:00:00	2023-01-19
2023-02-01 00:00:00	2023-02-01
2023-02-01 00:00:00	2023-02-01
2023-02-02 00:00:00	2023-02-02
2023-02-03 00:00:00	2023-02-03

Example of aggregate functions

Count number of rows.

SELECT COUNT(*) FROM Invoice;

1 records
COUNT(*)
412

Count number of non Null values for a given column.

SELECT COUNT(BillingPostalCode) FROM Invoice;

1 records
COUNT(BillingPostalCode)
384

Number of distinct postal codes in Invoice table

SELECT COUNT(DISTINCT BillingPostalCode) FROM Invoice;

1 records
COUNT(DISTINCT BillingPostalCode)
55

Number of customers in each country.

SELECT Country,
       COUNT(DISTINCT CustomerId) AS num
FROM Customer
GROUP BY Country;

Displaying records 1 - 10
Country	num
NA	1
Argentina	1
Australia	1
Austria	1
Belgium	1
Brazil	5
Canada	8
Chile	1
Czech Republic	2
Denmark	1

Average of previouq query

SELECT AVG(num)
FROM (SELECT Country,
             COUNT(DISTINCT CustomerId) AS num
      FROM Customer
      GROUP BY Country
);

1 records
AVG(num)
2.4

Get min and max as well.

SELECT AVG(num) AS avg_num,
       MIN(num) AS min_num,
       MAX(num) AS max_num
FROM (SELECT Country,
             COUNT(DISTINCT CustomerId) AS num
      FROM Customer
      GROUP BY Country
);

1 records
avg_num	min_num	max_num
2.4	1	13

Another handy aggregate function is group_concat. You can use this to produce a comma separated list of aggregate values.

SELECT Country,
       GROUP_CONCAT(DISTINCT State) AS States
FROM Customer
GROUP BY Country;

Displaying records 1 - 10
Country	States
NA	NA
Argentina	NA
Australia	NSW
Austria	NA
Belgium	NA
Brazil	SP,RJ,DF
Canada	QC,AB,BC,ON,NS,MB,NT
Chile	NA
Czech Republic	NA
Denmark	NA

Learning SQL Programming

quizdata <- dbConnect(RSQLite::SQLite(), "courses/learning_sql_programming/quizdata.db")

More examples of JOIN

Let’s join up our quiz participants’ names and state abbreviations with the full state name from the states table.

SELECT people.first_name, people.last_name, people.state_code, states.state_name 
FROM people 
JOIN states 
ON people.state_code=states.state_abbrev;

Displaying records 1 - 10
first_name	last_name	state_code	state_name
Janice	Howell	CA	California
Wanda	Alvarez	CA	California
Laura	Olson	CA	California
Jack	Garcia	NY	New York
Ryan	Rice	DE	Delaware
Christine	Wood	MI	Michigan
Dennis	Banks	NY	New York
Paula	Montgomery	AL	Alabama
Jerry	Ferguson	IL	Illinois
Darren	Black	FL	Florida

We can switch up the order that we’re joining the tables too, and we’ll see that we get a different order of records. I’ll switch the states and people tables here around the join keyword.

SELECT people.first_name, people.last_name, people.state_code, states.state_name 
FROM states 
JOIN people 
ON people.state_code=states.state_abbrev;

Displaying records 1 - 10
first_name	last_name	state_code	state_name
Alice	Roberts	AL	Alabama
Ann	Holmes	AL	Alabama
Anna	Foster	AL	Alabama
Arthur	Carroll	AL	Alabama
Brenda	Green	AL	Alabama
Brian	Berry	AL	Alabama
Charles	Diaz	AL	Alabama
Charles	Torres	AL	Alabama
Earl	Snyder	AL	Alabama
Edward	Hart	AL	Alabama

Now that the states table is on the left and the people table is on the right, let’s switch this to a left join so we get all of the records from the states table instead of just the ones that match. (No quiz participants from WY.)

SELECT people.first_name, people.last_name, people.state_code, states.state_name 
FROM states 
LEFT JOIN people 
ON people.state_code=states.state_abbrev;

Displaying records 1 - 10
first_name	last_name	state_code	state_name
Alice	Roberts	AL	Alabama
Ann	Holmes	AL	Alabama
Anna	Foster	AL	Alabama
Arthur	Carroll	AL	Alabama
Brenda	Green	AL	Alabama
Brian	Berry	AL	Alabama
Charles	Diaz	AL	Alabama
Charles	Torres	AL	Alabama
Earl	Snyder	AL	Alabama
Edward	Hart	AL	Alabama

Let’s write a query to match up the state abbreviations and the people table with those in the states table so we can see what else is missing. I don’t need all these rows, so I’ll use the distinct function to throw out duplicates and reduce the size of our result.

SELECT DISTINCT(people.state_code), states.state_name 
FROM states 
LEFT JOIN people ON people.state_code=states.state_abbrev
ORDER BY people.state_code;

Displaying records 1 - 10
state_code	state_name
NA	Maine
NA	Rhode Island
NA	Wyoming
AK	Alaska
AL	Alabama
AR	Arkansas
AZ	Arizona
CA	California
CO	Colorado
CT	Connecticut

Example of grouping results

How many times each first name appears in the table.

SELECT first_name, COUNT(first_name) 
FROM people 
GROUP BY first_name;

Displaying records 1 - 10
first_name	COUNT(first_name)
Aaron	3
Adam	6
Alan	2
Albert	1
Alice	5
Amy	7
Andrea	4
Andrew	8
Angela	6
Ann	6

How many participants come from each state.

SELECT state_code, COUNT(state_code) 
FROM people 
GROUP BY state_code;

Displaying records 1 - 10
state_code	COUNT(state_code)
AK	1
AL	24
AR	6
AZ	19
CA	127
CO	28
CT	14
DE	2
FL	93
GA	27

How many people each state got each score.

SELECT state_code, quiz_points, COUNT(quiz_points)
FROM people
GROUP BY state_code, quiz_points

Displaying records 1 - 10
state_code	quiz_points	COUNT(quiz_points)
AK	93	1
AL	74	1
AL	76	2
AL	79	2
AL	81	2
AL	82	1
AL	83	2
AL	85	3
AL	86	2
AL	87	1

Example of compound select

Sometimes we need to use more than one select statement to get the information we want. It’s often the case that we want to set up a query that relies on the result of another query. Using a secondary select statement inside of another query is called a sub-query or sub-select, and it helps us to narrow down or zero in on a specific set of data to use in that larger query.

Get a listing of all the people in our database who achieved the highest score on our quiz.

SELECT first_name, last_name, quiz_points 
FROM people 
WHERE quiz_points=(SELECT MAX(quiz_points) FROM people);

6 records
first_name	last_name	quiz_points
Scott	Thompson	100
Catherine	Rodriguez	100
Kathryn	Welch	100
Terry	Fernandez	100
Earl	Green	100
Julia	Lee	100

Find all people from Minnesota without knowing state abbreviation.

SELECT * FROM people 
WHERE state_code=(SELECT state_abbrev FROM states WHERE state_name='Minnesota');

Displaying records 1 - 10
id_number	first_name	last_name	city	state_code	shirt_or_hat	quiz_points	team	signup	age	company
78	Roy	Rivera	Minneapolis	MN	shirt	81	Baffled Badgers	2021-01-03	47	Hayes, Greenfelder and Schulist
150	Joan	Watkins	Minneapolis	MN	hat	85	Baffled Badgers	2021-01-05	23	Morissette Ltd.
253	Darren	Phillips	Minneapolis	MN	hat	85	Cosmic Cobras	2021-01-08	53	NA
288	Antonio	Hart	Young America	MN	shirt	85	Baffled Badgers	2021-01-09	30	Feest Inc
415	Louis	Cox	Young America	MN	hat	90	Cosmic Cobras	2021-01-13	45	Mitchell-Nolan
452	Chris	Castillo	Duluth	MN	shirt	76	Angry Ants	2021-01-14	25	NA
458	Ashley	Fisher	Saint Paul	MN	shirt	79	Angry Ants	2021-01-15	38	Keeling Ltd.
543	Harry	Welch	Rochester	MN	hat	92	Baffled Badgers	2021-01-17	27	Wehner Group
570	Martin	Richardson	Minneapolis	MN	hat	87	Baffled Badgers	2021-01-18	24	Ortiz, Lubowitz and Brekke
576	Cynthia	Carroll	Monticello	MN	hat	82	Baffled Badgers	2021-01-18	24	Stracke Ltd.

SQL Essential Training

This course uses three databases:

album
test
world

album <- dbConnect(RSQLite::SQLite(),
                   "courses/sql_essential_training/album.db")
test <- dbConnect(RSQLite::SQLite(), 
                  "courses/sql_essential_training/test.db")
world <- dbConnect(RSQLite::SQLite(),
                   "courses/sql_essential_training/world.db")

Quick SELECT example.

SELECT Name, LifeExpectancy AS "Life Expectancy" 
FROM Country ORDER BY Name;

Displaying records 1 - 10
Name	Life Expectancy
Afghanistan	45.9
Albania	71.6
Algeria	69.7
American Samoa	75.1
Andorra	83.5
Angola	38.3
Anguilla	76.1
Antarctica	NA
Antigua and Barbuda	70.5
Argentina	75.1

Get second five after first five

SELECT Name, Continent, Region 
FROM Country 
WHERE Continent = 'Europe'   -- which rows
ORDER BY Name LIMIT 5 OFFSET 5;  -- only 5, second 5

5 records
Name	Continent	Region
Bosnia and Herzegovina	Europe	Southern Europe
Bulgaria	Europe	Eastern Europe
Croatia	Europe	Southern Europe
Czech Republic	Europe	Eastern Europe
Denmark	Europe	Nordic Countries

Counting Rows

COUNT(*) means count all the rows. Count number of rows where pop > 100,000,000 and Continent = ‘Europe’

SELECT COUNT(*) FROM Country WHERE Population > 100000000 AND Continent = 'Europe' ;

1 records
COUNT(*)
1

COUNT(LifeExpectancy) means count rows where the column has data.

SELECT COUNT(LifeExpectancy) FROM Country;

1 records
COUNT(LifeExpectancy)
222

Countries with pop < 100000 or pop is NULL, listed in descending order by population.

SELECT Name, Continent, Population FROM Country 
  WHERE Population < 100000 OR Population IS NULL ORDER BY Population DESC;

Displaying records 1 - 10
Name	Continent	Population
Tonga	Oceania	99000
Grenada	North America	94000
Virgin Islands, U.S.	North America	93000
Kiribati	Oceania	83000
Andorra	Europe	78000
Northern Mariana Islands	Oceania	78000
Seychelles	Africa	77000
Dominica	North America	71000
American Samoa	Oceania	68000
Antigua and Barbuda	North America	68000

Countries with pop < 100000 and Continent = ‘Oceania’, listed in descending order by population.

SELECT Name, Continent, Population FROM Country 
  WHERE Population < 100000 AND Continent = 'Oceania' ORDER BY Population DESC;

Displaying records 1 - 10
Name	Continent	Population
Tonga	Oceania	99000
Kiribati	Oceania	83000
Northern Mariana Islands	Oceania	78000
American Samoa	Oceania	68000
Marshall Islands	Oceania	64000
Cook Islands	Oceania	20000
Palau	Oceania	19000
Wallis and Futuna	Oceania	15000
Nauru	Oceania	12000
Tuvalu	Oceania	12000

LIKE operator: percent sign is a wildcard; find Names that have anything anything before or after ‘island’

SELECT Name, Continent, Population FROM Country 
  WHERE Name LIKE '%island%' ORDER BY Name;

Displaying records 1 - 10
Name	Continent	Population
Bouvet Island	Antarctica	NA
Cayman Islands	North America	38000
Christmas Island	Oceania	2500
Cocos (Keeling) Islands	Oceania	600
Cook Islands	Oceania	20000
Falkland Islands	South America	2000
Faroe Islands	Europe	43000
Fiji Islands	Oceania	817000
Heard Island and McDonald Islands	Antarctica	NA
Marshall Islands	Oceania	64000

LIKE operator: all countries with ‘a’ as second letter

SELECT Name, Continent, Population FROM Country 
  WHERE Name LIKE '_a%' ORDER BY Name;

Displaying records 1 - 10
Name	Continent	Population
Bahamas	North America	307000
Bahrain	Asia	617000
Bangladesh	Asia	129155000
Barbados	North America	270000
Cambodia	Asia	11168000
Cameroon	Africa	15085000
Canada	North America	31147000
Cape Verde	Africa	428000
Cayman Islands	North America	38000
East Timor	Asia	885000

IN operator: all countries in Asia or Europe

SELECT Name, Continent, Population FROM Country 
 WHERE Continent IN ( 'Europe', 'Asia' ) ORDER BY Name;

Displaying records 1 - 10
Name	Continent	Population
Afghanistan	Asia	22720000
Albania	Europe	3401200
Andorra	Europe	78000
Armenia	Asia	3520000
Austria	Europe	8091800
Azerbaijan	Asia	7734000
Bahrain	Asia	617000
Bangladesh	Asia	129155000
Belarus	Europe	10236000
Belgium	Europe	10239000

SELECT DISTINCT: Using the SELECT DISTINCT statement, you will get only unique results.

SELECT DISTINCT Continent FROM Country;

7 records
Continent
Asia
Europe
North America
Africa
Oceania
South America
Antarctica

Ordering output. Order by Continent in descending order, then within Continent show Region in ascending order, and within Region show Name in ascending order.

SELECT Name, Continent, Region 
FROM Country 
ORDER BY Continent DESC, Region, Name;

Displaying records 1 - 10
Name	Continent	Region
Argentina	South America	South America
Bolivia	South America	South America
Brazil	South America	South America
Chile	South America	South America
Colombia	South America	South America
Ecuador	South America	South America
Falkland Islands	South America	South America
French Guiana	South America	South America
Guyana	South America	South America
Paraguay	South America	South America

Example of a junction table.

SELECT c.name AS Cust, c.zip, i.name AS Item, i.description, s.quantity AS Quan, s.price AS Price
  FROM sale AS s
  JOIN item AS i ON s.item_id = i.id
  JOIN customer AS c ON s.customer_id = c.id
  ORDER BY Cust, Item
;

5 records
Cust	zip	Item	description	Quan	Price
Bill Smith	98765	Box of 64 Pixels	64 RGB pixels in a decorative box	1	2995
Bob Smith	98765	Bar Code	Unused. In original packaging.	2	999
Mary Smith	98765	Box of 64 Pixels	64 RGB pixels in a decorative box	3	2995
Mary Smith	98765	Box of 64 Pixels	64 RGB pixels in a decorative box	1	2995
Mary Smith	98765	Sense of Humor	Especially dry. Imported from England.	1	1995

Example of aggregate functions. Count of rows by Region.

SELECT Region, COUNT(*)
  FROM Country
  GROUP BY Region
;

Displaying records 1 - 10
Region	COUNT(*)
Antarctica	5
Australia and New Zealand	5
Baltic Countries	3
British Islands	2
Caribbean	24
Central Africa	9
Central America	8
Eastern Africa	20
Eastern Asia	8
Eastern Europe	10

SELECT Region, COUNT(*) AS Count  -- with alias
  FROM Country
  GROUP BY Region
  ORDER BY Count DESC, Region  -- count in descending order
;

Displaying records 1 - 10
Region	Count
Caribbean	24
Eastern Africa	20
Middle East	18
Western Africa	17
Southern Europe	15
South America	14
Southern and Central Asia	14
Southeast Asia	11
Eastern Europe	10
Polynesia	10

-- number of tracks for each album
SELECT a.title AS Album, COUNT(t.track_number) as Tracks
  FROM track AS t
  JOIN album AS a
    ON a.id = t.album_id
  GROUP BY a.id
  ORDER BY Tracks DESC, Album
;

7 records
Album	Tracks
Rubber Soul	14
Birds of Fire	10
Two Men with the Blues	10
Apostrophe	9
Hendrix in the West	9
Live And	6
Kind of Blue	5

Just albums with 10 more tracks.

-- HAVING is for aggregate data
SELECT a.title AS Album, COUNT(t.track_number) as Tracks
  FROM track AS t
  JOIN album AS a
    ON a.id = t.album_id
  GROUP BY a.id
  HAVING Tracks >= 10
  ORDER BY Tracks DESC, Album
;

3 records
Album	Tracks
Rubber Soul	14
Birds of Fire	10
Two Men with the Blues	10

Albums with 10 or more tracks by the Beatles.

-- WHERE is for non-aggregate data
SELECT a.title AS Album, COUNT(t.track_number) as Tracks
  FROM track AS t
  JOIN album AS a
    ON a.id = t.album_id
  WHERE a.artist = 'The Beatles'
  GROUP BY a.id
  HAVING Tracks >= 10
  ORDER BY Tracks DESC, Album
  ;

1 records
Album	Tracks
Rubber Soul	14

More aggregate functions.

SELECT Region, AVG(Population) 
FROM Country 
GROUP BY Region;

Displaying records 1 - 10
Region	AVG(Population)
Antarctica	NA
Australia and New Zealand	4550620
Baltic Countries	2520633
British Islands	31699250
Caribbean	1589167
Central Africa	10628000
Central America	16902625
Eastern Africa	12999947
Eastern Asia	188416000
Eastern Europe	30702600

SELECT Region, MIN(Population), MAX(Population) 
  FROM Country 
  GROUP BY Region;

Displaying records 1 - 10
Region	MIN(Population)	MAX(Population)
Antarctica	NA	NA
Australia and New Zealand	600	18886000
Baltic Countries	1439200	3698500
British Islands	3775100	59623400
Caribbean	8000	11201000
Central Africa	147000	51654000
Central America	241000	98881000
Eastern Africa	77000	62565000
Eastern Asia	473000	1277558000
Eastern Europe	4380000	146934000

Count of distinct heads of states.

SELECT COUNT(DISTINCT HeadOfState) FROM Country;

1 records
COUNT(DISTINCT HeadOfState)
179

Example of a subselect.

Which albums have tracks that are less than 90 seconds? Only shows album IDs.

SELECT DISTINCT album_id FROM track WHERE duration <= 90;

2 records
album_id
11
13

Use above select statement as a subselect. Note the use of parentheses and the IN operator.

SELECT * FROM album
  WHERE id IN (SELECT DISTINCT album_id FROM track WHERE duration <= 90)
;

2 records
id	title	artist	label	released
11	Hendrix in the West	Jimi Hendrix	Polydor	1972-01-00
13	Birds of Fire	Mahavishnu Orchestra	Columbia	1973-03-00

SELECT a.title AS album, a.artist, t.track_number AS seq, t.title, t.duration AS secs
  FROM album AS a
  JOIN (
    SELECT DISTINCT album_id, track_number, duration, title
      FROM track
      WHERE duration <= 90
  ) AS t
    ON t.album_id = a.id
  ORDER BY a.title, t.track_number
;

2 records
album	artist	seq	title	secs
Birds of Fire	Mahavishnu Orchestra	4	Sapphire Bullets of Pure Love	24
Hendrix in the West	Jimi Hendrix	6	Sgt. Pepper’s Lonely Hearts Club Band	76