About SQLite

  • SQLite is an effective solution for data too large for RAM but small enough to fit on a local hard drive.
  • SQLite is serverless unlike client/server databases such as MySQL, PostgreSQL, SQL Server, and Oracle. SQLite does not need to be “installed” before it is used. There is no “setup” procedure.
  • A SQLite database is a single ordinary disk file that can be located anywhere in the directory hierarchy.
  • SQLite is a distinct SQL implementation. SQL code you find online to accomplish a task may not work in SQLite. For example, SQLite has no DATETIME datatype.

Installing SQLite

Windows

  1. Go to https://www.sqlite.org/download.html and download the 64-bit DLL and the command-line tools bundle.
  2. Extract files and copy into a new folder. Suggestion: copy into a folder you create called “SQLite” at the top level of the C drive or root directory.
  3. Add SQLite directory to path: (Win 10) Control Panel > User Accounts > User Accounts, go to “Change my environment variables”, click Path, click Edit, Click New, add C:\SQLite entry, OK, OK.

Macs

Macs should already have SQLite installed.

Running SQLite from command line

  • Open terminal or cmd.exe and enter sqlite3 followed by name of database you wish to open.
  • Running sqlite3 by itself creates a transient in-memory database.
  • The Introduction to SQLite (Jan 2023) course by Brian Jepson on LinkedIn Learning teaches SQLite using the command line.
  • SQLite provides special commands to sqlite3 called “dot-commands”: https://www.sqlite.org/cli.html
  • .tables shows all tables in current database
  • .exit exits SQLite

SQLite GUI interfaces

  • There are programs available that provide a GUI to make it easier to browse database tables, write SQL syntax, etc.
  • Two that I have used:
  • Both install SQLite when installed. However they tend to use older versions of SQLite.
  • DB Browser for SQLite is similar to RStudio in that you can write SQL syntax and click Ctrl + Enter to submit and execute code.
  • The Learning SQL Programming (March 2021) course by Scott Simpson on LinkedIn Learning teaches SQLite using DB Browser for SQLite.
  • The SQL Essential Training (Dec 2019) course by Bill Weinman on LinkedIn Learning teaches SQLite using SQLiteStudio.

Creating a database

  • Using command line: navigate to where you want to create database and enter sqlite3 name_of_database.db
  • Using DB Browser for SQLite: File…New Database…, browse to where you want to create database, enter name, and hit Enter.

Next you need to add tables to the database.

Creating tables

By import, no existing table

If you have data in a CSV file and no existing table, you can do the following from the sqlite3 command line:

sqlite> .mode csv
sqlite> .import file.csv table_name

This will import the contents of the CSV file into a new table called “table_name”. This seems to import everything as TEXT.

To edit table schema using GUIs:

  • DB Browser for SQLite: Click on table, click Modify Table button, and edit properties. Can modify multiple variables at once.
  • SQLiteStudio: go to Table, go to Structure tab, doubleclick on variable, and edit properties. Have to do one variable at a time.

Create empty table

You can also define a table and then populate it. Text preceded with -- below are comments.

Example of creating an empty table:

CREATE TABLE t1
  (id  INTEGER,   -- variable name followed by storage type
    y   NUMERIC,
    x1  INTEGER,
    x2  REAL,   
    x3  TEXT);

Can use .schema command at SQLite command line to view and verify table definition. Then you can import a CSV file into the table using the steps above.

Repeat for multiple tables as needed.

SQLite has 5 data types: INTERGER, TEXT, NUMERIC, REAL, BLOB. See: https://www.sqlite.org/datatype3.html

Submitting queries

  • Queries can be submitted one at a time from SQLite command line or using the GUIs.
  • Multiple queries can be written/saved as txt in a .sql file.
  • In DB Browser for SQLite: go to Execute SQL tab, write query, and hit Ctrl + Enter
  • In SQLiteStudio: Open SQL Editor (Tools…Open SQL Editor), write query, and hit F9
  • The DB Browser for SQLite editor provides code completion, which is very handy.
  • DB Browser for SQLite makes it easy to export a query result to CSV. Click on “Save the Results View” button and select “Export to CSV”

Connect to SQLite db - R

The RSQLite package provides an interface to SQLite databases. Per the RSQLite vignette: “RSQLite is a DBI-compatible interface which means you primarily use functions defined in the DBI package, so you should always start by loading DBI, not RSQLite.”

Here’s an example of accessing a SQLite database called “homes” that contains real estate data from Albemarle County in Virginia, USA. We query the parcels table for records where LotSize is greater than 500 acres. The query is issued as a text string using the dbGetQuery() function. The result is saved as d, a data frame, which has 56 rows and 32 variables.

library(DBI)
mydb <- dbConnect(RSQLite::SQLite(), "homes.db")
d <- dbGetQuery(mydb, 'select * from parcels where LotSize > 500;')
dim(d)
[1] 56 32
class(d)
[1] "data.frame"

It’s good practice to close the database connection when finished. “This closes the connection, discards all pending work, and frees resources (e.g., memory, sockets).” (see ?dbDisconnect)

dbDisconnect(mydb)

Connect to SQLite db - Python

The sqlite3 module provides an interface to SQLite databases.

Below we replicate the example from the R section above using the pandas function read_sql_query(). The result is a pandas data frame with 56 rows and 32 variables.

import pandas as pd
import sqlite3
con = sqlite3.connect("homes.db")
df = pd.read_sql_query("select * from parcels where LotSize > 500;", con)
df.shape
(56, 32)

It’s good practice to close the database connection when finished.

con.close()

Example queries/SQL code

These examples come from LinkedIn courses.

Introduction to SQLite

chinook <- dbConnect(RSQLite::SQLite(), "courses/intro_to_sqlite/Chinook_Learning_Sqlite.sqlite")

Select 10 records from Invoice table, only show first 10.

SELECT * FROM Invoice LIMIT 10;
Displaying records 1 - 10
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
1 2 2023-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart NA Germany 70174 1.98
2 4 2023-01-02 00:00:00 Ullevålsveien 14 Oslo NA Norway 0171 3.96
3 8 2023-01-03 00:00:00 Grétrystraat 63 Brussels NA Belgium 1000 5.94
4 14 2023-01-06 00:00:00 8210 111 ST NW Edmonton AB Canada T6G 2C7 8.91
5 23 2023-01-11 00:00:00 69 Salem Street Boston MA USA 2113 13.86
6 37 2023-01-19 00:00:00 Berger Straße 10 Frankfurt NA Germany 60316 0.99
7 38 2023-02-01 00:00:00 Barbarossastraße 19 Berlin NA Germany 10779 1.98
8 40 2023-02-01 00:00:00 8, Rue Hanovre Paris NA France 75002 1.98
9 42 2023-02-02 00:00:00 9, Place Louis Barthou Bordeaux NA France 33000 3.96
10 46 2023-02-03 00:00:00 3 Chatham Street Dublin Dublin Ireland NA 5.94

To output query result to csv file using sqlite3 command line, run the following:

.mode csv
.header on
.output filename.csv 

Output of next query is sent to filename.csv

To restore default output:

.output stdout
.mode list

To see file from sqlite3 command line:

.shell filename.csv

Examples of using WHERE clause

SELECT * FROM Genre WHERE name LIKE 'S%';
3 records
GenreId Name
10 Soundtrack
18 Science Fiction
20 Sci Fi & Fantasy
SELECT * FROM Invoice WHERE Total > 10;
Displaying records 1 - 10
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
5 23 2023-01-11 00:00:00 69 Salem Street Boston MA USA 2113 13.86
12 2 2023-02-11 00:00:00 Theodor-Heuss-Straße 34 Stuttgart NA Germany 70174 13.86
19 40 2023-03-14 00:00:00 8, Rue Hanovre Paris NA France 75002 13.86
26 19 2023-04-14 00:00:00 1 Infinite Loop Cupertino CA USA 95014 13.86
33 57 2023-05-15 00:00:00 Calle Lira, 198 Santiago NA Chile NA 13.86
40 36 2023-06-15 00:00:00 Tauentzienstraße 8 Berlin NA Germany 10789 13.86
47 15 2023-07-16 00:00:00 700 W Pender Street Vancouver BC Canada V6C 1G8 13.86
54 53 2023-08-16 00:00:00 113 Lupus St London NA United Kingdom SW1V 3EN 13.86
61 32 2023-09-16 00:00:00 696 Osborne Street Winnipeg MB Canada R3L 2B9 13.86
68 11 2023-10-17 00:00:00 Av. Paulista, 2022 São Paulo SP Brazil 01310-200 13.86
SELECT *
FROM Invoice
WHERE Total > 10
AND InvoiceDate BETWEEN '2024-01-01' AND '2024-03-31';
6 records
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
250 55 2024-01-01 00:00:00 421 Bourke Street Sidney NSW Australia 2010 13.86
257 34 2024-02-01 00:00:00 Rua da Assunção 53 Lisbon NA Portugal NA 13.86
264 13 2024-03-03 00:00:00 Qe 7 Bloco G Brasília DF Brazil 71020-677 13.86
334 39 2024-01-07 00:00:00 4, Rue Milton Paris NA France 75009 13.86
341 18 2024-02-07 00:00:00 627 Broadway New York NY USA 10012-2612 13.86
348 56 2024-03-10 00:00:00 307 Macacha Güemes Buenos Aires NA Argentina 1106 13.86
SELECT * FROM Genre WHERE LOWER(Name) = 'rock';
1 records
GenreId Name
1 Rock
SELECT * FROM Genre WHERE LOWER(Name) LIKE '%rock%';
2 records
GenreId Name
1 Rock
5 Rock And Roll

Examples of queries in aggregate.

See also: https://sqlite.org/lang_aggfunc.html

Sum of Total, returned as OverallTotal.

SELECT SUM(Total) AS OverallTotal
FROM Invoice;
1 records
OverallTotal
2328.6

Sum of Total by InvoiceDate.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate;
Displaying records 1 - 10
InvoiceDate DailyTotal
2023-01-01 00:00:00 1.98
2023-01-02 00:00:00 4.95
2023-01-03 00:00:00 5.94
2023-01-06 00:00:00 8.91
2023-01-08 00:00:00 3.96
2023-01-09 00:00:00 3.96
2023-01-10 00:00:00 6.94
2023-01-11 00:00:00 13.86
2023-01-13 00:00:00 17.91
2023-01-15 00:00:00 3.96

Sum of Total by InvoiceDate, sorted by InvoiceDate.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate
ORDER BY InvoiceDate;
Displaying records 1 - 10
InvoiceDate DailyTotal
2023-01-01 00:00:00 1.98
2023-01-02 00:00:00 4.95
2023-01-03 00:00:00 5.94
2023-01-06 00:00:00 8.91
2023-01-08 00:00:00 3.96
2023-01-09 00:00:00 3.96
2023-01-10 00:00:00 6.94
2023-01-11 00:00:00 13.86
2023-01-13 00:00:00 17.91
2023-01-15 00:00:00 3.96

Sum of Total by InvoiceDate, sorted by InvoiceDate in descending order.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate
ORDER BY SUM(Total) DESC;
Displaying records 1 - 10
InvoiceDate DailyTotal
2024-11-13 00:00:00 25.86
2024-08-05 00:00:00 23.86
2023-04-28 00:00:00 21.86
2023-02-18 00:00:00 21.86
2023-05-29 00:00:00 18.86
2023-01-18 00:00:00 18.86
2023-01-13 00:00:00 17.91
2024-10-06 00:00:00 16.86
2024-09-05 00:00:00 16.86
2023-06-29 00:00:00 15.86

Sum of Total by InvoiceDate having total > 10, sorted by InvoiceDate in descending order.

SELECT InvoiceDate, SUM(Total) AS DailyTotal
FROM Invoice
GROUP BY InvoiceDate
HAVING SUM(Total) > 10
ORDER BY SUM(Total) DESC;
Displaying records 1 - 10
InvoiceDate DailyTotal
2024-11-13 00:00:00 25.86
2024-08-05 00:00:00 23.86
2023-04-28 00:00:00 21.86
2023-02-18 00:00:00 21.86
2023-05-29 00:00:00 18.86
2023-01-18 00:00:00 18.86
2023-01-13 00:00:00 17.91
2024-10-06 00:00:00 16.86
2024-09-05 00:00:00 16.86
2023-06-29 00:00:00 15.86

Sum of Total and count of Invoices by InvoiceDate and Customer ID

SELECT InvoiceDate,
       CustomerId,
       SUM(Total) AS DailyTotal,
       COUNT(InvoiceId) AS InvoiceCount
FROM Invoice
GROUP BY InvoiceDate, CustomerId;
Displaying records 1 - 10
InvoiceDate CustomerId DailyTotal InvoiceCount
2023-01-01 00:00:00 2 1.98 1
2023-01-02 00:00:00 4 3.96 1
2023-01-02 00:00:00 26 0.99 1
2023-01-03 00:00:00 8 5.94 1
2023-01-06 00:00:00 14 8.91 1
2023-01-08 00:00:00 43 1.98 1
2023-01-08 00:00:00 45 1.98 1
2023-01-09 00:00:00 47 3.96 1
2023-01-10 00:00:00 51 6.94 1
2023-01-11 00:00:00 23 13.86 1

Examples of JOINS

Inner join: Get the count of invoice IDs for every customer who has a company name associated with their record in the customer table. The WHERE company is NOT NULL clause takes care of only selecting customers with company names.

SELECT Company, COUNT(InvoiceID)
FROM Customer
JOIN Invoice ON Invoice.CustomerId = Customer.CustomerId
WHERE Company IS NOT NULL
GROUP BY Company;
Displaying records 1 - 10
Company COUNT(InvoiceID)
Apple Inc. 7
Banco do Brasil S.A. 7
Embraer - Empresa Brasileira de Aeronáutica S.A. 7
Google Inc. 7
JetBrains s.r.o. 7
Microsoft Corporation 7
Riotur 7
Rogers Canada 7
Telus 7
Woodstock Discos 7

Left Join: Get results even if customer ID values fall outside of the intersection. So in the case of a customer that has no matching invoice records, such as LinkedIn, the invoice columns are represented as NULLS, and so the COUNT invoice ID calculation returns zero.

SELECT Company, COUNT(InvoiceID)
FROM Customer
LEFT OUTER JOIN Invoice ON Invoice.CustomerId = Customer.CustomerId
WHERE Company IS NOT NULL
GROUP BY Company;
Displaying records 1 - 10
Company COUNT(InvoiceID)
Apple Inc. 7
Banco do Brasil S.A. 7
Embraer - Empresa Brasileira de Aeronáutica S.A. 7
Google Inc. 7
JetBrains s.r.o. 7
LinkedIn 0
Microsoft Corporation 7
Riotur 7
Rogers Canada 7
Telus 7

Examples of basic functions

SELECT ROUND(1.2345678, 3);
1 records
ROUND(1.2345678, 3)
1.235
SELECT INSTR(BillingAddress, ' ') FROM Invoice LIMIT 5;
5 records
INSTR(BillingAddress, ’ ’)
21
14
13
5
3
SELECT SUBSTR(BillingAddress, 0, INSTR(BillingAddress, ' '))
FROM Invoice LIMIT 5;
5 records
SUBSTR(BillingAddress, 0, INSTR(BillingAddress, ’ ’))
Theodor-Heuss-Straße
Ullevålsveien
Grétrystraat
8210
69
SELECT FORMAT('$%f', SUM(Total)) FROM Invoice;
1 records
FORMAT(‘$%f’, SUM(Total))
$2328.600000
SELECT FORMAT('$%.2f', SUM(Total)) FROM Invoice;
1 records
FORMAT(‘$%.2f’, SUM(Total))
$2328.60
SELECT InvoiceDate, strftime('%Y-%m-%d', InvoiceDate)
FROM Invoice LIMIT 10;
Displaying records 1 - 10
InvoiceDate strftime(‘%Y-%m-%d’, InvoiceDate)
2023-01-01 00:00:00 2023-01-01
2023-01-02 00:00:00 2023-01-02
2023-01-03 00:00:00 2023-01-03
2023-01-06 00:00:00 2023-01-06
2023-01-11 00:00:00 2023-01-11
2023-01-19 00:00:00 2023-01-19
2023-02-01 00:00:00 2023-02-01
2023-02-01 00:00:00 2023-02-01
2023-02-02 00:00:00 2023-02-02
2023-02-03 00:00:00 2023-02-03

Example of aggregate functions

Count number of rows.

SELECT COUNT(*) FROM Invoice;
1 records
COUNT(*)
412

Count number of non Null values for a given column.

SELECT COUNT(BillingPostalCode) FROM Invoice;
1 records
COUNT(BillingPostalCode)
384

Number of distinct postal codes in Invoice table

SELECT COUNT(DISTINCT BillingPostalCode) FROM Invoice;
1 records
COUNT(DISTINCT BillingPostalCode)
55

Number of customers in each country.

SELECT Country,
       COUNT(DISTINCT CustomerId) AS num
FROM Customer
GROUP BY Country;
Displaying records 1 - 10
Country num
NA 1
Argentina 1
Australia 1
Austria 1
Belgium 1
Brazil 5
Canada 8
Chile 1
Czech Republic 2
Denmark 1

Average of previouq query

SELECT AVG(num)
FROM (SELECT Country,
             COUNT(DISTINCT CustomerId) AS num
      FROM Customer
      GROUP BY Country
);
1 records
AVG(num)
2.4

Get min and max as well.

SELECT AVG(num) AS avg_num,
       MIN(num) AS min_num,
       MAX(num) AS max_num
FROM (SELECT Country,
             COUNT(DISTINCT CustomerId) AS num
      FROM Customer
      GROUP BY Country
);
1 records
avg_num min_num max_num
2.4 1 13

Another handy aggregate function is group_concat. You can use this to produce a comma separated list of aggregate values.

SELECT Country,
       GROUP_CONCAT(DISTINCT State) AS States
FROM Customer
GROUP BY Country;
Displaying records 1 - 10
Country States
NA NA
Argentina NA
Australia NSW
Austria NA
Belgium NA
Brazil SP,RJ,DF
Canada QC,AB,BC,ON,NS,MB,NT
Chile NA
Czech Republic NA
Denmark NA

Learning SQL Programming

quizdata <- dbConnect(RSQLite::SQLite(), "courses/learning_sql_programming/quizdata.db")

More examples of JOIN

Let’s join up our quiz participants’ names and state abbreviations with the full state name from the states table.

SELECT people.first_name, people.last_name, people.state_code, states.state_name 
FROM people 
JOIN states 
ON people.state_code=states.state_abbrev;
Displaying records 1 - 10
first_name last_name state_code state_name
Janice Howell CA California
Wanda Alvarez CA California
Laura Olson CA California
Jack Garcia NY New York
Ryan Rice DE Delaware
Christine Wood MI Michigan
Dennis Banks NY New York
Paula Montgomery AL Alabama
Jerry Ferguson IL Illinois
Darren Black FL Florida

We can switch up the order that we’re joining the tables too, and we’ll see that we get a different order of records. I’ll switch the states and people tables here around the join keyword.

SELECT people.first_name, people.last_name, people.state_code, states.state_name 
FROM states 
JOIN people 
ON people.state_code=states.state_abbrev;
Displaying records 1 - 10
first_name last_name state_code state_name
Alice Roberts AL Alabama
Ann Holmes AL Alabama
Anna Foster AL Alabama
Arthur Carroll AL Alabama
Brenda Green AL Alabama
Brian Berry AL Alabama
Charles Diaz AL Alabama
Charles Torres AL Alabama
Earl Snyder AL Alabama
Edward Hart AL Alabama

Now that the states table is on the left and the people table is on the right, let’s switch this to a left join so we get all of the records from the states table instead of just the ones that match. (No quiz participants from WY.)

SELECT people.first_name, people.last_name, people.state_code, states.state_name 
FROM states 
LEFT JOIN people 
ON people.state_code=states.state_abbrev;
Displaying records 1 - 10
first_name last_name state_code state_name
Alice Roberts AL Alabama
Ann Holmes AL Alabama
Anna Foster AL Alabama
Arthur Carroll AL Alabama
Brenda Green AL Alabama
Brian Berry AL Alabama
Charles Diaz AL Alabama
Charles Torres AL Alabama
Earl Snyder AL Alabama
Edward Hart AL Alabama

Let’s write a query to match up the state abbreviations and the people table with those in the states table so we can see what else is missing. I don’t need all these rows, so I’ll use the distinct function to throw out duplicates and reduce the size of our result.

SELECT DISTINCT(people.state_code), states.state_name 
FROM states 
LEFT JOIN people ON people.state_code=states.state_abbrev
ORDER BY people.state_code;
Displaying records 1 - 10
state_code state_name
NA Maine
NA Rhode Island
NA Wyoming
AK Alaska
AL Alabama
AR Arkansas
AZ Arizona
CA California
CO Colorado
CT Connecticut

Example of grouping results

How many times each first name appears in the table.

SELECT first_name, COUNT(first_name) 
FROM people 
GROUP BY first_name;
Displaying records 1 - 10
first_name COUNT(first_name)
Aaron 3
Adam 6
Alan 2
Albert 1
Alice 5
Amy 7
Andrea 4
Andrew 8
Angela 6
Ann 6

How many participants come from each state.

SELECT state_code, COUNT(state_code) 
FROM people 
GROUP BY state_code;
Displaying records 1 - 10
state_code COUNT(state_code)
AK 1
AL 24
AR 6
AZ 19
CA 127
CO 28
CT 14
DE 2
FL 93
GA 27

How many people each state got each score.

SELECT state_code, quiz_points, COUNT(quiz_points)
FROM people
GROUP BY state_code, quiz_points
Displaying records 1 - 10
state_code quiz_points COUNT(quiz_points)
AK 93 1
AL 74 1
AL 76 2
AL 79 2
AL 81 2
AL 82 1
AL 83 2
AL 85 3
AL 86 2
AL 87 1

Example of compound select

Sometimes we need to use more than one select statement to get the information we want. It’s often the case that we want to set up a query that relies on the result of another query. Using a secondary select statement inside of another query is called a sub-query or sub-select, and it helps us to narrow down or zero in on a specific set of data to use in that larger query.

Get a listing of all the people in our database who achieved the highest score on our quiz.

SELECT first_name, last_name, quiz_points 
FROM people 
WHERE quiz_points=(SELECT MAX(quiz_points) FROM people);
6 records
first_name last_name quiz_points
Scott Thompson 100
Catherine Rodriguez 100
Kathryn Welch 100
Terry Fernandez 100
Earl Green 100
Julia Lee 100

Find all people from Minnesota without knowing state abbreviation.

SELECT * FROM people 
WHERE state_code=(SELECT state_abbrev FROM states WHERE state_name='Minnesota');
Displaying records 1 - 10
id_number first_name last_name city state_code shirt_or_hat quiz_points team signup age company
78 Roy Rivera Minneapolis MN shirt 81 Baffled Badgers 2021-01-03 47 Hayes, Greenfelder and Schulist
150 Joan Watkins Minneapolis MN hat 85 Baffled Badgers 2021-01-05 23 Morissette Ltd.
253 Darren Phillips Minneapolis MN hat 85 Cosmic Cobras 2021-01-08 53 NA
288 Antonio Hart Young America MN shirt 85 Baffled Badgers 2021-01-09 30 Feest Inc
415 Louis Cox Young America MN hat 90 Cosmic Cobras 2021-01-13 45 Mitchell-Nolan
452 Chris Castillo Duluth MN shirt 76 Angry Ants 2021-01-14 25 NA
458 Ashley Fisher Saint Paul MN shirt 79 Angry Ants 2021-01-15 38 Keeling Ltd.
543 Harry Welch Rochester MN hat 92 Baffled Badgers 2021-01-17 27 Wehner Group
570 Martin Richardson Minneapolis MN hat 87 Baffled Badgers 2021-01-18 24 Ortiz, Lubowitz and Brekke
576 Cynthia Carroll Monticello MN hat 82 Baffled Badgers 2021-01-18 24 Stracke Ltd.

SQL Essential Training

This course uses three databases:

  1. album
  2. test
  3. world
album <- dbConnect(RSQLite::SQLite(),
                   "courses/sql_essential_training/album.db")
test <- dbConnect(RSQLite::SQLite(), 
                  "courses/sql_essential_training/test.db")
world <- dbConnect(RSQLite::SQLite(),
                   "courses/sql_essential_training/world.db")

Quick SELECT example.

SELECT Name, LifeExpectancy AS "Life Expectancy" 
FROM Country ORDER BY Name;
Displaying records 1 - 10
Name Life Expectancy
Afghanistan 45.9
Albania 71.6
Algeria 69.7
American Samoa 75.1
Andorra 83.5
Angola 38.3
Anguilla 76.1
Antarctica NA
Antigua and Barbuda 70.5
Argentina 75.1

Get second five after first five

SELECT Name, Continent, Region 
FROM Country 
WHERE Continent = 'Europe'   -- which rows
ORDER BY Name LIMIT 5 OFFSET 5;  -- only 5, second 5
5 records
Name Continent Region
Bosnia and Herzegovina Europe Southern Europe
Bulgaria Europe Eastern Europe
Croatia Europe Southern Europe
Czech Republic Europe Eastern Europe
Denmark Europe Nordic Countries

Counting Rows

COUNT(*) means count all the rows. Count number of rows where pop > 100,000,000 and Continent = ‘Europe’

SELECT COUNT(*) FROM Country WHERE Population > 100000000 AND Continent = 'Europe' ;
1 records
COUNT(*)
1

COUNT(LifeExpectancy) means count rows where the column has data.

SELECT COUNT(LifeExpectancy) FROM Country;
1 records
COUNT(LifeExpectancy)
222

Countries with pop < 100000 or pop is NULL, listed in descending order by population.

SELECT Name, Continent, Population FROM Country 
  WHERE Population < 100000 OR Population IS NULL ORDER BY Population DESC;
Displaying records 1 - 10
Name Continent Population
Tonga Oceania 99000
Grenada North America 94000
Virgin Islands, U.S. North America 93000
Kiribati Oceania 83000
Andorra Europe 78000
Northern Mariana Islands Oceania 78000
Seychelles Africa 77000
Dominica North America 71000
American Samoa Oceania 68000
Antigua and Barbuda North America 68000

Countries with pop < 100000 and Continent = ‘Oceania’, listed in descending order by population.

SELECT Name, Continent, Population FROM Country 
  WHERE Population < 100000 AND Continent = 'Oceania' ORDER BY Population DESC;
Displaying records 1 - 10
Name Continent Population
Tonga Oceania 99000
Kiribati Oceania 83000
Northern Mariana Islands Oceania 78000
American Samoa Oceania 68000
Marshall Islands Oceania 64000
Cook Islands Oceania 20000
Palau Oceania 19000
Wallis and Futuna Oceania 15000
Nauru Oceania 12000
Tuvalu Oceania 12000

LIKE operator: percent sign is a wildcard; find Names that have anything anything before or after ‘island’

SELECT Name, Continent, Population FROM Country 
  WHERE Name LIKE '%island%' ORDER BY Name;
Displaying records 1 - 10
Name Continent Population
Bouvet Island Antarctica NA
Cayman Islands North America 38000
Christmas Island Oceania 2500
Cocos (Keeling) Islands Oceania 600
Cook Islands Oceania 20000
Falkland Islands South America 2000
Faroe Islands Europe 43000
Fiji Islands Oceania 817000
Heard Island and McDonald Islands Antarctica NA
Marshall Islands Oceania 64000

LIKE operator: all countries with ‘a’ as second letter

SELECT Name, Continent, Population FROM Country 
  WHERE Name LIKE '_a%' ORDER BY Name;
Displaying records 1 - 10
Name Continent Population
Bahamas North America 307000
Bahrain Asia 617000
Bangladesh Asia 129155000
Barbados North America 270000
Cambodia Asia 11168000
Cameroon Africa 15085000
Canada North America 31147000
Cape Verde Africa 428000
Cayman Islands North America 38000
East Timor Asia 885000

IN operator: all countries in Asia or Europe

SELECT Name, Continent, Population FROM Country 
 WHERE Continent IN ( 'Europe', 'Asia' ) ORDER BY Name;
Displaying records 1 - 10
Name Continent Population
Afghanistan Asia 22720000
Albania Europe 3401200
Andorra Europe 78000
Armenia Asia 3520000
Austria Europe 8091800
Azerbaijan Asia 7734000
Bahrain Asia 617000
Bangladesh Asia 129155000
Belarus Europe 10236000
Belgium Europe 10239000

SELECT DISTINCT: Using the SELECT DISTINCT statement, you will get only unique results.

SELECT DISTINCT Continent FROM Country;
7 records
Continent
Asia
Europe
North America
Africa
Oceania
South America
Antarctica

Ordering output. Order by Continent in descending order, then within Continent show Region in ascending order, and within Region show Name in ascending order.

SELECT Name, Continent, Region 
FROM Country 
ORDER BY Continent DESC, Region, Name;
Displaying records 1 - 10
Name Continent Region
Argentina South America South America
Bolivia South America South America
Brazil South America South America
Chile South America South America
Colombia South America South America
Ecuador South America South America
Falkland Islands South America South America
French Guiana South America South America
Guyana South America South America
Paraguay South America South America

Example of a junction table.

SELECT c.name AS Cust, c.zip, i.name AS Item, i.description, s.quantity AS Quan, s.price AS Price
  FROM sale AS s
  JOIN item AS i ON s.item_id = i.id
  JOIN customer AS c ON s.customer_id = c.id
  ORDER BY Cust, Item
;
5 records
Cust zip Item description Quan Price
Bill Smith 98765 Box of 64 Pixels 64 RGB pixels in a decorative box 1 2995
Bob Smith 98765 Bar Code Unused. In original packaging. 2 999
Mary Smith 98765 Box of 64 Pixels 64 RGB pixels in a decorative box 3 2995
Mary Smith 98765 Box of 64 Pixels 64 RGB pixels in a decorative box 1 2995
Mary Smith 98765 Sense of Humor Especially dry. Imported from England. 1 1995

Example of aggregate functions. Count of rows by Region.

SELECT Region, COUNT(*)
  FROM Country
  GROUP BY Region
;
Displaying records 1 - 10
Region COUNT(*)
Antarctica 5
Australia and New Zealand 5
Baltic Countries 3
British Islands 2
Caribbean 24
Central Africa 9
Central America 8
Eastern Africa 20
Eastern Asia 8
Eastern Europe 10
SELECT Region, COUNT(*) AS Count  -- with alias
  FROM Country
  GROUP BY Region
  ORDER BY Count DESC, Region  -- count in descending order
;
Displaying records 1 - 10
Region Count
Caribbean 24
Eastern Africa 20
Middle East 18
Western Africa 17
Southern Europe 15
South America 14
Southern and Central Asia 14
Southeast Asia 11
Eastern Europe 10
Polynesia 10
-- number of tracks for each album
SELECT a.title AS Album, COUNT(t.track_number) as Tracks
  FROM track AS t
  JOIN album AS a
    ON a.id = t.album_id
  GROUP BY a.id
  ORDER BY Tracks DESC, Album
;
7 records
Album Tracks
Rubber Soul 14
Birds of Fire 10
Two Men with the Blues 10
Apostrophe 9
Hendrix in the West 9
Live And 6
Kind of Blue 5

Just albums with 10 more tracks.

-- HAVING is for aggregate data
SELECT a.title AS Album, COUNT(t.track_number) as Tracks
  FROM track AS t
  JOIN album AS a
    ON a.id = t.album_id
  GROUP BY a.id
  HAVING Tracks >= 10
  ORDER BY Tracks DESC, Album
;
3 records
Album Tracks
Rubber Soul 14
Birds of Fire 10
Two Men with the Blues 10

Albums with 10 or more tracks by the Beatles.

-- WHERE is for non-aggregate data
SELECT a.title AS Album, COUNT(t.track_number) as Tracks
  FROM track AS t
  JOIN album AS a
    ON a.id = t.album_id
  WHERE a.artist = 'The Beatles'
  GROUP BY a.id
  HAVING Tracks >= 10
  ORDER BY Tracks DESC, Album
  ;
1 records
Album Tracks
Rubber Soul 14

More aggregate functions.

SELECT Region, AVG(Population) 
FROM Country 
GROUP BY Region;
Displaying records 1 - 10
Region AVG(Population)
Antarctica NA
Australia and New Zealand 4550620
Baltic Countries 2520633
British Islands 31699250
Caribbean 1589167
Central Africa 10628000
Central America 16902625
Eastern Africa 12999947
Eastern Asia 188416000
Eastern Europe 30702600
SELECT Region, MIN(Population), MAX(Population) 
  FROM Country 
  GROUP BY Region;
Displaying records 1 - 10
Region MIN(Population) MAX(Population)
Antarctica NA NA
Australia and New Zealand 600 18886000
Baltic Countries 1439200 3698500
British Islands 3775100 59623400
Caribbean 8000 11201000
Central Africa 147000 51654000
Central America 241000 98881000
Eastern Africa 77000 62565000
Eastern Asia 473000 1277558000
Eastern Europe 4380000 146934000

Count of distinct heads of states.

SELECT COUNT(DISTINCT HeadOfState) FROM Country;
1 records
COUNT(DISTINCT HeadOfState)
179

Example of a subselect.

Which albums have tracks that are less than 90 seconds? Only shows album IDs.

SELECT DISTINCT album_id FROM track WHERE duration <= 90;
2 records
album_id
11
13

Use above select statement as a subselect. Note the use of parentheses and the IN operator.

SELECT * FROM album
  WHERE id IN (SELECT DISTINCT album_id FROM track WHERE duration <= 90)
;
2 records
id title artist label released
11 Hendrix in the West Jimi Hendrix Polydor 1972-01-00
13 Birds of Fire Mahavishnu Orchestra Columbia 1973-03-00
SELECT a.title AS album, a.artist, t.track_number AS seq, t.title, t.duration AS secs
  FROM album AS a
  JOIN (
    SELECT DISTINCT album_id, track_number, duration, title
      FROM track
      WHERE duration <= 90
  ) AS t
    ON t.album_id = a.id
  ORDER BY a.title, t.track_number
;
2 records
album artist seq title secs
Birds of Fire Mahavishnu Orchestra 4 Sapphire Bullets of Pure Love 24
Hendrix in the West Jimi Hendrix 6 Sgt. Pepper’s Lonely Hearts Club Band 76