SQL
SQL (Structured Query Language) is a language for interacting with relational databases in an RDBMS. This page was created with reference to Ben Forta's book on SQL.
Terms
- A table is a structured list of data of a specific, consistent type.
- A schema describes table layout (and other information about the database).
- A table consists of columns and rows.
- Each column is a single field with an associated data type.
- Each row is a record in a table.
- Every row in a table should have some column (or set of columns) that uniquely identified it. This is called a primary key. Primary keys must be unique and shouldn't be reused.
SELECT
The SELECT
statement retrieves information from one or more tables. At
minimum, you must specify what you want to select and from where you want to
select it.
-- Retrieve a single column. SELECT first_name FROM People; -- Retrieve multiple columns. SELECT id, first_name, last_name FROM People; -- Retrieve all columns. SELECT * FROM People;
Unique Values
Use the DISTINCT
keyword to find unique rows. Note that the keyword applies
to all columns, not just the one it precedes.
SELECT DISTINCT first_name FROM People;
Limit Results
Use the LIMIT
keyword to control the number of rows returned. Note that SQL
implementations differ here.
SELECT first_name FROM People LIMIT 5; -- Start from row 5. SELECT first_name FROM People LIMIT 5 OFFSET 5;
Sorting
Use the ORDER BY
clause to sort data retrieved using a SELECT
. Be sure
that the ORDER BY
clause is the last clause in your SELECT
statement.
SELECT first_name FROM People ORDER BY first_name; -- Sort by multiple column names. SELECT id FROM People ORDER BY first_name, last_name; -- Sort in descending order (Z to A). SELECT email FROM People ORDER BY first_name DESC;
Filtering
Use the WHERE
clause to filter data retrieved by the SELECT
statement. The WHERE
clause is specified right after the table name. AND
,
OR
, and NOT
may be used in where clauses. SQL (like most languages)
processes AND
operators before OR
operators. Use parenthesis to
explicitly group related operators.
SELECT first_name, last_name FROM People WHERE age = 21; SELECT first_name FROM People WHERE last_name = 'Penson' AND age <= 30; SELECT first_name FROM People WHERE last_name = 'Penson' OR last_name = 'Bland'; SELECT first_name FROM People WHERE NOT last_name = 'Penson';
The IN
operator is used to specify a range of conditions, any of which can
be matched.
SELECT last_name FROM People WHERE zip IN ('20005', '20036') ORDER BY last_name;
Various operators are allowed. Note that single quotes should be used for string literals.
Operator | Description |
---|---|
= |
Equality |
<> or != |
Non-equality |
< |
Less than |
<= |
Less than or equal to |
!< |
Not less than |
> |
Greater than |
>= |
Greater than or equal to |
!> |
Not greater than |
BETWEEN <a> AND <b> |
Between two specified values |
IS NULL |
Is a NULL value. |
Wildcards
To use wildcards in search clauses, the LIKE
operator must be
used. Wildcard searching can only be used with text fields. The %
sign
matches any number of occurrences of any character. The underscore (_
)
matches any single character.
SELECT email FROM People WHERE email LIKE '%@gmail.com'; -- Find People (babies) with single digit heights. SELECT height FROM People WHERE height LIKE '_ inches';
Calculated Fields
A calculated field is created on-the-fly within a SQL SELECT
statement. Concatenate strings with the ||
operator. SQL supports
mathematical operators like +
, -
, *
, and /
.
SELECT last_name || ', ' || first_name FROM People; SELECT item_price * quantity FROM Orders;
An alias is an alternate name for a field or value. Aliases are assigned
with the AS
keyword.
SELECT last_name || ', ' || first_name AS name FROM People ORDER BY name;
SQL also provides a variety of functions for manipulating values. Unfortunately, functions tend to be very DBMS specific. Unlike SQL statements, SQL functions are not portable. Be sure to look at documentation for your specific SQL implementation!
Aggregation
SQL supports five efficient aggregation functions.
Function | Description |
---|---|
AVG() | Return a column's average value |
COUNT() | Return the number of rows in a column |
MAX() | Return a column's highest value |
MIN() | Return a column's lowest value |
SUM() | Return the sum of a column's values |
The COUNT
function can be used in a few different ways.
-- Count the number of rows in a table. SELECT COUNT(*) FROM People; -- Count the number of rows that have values in a specific column. SELECT COUNT(email) FROM People; -- Count unique values. SELECT COUNT(DISTINCT age) FROM People;
Grouping
GROUP BY
divides data into logical sets so that you can perform aggregate
calculations on each group. Note that groups may be nested by specifying
multiple columns in the GROUP BY
clause. Filter groups with the HAVING
clause. HAVING
filters groups, WHERE
filters rows.
SELECT age, COUNT(*) AS num_people FROM People GROUP BY age; SELECT zip_code, COUNT(*) AS num_people FROM People GROUP BY age HAVING COUNT(*) > 2;
Subqueries
Subqueries are queries embedded into other queries. SQL imposes no limit on
the number of subqueries that can be nested. Note that subquery SELECT
statements can only retrieve a single column. Attempting to retrieve multiple
columns will return an error.
SELECT last_name, first_name FROM People WHERE id IN (SELECT customer_id FROM Orders WHERE product_id = 32);
Joins are usually more performant than subqueries.
Joins
Relational tables are designed so that information is split into multiple
tables, one for each data type. The tables are related to each other through
common values. A join is a mechanism used to associate tables within a
SELECT
statement.
Note that you must use the fully qualified column name (table and column separated by a period) whenever there is a possible ambiguity about which column you are referring to.
Inner Join
The common inner join (or equijoin) is a join based on the testing of equality between two tables.
-- Implicit inner join. SELECT vendor_name, product_name FROM Vendors, Products WHERE Vendors.id = Products.vendor_id; -- Explicit inner join (preferred). SELECT vendor_name, product_name FROM Vendors INNER JOIN Products ON Vendors.id = Products.vendor_id;
Outer Join
Suppose, you're hoping to find the number of products sold by each vendor. Some of these counts might be zero. The outer join lets you include rows with no mutual relation.
SELECT vendor_name, COUNT(*) FROM Vendors LEFT OUTER JOIN Products ON Vendors.id = Products.vendor_id
The RIGHT
or LEFT
keyword must be used to specify the table from which
to include all rows. Some SQL implementations also provide a FULL
keyword
that retrieves all rows from both tables.
Combine Results
A UNION
is composed of two or more SELECT
statements, each separated by
the keyword UNION
. Each query in a UNION
must contain the same columns,
expressions, or aggregate functions.
SELECT first_name, last_name FROM People WHERE state IN ('MA', 'IL') UNION SELECT first_name, last_name FROM People WHERE age = 50;
Note that this specific query could have been performed with multiple WHERE
clauses instead.
INSERT
INSERT
adds rows to a database table. Complete or partial rows may be
inserted.
-- Insert a complete row. INSERT INTO People VALUES ('Elliot', 'Penson', 26, 'DC', 'elliotpenson@gmail.com', NULL); -- Insert a complete row with explicit columns (highly preferred). INSERT INTO People(first_name, last_name, age, state, email, favorite_color) VALUES ('Elliot', 'Penson', 26, 'DC', 'elliotpenson@gmail.com', NULL); -- Insert a partial row. INSERT INTO People(first_name, last_name) VALUES ('Elliot', 'Penson');
UPDATE
The UPDATE
statement modifies data in a table. The format includes the table
name, column names/values, and a filter condition. Omitting the filter
condition will update all rows in the table.
-- Update a column. UPDATE People SET age = 27 WHERE first_name = 'Elliot'; -- Update multiple columns. UPDATE People SET state = 'CA', email = 'elliot@company.com' WHERE first_name = 'Elliot';
DELETE
Remove rows from a table with DELETE
. Be careful to include a WHERE
clause!
-- Delete all Toms. DELETE FROM People WHERE first_name = 'Tom'; -- Delete all rows! DELETE FROM People;
CREATE TABLE
Use the CREATE TABLE
statement to create a table. Specify the name and a
series of column definitions.
CREATE TABLE People ( id INT NOT NULL PRIMARY KEY, first_name TEXT NOT NULL, last_name TEXT NOT NULL, age INT NOT NULL DEFAULT 0, state CHAR(2) NOT NULL, email TEXT NULL, birthdate DATETIME NOT NULL DEFAULT NOW() );
NULL
columns permit NULL
values, NOT NULL
columns do not accept rows
with no value. NULL
. The DEFAULT
keyword is used to specify default
values.
Standard SQL types include INT
, SMALLINT
, REAL
, FLOAT
, CHAR(N)
,
TEXT
, DATE
, TIME
, TIMESTAMP
. Note that some implementations (like
PostgresSQL) provide other types.
Constraints
Relational databases store related data in multiple tables. DBMSs enforce referential integrity by imposing constraints on database tables. Constraints are rules that govern how database data is inserted or manipulated.
Primary Keys
A primary key is a special constraint used to ensure that values in a
column (or set of columns) are unique and never change. No two rows may have
the same primary key value. Use the PRIMARY KEY
keyword in a table
definition to define a primary key.
Foreign Keys
A foreign key is a column whose values must be listed in the primary key of
another table. Foreign keys are essential for ensuring referential
integrity. Use the REFERENCES
keyword to define a foreign key.
CREATE TABLE Orders ( order_number INT NOT NULL PRIMARY KEY, order_date DATETIME NOT NULL, customer_id INT NOT NULL REFERENCES People(id) );
Unique Constraints
Unique constraints are used to ensure that all data in a column is
unique. Unique constraints are similar to primary keys. In contrast, though,
a table can contain multiple constraints, but only one primary key. Unique
constraints can also be modified or updated. Use the UNIQUE
keyword to
define a unique constraint.
Check Constraints
Check constraints ensure that data in a column (or set of columns) meet a set
of criteria. Common criteria include minimum or maximum values, a range, and
specific values. Use the CHECK
keyword to define a check constraint.
CREATE TABLE Order ( order_number INT NOT NULL PRIMARY KEY, product_id CHAR(10) NOT NULL, quantity INT NOT NULL CHECK (quantity > 0), price MONEY NOT NULL );
ALTER TABLE
-- Add a column. ALTER TABLE People ADD city TEXT; -- Remove a column. ALTER TABLE People DROP COLUMN city;
DROP TABLE
DROP TABLE People;
Transactions
Transaction processing is used to maintain database integrity by ensuring that batches of SQL operations execute completely or not at all. If no error occurs, the entire set of statements is committed. If an error occurs, a rollback can restore the database to a known and safe state. Note that individual statements are implicitly a single transaction.
Unfortunately, transaction processing is implemented differently in each
DBMS. In PostgreSQL, a transaction is set up by surrounding the SQL commands
with BEGIN
and COMMIT
. A ROLLBACK
will cancel the updates.
-- PostgreSQL transaction. BEGIN; ... COMMIT;
Savepoints allow you to selectively discard parts of a transaction while committing the rest.
BEGIN; ... SAVEPOINT my_savepoint; ... ROLLBACK TO my_savepoint; ... COMMIT;
Indexes
Searching for specific column values can often be inefficient. The DBMS might have to read every row in the table looking for matches. Indexes are a copy of selected columns organized logically to improve the speed of searching and sorting. These data structures can improve lookup from linear to logarithmic or even constant time.
While indexes improve the performance of retrieval operations, they degrade the performance of data insertion, modification, and deletion. Index data can also take up significant storage space.
CREATE INDEX last_name_index ON People (last_name); DROP INDEX last_name_index;
PostgreSQL
PostgreSQL (or Postgres) is an open-source SQL implementation. See the documentation for more information.
On macOS, PostgreSQL can be installed via homebrew (i.e. brew install
postgres
). PostgreSQL uses a client/server model. Run brew services start
postgresql
to begin the server. Create a new database with createdb
<db-name>
, delete a database with dropdb <db-name>
.
psql
is the PostgreSQL terminal client. This application allows one to
interactively enter, edit, and execute SQL commands. The command takes the
form psql <db-name>
.
Meta-Command | Description |
---|---|
\list | List all databases. |
\connect <database> | Connect to a database. |
\dt | Display all tables in the current database. |
\dn | List schemas (namespaces). |
\d <table> | Describe a table (show columns) |
\q | Quit psql |
Tricks and Examples
Remove Duplicates
Suppose we have a People
table with Id
and Email
columns. The following
command will delete all duplicate email entries keeping entries based on the
smallest ID.
DELETE * FROM People p1 INNER JOIN People p2 WHERE p1.Email = p2.Email AND p1.Id > p2.Id;