SQL (Structured Query Language) is a language for interacting with relational databases in an RDBMS. This page was created with reference to Ben Forta's book on SQL.
- A table is a structured list of data of a specific, consistent type.
- A schema describes table layout (and other information about the database).
- A table consists of columns and rows.
- Each column is a single field with an associated data type.
- Each row is a record in a table.
- Every row in a table should have some column (or set of columns) that uniquely identified it. This is called a primary key. Primary keys must be unique and shouldn't be reused.
SELECT statement retrieves information from one or more tables. At
minimum, you must specify what you want to select and from where you want to
-- Retrieve a single column. SELECT first_name FROM People; -- Retrieve multiple columns. SELECT id, first_name, last_name FROM People; -- Retrieve all columns. SELECT * FROM People;
DISTINCT keyword to find unique rows. Note that the keyword applies
to all columns, not just the one it precedes.
SELECT DISTINCT first_name FROM People;
LIMIT keyword to control the number of rows returned. Note that SQL
implementations differ here.
SELECT first_name FROM People LIMIT 5; -- Start from row 5. SELECT first_name FROM People LIMIT 5 OFFSET 5;
ORDER BY clause to sort data retrieved using a
SELECT. Be sure
ORDER BY clause is the last clause in your
SELECT first_name FROM People ORDER BY first_name; -- Sort by multiple column names. SELECT id FROM People ORDER BY first_name, last_name; -- Sort in descending order (Z to A). SELECT email FROM People ORDER BY first_name DESC;
WHERE clause to filter data retrieved by the
WHERE clause is specified right after the table name.
NOT may be used in where clauses. SQL (like most languages)
AND operators before
OR operators. Use parenthesis to
explicitly group related operators.
SELECT first_name, last_name FROM People WHERE age = 21; SELECT first_name FROM People WHERE last_name = 'Penson' AND age <= 30; SELECT first_name FROM People WHERE last_name = 'Penson' OR last_name = 'Bland'; SELECT first_name FROM People WHERE NOT last_name = 'Penson';
IN operator is used to specify a range of conditions, any of which can
SELECT last_name FROM People WHERE zip IN ('20005', '20036') ORDER BY last_name;
Various operators are allowed. Note that single quotes should be used for string literals.
||Less than or equal to|
||Not less than|
||Greater than or equal to|
||Not greater than|
||Between two specified values|
To use wildcards in search clauses, the
LIKE operator must be
used. Wildcard searching can only be used with text fields. The
matches any number of occurrences of any character. The underscore (
matches any single character.
SELECT email FROM People WHERE email LIKE '%@gmail.com'; -- Find People (babies) with single digit heights. SELECT height FROM People WHERE height LIKE '_ inches';
A calculated field is created on-the-fly within a SQL
statement. Concatenate strings with the
|| operator. SQL supports
mathematical operators like
SELECT last_name || ', ' || first_name FROM People; SELECT item_price * quantity FROM Orders;
An alias is an alternate name for a field or value. Aliases are assigned
SELECT last_name || ', ' || first_name AS name FROM People ORDER BY name;
SQL also provides a variety of functions for manipulating values. Unfortunately, functions tend to be very DBMS specific. Unlike SQL statements, SQL functions are not portable. Be sure to look at documentation for your specific SQL implementation!
SQL supports five efficient aggregation functions.
|AVG()||Return a column's average value|
|COUNT()||Return the number of rows in a column|
|MAX()||Return a column's highest value|
|MIN()||Return a column's lowest value|
|SUM()||Return the sum of a column's values|
COUNT function can be used in a few different ways.
-- Count the number of rows in a table. SELECT COUNT(*) FROM People; -- Count the number of rows that have values in a specific column. SELECT COUNT(email) FROM People; -- Count unique values. SELECT COUNT(DISTINCT age) FROM People;
GROUP BY divides data into logical sets so that you can perform aggregate
calculations on each group. Note that groups may be nested by specifying
multiple columns in the
GROUP BY clause. Filter groups with the
HAVING filters groups,
WHERE filters rows.
SELECT age, COUNT(*) AS num_people FROM People GROUP BY age; SELECT zip_code, COUNT(*) AS num_people FROM People GROUP BY age HAVING COUNT(*) > 2;
Subqueries are queries embedded into other queries. SQL imposes no limit on
the number of subqueries that can be nested. Note that subquery
statements can only retrieve a single column. Attempting to retrieve multiple
columns will return an error.
SELECT last_name, first_name FROM People WHERE id IN (SELECT customer_id FROM Orders WHERE product_id = 32);
Joins are usually more performant than subqueries.
Relational tables are designed so that information is split into multiple
tables, one for each data type. The tables are related to each other through
common values. A join is a mechanism used to associate tables within a
Note that you must use the fully qualified column name (table and column separated by a period) whenever there is a possible ambiguity about which column you are referring to.
The common inner join (or equijoin) is a join based on the testing of equality between two tables.
-- Implicit inner join. SELECT vendor_name, product_name FROM Vendors, Products WHERE Vendors.id = Products.vendor_id; -- Explicit inner join (preferred). SELECT vendor_name, product_name FROM Vendors INNER JOIN Products ON Vendors.id = Products.vendor_id;
Suppose, you're hoping to find the number of products sold by each vendor. Some of these counts might be zero. The outer join lets you include rows with no mutual relation.
SELECT vendor_name, COUNT(*) FROM Vendors LEFT OUTER JOIN Products ON Vendors.id = Products.vendor_id
LEFT keyword must be used to specify the table from which
to include all rows. Some SQL implementations also provide a
that retrieves all rows from both tables.
UNION is composed of two or more
SELECT statements, each separated by
UNION. Each query in a
UNION must contain the same columns,
expressions, or aggregate functions.
SELECT first_name, last_name FROM People WHERE state IN ('MA', 'IL') UNION SELECT first_name, last_name FROM People WHERE age = 50;
Note that this specific query could have been performed with multiple
INSERT adds rows to a database table. Complete or partial rows may be
-- Insert a complete row. INSERT INTO People VALUES ('Elliot', 'Penson', 26, 'DC', 'firstname.lastname@example.org', NULL); -- Insert a complete row with explicit columns (highly preferred). INSERT INTO People(first_name, last_name, age, state, email, favorite_color) VALUES ('Elliot', 'Penson', 26, 'DC', 'email@example.com', NULL); -- Insert a partial row. INSERT INTO People(first_name, last_name) VALUES ('Elliot', 'Penson');
UPDATE statement modifies data in a table. The format includes the table
name, column names/values, and a filter condition. Omitting the filter
condition will update all rows in the table.
-- Update a column. UPDATE People SET age = 27 WHERE first_name = 'Elliot'; -- Update multiple columns. UPDATE People SET state = 'CA', email = 'firstname.lastname@example.org' WHERE first_name = 'Elliot';
Remove rows from a table with
DELETE. Be careful to include a
-- Delete all Toms. DELETE FROM People WHERE first_name = 'Tom'; -- Delete all rows! DELETE FROM People;
CREATE TABLE statement to create a table. Specify the name and a
series of column definitions.
CREATE TABLE People ( id INT NOT NULL PRIMARY KEY, first_name TEXT NOT NULL, last_name TEXT NOT NULL, age INT NOT NULL DEFAULT 0, state CHAR(2) NOT NULL, email TEXT NULL, birthdate DATETIME NOT NULL DEFAULT NOW() );
NULL columns permit
NOT NULL columns do not accept rows
with no value.
DEFAULT keyword is used to specify default
Standard SQL types include
TIMESTAMP. Note that some implementations (like
PostgresSQL) provide other types.
Relational databases store related data in multiple tables. DBMSs enforce referential integrity by imposing constraints on database tables. Constraints are rules that govern how database data is inserted or manipulated.
A primary key is a special constraint used to ensure that values in a
column (or set of columns) are unique and never change. No two rows may have
the same primary key value. Use the
PRIMARY KEY keyword in a table
definition to define a primary key.
A foreign key is a column whose values must be listed in the primary key of
another table. Foreign keys are essential for ensuring referential
integrity. Use the
REFERENCES keyword to define a foreign key.
CREATE TABLE Orders ( order_number INT NOT NULL PRIMARY KEY, order_date DATETIME NOT NULL, customer_id INT NOT NULL REFERENCES People(id) );
Unique constraints are used to ensure that all data in a column is
unique. Unique constraints are similar to primary keys. In contrast, though,
a table can contain multiple constraints, but only one primary key. Unique
constraints can also be modified or updated. Use the
UNIQUE keyword to
define a unique constraint.
Check constraints ensure that data in a column (or set of columns) meet a set
of criteria. Common criteria include minimum or maximum values, a range, and
specific values. Use the
CHECK keyword to define a check constraint.
CREATE TABLE Order ( order_number INT NOT NULL PRIMARY KEY, product_id CHAR(10) NOT NULL, quantity INT NOT NULL CHECK (quantity > 0), price MONEY NOT NULL );
-- Add a column. ALTER TABLE People ADD city TEXT; -- Remove a column. ALTER TABLE People DROP COLUMN city;
DROP TABLE People;
Transaction processing is used to maintain database integrity by ensuring that batches of SQL operations execute completely or not at all. If no error occurs, the entire set of statements is committed. If an error occurs, a rollback can restore the database to a known and safe state. Note that individual statements are implicitly a single transaction.
Unfortunately, transaction processing is implemented differently in each
DBMS. In PostgreSQL, a transaction is set up by surrounding the SQL commands
ROLLBACK will cancel the updates.
-- PostgreSQL transaction. BEGIN; ... COMMIT;
Savepoints allow you to selectively discard parts of a transaction while committing the rest.
BEGIN; ... SAVEPOINT my_savepoint; ... ROLLBACK TO my_savepoint; ... COMMIT;
Searching for specific column values can often be inefficient. The DBMS might have to read every row in the table looking for matches. Indexes are a copy of selected columns organized logically to improve the speed of searching and sorting. These data structures can improve lookup from linear to logarithmic or even constant time.
While indexes improve the performance of retrieval operations, they degrade the performance of data insertion, modification, and deletion. Index data can also take up significant storage space.
CREATE INDEX last_name_index ON People (last_name); DROP INDEX last_name_index;
PostgreSQL (or Postgres) is an open-source SQL implementation. See the documentation for more information.
On macOS, PostgreSQL can be installed via homebrew (i.e.
postgres). PostgreSQL uses a client/server model. Run
brew services start
postgresql to begin the server. Create a new database with
<db-name>, delete a database with
psql is the PostgreSQL terminal client. This application allows one to
interactively enter, edit, and execute SQL commands. The command takes the
|\list||List all databases.|
|\connect <database>||Connect to a database.|
|\dt||Display all tables in the current database.|
|\dn||List schemas (namespaces).|
|\d <table>||Describe a table (show columns)|