Files
databases.softwareshinobi.com/landing/docs/SQL-101/011-join.md

352 lines
16 KiB
Markdown
Raw Normal View History

2025-06-19 13:04:08 -04:00
# JOINs
Relational databases store data across multiple tables to enforce structure and minimize redundancy. Effective querying of this distributed data necessitates the use of `JOIN` operations. The `JOIN` clause provides a mechanism to consolidate data from two or more tables into a single, cohesive result set.
Selecting columns from multiple tables requires specifying the table name followed by a dot (`.`) before the column name (e.g., `table1.column1`, `table2.column2`) in the `SELECT` list, separated by commas.
This document covers the fundamental `JOIN` types essential for application development:
* `CROSS JOIN`
* `INNER JOIN`
* `LEFT JOIN`
* `RIGHT JOIN`
To illustrate these concepts, establish the following database and tables:
1. Create a database named `demo_joins`:
```sql
CREATE DATABASE demo_joins;
```
2. Switch to the newly created database:
```sql
USE demo_ joins;
```
3. Define a `users` table with `id` and `username` columns:
```sql
CREATE TABLE users
(
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(2 55) NOT NULL
);
```
4. Define a `posts` table with `id`, `user_id`, and `title` columns. The `user_id` column links posts to users, establishing a one-to -many relationship (one user can have many posts).
```sql
CREATE TABLE posts
(
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
title VARCHAR(255 ) NOT NULL
);
```
Populate these tables with sample data:
```sql
INSERT INTO users
( username )
VALUES
('shinobi'),
('javateamsix'),
('ton y'),
('greisi');
```
```sql
INSERT INTO posts
( user_id, title )
VALUES
('1', 'Hello World!'),
('2', 'Getting started with SQL '),
('3', 'SQL is awesome'),
('2', 'MySQL is up!'),
('1', 'SQL - structured query language');
```
With the database and data prepared, explore the application of different join types.
## CROSS JOIN
A `CROSS JOIN` combines every row from the first table with every row from the second table. This generates a Cartesian product of the two tables, without requiring any join condition.
Executing an unqualified `CROSS JOIN` on large tables yields a substantial result set, rarely applicable in typical application logic. Its primary use is often conceptual or in specific scenarios requiring a complete pairing of records from two sources.
Consider joining the `users` and `posts` tables using `CROSS JOIN`:
```sql
SELECT * FROM users CROSS JOIN posts;
```
The output lists every user paired with every post:
| id | username | id | user_id | title |
|:---|:------------|:---|:--------|:-----------------------------|
| 4 | greisi | 1 | 1 | Hello World! |
| 3 | tony | 1 | 1 | Hello World! |
| 2 | javateamsix | 1 | 1 | Hello World! |
| 1 | shinobi | 1 | 1 | Hello World! |
| 4 | greisi | 2 | 2 | Getting started with SQL |
| 3 | tony | 2 | 2 | Getting started with SQL |
| 2 | javateamsix | 2 | 2 | Getting started with SQL |
| 1 | shinobi | 2 | 2 | Getting started with SQL |
| 4 | greisi | 3 | 3 | SQL is awesome |
| 3 | tony | 3 | 3 | SQL is awesome |
| 2 | javateamsix | 3 | 3 | SQL is awesome |
| 1 | shinobi | 3 | 3 | SQL is awesome |
| 4 | greisi | 4 | 2 | MySQL is up! |
| 3 | tony | 4 | 2 | MySQL is up! |
| 2 | javateamsix | 4 | 2 | MySQL is up! |
| 1 | shinobi | 4 | 2 | MySQL is up! |
| 4 | greisi | 5 | 1 | SQL - structured query language |
| 3 | tony | 5 | 1 | SQL - structured query language |
| 2 | javateamsix | 5 | 1 | SQL - structured query language |
| 1 | shinobi | 5 | 1 | SQL - structured query language |
While simple in syntax, an unqualified `CROSS JOIN` generates a row count equal to the product of the row counts of the joined tables. For tables with many records, this operation is computationally expensive and yields a large, often unmanageable, result set. Practical join operations typically employ conditions to relate rows between tables.
In many SQL dialects, including MySQL, `CROSS JOIN` without an `ON` clause is syntactically equivalent to `INNER JOIN` without an `ON` clause.
## INNER JOIN
An `INNER JOIN ` retrieves rows from both tables where a specified join condition is met. This is the most common type of join, used to combine records that have matching values in a defined relationship column.
Although not strictly required by all SQL implementations for `INNER JOIN`, defining relationships using primary and foreign keys is a standard database design practice that facilitates join operations. In this example, `users.id` serves as the primary key and `posts.user_id` as the foreign key referencing the `users` table.
To retrieve users and their associated posts, matching based on the `user_id` relationship:
```sql
SELECT *
FROM users
INNER JOIN posts
ON users.id = posts.user_id;
```
This query operates as follows:
* `SELECT * FROM users`: Selects all columns from the `users` table.
* `INNER JOIN posts`: Specifies the intention to join with the `posts` table.
* `ON users.id = posts.user_id`: Defines the join condition, matching rows where the `id` from `users` equals the `user_id` from `posts `.
The result set includes only the rows where a user `id` matches a post `user_id`:
| id | username | id | user_id | title |
|:---|:------------|:---|: --------|:-----------------------------|
| 1 | shinobi | 1 | 1 | Hello World! |
| 2 | javateamsix | 2 | 2 | Getting started with SQL |
| 3 | tony | 3 | 3 | SQL is awesome |
| 2 | javateamsix | 4 | 2 | MySQL is up! |
| 1 | shinobi | 5 | 1 | SQL - structured query language |
Note that the user 'greisi', who has no associated posts in the `posts` table, is not included in this result. `INNER JOIN` explicitly excludes rows with no matches in the joined table.
In MySQL, the `INNER` keyword is optional for an `INNER JOIN`, meaning `JOIN` is equivalent.
```sql
SELECT *
FROM users
JOIN posts
ON users.id = posts.user_id;
```
Key takeaways for `INNER JOIN`:
* Requires a join condition specified by `ON`.
* Only returns rows where the condition is met in *both* tables.
* Discards rows where no match is found in the joined table, effectively excluding `NULL` matches based on the join key.
### Types of INNER JOIN
Relational algebra defines specialized forms of `INNER JOIN` based on the join condition:
1. **Theta Join ( θ )**: Joins rows from tables based on an arbitrary condition (θ ). The condition can use any comparison operator (`<`, `>`, `=`, `<=`, `>=`, `!=`).
Notation: R₁ ⋈<sub>θ</sub> R₂
Example: Select mobile and laptop models where the mobile price is less than the laptop price.
```sql
SELECT mobile.model, laptop.model
FROM mobile, laptop
WHERE mobile.price < laptop.price;
```
2. **Equijoin**: A specific type of Theta Join where the join condition exclusively uses the equality operator (`=`).
Example: Select mobile and laptop models where prices are equal.
```sql
SELECT mobile.model, laptop.model
FROM mobile, laptop
WHERE mobile.price = laptop.price;
```
3. **Natural Join ( ⋈ )**: Joins tables implicitly based on columns that share the same name and data type in both tables. No `ON` clause is used; the database system automatically finds matching columns. For a Natural Join to be valid, at least one such common column must exist.
Example: Assume `mobile` and `laptop` tables both have a `price` column.
```sql
SELECT * FROM mobile NATURAL JOIN laptop;
```
This joins `mobile` and `laptop ` on their common `price` column.
## LEFT JOIN
A `LEFT JOIN` (or `LEFT OUTER JOIN`) returns all rows from the *left* table (the first table mentioned) and the matched rows from the *right* table. If a row from the left table has no matching row in the right table according to the join condition, columns from the right table in the result set will contain `NULL` values.
This is crucial for retrieving all records from one table, even if corresponding entries do not exist in another. For example, to list all users and any posts they might have, including users with no posts:
```sql
SELECT *
FROM users
LEFT JOIN posts
ON users.id = posts.user_id;
```
The output includes all users, and for those without matching posts, the post-related columns are `NULL`.
| id | username | id | user_ id | title |
|:---|:------------|:-----|:--------|:-------------------------------|
| 1 | shinobi | 1 | 1 | Hello World! |
| 2 | javateamsix | 2 | 2 | Getting started with SQL |
| 3 | tony | 3 | 3 | SQL is awesome |
| 2 | jav ateamsix | 4 | 2 | MySQL is up! |
| 1 | shinobi | 5 | 1 | SQL - structured query language|
| 4 | greisi | NULL | NULL | NULL |
The user 'greisi' appears in the result set because they are in the left table (`users`), even though they have no matching record in the right table (`posts`).
## RIGHT JOIN
A `RIGHT JOIN` (or `RIGHT OUTER JOIN`) is the inverse of `LEFT JOIN`. It returns all rows from the *right* table and the matched rows from the *left* table. If a row from the right table has no match in the left table based on the join condition, columns from the left table in the result set will be `NULL`.
Consider a post entry without a corresponding user. Add such a post:
```sql
INSERT INTO posts
( user_id, title )
VALUES
('123', 'No user post!');
```
User ID `123` does not exist in the `users ` table.
A `LEFT JOIN` would exclude this 'No user post!' entry. A `RIGHT JOIN`, however, will include it, as `posts` is now the right table:
```sql
SELECT *
FROM users
RIGHT JOIN posts
ON users.id = posts.user_id;
```
The output includes all posts, even the one without a matching user. The 'greisi' user, lacking posts, is not present.
| id | username | id | user_id | title |
|:-----|:------------|:---|:--------|:-------------------------------|
| 1 | shinobi | 1 | 1 | Hello World! |
| 2 | javateamsix | 2 | 2 | Getting started with SQL |
| 3 | tony | 3 | 3 | SQL is awesome |
| 2 | javateamsix | 4 | 2 | MySQL is up! |
| 1 | shinobi | 5 | 1 | SQL - structured query language|
| NULL | NULL | 6 | 123 | No user post! |
Joins can be combined with `WHERE` clauses to filter results *after* the join operation. For example, restricting the previous `RIGHT JOIN` results to only show data related to username 'shinobi':
```sql
SELECT *
FROM users
RIGHT JOIN posts
ON users.id = posts.user_id
WHERE username = 'shinobi';
```
Output shows only the rows where the `username` is 'shinobi' and a match was found in the join. The row for 'No user post!' is excluded because its `username` is `NULL`, failing the `WHERE` condition.
| id | username | id | user_id | title |
|:---|:---------|:---|:--------|:-------------------------------|
| 1 | shinobi | 1 | 1 | Hello World ! |
| 1 | shinobi | 5 | 1 | SQL - structured query language|
## The Impact of Conditions: JOIN vs. WHERE
The placement of conditions significantly alters query results, particularly in `LEFT` and `RIGHT` joins. Conditions in the `ON` clause filter data *during* the join, while conditions in the `WHERE` clause filter data *after* the join is complete.
Consider retrieving posts with "SQL" in their title, along with user data:
Placing the condition in the `WHERE` clause:
```sql
SELECT users.*, posts.*
FROM users
LEFT JOIN posts
ON posts.user_id = users .id
WHERE posts.title LIKE '%SQL%';
```
This performs the `LEFT JOIN` first (including all users, even those without matching posts), and then filters the result set to include only rows where the `posts.title` contains "SQL ". Users without posts will not appear if they do not meet the `WHERE` clause criteria (which is impossible since `posts.title` would be NULL).
| id | username | id | user_id | title |
| :---|:------------|:---|:--------|:-----------------------------|
| 2 | javateamsix | 2 | 2 | Getting started with SQL |
| 3 | tony | 3 | 3 | SQL is awesome |
| 2 | javateamsix | 4 | 2 | MySQL is up! |
| 1 | shinobi | 5 | 1 | SQL - structured query language |
Shifting the condition to the `ON` clause:
```sql
SELECT users.*, posts.*
FROM users
LEFT JOIN posts
ON posts.user_id = users.id
AND posts.title LIKE '%SQL%';
```
This query incorporates the `posts.title` condition into the join process itself. The `LEFT JOIN` proceeds, including all users. However, for the *right* side (posts), it only matches posts where the `user_id` corresponds *AND* the `title` contains "SQL". If a user has posts but none contain "SQL", those post columns will be `NULL` in the result for that user.
| id | username | id | user_id | title |
|:---|:------------|:-----|:--------|:-------------------------------|
| 1 | shinobi | 5 | 1 | SQL - structured query language|
| 2 | javateamsix | 4 | 2 | MySQL is up! |
| 2 | javateamsix | 2 | 2 | Getting started with SQL |
| 3 | tony | 3 | 3 | SQL is awesome |
| 4 | greisi | NULL | NULL | NULL |
Notice the difference: the `ON` clause condition includes user 'greisi' (with NULL post data) because the `LEFT JOIN` guarantees all rows from the left table (`users`) are included. The post records for other users only appear if they match the `ON` condition.
## Equivalence of RIGHT and LEFT JOINs
`RIGHT JOIN` and `LEFT JOIN` operations are functionally interchangeable. Any query written with a `RIGHT JOIN` can be rewritten as a `LEFT JOIN` by simply swapping the order of the tables in the `FROM` clause.
This `LEFT JOIN`:
```sql
SELECT users.*, posts.*
FROM posts
LEFT JOIN users
ON posts.user_id = users .id;
```
Is equivalent to this `RIGHT JOIN`:
```sql
SELECT users.*, posts.*
FROM users
RIGHT JOIN posts
ON posts.user_id = users.id;
```
Both queries produce the same result set, illustrating that the direction of the "outer" join is determined by the order of the tables and the `LEFT` or `RIGHT` keyword.
## Conclusion
Mastering `JOIN` operations is foundational for effective data retrieval and manipulation in relational databases. Each join type serves a distinct purpose in combining data based on defined relationships between tables. Understanding their behavior, particularly concerning matching records and the treatment of non-matching rows (`NULL ` values), is critical.
Experimentation is key to solidifying comprehension. Practice writing queries with each `JOIN` type, varying the join conditions and observing the resulting data sets.
For comprehensive technical specifications, refer to the official database documentation for your specific SQL dialect.