Files
databases.softwareshinobi.com/landing/docs/SQL-101/011-join.md
Software Shinobi caa5bbb983
All checks were successful
learn org at code.softwareshinobi.com/databases.softwareshinobi.com/pipeline/head This commit looks good
rewriting
2025-06-19 13:04:08 -04:00

16 KiB

JOINs

Relational databases store data across multiple tables to enforce structure and minimize redundancy. Effective querying of this distributed data necessitates the use of JOIN operations. The JOIN clause provides a mechanism to consolidate data from two or more tables into a single, cohesive result set.

Selecting columns from multiple tables requires specifying the table name followed by a dot (.) before the column name (e.g., table1.column1, table2.column2) in the SELECT list, separated by commas.

This document covers the fundamental JOIN types essential for application development:

  • CROSS JOIN
  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN

To illustrate these concepts, establish the following database and tables:

  1. Create a database named demo_joins:
CREATE DATABASE demo_joins;
  1. Switch to the newly created database:
USE demo_ joins;
  1. Define a users table with id and username columns:
CREATE TABLE users
(
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(2 55) NOT NULL
);
  1. Define a posts table with id, user_id, and title columns. The user_id column links posts to users, establishing a one-to -many relationship (one user can have many posts).
CREATE TABLE posts
(
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
title VARCHAR(255 ) NOT NULL
);

Populate these tables with sample data:

INSERT INTO users
( username )
VALUES
('shinobi'),
('javateamsix'),
('ton y'),
('greisi');
INSERT INTO posts
( user_id, title )
VALUES
('1', 'Hello World!'),
('2', 'Getting started with SQL '),
('3', 'SQL is awesome'),
('2', 'MySQL is up!'),
('1', 'SQL - structured query language');

With the database and data prepared, explore the application of different join types.

CROSS JOIN

A CROSS JOIN combines every row from the first table with every row from the second table. This generates a Cartesian product of the two tables, without requiring any join condition.

Executing an unqualified CROSS JOIN on large tables yields a substantial result set, rarely applicable in typical application logic. Its primary use is often conceptual or in specific scenarios requiring a complete pairing of records from two sources.

Consider joining the users and posts tables using CROSS JOIN:

SELECT * FROM users CROSS JOIN posts;

The output lists every user paired with every post:

id username id user_id title
4 greisi 1 1 Hello World!
3 tony 1 1 Hello World!
2 javateamsix 1 1 Hello World!
1 shinobi 1 1 Hello World!
4 greisi 2 2 Getting started with SQL
3 tony 2 2 Getting started with SQL
2 javateamsix 2 2 Getting started with SQL
1 shinobi 2 2 Getting started with SQL
4 greisi 3 3 SQL is awesome
3 tony 3 3 SQL is awesome
2 javateamsix 3 3 SQL is awesome
1 shinobi 3 3 SQL is awesome
4 greisi 4 2 MySQL is up!
3 tony 4 2 MySQL is up!
2 javateamsix 4 2 MySQL is up!
1 shinobi 4 2 MySQL is up!
4 greisi 5 1 SQL - structured query language
3 tony 5 1 SQL - structured query language
2 javateamsix 5 1 SQL - structured query language
1 shinobi 5 1 SQL - structured query language

While simple in syntax, an unqualified CROSS JOIN generates a row count equal to the product of the row counts of the joined tables. For tables with many records, this operation is computationally expensive and yields a large, often unmanageable, result set. Practical join operations typically employ conditions to relate rows between tables.

In many SQL dialects, including MySQL, CROSS JOIN without an ON clause is syntactically equivalent to INNER JOIN without an ON clause.

INNER JOIN

An INNER JOIN retrieves rows from both tables where a specified join condition is met. This is the most common type of join, used to combine records that have matching values in a defined relationship column.

Although not strictly required by all SQL implementations for INNER JOIN, defining relationships using primary and foreign keys is a standard database design practice that facilitates join operations. In this example, users.id serves as the primary key and posts.user_id as the foreign key referencing the users table.

To retrieve users and their associated posts, matching based on the user_id relationship:

SELECT *
FROM users
INNER JOIN  posts
ON users.id = posts.user_id;

This query operates as follows:

  • SELECT * FROM users: Selects all columns from the users table.
  • INNER JOIN posts: Specifies the intention to join with the posts table.
  • ON users.id = posts.user_id: Defines the join condition, matching rows where the id from users equals the user_id from posts .

The result set includes only the rows where a user id matches a post user_id:

| id | username | id | user_id | title | |:---|:------------|:---|: --------|:-----------------------------| | 1 | shinobi | 1 | 1 | Hello World! | | 2 | javateamsix | 2 | 2 | Getting started with SQL | | 3 | tony | 3 | 3 | SQL is awesome | | 2 | javateamsix | 4 | 2 | MySQL is up! | | 1 | shinobi | 5 | 1 | SQL - structured query language |

Note that the user 'greisi', who has no associated posts in the posts table, is not included in this result. INNER JOIN explicitly excludes rows with no matches in the joined table.

In MySQL, the INNER keyword is optional for an INNER JOIN, meaning JOIN is equivalent.

SELECT *
FROM users
JOIN posts
ON users.id = posts.user_id;

Key takeaways for INNER JOIN:

  • Requires a join condition specified by ON.
  • Only returns rows where the condition is met in both tables.
  • Discards rows where no match is found in the joined table, effectively excluding NULL matches based on the join key.

Types of INNER JOIN

Relational algebra defines specialized forms of INNER JOIN based on the join condition:

  1. Theta Join ( θ ): Joins rows from tables based on an arbitrary condition (θ ). The condition can use any comparison operator (<, >, =, <=, >=, !=). Notation: R₁ ⋈θ R₂

Example: Select mobile and laptop models where the mobile price is less than the laptop price.

SELECT mobile.model, laptop.model
FROM mobile, laptop
WHERE mobile.price < laptop.price;
  1. Equijoin: A specific type of Theta Join where the join condition exclusively uses the equality operator (=).

Example: Select mobile and laptop models where prices are equal.

SELECT mobile.model, laptop.model
FROM mobile, laptop
WHERE mobile.price = laptop.price;
  1. Natural Join ( ⋈ ): Joins tables implicitly based on columns that share the same name and data type in both tables. No ON clause is used; the database system automatically finds matching columns. For a Natural Join to be valid, at least one such common column must exist.

Example: Assume mobile and laptop tables both have a price column.

SELECT * FROM mobile NATURAL JOIN laptop;

This joins mobile and laptop on their common price column.

LEFT JOIN

A LEFT JOIN (or LEFT OUTER JOIN) returns all rows from the left table (the first table mentioned) and the matched rows from the right table. If a row from the left table has no matching row in the right table according to the join condition, columns from the right table in the result set will contain NULL values.

This is crucial for retrieving all records from one table, even if corresponding entries do not exist in another. For example, to list all users and any posts they might have, including users with no posts:

SELECT *
FROM users
LEFT JOIN posts
ON users.id = posts.user_id;

The output includes all users, and for those without matching posts, the post-related columns are NULL.

id username id user_ id title
1 shinobi 1 1 Hello World!
2 javateamsix 2 2 Getting started with SQL
3 tony 3 3 SQL is awesome
2 jav ateamsix 4 2 MySQL is up!
1 shinobi 5 1 SQL - structured query language
4 greisi NULL NULL NULL

The user 'greisi' appears in the result set because they are in the left table (users), even though they have no matching record in the right table (posts).

RIGHT JOIN

A RIGHT JOIN (or RIGHT OUTER JOIN) is the inverse of LEFT JOIN. It returns all rows from the right table and the matched rows from the left table. If a row from the right table has no match in the left table based on the join condition, columns from the left table in the result set will be NULL.

Consider a post entry without a corresponding user. Add such a post:

INSERT INTO posts
( user_id, title )
VALUES
('123', 'No user post!');

User ID 123 does not exist in the users table.

A LEFT JOIN would exclude this 'No user post!' entry. A RIGHT JOIN, however, will include it, as posts is now the right table:

SELECT *
FROM users
RIGHT JOIN posts
ON users.id = posts.user_id;

The output includes all posts, even the one without a matching user. The 'greisi' user, lacking posts, is not present.

id username id user_id title
1 shinobi 1 1 Hello World!
2 javateamsix 2 2 Getting started with SQL
3 tony 3 3 SQL is awesome
2 javateamsix 4 2 MySQL is up!
1 shinobi 5 1 SQL - structured query language
NULL NULL 6 123 No user post!

Joins can be combined with WHERE clauses to filter results after the join operation. For example, restricting the previous RIGHT JOIN results to only show data related to username 'shinobi':

SELECT *
FROM users
RIGHT JOIN posts
ON users.id = posts.user_id
WHERE username = 'shinobi';

Output shows only the rows where the username is 'shinobi' and a match was found in the join. The row for 'No user post!' is excluded because its username is NULL, failing the WHERE condition.

id username id user_id title
1 shinobi 1 1 Hello World !
1 shinobi 5 1 SQL - structured query language

The Impact of Conditions: JOIN vs. WHERE

The placement of conditions significantly alters query results, particularly in LEFT and RIGHT joins. Conditions in the ON clause filter data during the join, while conditions in the WHERE clause filter data after the join is complete.

Consider retrieving posts with "SQL" in their title, along with user data:

Placing the condition in the WHERE clause:

SELECT users.*, posts.*
FROM users
LEFT JOIN posts
ON posts.user_id = users .id
WHERE posts.title LIKE '%SQL%';

This performs the LEFT JOIN first (including all users, even those without matching posts), and then filters the result set to include only rows where the posts.title contains "SQL ". Users without posts will not appear if they do not meet the WHERE clause criteria (which is impossible since posts.title would be NULL).

id username id user_id title
2 javateamsix 2 2 Getting started with SQL
3 tony 3 3 SQL is awesome
2 javateamsix 4 2 MySQL is up!
1 shinobi 5 1 SQL - structured query language

Shifting the condition to the ON clause:

SELECT users.*, posts.*
FROM users
LEFT JOIN posts
ON posts.user_id  = users.id
AND posts.title LIKE '%SQL%';

This query incorporates the posts.title condition into the join process itself. The LEFT JOIN proceeds, including all users. However, for the right side (posts), it only matches posts where the user_id corresponds AND the title contains "SQL". If a user has posts but none contain "SQL", those post columns will be NULL in the result for that user.

id username id user_id title
1 shinobi 5 1 SQL - structured query language
2 javateamsix 4 2 MySQL is up!
2 javateamsix 2 2 Getting started with SQL
3 tony 3 3 SQL is awesome
4 greisi NULL NULL NULL

Notice the difference: the ON clause condition includes user 'greisi' (with NULL post data) because the LEFT JOIN guarantees all rows from the left table (users) are included. The post records for other users only appear if they match the ON condition.

Equivalence of RIGHT and LEFT JOINs

RIGHT JOIN and LEFT JOIN operations are functionally interchangeable. Any query written with a RIGHT JOIN can be rewritten as a LEFT JOIN by simply swapping the order of the tables in the FROM clause.

This LEFT JOIN:

SELECT users.*, posts.*
FROM posts
LEFT JOIN users
ON posts.user_id = users .id;

Is equivalent to this RIGHT JOIN:

SELECT users.*, posts.*
FROM users
RIGHT JOIN posts
ON posts.user_id = users.id;

Both queries produce the same result set, illustrating that the direction of the "outer" join is determined by the order of the tables and the LEFT or RIGHT keyword.

Conclusion

Mastering JOIN operations is foundational for effective data retrieval and manipulation in relational databases. Each join type serves a distinct purpose in combining data based on defined relationships between tables. Understanding their behavior, particularly concerning matching records and the treatment of non-matching rows (NULL values), is critical.

Experimentation is key to solidifying comprehension. Practice writing queries with each JOIN type, varying the join conditions and observing the resulting data sets.

For comprehensive technical specifications, refer to the official database documentation for your specific SQL dialect.