TLDR: To count related records (and get a zero when there are none) use a LEFT JOIN. To count related records that match a certain criteria (and get a zero when there are none) use a CASE statement in the SELECT fields.
I had a table of school terms and a table of enrolments. Here’s a super simplified example using an INNER JOIN:
SELECT term.year, enrol.id
FROM term INNER JOIN enrol on term.term_id = enrol.term_id
| year | id |
|---|---|
| … | … |
| 2021 | 69423 |
| 2021 | 694170 |
| 2023 | 69423 |
| 2023 | 69584 |
| 2024 | 69456 |
To count the enrolments per year was a simple matter of adding a COUNT, changing it to a LEFT JOIN and adding the GROUP BY:
SELECT term.year, COUNT(enrol.id)
FROM term LEFT JOIN enrol on term.term_id = enrol.term_id
GROUP BY term.year
ORDER BY term.year DESC
| year | |
|---|---|
| 2024 | 1 |
| 2023 | 2 |
| 2022 | 0 |
| 2021 | 2 |
| … | … |
Note the count of 0 in 2022, that’s what I want! But then when I tried to add a WHERE clause to only get the enrolments where there was a possible flaw in the data, I stopped getting a zero count for the missing years:
SELECT term.year, COUNT(enrol.id)
FROM term LEFT JOIN enrol on term.term_id = enrol.term_id
WHERE enrol.raw_mark <> enrol.final_mark
GROUP BY term.year
ORDER BY term.year DESC
| year | |
|---|---|
| 2023 | 1 |
If I understand correctly, the WHERE clause is removing the rows (including the rows that only contain a term.year) before they get counted.
The solution (thanks to my DB guru friend TC) is to move the logic from the WHERE clause up into the SELECT. Also, now that there are blanks instead of nulls in the second column, we can go back to a regular INNER JOIN and still get the zero counts:
SELECT term.year,
COUNT(CASE WHEN enrol.raw_mark <> enrol.final_mark THEN 1 END)
FROM term INNER JOIN enrol on term.term_id = enrol.term_id
GROUP BY term.year
ORDER BY term.year DESC
| year | |
|---|---|
| 2024 | 0 |
| 2023 | 1 |
| 2022 | 0 |
| 2021 | 0 |
| … | … |