Friday, March 9, 2012

Exact relational division

Joe Celko's article at www.dbazine.com/ofinterest/oi-articles/celko1 talks
about relational division and exact relational division.
There's more background in the article, but the second example, for exact
relational division, has a hangar with some airplanes, and a bunch of
pilots, and you want to know which pilots can fly all the planes in the
hangar and no other planes.
The article says this:
-- start of quote from article
SELECT PS1.pilot
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);
This says that a pilot must have the same number of certificates as there
planes in the hangar and these certificates all match to a plane in the
hangar, not something else. The "something else" is shown by a created NULL
from the LEFT OUTER JOIN.
Please do not make the mistake of trying to reduce the HAVING clause with a
little algebra to:
HAVING COUNT(PS1.plane) = COUNT(H1.plane)
because it does not work; it will tell you that the hangar has (n) planes
in it and the pilot is certified for (n) planes, but not that those two
sets of planes are equal to each other.
-- end of quote from article
I didn't quite understand this until I realized that "Count(H1.Plane)" is
the number of planes in the hangar that the pilot can fly, while "SELECT
COUNT(plane) FROM Hangar" is the number of planes in the hangar.
I wanted to rant about why so many SQL'ers insist on using table aliases
for short, one-word table names; the query seems to be easier to read
without the table aliases, as follows:
SELECT PilotSkills.pilot
FROM PilotSkills
LEFT OUTER JOIN Hangar
ON PilotSkills.plane = Hangar.plane
GROUP BY PilotSkills.pilot
HAVING COUNT(PilotSkills.plane) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(Hangar.plane) = (SELECT COUNT(plane) FROM Hangar);
Putting in table aliases (for most queries) seems to confuse the situation
and make the query harder to read, since we have to make mental
substitutions, plus there are now four table names (the original two, plus
the two aliases).
But without the aliases, it's maybe little trickier to see that the left
side of the last clause, "Count(Hangar.plane)", is not the count of planes
in the hangar, it's still the count of planes in the hangar that the pilot
(each pilot) can fly. It's the count of the planes that remain after the
join -- the Having is applied after the join and uses the (grouped) results
of the join.
It took me a little while to understand all this, and looking at the
results of the left outer join without the "group by" or "having" clases
helped a lot. Interesting.
David WalkerI personally believe that when I spend a lot of time trying to figure
something out, the value of what I discover doing that by far supersedes the
value of knowledge acquired quickly.
I've read that same article a few ws ago and let me say this: the value
went through the roof. :)
ML|||Putting Aliases on tables once seemed a cosmetic operation to most SQL
practitioners (except for self-joins, for those Aliases were always
necessary.)
In the newer SQL tables alias are necessary in a number of places. Derived
tables, PIVOT, CROSS APPLY...
BTW, the third edition of Celko's Smarties has just come out.
Cheers, Mark Bosley
"DWalker" <none@.none.com> wrote in message
news:eRKbaIWtFHA.3848@.TK2MSFTNGP10.phx.gbl...
> Joe Celko's article at www.dbazine.com/ofinterest/oi-articles/celko1 talks
> about relational division and exact relational division.
> There's more background in the article, but the second example, for exact
> relational division, has a hangar with some airplanes, and a bunch of
> pilots, and you want to know which pilots can fly all the planes in the
> hangar and no other planes.
> The article says this:
> -- start of quote from article
> SELECT PS1.pilot
> FROM PilotSkills AS PS1
> LEFT OUTER JOIN
> Hangar AS H1
> ON PS1.plane = H1.plane
> GROUP BY PS1.pilot
> HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar)
> AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);
> This says that a pilot must have the same number of certificates as there
> planes in the hangar and these certificates all match to a plane in the
> hangar, not something else. The "something else" is shown by a created
> NULL
> from the LEFT OUTER JOIN.
> Please do not make the mistake of trying to reduce the HAVING clause with
> a
> little algebra to:
> HAVING COUNT(PS1.plane) = COUNT(H1.plane)
> because it does not work; it will tell you that the hangar has (n) planes
> in it and the pilot is certified for (n) planes, but not that those two
> sets of planes are equal to each other.
> -- end of quote from article
> I didn't quite understand this until I realized that "Count(H1.Plane)" is
> the number of planes in the hangar that the pilot can fly, while "SELECT
> COUNT(plane) FROM Hangar" is the number of planes in the hangar.
> I wanted to rant about why so many SQL'ers insist on using table aliases
> for short, one-word table names; the query seems to be easier to read
> without the table aliases, as follows:
> SELECT PilotSkills.pilot
> FROM PilotSkills
> LEFT OUTER JOIN Hangar
> ON PilotSkills.plane = Hangar.plane
> GROUP BY PilotSkills.pilot
> HAVING COUNT(PilotSkills.plane) = (SELECT COUNT(plane) FROM Hangar)
> AND COUNT(Hangar.plane) = (SELECT COUNT(plane) FROM Hangar);
> Putting in table aliases (for most queries) seems to confuse the situation
> and make the query harder to read, since we have to make mental
> substitutions, plus there are now four table names (the original two, plus
> the two aliases).
> But without the aliases, it's maybe little trickier to see that the left
> side of the last clause, "Count(Hangar.plane)", is not the count of planes
> in the hangar, it's still the count of planes in the hangar that the pilot
> (each pilot) can fly. It's the count of the planes that remain after the
> join -- the Having is applied after the join and uses the (grouped)
> results
> of the join.
> It took me a little while to understand all this, and looking at the
> results of the left outer join without the "group by" or "having" clases
> helped a lot. Interesting.
> David Walker|||"examnotes" <ML@.discussions.microsoft.com> wrote in
news:761C03F8-D67B-4C29-82AD-9B1EE3D365CC@.microsoft.com:

> I personally believe that when I spend a lot of time trying to figure
> something out, the value of what I discover doing that by far
> supersedes the value of knowledge acquired quickly.
> I've read that same article a few ws ago and let me say this: the
> value went through the roof. :)
>
> ML
Yep!
David|||"Mark Bosley" <mark.nspam@.lightcc.com> wrote in
news:#bCQwpXtFHA.3500@.TK2MSFTNGP09.phx.gbl:

> Putting Aliases on tables once seemed a cosmetic operation to most SQL
> practitioners (except for self-joins, for those Aliases were always
> necessary.)
> In the newer SQL tables alias are necessary in a number of places.
> Derived tables, PIVOT, CROSS APPLY...
> BTW, the third edition of Celko's Smarties has just come out.
> Cheers, Mark Bosley
>
Yes, aliases are necessary for self-joins and derived tables especially,
but when they are not necessary they just confuse me. The cosmetics of
"titles as t" escapes me completely; "titles" is so much more meaningful
than "t". And, the number of rememberable items in my brain can overflow!
:-)
David|||"DWalker" <none@.none.com> wrote in message
news:Odj23wXtFHA.1440@.TK2MSFTNGP10.phx.gbl...
> "Mark Bosley" <mark.nspam@.lightcc.com> wrote in
> news:#bCQwpXtFHA.3500@.TK2MSFTNGP09.phx.gbl:
> Yes, aliases are necessary for self-joins and derived tables especially,
> but when they are not necessary they just confuse me. The cosmetics of
> "titles as t" escapes me completely; "titles" is so much more meaningful
> than "t". And, the number of rememberable items in my brain can overflow!
> :-)
> David
ERROR MESSAGE
Brain Memory Overflow
Break for the wend and reboot Monday morning|||A lot of the time, they jsut make the names shorter. But when you
start nesting SELECTs, you can get some surprises about which scope a
column name falls within if you leave them off.
And while we are plugging SFS III, I would like to mention that I also
have several other fine SQL books whose royalties allow me to eat and
live indoors.|||"--CELKO--" <jcelko212@.earthlink.net> wrote in
news:1126298168.472203.189180@.g43g2000cwa.googlegroups.com:

> A lot of the time, they jsut make the names shorter. But when you
> start nesting SELECTs, you can get some surprises about which scope a
> column name falls within if you leave them off.
> And while we are plugging SFS III, I would like to mention that I also
> have several other fine SQL books whose royalties allow me to eat and
> live indoors.
>
Well, "t" is shorter than "titles", but less meaningful! Short is not
always an advantage. I can type "titles" pretty fast, and read it even
faster.
I think I have one of your books at home. Here at the office, I have
Henderson's Guru's Guide to Transact-SQL, which you wrote the foreward
to, and one by Ben-Gan & Moreau, and one by Kalen Delaney, and so on...
:-)
David

No comments:

Post a Comment