» SQL Formatting
With all the abstraction layers between code & database, and now with my recent CakePHP adventures: The need for writing SQL statements deminishes. But over the years I've developed my own way of formatting them. And before I'll have to wave queries goodbye forever ;) I just want to put it out there.
Code spends more time being read then being written. I think naturally this is true for queries as well. So it might help if we teach ourselves some guidelines as how to nicely format them.
We can use a couple of instruments inside SQL to format our queries.
- SQL whitespace is ignored by MySQL's parser, so let's use it to make our lives easier, right?
- Dummy conditions have no effect in SQL statements. Things like: WHERE 1=1.
Ok let's just begin with some of my SQL layout habits.
All clauses get their newline
This will provide a clear seperation of different parts of the query, making it more easy to read & comprehend. And it will enable you to better maintain the query cause you can jump between different clauses bij just pressing up & down.
Before
SELECT * FROM `book` ORDER BY `title`
After
SELECT * FROM `book` ORDER BY `title`
N.b.: Whitespace is ignored by MySQL's parser, but not by MySQL's query cache on versions <5 , thanks to foobar for pointing that out. So if you want query cache to pay off.. Pick one format & stick with it ;)
Accompanied fieldnames get their own newline
This will allow you to very easily (or even dynamically) add or remove certain fields from the query.
Before
SELECT `id`, `title`, `rating` FROM `book` ORDER BY `title`
After
SELECT `id`, `title`, `rating` FROM `book` ORDER BY `title`
Lonely fieldnames stay on the same line
This will keep the query more or less compact and avoid that even the simplest query takes up 10 lines.
Before
SELECT `id`, `title` FROM `book` ORDER BY `title`
After
SELECT `id`, `title` FROM `book` ORDER BY `title`
Where 1 = 1
So now that we've played with whitespace a bit, there are other things that have no effect in SQL as well. Like 1=1. This short expression can really help us define conditions in a more uniform way.
Let's look at an example.
Before
SELECT `id` FROM `book` WHERE `published` = 'yes' AND `rating` > 5
Hm, too bad, published = 'yes' is formatted differently from rating > 5 because it doens't have the AND word. We'd have to account for that every time we change the conditions, or if we were generating this query automatically: 'Always prefix with AND... UNLESS it's the first condition'. Bad for layout. Bad for automation.
After
SELECT `id` FROM `book` WHERE 1 = 1 AND `published` = 'yes' AND `rating` > 5
One other big advantage of writing your conditions in such a uniform format, is that it becomes really easy to temporarily turn off conditions with MySQL Comments:
SELECT `id` FROM `book` WHERE 1 = 1 -- AND `published` = 'yes' AND `rating` > 5
..without breaking SQL syntax. Cause remember that if I had done this with the before query, I would have gotten a syntax error.
You can also make them conditional in your programming language:
$res = $db->queryS(" SELECT `id` FROM `book` WHERE 1 = 1 " . ($unPublished ? "" : "AND `published` = 'yes'") . " AND `rating` > 5 ");
Needles to say: dynamically built queries (using while or for loops), are more easily built because you don't have to worry about the syntax. It's the same for every element.
The performance hit of 1 = 1
While this dummy syntax provides developers with some great comfort, of course we have to make sure this addition doesn't come at a price. MySQL guru Erwin Bleeker benchmarked on multiple occasions with 1 bilion queries (no cache), to find that the results only differed by one hundreds of a second on average. We even had a result in which the 1 = 1 was faster :)
This is how he benchmarked:
mysql> select benchmark(1000000000, (select SQL_NO_CACHE 1 from employees WHERE 1=1 limit 1));
+-----------------------------------------------------------------------------------+
| benchmark(1000000000, (select SQL_NO_CACHE 1 from medewerkers WHERE 1=1 limit 1)) |
+-----------------------------------------------------------------------------------+
| 0 |
+-----------------------------------------------------------------------------------+
1 row in set (24.84 sec)
mysql> select benchmark(1000000000, (select SQL_NO_CACHE 1 from employees limit 1));
+-------------------------------------------------------------------------+
| benchmark(1000000000, (select SQL_NO_CACHE 1 from medewerkers limit 1)) |
+-------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------+
1 row in set (24.83 sec)
Where 1 != 1
As you may have guessed, our dummy condition also works for OR queries. Just negate the dummy condition: 1 != 1, look:
SELECT `id` FROM `book` WHERE 1 != 1 OR `published` = 'yes' OR `rating` > 5
Joins
Let me just show you how I go about this:
SELECT `author`.`id`, `author`.`name`, `author`.`birthday`, COUNT(`book`.`id`) AS book_pub_cnt FROM `author` LEFT JOIN `book` ON (1 = 1 AND `book`.`author_id` = `author`.`id` AND `book`.`published` = 'yes' ) WHERE 1 = 1 AND `author`.`alive` = 'yes' GROUP BY `author`.`id` ORDER BY `author`.`name`
Furthermore
You may have noticed that I:
Use backquotes to enclose all database entities
That's just good habit, this way if you ever have an ambiguous fieldname ('active', or 'status', could be interpretted as statements), your database will know that you mean the fieldname or table, and not the statement.
Use single quotes to enclose strings
So you can enclose the entire query in double quotes, and be able to use variables from PHP without concatenation. On the other hand: You should really use prepared statements, and in my eyes concatination is better than let PHP automatically substitute your vars. But hey, if you need to choose anyway, might as well be single quotes.
Use spaces instead of tabs
I do this in PHP & other languages as well. It allows for consistent layout in all possible editors & views. There's an interesting post about it here.
Conventions in General
Sometimes conventions rely more on taste than reason. In that case look at it this way:
It's irrelevant if people drive on the right or left side of the road. As long as they all do the same.
Now I'm not implying there will be fatal accidents if your fellow programmers use different styles. But it will definitely contribute to the success for your project if you are able to agree on some standards.
Stay up to date
You can track my blog
articles and
comments. You may also find my
bookmarks interesting. Or
Follow me on Twitter
Like this article?
|
Then Digg it! Or use another bookmark button below to show your support & help me spread the word. |
tags: sql, coding standards, mysql, database
category: Programming
read: 7,729 times






tagcloud
#5. tnt2br on 25 June 2009
#4. Kevin on 03 April 2009
#3. feedreader on 31 March 2009
Suggestions for future articles:
more about joins
more about transactions
... [more] normalizing an old projects tables and updating your queries with joins and transactions for working with the normalized tables
converting complex join queries to a popular database abstraction class (pdo? mysqli? other?)
if any of that was worded incorrectly.... I'm a noob!
#2. Kevin on 04 March 2009
#1. foobar on 04 March 2009
wrong for mysql <5:
http://www.mysqlperformanceblog.com/2008/03/20/mysql-query-cache-whitespace-and-comments/
... [more]
"Dummy conditions are also ignored in SQL statements. Things like WHERE 1=1."
you should differentiate between "ignored" and "have no effect"... 1=1 is parsed (like all the whitescapes) and executed!