» SQL Formatting

On twitter

With all the abstraction layers between code & database, and now with my recent CakePHP adventures: The need for writing SQL statements deminishes. But over the years I've developed my own way of formatting them. And before I'll have to wave queries goodbye forever ;) I just want to put it out there.

Code spends more time being read then being written. I think naturally this is true for queries as well. So it might help if we teach ourselves some guidelines as how to nicely format them.

We can use a couple of instruments inside SQL to format our queries.

  • SQL whitespace is ignored by MySQL's parser, so let's use it to make our lives easier, right?
  • Dummy conditions have no effect in SQL statements. Things like: WHERE 1=1.

Ok let's just begin with some of my SQL layout habits.

All clauses get their newline

This will provide a clear seperation of different parts of the query, making it more easy to read & comprehend. And it will enable you to better maintain the query cause you can jump between different clauses bij just pressing up & down. 

Before

SELECT * FROM `book` ORDER BY `title`

After

SELECT *
FROM `book`
ORDER BY `title`

N.b.: Whitespace is ignored by MySQL's parser, but not by MySQL's query cache on versions <5 , thanks to foobar for pointing that out. So if you want query cache to pay off.. Pick one format & stick with it ;)

Accompanied fieldnames get their own newline

This will allow you to very easily (or even dynamically) add or remove certain fields from the query.

Before

SELECT `id`, `title`, `rating`
FROM `book`
ORDER BY `title`

After

SELECT
    `id`,
    `title`,
    `rating`
FROM `book`
ORDER BY `title`

Lonely fieldnames stay on the same line

This will keep the query more or less compact and avoid that even the simplest query takes up 10 lines.

Before

SELECT
    `id`,
    `title`
FROM 
    `book`
ORDER BY 
    `title`

After

SELECT 
    `id`,
    `title`
FROM `book`
ORDER BY `title`

Where 1 = 1

So now that we've played with whitespace a bit, there are other things that have no effect in SQL as well. Like 1=1. This short expression can really help us define conditions in a more uniform way.

Let's look at an example.

Before

SELECT `id`
FROM `book`
WHERE 
    `published` = 'yes'
    AND `rating` > 5

Hm, too bad, published = 'yes' is formatted differently from rating > 5 because it doens't have the AND word. We'd have to account for that every time we change the conditions, or if we were generating this query automatically: 'Always prefix with AND... UNLESS it's the first condition'. Bad for layout. Bad for automation.

After

SELECT `id`
FROM `book`
WHERE 1 = 1
    AND `published` = 'yes'
    AND `rating` > 5

One other big advantage of writing your conditions in such a uniform format, is that it becomes really easy to temporarily turn off conditions with MySQL Comments:

SELECT `id`
FROM `book`
WHERE 1 = 1
    -- AND `published` = 'yes'
    AND `rating` > 5

..without breaking SQL syntax. Cause remember that if I had done this with the before query, I would have gotten a syntax error.

You can also make them conditional in your programming language:

$res = $db->queryS("
    SELECT `id`
    FROM `book`
    WHERE 1 = 1
        " . ($unPublished ? "" : "AND `published` = 'yes'") . "
        AND `rating` > 5
");

Needles to say: dynamically built queries (using while or for loops), are more easily built because you don't have to worry about the syntax. It's the same for every element.

The performance hit of 1 = 1

While this dummy syntax provides developers with some great comfort, of course we have to make sure this addition doesn't come at a price. MySQL guru Erwin Bleeker benchmarked on multiple occasions with 1 bilion queries (no cache), to find that the results only differed by one hundreds of a second on average. We even had a result in which the 1 = 1 was faster :)

This is how he benchmarked:

mysql> select benchmark(1000000000, (select SQL_NO_CACHE 1 from employees WHERE 1=1 limit 1));
+-----------------------------------------------------------------------------------+
| benchmark(1000000000, (select SQL_NO_CACHE 1 from medewerkers WHERE 1=1 limit 1)) |
+-----------------------------------------------------------------------------------+
| 0 |
+-----------------------------------------------------------------------------------+
1 row in set (24.84 sec)

mysql> select benchmark(1000000000, (select SQL_NO_CACHE 1 from employees limit 1));
+-------------------------------------------------------------------------+
| benchmark(1000000000, (select SQL_NO_CACHE 1 from medewerkers limit 1)) |
+-------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------+
1 row in set (24.83 sec)

Where 1 != 1

As you may have guessed, our dummy condition also works for OR queries. Just negate the dummy condition: 1 != 1, look:

SELECT `id`
FROM `book`
WHERE 1 != 1
    OR `published` = 'yes'
    OR `rating` > 5

Joins

Let me just show you how I go about this:

SELECT
    `author`.`id`,
    `author`.`name`,
    `author`.`birthday`,
    COUNT(`book`.`id`) AS book_pub_cnt
FROM `author`
LEFT JOIN `book` ON (1 = 1 
    AND `book`.`author_id` = `author`.`id`
    AND `book`.`published` = 'yes'
) 
WHERE 1 = 1
    AND `author`.`alive` = 'yes'
GROUP BY `author`.`id`
ORDER BY `author`.`name`

Furthermore

You may have noticed that I:

Use backquotes to enclose all database entities

That's just good habit, this way if you ever have an ambiguous fieldname ('active', or 'status', could be interpretted as statements), your database will know that you mean the fieldname or table, and not the statement.

Use single quotes to enclose strings

So you can enclose the entire query in double quotes, and be able to use variables from PHP without concatenation. On the other hand: You should really use prepared statements, and in my eyes concatination is better than let PHP automatically substitute your vars. But hey, if you need to choose anyway, might as well be single quotes.

Use spaces instead of tabs

I do this in PHP & other languages as well. It allows for consistent layout in all possible editors & views. There's an interesting post about it here.

Conventions in General

Sometimes conventions rely more on taste than reason. In that case look at it this way:

It's irrelevant if people drive on the right or left side of the road. As long as they all do the same.

Now I'm not implying there will be fatal accidents if your fellow programmers use different styles. But it will definitely contribute to the success for your project if you are able to agree on some standards.

Stay up to date

You can track my blog rss articles and rss comments. You may also find my rss bookmarks interesting. Or twitter Follow me on Twitter


Like this Article?

Your money is no good here, but
you can boost morale by spreading the word! : )


tags: sql, coding standards, mysql, database
category: Programming
read: 10,676 times

Add comment

(required, shown)(required, not shown)for syntax highlighting

[CODE="Javascript"]
your_code_here();
[/CODE]

Replace "Javascript"
with "php", "text", etc.
code (to make sure you are not a spammer)

 Track replies: rss feed comments feed

Comments

#5. tnt2br on 25 June 2009

Gravatar.com: tnt2brVlw. Very good article! Thanks !

#4. Kevin on 03 April 2009

Twitter.com: kvz@ feedreader: Thanks I'll think about it!

#3. feedreader on 31 March 2009

Gravatar.com: feedreaderGreat article. I'll definitely consider this going forward (except the spaces over tabs).

Suggestions for future articles:
more about joins
more about transactions
... [more] normalizing an old projects tables and updating your queries with joins and transactions for working with the normalized tables
converting complex join queries to a popular database abstraction class (pdo? mysqli? other?)

if any of that was worded incorrectly.... I'm a noob!

#2. Kevin on 04 March 2009

Twitter.com: kvz@ foobar: Thanks for pointing that out. I've corrected the article.

#1. foobar on 04 March 2009

Gravatar.com: foobar"SQL whitespace is generally ignored in SQL statements and queries, so let's use it to make our lives easier, right?"

wrong for mysql <5:

http://www.mysqlperformanceblog.com/2008/03/20/mysql-query-cache-whitespace-and-comments/
... [more]
"Dummy conditions are also ignored in SQL statements. Things like WHERE 1=1."

you should differentiate between "ignored" and "have no effect"... 1=1 is parsed (like all the whitescapes) and executed!