fREWdiculous!
13 Jul
So a couple perl giants I have already heard of responded to my previous post regarding NULL’s in the database.
NULL means “this piece of information exists but is unknown to us”. Follow this simple rule when deciding whether to allow things to be NULL or not and you’re basically sorted – and the standard SQL logic will suddenly work with you rather than against.
Until you do a LEFT JOIN and discover that it uses NULLs for “doesn’t exist” in there … but anyway …
–mst
I’ve a blog entry about this. Basically, NULLs can lead to queries which are logically impossible to get correct answers for. They’re rare, but I’ve hit them on larger queries and they’re a nightmare to debug.
There’s also the problem of what a NULL is supposed to represent. Is the data unknown? Is it not applicable? Is it something else entirely? I often see NULL values in a databsae where people have tried to overload the meaning of NULL and it’s done on an ad hoc basis. For example, consider a “salary” field in a database. Why would it be NULL? Are they unemployed? Are they a volunteer? Do you simply not know it? Are they hourly and therefore not salaried? A NULL value could potenitally have four different meanings.
–Ovid
I personally think that they both make good points. I lean the direction of mst, which is that NULL’s are ok, but all they mean is that you don’t know that piece of information. Treating them as more information than that is probably a bad idea. Normally I’d just make a bit field to represent other information about the field, like why it’s NULL or something like that. In general fields should only be NULL when they are optional, which should probably be rare.
Although, Ovid links to an article (from the article he wrote) that advocates the removal of *all* NULL’s which I think is relatively extreme. But it resonates with the coder inside of me. The same coder who thinks it’s a good idea to make a new class for everything and do everything with method dispatching instead of if-else’s. I’d like to point out that this part of me has never won out against pragmatism, but I’m sure it will happen someday.
Anyway, I present to you two options from the luminaries above. I find both of the options very attractive and I will probably take mst’s route in general, but I think that the link Ovid gave is surprisingly compelling. It would make the data very consistent, but the cost would be lots of JOIN’s, tables, and classes representing those tables.
The answer may be some place in the middle; I don’t know. One way or the other, ponder the path of your feet; then all your ways will be sure. No one ever got to be a good programmer by blindly following some random blogger.
3 Responses for "NULL Conclusion"
Didn’t get to your previous post before you wrote this one, but I agree with your assessment here: NULLs should be pretty rare anyway, as values are usually required.
The problem comes down to what NULLs actually mean, as Ovid says. Technically, the mean “the value is unknown.” This is distinct from, “the value is known, and it’s a non-value.” From a semantic point of view, how your application treats these two interpretations may be the same, in which case you have no problems with NULLs. I find that such is usually the case.
In some situations, however, the distinction becomes important, and then you need to do something about it. The nice thing about an empty string in a text column is that it encompasses the distinction beautifully (an empty string is a known value), while the truthiness is the same (in Perl) for both an empty string and a NULL. I only wish that SQL had a way to represent the distinction in non-text types. But long term, as folks moved toward the sixth normal form, NULLs and their inherent ambiguity will be, in practice, eliminated.
—Theory
It’s worth pointing out that the link I presented shows a technique which has some significant drawbacks. First, decomposition into sixth normal form can be excellent (in the “correctness” sense), but in practice, it’s not going to be performant for larger systems (yeah, yeah, I know about premature whatever
and it’s certainly going to impose a significant cognitive load on the programmer as they have to do a heck of a lot more joins unless a clean view is layed over the top.
Second, the technique benefits from distributed foreign keys, something I’m not aware of any database supporting. I suppose you could write triggers or stored procedures to handle this, but if you wanted them to be performant over large tables, I suspect you’d have to create and manage your own indices. (That being said, I’d kill for distributed FKs in our work app).
So I’d suggest that you never, never use NULLs. Until you have to. Then just use ‘em sparingly.
What Ovid said. Unfortunately, NULLs are a necessary evil given todays’s architectures. Do what you can to avoid them, and be very specific in your interpretation of them when you have to have them.
Fortunately, a large percentage of columns are text types, in which case you have the beauty of the empty string so that you can avoid NULLs in pretty much any text column.
—Theory
Leave a reply