24 November 2007

Good Database Design Part 6: Surrogate Keys

I routinely use surrogate keys. Migrating one integer value instead of a wider compound key has numerous benefits. It provides nice consistency across the physical model, by and large saves more space than it costs and reduces I/O when compared to migrating compound keys; especially in a well-normalized model. Additionally, they simplify understanding of a model and query joins. Yet another benefit of surrogate keys is that they trivialize identification of new rows from existing rows (covered in Part 7: Stored Procedures).

Most DBMS's provide a simplified way of accomplishing this (as well they should since it made its way into the SQL:2003 standard); MySQL uses an AUTO_INCREMENT property, Oracle has sequences (less simple than others because sequences are not associated with a field, so often automated using triggers), PostgreSQL uses a SERIAL datatype, where DB2, SQL Server and Sybase use an IDENTITY property.

Oracle and Microsoft often show examples of this ID column methodology. The important thing to remember is that this does not excuse you from defining your logical keys. They may not need to be physically enforced, but they should at least be documented. It is important to realize that your surrogate keys are only at the physically level. Logically, they do not exist. Logically, you are still responsible for primary and candidate keys.

The type is a part of SQL:2003 which reflects the community’s general acceptance of surrogate keys. Still dodgy ground for some DBAs, most recognize the value of this approach.

I already touched naming conventions, but consider that renaming columns is by and large unnecessary (there are exceptions) and ‘id’ is often a reserved word. Using the name format of ‘TableName_UID’ (or ‘TableNameID’) shows where surrogate keys are from and easily shows which tables are related.