One of the major problems identified above is data
duplication. To be able to implement a system for reducing this,
we must be able to uniquely identify a record (or row, as it is
known in client/server parlance). We do this using a Primary Key
(PK). A Primary Key is defined as "a column, or set of columns,
whose values uniquely identify every row in the table".
To continue our example on the Palm, we may make a primary key
out of the FirstName and LastName columns. By doing so, we would
be prevented from adding a new record bearing the same first and
last name as an existing user and we could always guarantee finding
the correct record by using a combination of the first name and
last name. This would allow us to be able to change the address on
both the Palm and the desktop, and have HotSync be able to handle
that update correctly. That is, update the record without
duplicating it.
In practice, creating a key based on multiple columns
(especially string based columns) produces a result that is slow
and inflexible. The speed is diminished by the fact that PKs are
indexed (to allow for faster searching). This index is larger when
using strings and the computer less easily manipulates strings.
The result is inflexible in that it is wholly possible that you
may have two friends who do bear the same first and last names.
Recall the definition of a PK. We simply need to be able to
uniquely identify a record. The most efficient way to do this is to
use a single integer value, as this is both fast and space
efficient. The speed comes from the fact that an integer is based
on the operating system bit size (except for Java applications) and
is handled natively by the computers CPU. The space efficiency
comes from the fact that only 4 bytes are required to store the
value.
Another reason that makes the integer effective is that it can
store a value between 0 and 4 billion (and change). It is
extremely unlikely that you will ever exceed this number of rows
in one table.
A final reason to use PKs is based on the requirements for a PK
pool (covered later). For now, suffice it to say that Asta SkySync
requires that any table intended to be used for synchronization or
replication MUST have an integer based primary key.