I agree with all that you say.
To me there seems to be 3 classes of databases (systems).
1) the real time operational systems
These systems only store state data for a system at the current point n time.
2) Data warehouse systems,
These system only hold old data and are used to mine for trends, etc..
3) Effective dated system.
Due to auditing/validation requirements a lot of our systems today fall into this
category. These systems need to keep both "history" and "current" data.
There are two basic class for these types of system. Either the history
data is kept with the current data or the history data is kept in a separate
set of tables/files (e.g. a "log" table/file - either application or DBMS provided).
A design consideration is to keep the history data with the current data
in the same table or in a separate set of tables.
Keeping history and current data in physically separate elements (either
application or DBMS system provided) results in the need for processes
to inventory, track, verify, the two sets of data. to prove that the system
data is valid.
Keeping the two classes of data in the same table data eliminates the need to prove that
external history data is valid for the current separate table data.
The separate table/file method is the easiest to implement from a programming
view point. Using this implementation moves the design outside of the database.
Keeping the history and current data in the same physical table is a major
change in the design of the environment. In this situation, a "row" is replaced
by a "version set" of rows, of which only one is the current value.
One method to achieve this "version set" is use a "change date", column/field.
Each row in the version set has a different "change date" value. The date
value can be used to order the individual rows in history order.
At least one system that I know, has used this concept to its fullest extent.
This system uses only the "change date" to identify its data. The change
date value is allowed to range over all possible values (past, current, and future).
It is/was used n a HR system, where the HR staff would schedule raises, etc.
before they actually occur. Then when the future time becomes current, the
raise automatically occurs.
This idea carried out across an entire system multiplies the complexity of
of the SQL statements, and results (in some situations involving mutiple
table joins) in long execution times, and very large quantities of data
to be processed for a relatively small result set.
If "future" events can eliminated from the version set, then things become
easier.
By eliminating FUTURE events you need a mechanism to separate
the history rows from the current rows within the version set.
For this you suggest using a "delete" flag.
In this environment, the design consideration then moves to the delete flag
and its effect on the "primary key"; its use in indexes and relationships.
The initial use of a soft "delete" flag, only used "two" value "deleted" or
not "deleted".
To continue to use this type of delete flag, requires that a new primary key
be obtained for each new version of the row in a version set. You cannot
have two rows in the same table with the same primary key.
I suggest changing a "soft delete" flag to a "soft active" flag. The use
of the "flag" is to identify the active row out of all of the rows in the
version set. Each row in all of the versions sets that make up the
table has the same value. And this same value applies to all of
the tables that make up the application system.
When version of the row becomes apart of the history, its active
flag is set to a value other than the "active" value, and a new version
is inserted with active status value The value of the previous active
is set to different from all of the other values for the row in the
version set. There are many ways to choose how these other
non-active values are determined. (time, sequence number, etc..).
The result of this design is that the primary key becomes a union
(concatenation), of the original primary key and the "active" flag.
Implementing this type of design becomes simple. The tables are
accessed through a view that has a "where clause" contains a
condition on active flag column for the unique "active value".
Concerning stored procedures, use views, triggers and stored procedures to effectively implement the "effective dated" system so that the end user sees the view of
the data as they need. The predominant view used in most systems will be the "current
point in time" view. Use these tools to implement the effective data hiding.
Hope this helps someone who has to deal with these types of design choices.
Use this if it is helpful.
Discussion on:
Message 19 of 41

































