Data Centers

Database management: Tuning with Oracle replication

Oracle replication is often necessary for large enterprises with dispersed processing requirements. However, replication requires a thorough understanding of the complexities such an environment introduces to a database system.

Oracle replication has been around for quite some time and has become a mature, feature-rich environment to satisfy widely dispersed processing requirements. Replication was first introduced as a way to allow Oracle tables or subsets of tables to be available locally on widely separated database servers. This was accomplished via the use of snapshots (point-in-time copies) of required tables that were copied from a master server to one or more remote slave servers. The snapshot technique was particularly effective for relatively static tables that did not require frequent refresh operations to be kept in sync with the master tables. Read-only applications benefited from the use of snapshots since wide-area network transmission time was eliminated, significantly improving performance.

Snapshots are now more commonly known as materialized views, and while the creation of remote materialized views of master tables is still a common use of replication, the technology has matured significantly, supporting a much broader spectrum of database objects. I'll discuss the snapshot method and then explore more advanced techniques.

A number of factors influence the decisions concerning Oracle table replication. Table size and the volatility of the table data are of particular importance (Figure A). Smaller static tables are ideal candidates for snapshot replication for remote read-only applications, whereas larger, dynamic master tables with many inserts, updates, and deletes would require frequent refreshes consuming a large amount of system and network resources. Snapshots are not a good solution for these large, dynamic master tables, so more advanced techniques (discussed later) should be used.

Figure A
Replication alternatives based on table size and volatility

When considering system and network performance, we need to be concerned with the size of the snapshots we create and the frequency they are created or refreshed. As indicated in Figure A, we can re-create a snapshot or perform a full refresh whenever required; we can schedule periodic refreshes; or we can use a trigger to refresh changes from a master table to the slave snapshots. Use the following general rules to determine which methodology is most appropriate.

Small static tables
When a table is small and contains relatively static data, it's often simpler to drop and then re-create the snapshot than it is to use the REFRESH COMPLETE option. A simple script invoked via cron could be created to perform the drop and re-creation at predetermined intervals.

An alternative to creating a snapshot is to use distributed SQL to create the replicated table directly on the slave database. Notice how a database link is utilized in the following CTAS example to create a subset of the master emp table from the headquarters database:
   department = 'NC';

Small dynamic tables
For small tables, we could invoke an update trigger to perform a refresh. However, since the table is small, the snapshot log would probably not contain very many changes. It is therefore entirely feasible that propagating the changes to the snapshot at more frequent intervals would suit our needs. Here’s an example of a REFRESH FAST specification that propagates every hour:
    NEXT SYSDATE + 1/24
   cust_nbr, cust_name
   department = 'NC';

Large static tables
For larger tables with static data content, we can significantly increase the refresh interval. The following example performs a REFRESH COMPLETE on the first Sunday of each quarter:
    NEXT NEXT_DAY(ADD_MONTHS(trunc(sysdate,'Q'),3),'SUNDAY')
   cust_nbr, cust_name
   department = 'NC';

Large dynamic tables
Dropping and re-creating very large tables is not a good option because of the system and network resources consumed. The same holds true for using the REFRESH COMPLETE option; both options would take too much time. Let’s explore a better option to use for these tables.

Multimaster replication
Oracle now supports multiple master tables as a feature of advanced replication. Modifications to any of the master tables from any of the sites are propagated to other masters either synchronously or asynchronously. As indicated in Figure B, Oracle’s advanced replication feature, using multimaster replication and synchronous updates, is probably a good choice for the replication of large, dynamic tables that can be updated from multiple locations. Using this technology, table updates are propagated as they occur, eliminating the need to refresh snapshots across the network.

Figure B
Replication of large, volatile tables using multimaster

If you support databases that require updating by users from diverse locations, you'll find that implementing multimaster replication significantly improves end-user response time. Additionally, multimaster replication provides load balancing and recovery fail-over solutions.

Oracle’s multimaster replication utilizes peer-to-peer replication techniques to synchronize all of the master tables in the network, regardless of where they are. Changes to a table at any master site are propagated to other master sites either synchronously or asynchronously.

These benefits do not come without a price. Configuring multimaster replication is a sophisticated process. Complexity is compounded by the need to implement conflict resolution processes, especially in the case of asynchronous propagation of changes. Larger installations could spend hundreds of hours configuring replication and may require a dedicated DBA to manage the environment. Regardless, most installations find that the benefits are well worth the extra efforts applied.

In addition to the table replication capabilities of multimaster replication, which is far superior to read-only snapshot replication, the replication of additional database objects is supported, including:
  • Indexes
  • Index types
  • Packages / Package Bodies
  • Procedures / Functions
  • Synonyms
  • Tables
  • Triggers
  • User-Defined Operators, Types, and Type Bodies
  • Views and Object Views

The ability to replicate stored procedures, for instance, allows the DBA to roll out code changes as easily as table data changes. This is particularly useful in shops where all application code is encapsulated within stored procedures.

Multimaster replication can be thought of as a synchronized set of updatable snapshots. In this context, updatable means that the snapshot allows the FOR UPDATE clause within its definition. In the example below, you’ll see that the snapshot propagates its updates back to the master table:
create snapshot
refresh fast start with sysdate
next sysdate + 1/24
for update
query rewrite
   select * from customer@master_site;

Conflict resolution
Conflicts can occur whenever users at diverse sites attempt to change the same row of a replicated table. While conflicts are avoided using synchronous replication, they can be a problem when asynchronous replication is implemented.

The most common conflicts you’ll encounter with multimaster replication are:
  • Delete conflict—Occurs when a transaction from one site deletes a row that a transaction from another site updates.
  • Uniqueness conflict—Results from an attempt by two different sites to insert rows using identical primary keys. Oracle provides three methods for resolving these conflicts:
    - Append Site Name To Duplicate Value
    - Append Sequence To Duplicate Value
    - Discard Duplicate Value
  • Update conflict—This is caused by simultaneous update transactions from different sites on the same row.

With update conflicts, your only option is to write conflict resolution routines, dealing with each conflict independently. However, Oracle provides several prewritten methods to help resolve the other conflicts. Some examples are:
  • Additive and Average—When dealing with replicated numerics, the Additive method adds a new value to the existing value using the following formula: (current value = current value + (new value—old value)). The Average method averages the conflicting values into the existing value using the formula (current value = (current value + new value)/2).
  • Earliest Timestamp—This method propagates the earliest update (chronologically speaking) and is the opposite of the Latest Timestamp method.
  • Latest Timestamp—This method propagates the latest update received, chronologically. This approach can result in situations where one user’s update gets overlaid by a more recent update.
  • Minimum and Maximum—These methods may be used when advanced replication detects a conflict within a column group. The specified method (minimum or maximum) is called and compares the new value from the originating site with the current value from the destination site for a designated column in the column group. You must designate the comparison column when you select the minimum or maximum conflict resolution method.
  • Priority Group—This method allows you to assign a priority level to each possible value of a particular column. The update associated with the highest-ranked column priority gets the update.
  • Site Priority—Using this method, all master sites are not created equal. Site priority is useful if one site is deemed more likely to have the most accurate information.

The example below demonstrates how conflict resolution methods are specified. Here we execute dbms_repcat.add_update_resolution, telling Oracle to use the LATEST TIMESTAMP method for conflict resolution for updates to the EMP table:
execute dbms_repcat.add_update_resolution( -
        sname => 'SCOTT',               -
        oname => 'EMP',                 -
        column_group => 'EMP_COLGRP',   -
        sequence_no => 1,               -
        method => 'LATEST TIMESTAMP',   -
        parameter_column_name => 'EMPNO');

Maximize performance
I've introduced you to the benefits and complexity of implementing advanced replication. I've also provided some general rules for selecting appropriate replication methods for maximizing performance while satisfying remote access requirements.

Editor's Picks