Oracle has a long history of introducing new releases with huge fanfare. When Oracle9i was introduced, Oracle Corporation offered one million dollars to any customer who did not find that Oracle was the fastest database on the market. As an Oracle professional, it took me many hours to translate the marketing hyperbole into a definition of the real features included in the database. Hopefully, this overview and a follow-up article detailing enhancements to the database will shed some light on what these new features will mean for you.
Beginning with Oracle8, Oracle made a commitment to supporting data warehouse types of applications. In fact, the major new features in Oracle8 were strongly centered on database scalability and Oracle’s ability to store many terabytes for the user community. Oracle is continuing this commitment to online analytical processing (OLAP) and data warehouse tools in Oracle9i. Let’s take a look at some of the new data warehouse features.
Oracle Structured Query Language (SQL) has changed significantly over the past four years. Oracle Corporation has been struggling to find a balance between making SQL adhere to the ANSI standard and providing robust built-in functions to improve Oracle SQL performance. Oracle 9i has deliberately chosen to vary from the ANSI standard for SQL and provide a wealth of built-in functions that can improve the performance of the Oracle database.
Oracle has also improved the responsiveness of SQL by providing Java-based interfaces directly to their OLAP engine for data warehousing. This makes SQL more accessible to external portals within the Internet world.
In keeping with this commitment to extending Oracle SQL, Oracle has provided some exciting new features for the analysis of large volumes of data:
- LEAD and LAG functions—These SQL extensions enable complex time-series analysis of data warehouse information and allows for the easy aggregation of rolling time periods.
- Ranking functions—Oracle9i SQL includes ranking functions that provide support for common OLAP rankings, such as the top 10, bottom 10, top 10 percent, and bottom 10 percent.
- Advanced grouping functions—Oracle9i now provides categorization functions that group values into buckets, such as age groups or income brackets.
- Statistical functions—Oracle 9i offers statistical functions, including support for correlation, standard deviation, linear regressions, and other common calculations.
In addition to these SQL enhancements, Oracle9i has improved its OLAP engine and offers tools that simplify the complex task of loading Oracle data warehouses.
Extract Transform & Load (ETL) tool
ETL functionality has been a long-awaited feature of Oracle data warehousing. One of the most confounding problems for DBAs managing an Oracle data warehouse is providing clean and accurate summarized data from the operational data store and getting it into the data warehouse in a timely fashion.
Prior to the introduction of ETL software, the extract and loading of Oracle information tended to be very time-consuming and cumbersome. Customized programs needed to be written for the extract. Transformation summarization and aggregation was done manually. Loading information into the target data warehouse would often require the dedicated efforts of a full-time DBA.
Many of the ETL functions Oracle9i uses were integrated through the acquisition of Carleton Software data warehouse tools. The purpose of the ETL software is to assist in the loading of data from the operational data stored into the data warehouse. The functionality includes:
- Extract—The extract phase of ETL software consists of a set of Oracle SQL queries that pull the detailed information from the operational data store.
- Transform—The transformation phase generally involves aggregating highly detailed data that was taken from the operational database. Oracle9i claims to have tools that will assist in the automatic aggregation of information so that it can be summarized for easy loading into the data warehouse engine. The transformation phase sometimes involves collecting external metadata and cleansing operational data.
- Load—The load functions of the ETL relates to the actual loading of the information after it is been transformed into the Oracle data warehouse. Prior to ETL, the Oracle DBA would have to write customized SQL*Loader programs to handle this task.
E-commerce sites have been demanding a tool that will allow the Web site to survey consumer behavior and make product recommendations based upon the customers prior searching, viewing, and purchasing behavior. Oracle9i introduces a personalization feature that simplifies the tracking of consumer behavior by collecting individual information for all consumers who visit the Web site. This data takes the form of WebServer statistics. The information on pages viewed and searches entered can be used to target advertising to the consumer.
Obviously, the recommendation engine is aimed at retail systems where consumers actually sign on and survey a broad range of products and services. With the recommendation engine, Oracle is attempting to add business intelligence to the commerce software. The personalization features will allow the data warehouse engine to automatically survey past usage trends and make recommendations for purchasesbased upon the end users’ prior behavior.
Oracle Parallel Server (OPS) has been around for four years. Now, Oracle has taken the general idea of OPS and improved upon it, offering a new product, which they call Cache Fusion. That term describes the ability of OPS databases to allow multiple instances to share data buffer storage between the instances while still accessing a single Oracle database.
The architecture of Oracle Parallel Server was originally designed for massive parallel processors (MPPs). The idea behind OPS is that a single database can have many separate instances, each with its own System Global Area (SGA) and memory regions, accessing a single database.
An OPS configuration requires that the instances be able to share the data buffer cache. To allow the instances to communicate, Oracle created the Integrated Distributed Lock Manager (IDLM), which transfers data buffer blocks between instances. This is the Cache Fusion component. In OPS, data blocks are quickly transferred between instances, thereby fusing the multiple data buffers into a single cache. The concept of Cache Fusion is essentially advertising the fact that Oracle's IDLM can rapidly “ping” data blocks between the many instances, allowing for these instances to share data buffers in an efficient way.
Oracle has always provided good security within the database, and it’s extending this functionality to incorporate distributed environments. Oracle has enhanced its Virtual Private Database (VPD) tool, which controls access to Oracle databases across multiple Web servers. Oracle also enhanced its encryption methodologies, including separate authentication servers to prevent unauthorized access to a Web-centric database.
Java and XML support
The enhancement for Java and XML support services in Oracle9i further demonstrates Oracle’s focus on Internet databases. Without going into a great level of detail about JavaBeans and XML communications, these tools enable distributed Oracle databases on the Web to communicate with each other using a standard protocol. Oracle has made a commitment to extending its Java class structures and to provide an easy-to-use XML interface.
Whereas a traditional database requires you to call information using Net8 database links, a database can communicate with any other database on the Web using XML. All that’s needed to establish communication are the IP address, port number, and the XML protocols for communicating with the remote database. This is an exciting new feature of Oracle9i and promises to make database communication of commerce systems a commonplace activity.
The Oracle9i data warehouse suite offers data mining tools. Data mining is a general term that refers to examining large volumes of data for statistically significant correlations. The data-mining product that Oracle9i offers is an enhancement of the Darwin data-mining product, which Oracle purchased in 1999.
Data mining is the most advanced of all of the Oracle facilities. Even in the simplest form, data mining tends to be quite analytical and time-consuming on the target database. Many data mining tools also have a tremendous amount of overhead because of their need to survey billions of bytes of information in an attempt to tease out statistically significant correlations between information in the data warehouse.
It remains to be seen how well the Oracle data warehouse community will accept this data-mining tool. Oracle9i users may instead choose one of the many third-party data warehouse tools that can be easily attached to the Oracle database.