Monday, October 5, 2009

Oracle's Exadata 2: Analytic and Operational Database in a single machine


More than a week ago I wrote a post on Oracle-Sun hardware in the light of Exadata 2 announcement. This announcement did not change my view before the acquisition and right after the Sun acquisition: Oracle is not a hardware vendor.

This post is more technical than the previous posts mentioned above.


Storage Hierarchy

The schematic illustration above depicts schematically Storage Hierarchy. It includes internal memory, as well as external storage.

The top of the hierarchy includes small, fast and expensive storage types, so the amount of storage of that types is limited. Sometimes the amount is also limited by technological constraints. As we proceed down the hierarchy the storage is cheaper and the amount of storage of that kind is larger, but the time required for accessing it is longer.

The illustration does not show size ratios it shows only size order from smaller to bigger. It also does not differentiate between slow and slower storage included in the bottom box. Size and speed of the storage types included in that box may vary significantly.

Software systems such as Paging systems for internal memory and archiving systems for external storage aim at optimizing the use of the storage hierarchy on a principle of Least Recently Used (LRU): the least recently used content is moved to a lower and cheaper storage, based on the assumption that an element that was not used for a long time probably will not be used again.

For optimizing the movement of elements between different storage level hierarchies another principle is used: Locality of Reference. There is Locality of Reference if after using data the system will use again the same data or a data located close to original data in memory. For example, if a banking system moves customer data from disk to memory for executing an online query, it is reasonable that a related transaction by the same customer will follow the query and probably use at list part of the data used for the query. Finding the data in a higher level memory and not in a disk will accelerate the transaction.

On the other hand, usage of a relatively small storage level such as a Cache with no Locality of Reference could result in large amounts of elements movements from a faster level to a slower level of the storage hierarchy. Those movements, called Thrashing will diminish the benefits of a Cache or other high level fast storage.


Analytical Data Warehouse/ BI systems vs. Operational Systems

There are significant differences between Data Warehouse (DWH)/ Business Intelligence (BI) systems and Operational Transactional systems. Analyzing these differences is beyond the scope of this post. For analysis including technical as well as cultural differences between DWH/BI systems and Transactional Systems, read one of Ralph Kimball's books. However, the following bullets describe briefly major differences related to Storage Hierarchy:

  • Read/Write Ratio – DWH/BI systems are almost read only and Transactional systems Read, Write and Update. Based on my experience I would expect a 20% or 30% of the IO operations to be Write/Update operations. The processing of DWH/BI is composed mainly from Read operations for Reporting and Analysis. Write operations for DWH databases are executed separately in an ETL process.
  • Unlike Transactional systems, Locality of Reference is not a pattern of DWH/BI systems because these systems process large and separated amounts of data.


The DWH/BI DBMS market is different from the overall DBMS market, which mostly reflects the more frequently, used Transactional systems.

Actually there are only four major players in the DWH/BI DBMS market: The three leading DBMS vendors: Oracle, IBM and Microsoft and also Teradata.

Microsoft's product is not a good fit for very large DWH, so in the high-end three vendors with three different approaches compete.

IBM's recommends to many customers, especially for Mainframe based Data Warehouses, to use a single large database for both Transactional systems and Data Warehouse. The advantage of this approach is avoiding of complex ETL processes. The approach is based upon the massive processing and especially IO capacity of Mainframes.

In the Oracle world usually the classic approach of different databases for the DWH/BI and the Transactional system is common, even if Oracle database is used for the Transactional systems. The DWH/BI database processing was executed on standard Servers under UNIX, Linux or Windows Operating Systems.

Teradata, without any presence in the Transactional DBMS market has no choice. The classic separation between Operational and Analytic databases model is the only possible model.

Teradata solution is unique solution. It is a combined hardware and software database machine. The hardware is based on standard Intel processors, capable of parallel processing, so its Scalability is relatively high especially for very large databases BI processing.


Exadata 1 and Exadata 2

The introduction of Exadata 1 special purpose Database machine by HP & Oracle less than a year ago was a model change. Oracle adopted a similar model to Teradata's model, in order to compete in the High-End market.

The machine was based on standard Intel processors and was designed for Data Warehouse applications.

Exadata 2 is based on Sun's hardware, as mentioned in previous post, instead of HP's hardware. Its, performance is significantly better than Exadata 1. According to Oracle it addresses both the Transactional market, as well as the DWH/BI market.

Probably, it is the first special purpose database hardware which addresses the Online Transaction Processing (OLTP) market.

Exadata 2 is also based on standard Intel processors. A significant enhancement is the addition of another storage level as part of the machine: Flash Disk. The Flash Disk could enhance significantly OLTP performance. But due to lack of Locality of Reference it may propose no improvements of DWH/BI applications performance or may even affect it negatively.

Exadata 1 could be a Teradata like model for DWH/BI. Is Exadata 2 a similar model to the model of one shared database for OLTP and DWH/BI as proposed in some cases by IBM? I am not sure it is, but in any case it is a machine aiming at addressing the High-End market and could be too expensive overkill for less scalable applications.

2 comments:

Machinery Manufacturer said...

Nice article! Your article and diagram are very helpful in understanding the concepts. Thanks for sharing it.

Unknown said...

This post is probably where I got the most useful information for my research. Thanks for posting, maybe we can see more on this.
Are you aware of any other websites on this subject.
GOOD TERADATA ONLINE TRAINING

Public Cloud Core Banking: Hype or Reality? - Revisited

  More than 4 years ago I was asked if Public Cloud Core Banking is a Hype or a Short Term Reality? If you had read the post, you would prob...