I have tremendous respect for Michael Stonebraker. He is an apt visionary. What I like the most about him is his drive and passion to commercialize the academic concepts. ACM recently published his article “My Top 10 Assertions About Data Warehouses.” If you haven’t read it, I would encourage you to read it.
“Please note that I have a financial interest in several database companies, and may be biased in a number of different ways.”
“Star and Snowflake schemas are clean, simple, easy to parallelize, and usually result in very high-performance database management system (DBMS) applications.”
“However, you will often come up with a design having a large number of attributes in the fact table; 40 attributes are routine and 200 are not uncommon. Current data warehouse administrators usually stand on their heads to make “fat” fact tables perform on current relational database management systems (RDBMSs).”
- The schema is not simple; 200 attributes, fact tables, and complex joins. What exactly is simple?
- Efficient parallelization of a query is based on many factors, beyond the schema. How the data is stored and partitioned, performance of a database engine, and hardware configuration are a few to name.
“If you are a data warehouse designer and come up with something other than a snowflake schema, you should probably rethink your design.”
“Since fact tables are getting fatter over time as business analysts want access to more and more information, this architectural difference will become increasingly significant. Even when “skinny” fact tables occur or where many attributes are read, a column store is still likely to be advantageous because of its superior compression ability.”
“For these reasons, over time, column stores will clearly win”
“Note that almost all traditional RDBMSs are row stores, including Oracle, SQLServer, Postgres, MySQL, and DB2.”
“It will take a long time before main memory or flash memory becomes cheap enough to handle most warehouse problems.”
“As such, non-disk technology should only be considered for temporary tables, very “hot” data elements, or very small data warehouses.”
“In other words, look for “no knobs” as the only way to cut down DBA costs.”
(Cross-posted @ cloud computing)