“Bob: So, how do I query the database? IT guy: It’s not a database. It’s a Key-Value store. . . . You write a distributed map-reduce function in Erlang. Bob: Did you just tell me to go **** myself? IT guy: I believe I did, Bob.”
—Fault Tolerance cartoon, @jrecursive, 2009
Relational databases have been around for a while now. It was necessitated with the emergence of web technologies. From its advance in 1995 to its cusp in 2005, it remained a stable and more or less the center piece of the web the revolution. But behind the scenes things were churning, especially with the arrival of web 2.0 and and the need of massive processing capability for big data. This was the age of Amazon, the largest retail operator of the time, arguably with a huge web presence. While web 1.0 was a collection of statically linked pages, web 2.0 was all about dynamic content and its necessity to search and index these pages with transnational capabilities
Amazon, in it’s early days used Common Gateway Interface (CGI) to facilitate user interaction. CGI allowed an HTTP request to invoke a script rather than display a HTML page. Scripts written in pearl were used to access the database and generate pages on the fly. As technology progressed CGI gave way to frameworks such as Java’s J2EE and ASP.NET along with PHP (that followed the CGI model). Despite these advances the basic pattern for for data access, retrieval and rendering of dynamic pages remained unchanged.
At this juncture, scaling was not a big issue, a bottle neck in the client /server or web/server layer could as easily be fixed by piling on more internet servers to meet the rising demands in traffic. However fixing a bottleneck at the database layer was not so simple. Like the web / server fix, in the early days , these issues were fixed by upgrading to the latest and greatest hardware, operating systems and databases and what have you.
With the crash of the internet bubble, two realities came into play
- Indefinite expenses in scaling up to the latest and greatest was no longer viable and economical
- Startups, came into being and they needed a more realistic solution, one that involved scaling up from a pint sized infrastructure to the potential of meeting a a global market as the companies grew.
Effective Schema Design
- The Schema Plays an important role in tuning joins
- Choose data types for all primary and unique keys carefully, and remain consistent through out the schema
- If you primary key column is varchar2(100), such a primary key is expensive because the large key must be carried in every referencing table, and also because Oracle bust compare long strings when performing a join
- Create Index where necessary and if possible try to create indexes on multiple columns
- Normalize your tables , so that there is not a lot of redundancy
- Put necessary constraints on the table so that you have the right data.
Invalid Optimizer Statistics
- Statistics are used by the optimizer to come up with a good execution plan
- Sometimes optimizer statistics might be stale or missing
The statistics may not always be accurate because the size of the table might have changed from a few 100k rows to a few million rows. There fore statistics should be periodically collected and be made as part of the database maintenance routines.
A hint is an instruction passed to the optimizer through comments in an SQL statement. Hints enable you to make decisions normally made automatically by the optimizer.