ENTRIES TAGGED "database"
Did Google just prove the industry wrong? Early thoughts on the Spanner database.
In case you missed it, Google Research published another one of “those” significant research papers — a paper like the BigTable paper from 2006 that had ramifications for the entire industry (that paper was one of the opening volleys in the NoSQL movement).
Google’s new paper is about a distributed relational database called Spanner that was a follow up to a presentation from earlier in the year about a new database for AdWords called F1. If you recall, that presentation revealed Google’s migration of AdWords from MySQL to a new database that supported SQL and hierarchical schemas — two ideas that buck the trend from relational databases.
This new database, Spanner, is a database unlike anything we’ve seen. It’s a database that embraces ACID, SQL, and transactions, that can be distributed across thousands of nodes spanning multiple data centers across multiple regions. The paper dwells on two main features that define this database:
- Schematized Semi-relational Tables — A hierarchical approach to grouping tables that allows Spanner to co-locate related data into directories that can be easily stored, replicated, locked, and managed on what Google calls spanservers. They have a modified SQL syntax that allows for the data to be interleaved, and the paper mentions some changes to support columns encoded with Protobufs.
- “Reification of Clock Uncertainty” — This is the real emphasis of the paper. The missing link in relational database scalability was a strong emphasis on coordination backed by a serious attempt to minimize time uncertainty. In Google’s new global-scale database, the variable that matters is epsilon — time uncertainty. Google has achieved very low overhead (14ms introduced by Spanner in this paper for datacenters at 1ms network distance) for read-write (RW) transactions that span U.S. East Coast and U.S. West Coast (data centers separated by around 2ms of network time) by creating a system that facilitates distributed transactions bound only by network distance (measured in milliseconds) and time uncertainty (epsilon).
Data is getting heavier relative to the networks that carry it around the data center.
Imagine a future where large clusters of like machines dynamically adapt between programming paradigms depending on a combination of the resident data and the required processing.
Oracle's NoSQL Database is more than a product. It's also an acknowledgement.
Oracle's announcement of a NoSQL product isn't just a validation of key-value stores, but of the entire discussion of database architecture.
Cloudera CEO Mike Olson on Hadoop's architecture and its data applications.
Hadoop gets a lot of buzz in database circles, but some folks are still hazy about what it is and how it works. In this interview, Cloudera CEO and Strata speaker Mike Olson discusses Hadoop's background and its current utility.
The founder of Drawn to Scale explains how his database platform does simple things quickly.
Bradford Stephens, founder of of Drawn to Scale, discusses big data systems that work in "user time."
Parsing the progress of open government data requires new tools and reliable information sources.
Data journalists now have huge volumes of accessible government data, but a recent panel discussion reveals that cultural roadblocks and "dirty" data still need to be addressed.
Bypass the SQL parser to use MySQL's raw speed
The HandlerSocket plugin for MySQL bypasses the query parser to deliver excellent NoSQL performance, rivaling that of memcache.
RethinkDB uses SSDs to their full advantage
Today's databases are designed for the spinning platter of the hard disk. As SSDs begin to enter data centers, it's time for a database that takes advantage of the new technology.
Martin Hall explains how Karmasphere is integrating Hadoop into enterprises.
You don't have to throw away existing investments in skills and tools to use Hadoop for big data, as Karmasphere's Martin Hall explains.