These days, even the smallest of enterprises have amassed terabytes-if not petabytes-of data storage. Additionally, services like YouTube and Twitter are said to surpass terabytes of data daily. There is no doubt about it; we are in the age of "big data". So the question arises, how are we to scan these vast heaps of information without experiencing the performance degradation noticed with relational database servers (mostly due to their propensity to preserve a relational structure, or attain what is known as normalization)?
Principally, most business cases that require the querying of big data stores usually entail some kind of sentiment analysis, for marketing purposes, or concern the analysis of data regarding time-sensitive/real-time processes that must be understood immediately in order to maximize their value, or even have any value at all. However, for those looking to harness big data, setting up a platform to process and analyze data across server clusters, while continuously streaming massive volumes of distinct data types, can be a very costly and time consuming expenditure. Luckily, as part of its recently coined Cloud Platform, Google offers BigQuery, a "pay-as-you-use" solution for those looking to get their feet wet amid querying complex unstructured data sets-without the hassle of having to manage a cumbersome distributed system.
Google BigQuery is essentially an on-demand big data storage and querying service. One can store as much data as he/she feels necessary, and only have to pay for what he/she uses (to the extent of certain storage limits). Furthermore, one can scale his/her data to the amount of hundreds of terabytes of data, with no additional management needed. Users manage their data stores through a web-based interface, or alternatively through a HTTP REST API or by way of command-line. This includes the running of SQL-like queries that not only allows one to query columnar data structures, but also join related tables, just as one might do with a traditional SQL database engine.
In terms of integration, BigQuery provides an easy method for exporting data. More progressively, users can share their data within Google spreadsheets, or through Google App Engine derived dashboards, all while controlling access through ACLs . Multiple layers of security are provided, as outlined through BigQuery's Terms of Service, that are guaranteed to "adhere to reasonable security standards no less protective than the security standards at facilities where Google processes and stores its own information" with complete redundancy throughout Google's countless datacenters. Pricing is relatively straightforward, where one only needs to be concern him/herself with two factors, querying and storage.
As stated on the BigQuery Pricing webpage, "BigQuery uses a columnar data structure, which means that for a given query, you are only charged for data processed in each column, not the entire table." Something to be mindful of is that charges are rounded up to the nearest MB, with a minimum of 1MB data processed per query. For an entry-level (non-premier) account, storage currently runs at $0.12 per GB for up to 2TB monthly. Querying is charged based upon GB processed, costing $0.035 (yes that is ½ a cent) per gigabyte, with a limit of 20,000 queries per day, as well as a 20TB of processed data per day limit. Moreover, the first 100GB of data processed per month is free!
BigQuery has multiple client libraries, such as Python and Java, so developers familiar with the Google App Engine should have no problem jumping right in. However, if you are newbie, or just need to simply sign up for the service, you can reference BigQuery's Quick Start Guide. For more complex or really "big" applications, see the Developer's Guide.
Ian is a manager of business intelligence/analytics for a small cap NYSE traded energy company. He also freelance writes about business and technology, as well as consults SMBs upon Internet marketing strategy.