Skip to content

Google BigQuery

Google BigQuery is an enterprise data warehouse that allows to store and query massive datasets by enabling super-fast SQL queries using the processing power of Google's infrastructure.

Before using the power of Google BigQuery within ChartFactor, we need to set up a Google Cloud account and a project within it. We'll also need to create a client id for that project in order to authenticate in our cloud account using Oauth2.

To use the Google BigQuery provider we need to include the google api client and the provider:

1
2
<script src="https://apis.google.com/js/client.js"></script>
<script src="./CFT-google-bigquery-provider.min.js"></script>
1
2
3
4
5
6
7
8
9
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    projectId: 'my-project-id',
    clientId: 'oauthClientId',
    otherProjects: [], //Optional
    scope: []  // Optional
}]

In the above configuration, we need to specify the project and client ids obtained from the Google Cloud account. The property otherProjects is a list of projects that we could also have access to like public datasets for example, and is optional. By default the public project bigquery-public-data will be always included.

The scope configuration is also optional. This is required only if we have configured a special access to one of the Google services or APIs that is not included in the default scope provided by ChartFactor for the Google BigQuery provider. The scopes already included are:

1
2
3
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/drive.readonly',
'https://www.googleapis.com/auth/cloud-platform.read-only'

Then, as in any other provider, use the setProviders() method of ChartFactor to set your data provider definitions:

1
cf.setProviders(providers);

This assumes you have a running instance with Dremio.

Google BigQuery handles the sources in a hierarchical format and it can be appreciated in how it builds queries. We must use that same way to specify the source name in the aql:

1
2
3
4
cf.provider('Google BQ')
.source('bigquery-public-data:austin_crime.crime')
.groupby(...)
...
In general the format used is the following: project-id:dataset:table

RAW DATA QUERY

There are two ways to query raw data from Google BigQuery by using the Raw Data Table.

The first way, which is the default one, uses the Tabledata API. By using it, we'll have all the rows rendered (with the proper server pagination) in the Table as usual. The limitation is that this way doesn't support sorting or filtering.

The second way builds an actual SQL query that is sent to the server. This variant does support filtering and sorting, but its limitation resides in that this only work if the amount of data we're going to query is not too big. Otherwise it will throw an Resources exceeded during execution error. In order to enable this way we need to set the option bigqueryTableApi to false in the Raw Data Table aql:

1
2
3
4
5
    ...
    .element('chart')
    .graph('Raw Data Table')
    .set('bigqueryTableApi', false)
    .execute()

Supported Aggregations Out-Of-The-Box

SUM

1
    var metric = cf.Metric("amount","sum");

AVG

1
    var metric = cf.Metric("amount","avg");

MIN

1
    var metric = cf.Metric("amount","min");

MAX

1
    var metric = cf.Metric("amount","max");

COUNT DISTINCT

1
    var metric = cf.Metric("my_attribute","unique");

Dependencies

  • Google Api javascript client