Skip to content

Google BigQuery

Google BigQuery is an enterprise data warehouse that allows to store and query massive datasets by enabling super-fast SQL queries using the processing power of Google's infrastructure.

Setup

Before using the power of Google BigQuery within ChartFactor, you need to set up a Google Cloud account and a project within it. You will also need to create a OAuth client ID for that project in order to authenticate in your cloud account using Oauth2.

To connecto to Google BigQuery from your browser, include in your HTML page the following:

  • The google api client library
  • The ChartFactor BigQuery provider library

Example below:

1
2
<script src="https://apis.google.com/js/client.js"></script>
<script src="./cft-google-bigquery-provider.min.js"></script>

Then, declare your provider in your Javascript code as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    projectId: 'my-project-id',
    clientId: 'oauthClientId', // Browser environment
    scope: [],  // Optional
    file: "../bigquery-private-key.json", // Non browser environment
    otherProjects: [] //Optional
}]

The description of the properties above is below:

name

This is the name you'd like to use to identify your provider. It is a standard property for all providers.

provider

This is the provider type, in this case google-bigquery

projectId

This is the project ID in your Google Cloud account

clientId

This is the OAuth 2.0 Client ID of your application when the BigQuery provider is running on the browser to enable OAuth authentication

scope

This property is optional. It is needed only if we have configured a special access to one of the Google services or APIs that is not included in the default scope provided by ChartFactor for the Google BigQuery provider. The scopes already included are:

1
2
3
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/drive.readonly',
'https://www.googleapis.com/auth/cloud-platform.read-only'

file

This property is only applicable when deploying the BigQuery provider on the server-side in a Node.js environment. It is the location of the private key JSON file containing the security information necessary to establish a connection to BigQuery. An example of a private key JSON file is below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "type": "service_account",
  "project_id": "project-id",
  "private_key_id": "1c934j3l2l7l5j6h3j5h67h5h3h9j3b5b6h8k1k3",
  "private_key": "private_key",
  "client_email": "bigquery@project-id.iam.gserviceaccount.com",
  "client_id": "client_id",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/bigquery%40project-id.iam.gserviceaccount.com"
}

otherProjects

This is a list of projects that you would also like to access, such as projects containing public datasets. This property is optional. By default the public project bigquery-public-data will always be included.

Using your data provider

Now that you have your providers array initialized (previous section) with your BigQuery provider, use the ChartFactor setProviders() method to set your data provider definition:

1
cf.setProviders(providers);

Google BigQuery uses a hierarchical naming convention to identify its projects, datasets, and tables. You must use this same naming convention to specify table names in the aql:

1
2
3
4
cf.provider('Google BQ')
.source('bigquery-public-data:austin_crime.crime')
.groupby(...)
...
In general the format used is the following: project-id:dataset.table

Nested structures

Google BigQuery allows objects (nested and repeated data) to be stored by using fields of type RECORD. You can learn more about nested structures here.

ChartFactor supports the interaction with these structure by flattening them transparently so they can be used as any other field:

1
2
3
4
5
6
let repoUrl = cf.Attribute('repository.url')
                .limit(10)
                .sort('asc', 'repository.url')
cf.provider('Google BQ')
.source('bigquery-public-data:samples:github_nested')
...

Response timeout

In some occasions, the Google BigQuery's engine may take time to process and execute queries, especially if the dataset is large and the queries have not been cached.

By default, when a request is sent by the BigQuery provider to the data engine, it will wait for a 60,000 milliseconds (a full minute) to receive the response. After that time it will throw a Timeout error for the query. This value can be modified to a greater one in the custom metadata:

1
2
3
4
5
6
7
8
9
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    ...
    metadata: {
      timeoutMs: 120000
    }
}]

In the example above we are setting timeoutMs with a waiting time of 2 mins for the response to come back. The value should always be set in milliseconds.

Supported Aggregations Out-Of-The-Box

SUM

1
    var metric = cf.Metric("amount","sum");

AVG

1
    var metric = cf.Metric("amount","avg");

MIN

1
    var metric = cf.Metric("amount","min");

MAX

1
    var metric = cf.Metric("amount","max");

COUNT DISTINCT

1
    var metric = cf.Metric("my_attribute","unique");

PERCENTILES

1
    var metric = cf.Metric('commission', 'percentiles');

GEOHASH

Please refer to Geo Queries for more information.

Dependencies

  • Google Api javascript client