Skip to content

Google BigQuery

Google BigQuery is an enterprise data warehouse that allows to store and query massive datasets by enabling super-fast SQL queries using the processing power of Google's infrastructure.

Setup

Before using the power of Google BigQuery within ChartFactor, you need to set up a Google Cloud account and a project within it. You will also need to create a OAuth client ID for that project in order to authenticate in your cloud account using Oauth2.

To connecto to Google BigQuery from your browser, include in your HTML page the following:

  • The google api client library
  • The ChartFactor BigQuery provider library

Example below:

1
2
<script src="https://accounts.google.com/gsi/client" async defer></script>
<script src="./cft-google-bigquery-provider.min.js"></script>

Then, declare your provider in your Javascript code as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    projectId: 'my-project-id',
    clientId: 'oauthClientId', // Browser environment
    scope: [],  // Optional
    file: "../bigquery-private-key.json", // Non browser environment
    otherProjects: [], //Optional
    autoRefreshToken: true, // Optional, defaults to false
    enableCache: true, // Optional, defaults to false
    cacheTimeout: '15m 30s', //15 minutes and 30 seconds, this is optional, defaults to "1d" (1 day)
}]

The description of the properties above is below:

name

This is the name you'd like to use to identify your provider. It is a standard property for all providers.

provider

This is the provider type, in this case google-bigquery

projectId

This is the project ID in your Google Cloud account where your BigQuery data is stored. This is required to access your datasets and tables.

clientId

Defaults to ChartFactor’s Client ID if not provided. This is a unique application identifier assigned by Google Cloud. You should obtain your own Client ID when deploying your ChartFactor web apps on your own domain. To obtain your own Client ID, follow these steps:

  1. Go to the Google Cloud Console.
  2. Navigate to "APIs & Services" > "Credentials".
  3. Click on "Create Credentials" and select "OAuth client ID".
  4. Configure the consent screen if you haven't already (this is required for OAuth credentials).
  5. Choose the application type "Web Application" and provide the necessary details.
  6. Click "Create". The Client ID will be displayed on the screen and can also be found in the "Credentials" section under "OAuth 2.0 Client IDs".

The Client ID is applicable when running on the browser to enable the OAuth2 authentication. If you are deploying the BigQuery provider on the server side, you need to set a different property with your service account private key as specified below.

scope

An array of strings to define the necessary permissions for your application in the form of OAuth 2.0 scopes. This property is optional. The scope already included is below.

1
'https://www.googleapis.com/auth/bigquery.readonly'

Use this property to specify additional scopes only if your application requires access to Google services or APIs that are not in the default scope.

file

This property is only applicable when deploying the BigQuery provider on the server-side in a Node.js environment. It is the location of the private key JSON file containing the security information necessary to establish a connection to BigQuery. An example of a private key JSON file is below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "type": "service_account",
  "project_id": "project-id",
  "private_key_id": "1c934j3l2l7l5j6h3j5h67h5h3h9j3b5b6h8k1k3",
  "private_key": "private_key",
  "client_email": "bigquery@project-id.iam.gserviceaccount.com",
  "client_id": "client_id",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/bigquery%40project-id.iam.gserviceaccount.com"
}

otherProjects

This is a list of projects that you would also like to access, such as projects containing public datasets. This property is optional. By default the public project bigquery-public-data will always be included.

autoRefreshToken

This property is applicable only when deploying the BigQuery provider on the browser side. It is a boolean flag that indicates whether the OAuth2 token should be automatically refreshed when it expires. This is useful for long-running applications or sessions where you want to avoid manual re-authentication. Defaults to false.

When set to true, the provider will handle token refresh automatically, triggering only a popup with a loader if the user is already authenticated, ensuring that your application remains authenticated without requiring user intervention.

enableCache

A boolean flag to enable or disable query caching. When enabled, ChartFactor will cache query results to improve performance and reduce the number of queries sent to BigQuery. Defaults to false.

cacheTimeout

A string specifying how long cached results are retained since creation. Applicable when enableCache is true.

The syntax consist of one or more duration segments in the form "<number><unit>" separated by spaces. Each segment is an integer immediately followed by a unit identifier (no spaces). Its default value is 1d (i.e. 1 day).

Supported units:

  • s = seconds
  • m = minutes
  • h = hours
  • d = days
  • w = weeks

Examples

  • "3.5h"
  • "1m 2s"
  • "1h 30m 15s"

Constraints

  • Must be positive.
  • Cannot exceed 150 days.
  • Segments must be in decreasing order of magnitude (e.g. "1h 30m" is preferred over "30m 1h").

auth

The auth property is an optional configuration object that allows defining custom authentication logic for the Google BigQuery provider in browser environments. This is useful when you want to handle token retrieval and token refreshing manually, such as when implementing a redirect-based OAuth flow (to avoid popup blockers) or retrieving tokens from a backend service.

Prerequisites: Google Cloud Console Configuration

Before implementing custom authentication, ensure your Google Cloud project is correctly configured:

  • OAuth Consent Screen: Configure the OAuth consent screen in the Google Cloud Console if you haven't done so. This is required to create OAuth 2.0 credentials.
  • Create Client ID: Create an OAuth 2.0 Client ID for a Web application.
  • Authorized JavaScript Origins: Add the origin of your application (e.g., http://localhost:3000 or https://your-app.com).
  • Authorized Redirect URIs: If you are implementing a redirect flow, you must add the full redirect URI where Google will send the user back after authentication (e.g., http://localhost:3000/oauth2callback).

Token Storage Behavior

The provider uses the localStorage key google_oauth_token to store and retrieve the access token.

  • Automatic Check: Before calling your custom getToken function, the provider checks google_oauth_token in localStorage. If a valid (non-expired) token exists, it uses it directly and skips calling getToken.
  • Manual Storage: If you are implementing a redirect flow or an external authentication manager, you can manually store the token in google_oauth_token (as a JSON string containing access_token and expires_at timestamp). This allows the provider to pick up the token immediately upon initialization.

Requirements

The auth object must contain two functions:

  • getToken(config): A function that returns a Promise resolving to the initial token. Called only if no valid token is found in storage.
  • refreshToken(config): A function that returns a Promise resolving to a refreshed token.

Parameters

Both functions receive a config object containing the following properties from the provider configuration:

  • clientId
  • projectId
  • otherProjects
  • scopes
  • autoRefreshToken

Return Value

The Promise returned by both getToken and refreshToken must resolve to an object containing:

  • access_token (string): The valid Google OAuth 2.0 access token.
  • expires_in (number): The duration in seconds until the token expires.

Examples

Scenario 1: Fetching Token from a Backend Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
const provider = {
    name: 'bigquery',
    provider: 'google-bigquery',
    projectId: 'your-project-id',
    clientId: 'your-client-id',
    auth: {
        getToken: async (config) => {
            // Fetch token from your backend API
            const response = await fetch('/api/get-google-token');
            const data = await response.json();
            return {
                access_token: data.token,
                expires_in: data.expiresInSeconds
            };
        },
        refreshToken: async (config) => {
            // Fetch refreshed token from your backend API
            const response = await fetch('/api/refresh-google-token');
            const data = await response.json();
            return {
                access_token: data.token,
                expires_in: data.expiresInSeconds
            };
        }
    }
};

Scenario 2: Redirect-based OAuth Flow

In this scenario, getToken initiates the redirect. A separate callback handler (at the redirect_uri) is responsible for parsing the token from the URL and saving it to localStorage using the key 'google_oauth_token'. When the user is redirected back to the app, the provider automatically detects the token in storage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
const provider = {
    name: 'bigquery',
    provider: 'google-bigquery',
    projectId: 'your-project-id',
    clientId: 'your-client-id',
    auth: {
        getToken: (config) => {
            // 1. Define your redirect URI (must match Google Cloud Console)
            const redirectUri = window.location.origin + '/oauth2callback';

            // 2. Construct the OAuth URL
            const params = new URLSearchParams({
                client_id: config.clientId,
                redirect_uri: redirectUri,
                response_type: 'token',
                scope: 'https://www.googleapis.com/auth/bigquery.readonly',
                prompt: 'select_account'
            });

            // 3. Save current location to return to later
            localStorage.setItem('google_oauth_return_url', window.location.href);

            // 4. Redirect to Google
            window.location.href = `https://accounts.google.com/o/oauth2/v2/auth?${params.toString()}`;

            // Return a non-resolving promise to pause execution during redirect
            return new Promise(() => {});
        },
        refreshToken: (config) => {
            // For implicit flow, you can try a silent redirect or simply redirect again
            const redirectUri = window.location.origin + '/oauth2callback';
            const params = new URLSearchParams({
                client_id: config.clientId,
                redirect_uri: redirectUri,
                response_type: 'token',
                scope: 'https://www.googleapis.com/auth/bigquery.readonly',
                prompt: 'none' // Try silent auth
            });
            window.location.href = `https://accounts.google.com/o/oauth2/v2/auth?${params.toString()}`;
            return new Promise(() => {});
        }
    }
};

Note on Callback Handling: For Scenario 2, your /oauth2callback route/component should:

  1. Parse the access_token and expires_in from the URL hash.
  2. Create a token object: { access_token: "...", expires_at: Date.now() + expires_in * 1000 }.
  3. Save it to localStorage: localStorage.setItem('google_oauth_token', JSON.stringify(tokenObject)).
  4. Redirect the user back to the application. You might choose to redirect to the URL stored in google_oauth_return_url (as shown in the example) to return the user to their previous location, or simply redirect to a default page depending on your requirements.

Using your data provider

Now that you have your providers array initialized (previous section) with your BigQuery provider, use the ChartFactor setProviders() method to set your data provider definition:

1
cf.setProviders(providers);

Google BigQuery uses a hierarchical naming convention to identify its projects, datasets, and tables. You must use this same naming convention to specify table names in the aql:

1
2
3
4
cf.provider('Google BQ')
.source('bigquery-public-data:austin_crime.crime')
.groupby(...)
...
In general the format used is the following: project-id:dataset.table

Nested structures

Google BigQuery allows objects (nested and repeated data) to be stored by using fields of type RECORD. You can learn more about nested structures here.

ChartFactor supports the interaction with these structure by flattening them transparently so they can be used as any other field:

1
2
3
4
5
6
let repoUrl = cf.Attribute('repository.url')
                .limit(10)
                .sort('asc', 'repository.url')
cf.provider('Google BQ')
.source('bigquery-public-data:samples:github_nested')
...

Response timeout

In some occasions, the Google BigQuery's engine may take time to process and execute queries, especially if the dataset is large and the queries have not been cached.

By default, when a request is sent by the BigQuery provider to the data engine, it will wait for a 60,000 milliseconds (a full minute) to receive the response. After that time it will throw a Timeout error for the query. This value can be modified to a greater one in the custom metadata:

1
2
3
4
5
6
7
8
9
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    ...
    metadata: {
      timeoutMs: 120000
    }
}]

In the example above we are setting timeoutMs with a waiting time of 2 mins for the response to come back. The value should always be set in milliseconds.

Partitioning and Clustering Support

Google BigQuery tables can be partitioned and clustered to improve query performance and efficiency. To take advantage of these optimizations, queries must include the partitioned or clustered column in their WHERE clause.

In some cases, the value of this column is derived from a hash or computed using one or more functions. Within ChartFactor, this behavior can be configured using Custom Metadata by adding a routing property to the corresponding data source.

The routing Property

The routing property accepts an array of objects, where each object must define the following fields:

  • field: The name of the field used in the filter.
  • partitionField: The name of the column used for partitioning or clustering in BigQuery.
  • Either function or query: Defines how to transform or resolve the filter value.

For query based routing, you should provide a SQL expression in the query property that uses the special variable {filter_value} to represent the value of the filter applied on the field property. An example is presented below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    ...
    metadata: {
      "your-project:your_dataset.your_table": {
        routing: [
            {
                field: 'company',
                partitionField: 'company_hash',
                query: "MOD(ABS(FARM_FINGERPRINT('{filter_value}')), 10)"
            }
        ]
      }
    }
}]

For function based routing, the function property must receive a function that takes as input the value of the filter applied on the field property and returns the computed value for the partitionField. An example is presented below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    ...
    metadata: {
      "your-project:your_dataset.your_table": {
        routing: [
            {
                field: 'company',
                partitionField: 'company_hash',
                function: (value) => { // the value of the filter when a column filter is applied
                    // assuming that the method  to generate a hash from a string is the same when ingesting data
                    let hash = 0;
                    for (let i = 0; i < value.length; i++) {
                        const char = value.charCodeAt(i);
                        hash = ((hash << 5) - hash) + char;
                        hash = hash & hash; // Convert to 32bit integer
                    }
                    return hash;
                }
            }
        ]
      }
    }
}]

Note

If you provide both function and query, the query property will take precedence.

Geo-Based Routing

The routing property also supports geo-based partitioning, allowing you to optimize geospatial queries on large datasets. This is commonly used for partitioning tables by geohash values, but you can use custom functions to partition by any computed value based on geographic coordinates.

For geo-based routing, the field property has a different structure (an object) compared to non-geo routing (a string). The field object should include the lat and lon keys and they should map to the latitude and longitude columns in your table.

Note

Geo routing does not support query-based routing. Only function and builtin routing methods are supported for geospatial data.

Using Built-in Functions

The builtin property currently supports the encodeGeohash out-of-the-box function. The description of this function is below.

Parameters

In addition to the filter values passed implicitely to the encodeGeohash function, you can specify the following parameters:

  • precision (number, optional – default: 2): Defines the geohash precision level used to encode the spatial area. Higher precision results in smaller, more granular geohash cells.

  • maxGeohashes (number, optional – default: 20): Sets the maximum number of geohash values the function is allowed to return. If the computed list exceeds this limit, the function falls back to returning an empty list to indicate that all partitions must be scanned.

Description

encodeGeohash generates a list of geohash partition keys that cover a user-specified geographic area. The function receives POLYGON and DISTANCE filters as input, computes the spatial envelope defined by these filters, and encodes that area into geohashes at the chosen precision.

These geohashes are then used to limit queries to only the relevant partitions, improving performance by avoiding unnecessary scans. If the number of resulting geohashes exceeds the maxGeohashes threshold, the function returns an empty array. This signals that the area is too large or too fragmented to efficiently partition using geohashes, and the system should instead query all available partitions.

Example definition

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    ...
    metadata: {
      "your-project:your_dataset.your_table": {
        routing: [
            {
                field: {
                    lat: "pickup_latitude",
                    lon: "pickup_longitude"
                },
                partitionField: "pickup_geohash2",
                builtin: "encodeGeohash",
                params: {
                    precision: 2,
                    maxGeohashes: 20
                }
            }
        ]
      }
    }
}]

Example result

When you apply a geospatial filter in ChartFactor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
let shapeFilter = cf.Filter(["pickup_longitude", "pickup_latitude"])
    .label("shape")
    .type("POLYGON")
    .operation("IN")
    .value([
        [-87.69057, 41.90846], 
        [-87.69136, 41.89542], 
        [-87.66525, 41.89736], 
        [-87.65756, 41.90552], 
        [-87.69057, 41.90846]
    ]);

With the encodeGeohash built-in function configured, the routing mechanism will automatically add the corresponding partition condition to the SQL query:

1
AND pickup_geohash2 IN ("dp", "dq", "dr", ...)

Using Custom Function for Geo Routing

For more flexibility, you can provide a custom function that receives the filter value and returns the computed partition value(s). This allows you to use any partitioning scheme based on geographic coordinates, not just geohashes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// define providers
var providers = [{
    name:'Google BQ',
    provider:'google-bigquery',
    ...
    metadata: {
      "your-project:your_dataset.your_table": {
        routing: [
            {
                field: {
                    lat: "pickup_latitude",
                    lon: "pickup_longitude"
                },
                partitionField: "pickup_geohash2",
                function: (filterValue) => {
                    // Custom logic to handle the filter value
                    const convertedValue = ["dp", "dq", "dr"];
                    // Can return any value type (e.g. string, array of strings) 
                    // based on your partitioning scheme
                    return convertedValue;
                }
            }
        ]
      }
    }
}]

Filter Value Formats

The filterValue argument passed to the custom function varies depending on the type of geo filter applied:

POLYGON Filter (Shape or Bounds)

An array of coordinate arrays, where each inner array contains [longitude, latitude]:

1
2
3
4
5
6
7
[
    [-87.69057, 41.90846],  // First point
    [-87.69136, 41.89542],  // Second point
    [-87.66525, 41.89736],  // Third point
    [-87.65756, 41.90552],  // Fourth point
    [-87.69057, 41.90846]   // Closing point (same as first)
]

BOUNDING_BOX Filter

An object containing the four corner coordinates of the bounding box:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
    topLeft: {
        lat: 34.20313854420557,
        lng: -88.11088638874446
    },
    bottomRight: {
        lat: 31.456162096357232,
        lng: -78.35756318590973
    },
    topRight: {
        lat: 34.20313854420557,
        lng: -78.35756318590973
    },
    bottomLeft: {
        lat: 31.456162096357232,
        lng: -88.11088638874446
    }
}

DISTANCE Filter (Radius)

An array containing a single object with center coordinates and distance in kilometers:

1
2
3
4
5
6
7
8
9
[
    {
        center: [
            -87.68548,  // longitude
            41.89929    // latitude
        ],
        distance: 24    // radius in kilometers
    }
]

Supported Aggregations Out-Of-The-Box

SUM

1
    var metric = cf.Metric("amount","sum");

AVG

1
    var metric = cf.Metric("amount","avg");

MIN

1
    var metric = cf.Metric("amount","min");

MAX

1
    var metric = cf.Metric("amount","max");

COUNT DISTINCT

1
    var metric = cf.Metric("my_attribute","unique");

APPROXIMATE COUNT DISTINCT

For large datasets, the approximate count distinct aggregation function unique_approx can be used to improve performance when exact counts are not required.

1
    var metric = cf.Metric("my_attribute","unique_approx");

PERCENTILES

1
    var metric = cf.Metric('commission', 'percentiles');

GEOHASH

Please refer to Geo Queries for more information.

Dependencies

  • Google Api javascript client