Introduction to Data Providers¶

A data provider is the interface between ChartFactor and the source of your data (ie: Spark, Elasticsearch or any other analytics engine). It will translate the ChartFactor AQL options into the specific language of the engine and then it will query it. Afterwards, it will receive and translate the incoming response back to the proper ChartFactor data structures needed to populate visualizations. A data provider is programmed based on the data engine it supports. If you plan to query data from different analytics engines, you need one provider for each.

ChartFactor implements an extensible modular architecture for data providers. Out of the box, ChartFactor includes several data providers as Javascript modules that must be included with a script tag in your html page.

The first step when creating your data application is to include the provider you’re going to use in the html file of the application.

 <script src="./Data-provider.min.js"></script>

This script tag can be included after or before the “CFToolkit” library. This is because they are not strictly dependant. The provider’s library will create a standalone global object that will be used by ChartFactor only when we define it.

The next step is to define your data providers in your Javascript code. A data provider is described by a simple JSON object that includes the following properties:

name: The name you want to use for your data provider within your data application

provider: The data provider type, which also happen to be the name of the global object. Examples are: sparksql, google-bigquery, elasticsearch, etc .

Additional properties exist, depending on the specific data provider.

Finally, use the global cf object to inform ChartFactor of your providers. Example:

    var providers = [{
        name:'ElasticSearch',
        provider:'elasticsearch', 
        url:'https://chartfactor.com:9200'
    }]

    cf.setProviders(providers);

`request` function¶

All data providers expose the same asynchronous provider.request(action, payload?) API. It returns a promise that resolves with provider-specific metadata once the request completes.

action: string that identifies the metadata operation.
payload: optional string or object used by the action (for example, a fully qualified table name).

A simple pattern you can reuse looks like this:

const provider = cf.getProviderByConfig({ name: 'My Google BigQuery' });

provider.request('sources')
    .then((response) => console.log('Metadata:', response))
    .catch((error) => console.error('Metadata request failed', error));

The promise rejects if the provider cannot reach the engine or if the action is not supported, so handle the error path when you surface the metadata in your application.

The supported actions are described below:

source¶

source registers a single resource (project, dataset, table, index, and so on) in the provider metadata cache and returns its details. The payload is the resource identifier accepted by the underlying engine.

const provider = cf.getProviderByConfig({ name: 'My Google BigQuery' });

// Table metadata (includes `objectFields` describing its columns)
await provider.request('source', 'bigquery-public-data:austin_311.311_service_requests');

// Dataset metadata
await provider.request('source', 'bigquery-public-data:austin_311');

// Project metadata
await provider.request('source', 'bigquery-public-data');

For providers with strict schemas (Redshift, Databricks, BigQuery, etc.), calling source is the fastest way to prime the metadata cache before building visuals.

sources¶

sources returns the top-level collection for the provider: projects in BigQuery, catalogs in Databricks, schemas in Redshift, tables in SparkSQL, indexes in Elasticsearch, and so on.

const provider = cf.getProviderByConfig({ name: 'My Databricks' });

const catalogs = await provider.request('sources');
console.log('Available catalogs:', catalogs);

The structure of the returned list varies by provider, but each entry contains at least an identifier you can feed into subsequent source or source-by-id requests.

source-by-id¶

source-by-id drills one level deeper by returning the children of a previously registered resource. The payload must be the identifier that the provider has already cached (usually after a source call).

const provider = cf.getProviderByConfig({ name: 'My Redshift' });

// Register the schema first to ensure it exists in the provider cache
await provider.request('source', 'public');

const tables = await provider.request('source-by-id', 'public');
console.log('Tables in public schema:', tables);

Use this action when you need the list of tables inside a schema or datasets inside a project. If the resource is missing from the cache, the promise will reject.

Note

Out of the box data providers are described individually in the following sections.