Skip to content

Querying Data

The ChartFactor Toolkit can be used to query data without rendering any visualization. This data could be either row-level or aggregated data. This is useful when the data needs to be rendered outside ChartFactor visual components, ie: drop-down selectors, custom tables, menus, etc. Querying data is done in pretty much the same way as rendering visualizations. Actually, every time a visual is rendered, the data injected to that visual is available in the same way.  This means we can have our visual and also the data to be used anywhere else.

To query data without rendering a visualization you need to do the following:

  • The element() function needs to be invoked since it is required. The element passed as parameter may exist or not in the DOM, preferably it does not exist to avoid any conflicts. It is only used to identify the query.
  • After the execute() method, which returns a JavaScript promise, the .then() function must be added in order to get the queried data.
1
2
3
4
5
6
7
8
9
cf.provider('Elasticsearch')
    .source('ticket_sales')
    .groupby(...groups)
    .metrics(...metrics)
    .element('dummy')
    .execute()
    .then(function(result){
        console.log(result.data)
    })

The result object obtained in the callback passed to .then() will have the following format:  

1
2
3
4
5
result = {
    'definition': {...},
    'data': [...],
    'visualization': undefined
}

In the result object, data is an array with the queried information in the format used by ChartFactor.  The other two attributes are useful only when rendering a visualization.

Query types

The following are the basic types of queries supported:

Top-n and Bottom-n queries

Top-n and bottom-n queries use attribute definitions that include sort and limit.  The example below is basically saying: group the venuestate attribute and bring the top 10 based on the sum of their commission. Also, within each venuestate, group their venue cities and bring the top 10 based on the sum of their commission.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
   // define metrics
    var metric = cf.Metric('commission', 'sum');
    var metrics = [metric];

    // define attributes to group by
    var group = cf.Attribute('venuestate')
                        .limit(10)
                        .sort('desc', metric);
    var group2 = cf.Attribute('venuecity.keyword')
                        .limit(10)
                        .sort('desc', metric);

    // set up the chart
    var myChart = cf.provider('Elastic')
                    .source('ticket_sales')
                    .groupby(group, group2)
                    .metrics(...metrics)
                    .element('query_1')
                    .set('legend','right')

    // execute it
    myChart.execute();

Simple multigroup queries

Simple multigroup queries use similar syntax as the one used to render Pivot visualizations. Use .rows() to specify the fields you'd like to group by. Use .limit() to limit the results. If a limit is not specified, the query is defaulted to 1000 and a warning is printed in the javascript console. The following is an example of a simple group-by query, aggregating the sum of life expectancy, grouped by region and country, and limited to 10 results:

1
2
3
4
5
6
7
8
9
cf.provider('SparkSQL')
    .source('life_expectancy_region')
    .metrics(cf.Metric('Life_expectancy', 'sum'))
    .rows('Region', 'Country')
    .limit(10)
    .element('query_2')
    .execute().then(result => {
        console.log(result.data.length === 10);
    });

Metric aggregation without grouping

The following is an example of a min and max metric aggregation without grouping.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
/* Configuration code for this widget */
cf.provider("Elasticsearch")
    .source('chicago_taxi_trips')
    .metrics(
        cf.Metric("fare","min"),
        cf.Metric("fare","max")
    )
    .element('query_3')
    .execute().then(result => {
        console.log(result.data[0].current.metrics.fare);
    });

Time min/max aggregation

To obtain time min/max aggregation, use the .timeField(timeField) function to provide your time field and set timeRangeVisual to true. After executing the example below, you can obtain the min and max of the trip_start_timestamp field in the chicago_taxi_trips dataset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
const timeField = cf.Attribute('trip_start_timestamp').func('DAY');

cf.provider("Elasticsearch")
    .source("chicago_taxi_trips")
    .timeField(timeField)
    .set('timeRangeVisual', true)
    .element('query_4')
    .execute().then(response => {
        // Minimun
        console.log(response.data[0].min)
        // Maximun
        console.log(response.data[0].max)
    })

Detail queries

Detail queries or raw-level queries are used to obtain non-aggregated information. This type of query is used by the Raw Data Table visualization or the Geo Map visualization for example. To perform these queries, use .fields() to specify the columns that you would like to include in the results. Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cf.provider('SparkSQL')
    .source('life_expectancy_region')
    .fields('Region', 'Country')
    .limit(10)
    .set('sort', [
        {'Region': 'asc'}
    ])
    .element('query_5')
    .execute().then(result => {
        console.log(result.data.length === 10);
    });

You can use the .fields() method without arguments to bring all the available fields from the data source. Also notice the use of .set('sort') to specify the sort field and order for the query.

In addition, the offset option can be used to indicate the offset of the query. Let's say that after getting the first 10 rows (row 1 to row 10) from the query above, we need the next 10 (row 11 to row 20). In that case, we would use the offset option as shown below.

1
2
3
4
5
cf.provider('SparkSQL')
  ...
  .set('offset', 10)
  ...
  .execute()

GeoQueries

GeoQueries are important when the data source contains geographical data types, allowing to perform geographic-specific queries.

Geohash queries

Geohash is a public domain geocoding system, which encodes a geographic location into a short string of letters and digits.

In ChartFactor, you can perform Geohash aggregations by specifying the "location" field when using data engines that support geo-point data types such as Elasticsearch. For SQL engines such as BigQuery, specify the latitude and longitude columns using an array.

Additionaly, you need to specify the precision which can be a number from 1 to 8. The Wikipedia article Geohash Characters and Precision describes these concepts well.

The following is a code example for a geohash aggregation where the location field is "pickup_location" and the precision is 6. This works when using Elasticsearch which supports the geo-point data type that encapsulates latitude and longitude information:

1
2
3
4
5
6
    cf.provider('Elasticsearch')
        .source('chicago_taxi_trips')
        .location('pickup_location') // geo point field
        .precision(6)
        .element('dummy')
        .execute();

The following is a code example for a geohash aggregation using BigQuery. In this case location takes an array with latitude and longitude fields:

1
2
3
4
5
6
    cf.provider("BigQuery")
        .source('bigquery-public-data:new_york_311.311_service_requests')
        .location(['latitude', 'longitude'])
        .precision(5)
        .element('dummy')
        .execute();
The response similar to the one below:

1
2
3
4
5
6
7
    [
        {
        'geohash': 'dp3wmb',
        'count': 1064
        },
        ...
    ]

You can convert the alphanumeric geohash value to a point or an area (bounds) with the latlon-geohash library or whatever format that supports this standard.

Refer to the Providers Capabilities Summary document for the list of providers that support Geohash queries.

Geo Bounding Box Filtering

Geo Bounding Box Filtering is useful when you want to filter the results by the visible area on your Geo Map viewport. See the example below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    cf.provider('Elasticsearch')
    .source('chicago_taxi_trips')
    .precision(8)
    .location('pickup_location') // Change to an array if using BigQuery
    .filter(cf.Filter('pickup_district').operation('IN').value(['West Side']))
    .metrics(cf.Metric('fare', 'sum'), cf.Metric())
    .bbox({
        topLeft: {
            lat: 41.8990,
            lng: -87.6448
        },
        bottomRight: {
            lat: 41.8836,
            lng: -87.6050
        }})
    .element('dummy')
    .execute();

Same as Geohash queries, refer to the Providers Capabilities Summary document for the list of providers that support Geo Bounding Box Filtering.

Geo Drawing Filtering

Geo Drawing filtering is useful when you want to filter the data by a specific polygon or distance.

Warning

  • Geo Drawing Filtering is an experimental feature. Its documentation and configuration properties may change within minor and major releases.
  • This feature is currently supported only for the Elasticsearch 7.x data provider versions v7.16 and above.
Filtering by a polygon

The example below filters the data by a polygon described by cf.Filter().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    cf.provider("cxairlines")
    .source("cell_towers_usa")
    .groupby(cf.Attribute("operator")
        .limit(25)
        .sort("desc", cf.Metric()))
    .metrics(cf.Metric("count"))
    .filter(cf.Filter("location").label("shape")
        .type("POLYGON").operation("IN")
        .value([[-87.64781, 41.89929], [-87.64638, 41.87404], [-87.60244, 41.87618], [-87.62799, 41.90445], [-87.64781, 41.89929]]))
    .element("dummy")
    .execute();
Filtering by distance

The example below filters the data by the distance described by cf.Filter(). The "distance" value is in kilometers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    cf.provider("cxairlines")
    .source("cell_towers_usa")
    .groupby(cf.Attribute("operator")
        .limit(25)
        .sort("desc", cf.Metric()))
    .metrics(cf.Metric("count"))
    .filter(cf.Filter("location").label("distance")
        .type("DISTANCE").operation("IN")
        .value([{ "center": [-87.6227, 41.8886], "distance": 1.3 }])        
    )
    .element("dummy")
    .execute();

Refer to the Providers Capabilities Summary document for the list of providers that support Geohash queries.

After executing your query

The Interaction Manager (IM), when used in your application, it automatically updates and re-executes your queries as users add and remove filters. For example, you may be rendering drop-downs, custom tables, and menues that need to be updated as users interact with your application. If this is the case, there is nothing else to do.

There are cases however when the IM should not re-execute your queries as users add and remove filters. You have two options:

Remove your query

Use the Aktive's remove() function or the remove(elementId) of the ChartFactor object to remove your query(s) after executing them. The example below illustrates the Aktive's remove() function.

1
2
3
4
5
6
7
export const removeAktives = () => {
    // obtain all the queries I want to remove
    const aktivesToRemove = cf.getTopLevelVisualizations()
                            .filter(obj => obj._elementId.startsWith("query_for_"));
    // call the remove() function to remove them
    aktivesToRemove.map(x => x.remove());
}

Configure a rule

You can tell the IM that your query should not receive any filters by using the IM's receive rule. You can configure this and other IM rules as needed.