Skip to content

Multiple Queries

Sometimes we need to query data that is going to be used to construct filters for another query, and this data can be or not in the same source, or even in the same provider.

There are two ways to do this multiple queries.

Using nested queries

In the Querying Data section we show how to perform queries to obtain data without rendering a visual. These queries return promises, so if we nest another query within the .then(), it will be executed as soon as the first query is resolved. So we format our data and then pass whatever we need to the nested one.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
cf.provider('SparkSQL')
    .source('ticket_sales')
    .element('query1')
    .groupby(...groups)
    .metrics(...metrics)
    .execute()
    .then(function(result){
    var filters = getFiltersFromData(result.data)
    cf.provider('Elastic')
        .source('events')
        .filters(...filters)
        .bars()
        .element('chart')
        .execute()
        //.then() and repeat the process
        // and we have to query again
})

This works well and is pretty simple to do. There are a few cons of this approach however.

The first one is, if for example we have more than 2 queries, the code may start looking a little bit more complicated and we can fall into something known as Callback Hell.

Another drawback is that the previous example will create 2 query objects, and the more queries we nest, the more queries will be created. We could re-use the query, but we may want to go back at some point to the original query and we’ll have to rebuild it again and we may loose some track of the state of it.

Using ChartFactor multi query builder

This way uses ChartFactor chaining method to build the queries without any nesting at all. It also provides other advantages.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
var callbackQuery1 = function(result) {
    var filters = getFiltersFromData(result.data)
    result.filters = filters
    return result
}

// Start of 2nd query
cf.create('Query1')
    .provider('SparkSQL')
    .source('ticket_sales')
    .groupby(...groups)
    .metrics(...metrics)
    .processWith(callbackQuery1)

    // Start of 2nd query
    .create('Query2')
    .provider('Elastic')
    .source('events')
    .groupby(...groups)
    .metrics(...metrics)

    // Finally we can visualize
    .bars()
    .element('chart')
    .execute()

    // Or we could just execute it to obtain
    // the data:
    .element('dummy') 
    .execute()
    .then(function(data){
    console.log(data)
    })  
})

This method is cleaner than the first one. It only creates one query object, and it can be re-used completely or partially (use only Query1 or Query2).

Let’s break down how it works:

The first thing here is the .create() method that we saw how it works here. In this case we pass a name to identify the query that follows. We use ‘Query1’.

Then we tell the query what provider, source and query configurations to use (groups, metrics, rows or any other).

Finally for the first query we use .processWith() which takes a function as parameter. This function is called data handler and it acts as a callback. Here the data obtained from Query1 is exposed and it allows to do any operation with it and then pass it down to the next query. In this case we see how it build filters, then returns the same result with those filters appended. We’ll see these handlers in details later in this article.

Then, the whole process is repeated for Query2.

There are a couple of things to point here. The first one is that .element() is defined only once, and the reason is as we said, it’s only one query object. The second is that Query2 doesn’t have a data handler. It actually has one by default. This only happens with the last query of the chain. For any other previous query a handler must be defined. The last one is that the data obtained in line 31 (if we decide to do this instead of visualize) will be the result of the 2nd query only. We’ll see how to obtain the data from all queries instead.

Data handlers

This multi-query method uses a cascade or pipe effect, where data from the first query is passed down to the second, and from this to a third and so on.

This pipe effect is done through the data handlers. The most basic structure of a handler is this one:

1
var myHandler = function(result){ return result }

The parameter result can have the following format:

1
2
3
4
5
var result = {
    data: { "Query1":[...], "Query2": [...] },
    filters: [...],
    keep: true
}

data is an object where the keys are the names passed in the .create() method, and they will contain the data obtained from that specific query. This is filled automatically after each query is executed, but is mutable. This means that if we modify the data it contains, that will be available for the next query. Each array will contain the data in the format explained here.

filters is where we can pass filters we construct from previous results, as we do in the example. Filters is an stackable array. So the last query will apply all filters that it finds in it. We’ll have to manually delete the filters we don’t want to apply in the next query from it.

keep is the flag that indicates if the last query should keep or not all the data from all the previous queries. IMPORTANT: This option should be used ONLY if we are obtaining the data instead of visualizing it, so it’s only useful in the handler of the last query. For our example it could be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
var callbackQuery2 = function(result){
    result.keep = true;
    return result;
}

// And then
// Start of 2nd query
.create('Query2')
...
.processWith(callbackQuery2)
.element('dummy')
.execute()
.then(function(result){
   // result.data will have the data from Query1 and
  // Query2
   console.log(result.data)
})

Re-using the query

Multi-queries objects can be re-queried totally or partially. This means that we can obtain it as we explain here , then modify whatever we need and query again the whole thing or only part of it.

Let’s say that we want to query again only the last query (Query2) with a different set of filters, but use the same provider, source, groups, metrics and options if any.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
var multiQuery = cf.getVisualization('dummy')

multiQuery
        .useOnly('Query2')
        .filters(newFilters)
        .execute()
        .then(function (result) {
            console.log(result.data);
        });

})

With the use of .useOnly() we tell the object to silence any other query configuration it has, apply the changes to the specified and execute it. After the execution all queries will be un-silenced again. So we’ll have to use .useOnly(‘Query2’) again if we want the same, otherwise the whole object will be executed.