Skip to content

Custom Queries

ChartFactor performs most of its aggregations and analytics over tables and views that exist in their respective data provider.

Sometimes however, we may want to analize data that is not available on a single table but that needs to be joined over different tables and views.

One way to accomplish this is of course to create a table or view of the resulting join directly in the data engine, outside ChartFactor.

Multiple Queries is sometimes a good fit to do this, specially when we only need one visualization to render the final data and the joining of the data is straightforward.

Yet another solution available in ChartFactor is custom queries. They allow users to define SQL queries and name them. ChartFactor then treats these queries as any other table or view where users can perform aggregations to visualize their data.

Providers support

Please refer to the providers capabilities table for custom query support across providers.

Defining the custom query

Note

If you are using ChartFactor Studio, refer to its Custom Query documentation where we show the UI to create and maintain custom queries. This section is only applicable when using ChartFactor Toolkit in standalone applications.

Declare your query, which is basically a string representing a SQL sentence:

1
2
3
4
5
const myQuery = `
  SELECT customer.name, customer.occupation, customer.debt, customer_adress.city
  FROM customer
  LEFT JOIN customer_adress ON customer.address_id = customer_adress.id
`

Once you have your query defined, you need to pass it to the provider. This can be done in two ways. The first one and most recomended is in the provider's custom metadata:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
var providers = [ {
        name:'SparkSQL',
        provider:'sparksql',
        url:'https://mysparkserver',
        metadata: {
            'my_custom_query': {
                'query': myQuery
            }
        }
    }
]

cf.setProviders(providers);

myChart = cf.provider('Elasticsearch')....execute();

In the above example we're using my_custom_query as the source name and we're passing the myQuery variable to the query property.

The second way to pass custom queries to the provider is after the provider has been created, specifically after the cf.setProviders() method has been called. At this point you can obtain the provider and inject the custom query directly:

1
2
3
const provider = cf.getProviderByConfig({ name: 'SparkSQL' })

provider.addSourceMetaData('my_custom_query', { query: myQuery });

Using your custom query

After the query has been registered, it behaves as any other source. You can create visualizations using the name you gave to your query as the source:

1
2
3
4
5
6
7
cf.provider('SparkSQL')
  .source('my_custom_query')
  .groupby(cf.Attribute('city').limit(10).sort('asc', 'city'))
  .metric(cf.Metric('debt', 'sum'))
  .graph('Bars')
  .element('id')
  .execute()

Some considerations

The ability to create custom queries is very powerful and useful and allows a higher level of flexibility. However, there are some things that we need to take in cosiderations before using them.

Performance

This is a very important point. When the custom query takes a long time to execute, returns a large amount of data, and on top of that, it is used on multiple visualizations in a dashboard, then the recommended way is to create a materialized view.

The reason is that most data engines don't do partial caching and therefore different aggregations on the same custom query will be seen as totally different queries by the data engine.

Query name

The name of the custom query can be any name as long as they only include alphanumeric characters and the underscore "_".

Query SQL

Comments

SQL comments start with double dashes "--". You can use them in your query as long as they are in a new line:

1
2
3
4
5
-- We only need these fields
SELECT f1, f2, f3
FROM source
-- only this month
WHERE month = "April"

Avoid inline comments:

1
2
3
SELECT f1, f2, f3 -- We only need these fields
FROM source
WHERE month = "April" -- only this month

Field names

Make sure that columns obtained from SQL functions are translated to simple names by using aliases with the sql AS statement. For example:

1
2
3
SELECT name, SUM(debts) AS debts_sum, AVG(score) AS score_avg, COUNT(*) AS count
FROM users
LIMIT 100

The result of this query should look like this:

name debts_sum score_avg count
User1 200.00 3.5 31