Skip to content

Derived Fields

Chartfactor allows to create custom or derived fields to be used as any other field that exist in the data source. The purpouse of these fields is to create new values that can be calculated from existing values in the datasource

Derived fields are divided in aggregate level and row level. Both types are defined at provider's definition time, specifically in the metadata, so they are available to be used as a typical metric or attribute when we define the visualization's aql.

These fields depend on one or more fields that exist in the data source to derive results from them.

Aggregate level

Aggregate level fields are calculated on the client side, which means that the data will be processed to fit the calculation required by the user. Knowing this, it is important to mention that this type of fields should be used in low cardinality queries, otherwise it may compromise the performance.

Let's see some examples.

Derived Attribute

Let's say that we have a field in the datasource that is called channelGrouping and it represents the source of traffic that a web site receives. We want to group by that field but we would like to use a different nomenclature for each value to be visualized. In this case we will define the fields as follow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
let channelAlias = {
    'name': 'channel_alias',
    'label': 'Channel Group Alias',
    'type': 'ATTRIBUTE',
    'dependencies': ['channelGrouping'],
    'function':  (channel) => {
        let alias = {
            'Affiliates' : 'AF'
            'Direct' : 'DIR',
            'Display' : 'DIS',
            'Organic Search' : 'OS',
            'Paid Search' : 'PS'
            'Referral' : 'REF'
            'Social' : 'SOC'
        }
        return alias[channel];
    }
}

The previous definition represents the new field. Here we specify different properties. Except for the label, the rest is required. The type must be according to the type of the value that the function returns (ATTRIBUTE, INTEGER...)

The dependency property is an array. It supports one dependency of type attribute which has to be an existing field in the datasource, it can not be a derived field. It also supports dependencies of type metric.

The function is what modifies the value. It takes as a parameter the value of the dependency (or dependencies if they are metrics) and returns one single value. So for example if we use this field with a limit of 10, the function will be executed 10 times for each value obtained from the query.

After defining the derived field, we have to include it in the metadata. More about custom metadata here.

1
2
3
4
5
6
7
providers[0].metadata = {
    'source_name' :{
        'fields': {
            'channel_alias': channelAlias,
        }, 
    },
}

Warning

Make sure the name property of your derived field matches exaclty with the name of the field in the source metadata. Otherwise the derived field may not work correctly. For example, in the Channel Alias example above, the name property of the derived field is channel_alias. Therefore, the key for the field name used in the source metadata definition should also be channel_alias.

Finally we can use the custom field as any other:

1
2
3
let group = cf.Attribute('channel_alias')
                .limit(10)
                .sort('desc', 'channel_alias')

The above AQL will group the new value alphabetically. Let's say that we now want to apply a custom order to the results.

To do so, we can apply "weights" to each value with the property order:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
let channelAlias = {
    'name': 'channel_alias',
    'label': 'Channel Group Alias',
    'type': 'ATTRIBUTE',
    'dependencies': ['channelGrouping'],
    'function':  (channel) => {
        let alias = {
            'Direct' : { value: 'DIR', order: 1 },
            'Display' : { value: 'DIS', order: 2 },
            'Organic Search' : { value: 'OS', order: 3 },
            'Paid Search' : { value: 'PS', order: 4 },
            'Affiliates' : { value: 'AF', order: 5 },
            'Referral' : { value: 'REF', order: 6 },
            'Social' : { value: 'SOC', order: 7 }
        }
        return alias[channel];
    }
}

And that's it. Chartfactor will recongnize the custom sort and it will use it as long as we use the custom field as the sort parameter in the .sort() function.

It is important to notice that this is the only case where the function returns an object.

Filtering derived attributes

By default, derived attributes can't be used as filters. They require an extra property called reverse that will take care of translating a derived value back to the original one. So for example for channel_alias we could have something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
let channelAlias = {
    'name': 'channel_alias',
    'label': 'Channel Group Alias',
    'type': 'ATTRIBUTE',
    'dependencies': ['channelGrouping'],
    'reverse': (derivedChannel) => {
        let original = {
            'AF': 'Affiliates',
            'DIR': 'Direct',
            'DIS': 'Display',
            'OS': 'Organic Search',
            'PS': 'Paid Search',
            'REF': 'Referral',
            'SOC': 'Social'
        }
        return original[derivedChannel];
    },
    'function':  (channel) => {
        let alias = {
            'Affiliates' : 'AF'
            'Direct' : 'DIR',
            'Display' : 'DIS',
            'Organic Search' : 'OS',
            'Paid Search' : 'PS'
            'Referral' : 'REF'
            'Social' : 'SOC'
        }
        return alias[channel];
    }
}

The reverse function expects the derived value as parameter. In other words, we must reverse any transformation done to the original value.

If the reverse function is not defined, the ChartFactor will warn about it and no filter will be applied.

Restriction in the sort criteria

Derived fields can not be sorted by derived metrics, but only by themselves or by any other primitive metric field.

Derived Metric

For this example we have the statistics of several web pages and we have for each the amount of visits and bounces they receive. We want then to know what is the bounce rate which can be obtained dividing these 2 values.

1
2
3
4
5
6
7
8
9
const dep1 = cf.Metric('totals.visits', 'sum');
const dep2 = cf.Metric('totals.bounces', 'sum');
const bounceRate= {
    'name': 'bounce_rate',
    'label': 'Bounce Rate',
    'type': 'NUMBER',
    'dependencies': [dep1, dep2],
    'function':  (visits, bounces) =>   bounces / visits * 100
}

Notice that we have used two dependencies (totals.visits and totals.bounces). This means that we can use one or many dependencies of type metric (integers, floats, doubles...)

Dependencies must be metric objects so they can include the operation. If the dependency or dependencies are passed as string values, the operation will be defaulted to sum and ChartFactor will warn about it in the console.

After defining the metric, we include it in the metadata:

1
2
3
4
5
6
7
8
providers[0].metadata = {
    'source_name' :{
        'fields': {
            'channel_alias': channelAlias,
            'bounce_rate': bounceRate,
        }, 
    },
}
And then we use it as any other metric, except for the difference that no function (sum, avg, max... ) is specified:

1
2
3
4
5
6
let metrics = [
    cf.Metric('bounce_rate'),
    cf.Metric('count'),
    cf.Metric('totals.hits', 'sum'),
    ...
]

As shown in the example above, derived metrics won't take functions as a second parameter and they will be ignored if they are specified. The reason is that every derived metric will use derived as a function.

If another function is required, a new derived metric must be created with the dependency using that function.

Note

Histogram and box plot visualizations do not support derived metrics.

Row level

Row level derived fields are calculated in the data engine. This means that they are translated into the query language used by the engine. We can apply filters using this fields since they will also be translated before sending them to the server.

For now there are two functions allowed for these type of fields: toDate and extract

Not all providers support these fields. Check the capabilities table here

toDate

The toDate function is useful when dates and times are stored in the data source as a different type. This could be a string, an integer (unix time or epoch), etc.

In these cases we cannot use Trends for example over these fields. We need to transform them into a valid time first:

1
2
3
4
5
6
7
8
9
let eventDate = {
    'name': 'event_date',
    'label': 'Event Date',
    'type': 'TIME',
    'function': 'toDate',
    'timestampGranularity': 'DAY',
    'dependencies': ['date'],
    'params': ['yyyyMMdd']    
}
In the example above, the field date in the dependency is stored as a string. A value for this field could be "20170630". Note that the property params tells ChartFactor the format used at storage level for that field. The format syntax depends on the data provider being used. For example, for BigQuery, the format parameter for the example above would be '%Y%m%d'.

Then we add the field to the metadata just as the rest:

1
2
3
4
5
6
7
8
providers[0].metadata = {
    'source_name' :{
        'fields': {
            ...
            'event_date': eventDate,
        }, 
    },
}

And then we use it:

1
2
3
4
let group = cf.Attribute('event_date')
                .func('MONTH')
                .limit(10)
                .sort('asc', 'event_date')

If date is an integer, for example a unix time, then no parameter is needed except for Google BigQuery. Usually they are stored in seconds, but Google BigQuery also allows it in milliseconds and even microseconds. So we will have something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
let eventDate = {
    'name': 'event_date',
    'label': 'Event Date',
    'type': 'TIME',
    'function': 'toDate',
    'timestampGranularity': 'DAY',
    'dependencies': ['date'],
    // Only if is Google Bigquery and when not stored in seconds:
    'params': ['MILLISECONDS']    
}

The rest will be the same.

extract

The extract is a shorcut to the sql function with the same name: It extracts a time part from a time field. For example day, month, year, hour ... etc. This depends on the provider being used. Refer to the documentation of the EXTRACT sql function for Spark, Dremio and Google BigQuery to know the date parts formats that can be used for each.

The function can be applied over any time field whether it is in date, datetime, timestamp, string or integer format. So for the previous example, we can extract the month for the field date even if it is in string format:

1
2
3
4
5
6
7
8
let dateMonth = {
    'name': 'date_month',
    'label': 'Date Month',
    'type': 'ATTRIBUTE',
    'function': 'extract',
    'dependencies': ['date'],
    'params': ['MONTH']
}

Note that the type of the field is 'ATTRIBUTE' since it returns a number representing in this case the month.