Databricks¶
The Databricks data provider allows ChartFactor to interact with Databricks.
Setup¶
To use the Databricks data provider you need include its library:
1 | |
The Provider JSON object requires a name, the provider type (e.g. 'databricks'), host, warehouse id, and token parameters. Example:
1 2 3 4 5 6 7 8 9 | |
1 2 3 4 | |
Supported provider properties:
host: The Server Hostname valuewarehouseId: The Databricks warehouse id valuetoken: The Databricks personal access token valueheaders: The HTTP headers to be sent to the Databricks server, if the Authorization header is provided it will be overwritten by the token valuemetadata: Please refer to the Custom Metadata section for a description on how to configure custom metadata
Finally, use the setProviders() method of ChartFactor to set your data provider definitions. Example:
1 | |
Partitioning and Clustering Support¶
Databricks tables can be partitioned and clustered (Liquid Clustering) to improve query performance and efficiency. To take advantage of these optimizations, queries must include the partitioned or clustered column in their WHERE clause.
In some cases, the value of this column is derived from a hash or computed using one or more functions. Within ChartFactor, this behavior can be configured using Custom Metadata by adding a routing property to the corresponding data source.
The routing Property¶
The routing property accepts an array of objects, where each object must define the following fields:
field: The name of the field used in the filter.partitionField: The name of the column used for partitioning or clustering in Databricks.- Either
functionorquery: Defines how to transform or resolve the filter value.
For query based routing, you should provide a SQL expression in the query property that uses the special variable {filter_value} to represent the value of the filter applied on the field property. An example is presented below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
For function based routing, the function property must receive a function that takes as input the value of the filter applied on the field property and returns the computed value for the partitionField. An example is presented below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
Note
If you provide both function and query, the query property will take precedence.
Supported Aggregations Out-Of-The-Box¶
SUM¶
1 | |
AVG¶
1 | |
MIN¶
1 | |
MAX¶
1 | |
PERCENTILES¶
1 | |
COUNT DISTINCT¶
1 | |
APPROXIMATE COUNT DISTINCT¶
For large datasets, the approximate count distinct aggregation function unique_approx can be used to improve performance when exact counts are not required.
1 | |
Dependencies¶
- None