Druid complex Lookup using other dimensions, with native query or Golang

Eli Segev
3 min readApr 16, 2022

Lookups in Apache Druid are a way to replace values by keys in datasources, without needing to actually change anything in the datasources. This is extremely useful for repeatable data. Common examples are replacing a country-code with the country name, language-code with the language name, and so on.

For me, a real world example was the need to replace meta-data that a customer might change on distant occasions, without having to update all the respective records in the datasource (because there might be a great amount of them, and it is not efficient to do it every customer update).

For the ability to update the meta-data on customer demand, it was convenient to create a lookup name for each customer. Then, in the lookup, it was convenient that the keys would be a combination of an ID (which is a dimension of its own) and the meta-data-key.
Example:

Datasource: items
-----------------
guid | customer_id | name | view_count
-------------------------------------
guid1 | custA | page1 | 12
guid2 | custA | page2 | 8
guid3 | custB | page1 | 15
Lookup 1: custA_metadata
------------------------
key | value
------------------------
guid1_background | dark
guid1_language | en-US
guid2_background | light
guid2_language | he-IL
Lookup 2: custB_metadata
------------------------
key | value
------------------------
guid3_background | dark
...

This guide will demonstrate how to create and fetch this structure in native Druid queries and also with Go (Golang).

Creating the lookups

To create the lookups, send the following POST request:

{
"__default": {
"custA_metadata": {
"version": "1",
"lookupExtractorFactory": {
"type": "map",
"map": {
"guid1_background": "dark",
"guid1_language": "en-US",
"guid2_background": "light",
"guid2_language": "he-IL"
}
}
}
}
}

Notice the first json node is__default , which is the default tier (a group of lookups across Druid servers).
If you decide to name the tier differently, you will need to update your Druid druid.lookup.lookupTier configuration to that alternate name (read more).

Now you have a lookup in place. You can repeat this for the other lookup names.

Verify that the lookup was successfully created

It takes a little time for lookups to propagate across Druid servers. One might need to retry the verification before it returns a successful response.

Use the following to verify that the lookup update was successful:

GET http://localhost:8888/druid/coordinator/v1/lookups/config

Use the following to verify that the lookup was propagated successfully:

GET http://localhost:8082/druid/listen/v1/lookups

Notice the ports are different, intentionally.

GroupBy aggregation native query

Because the keys in our lookup rely on another dimension (the guid), we can’t use a simple lookup-dimension. Before that, we need to “extract” the guid, and then apply “variations” to it. For this we can utilize the Cascade extraction and apply the lookup extraction-function to it.

{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "items"
},
"intervals": [
"2022-04-16T00:00:00Z/2022-04-16T23:59:59.999999999Z"
],
"dimensions": [
{
"type": "extraction",
"dimension": "name",
"extractionFn": {
"type": "stringFormat",
"format": "%s"
}
},
{
"type": "extraction",
"dimension": "customer_id",
"extractionFn": {
"type": "stringFormat",
"format": "%s"
}
},
{
"type": "extraction",
"dimension": "guid",
"outputName": "background",
"extractionFn": {
"type": "cascade",
"extractionFns": [
{
"type": "stringFormat",
"format": "%s_background"
},
{
"type": "registeredLookup",
"lookup": "custA_metadata",
"retainMissingValue": true
}
]
}
}
],
"filter": {
"type": "and",
"fields": [
{
"type": "in",
"dimension": "guid",
"values": [
"en-US"
],
"extractionFn": {
"type": "cascade",
"extractionFns": [
{
"type": "stringFormat",
"format": "%s_language"
},
{
"type": "registeredLookup",
"lookup": "custA_metadata",
"retainMissingValue": true
}
]
}
}
]
},
"granularity": "all",
"aggregations": [],
"postAggregations": []
}

How does it work, both in the dimensions and the filters? We declare our object with type “dimension” or “in” and request to get the guid dimension. Then we apply the cascade extraction function, and then apply two functions: the first will transform the guid to the key in the lookup (e.g. <guid>_background ) and the second will use the transformation result and get it from the lookup. I use retainMissingValue: true but of course you can drop it if it suits you.

Notice that the dimension has an outputName , which will be the name of the looked-up-dimension in the response.
And the filter has a values:[...] , which contains the values to filter by in the lookup.

Thank you Avi Rogalsky for this.

GroupBy fetch with Go

As explained above, we need to utilize the Cascade extraction.

With the grafadruid/go-druid Go library, we can use these:

dimensions := []builder.Dimension{
dimension.NewExtraction().
SetDimension("guid").
SetOutputName("background").
SetExtractionFn(extractionfn.NewCascade().
SetExtractionFns([]builder.ExtractionFn{
extractionfn.NewStringFormat().SetFormat("%s_background"),
extractionfn.NewRegisteredLookup().SetLookup("custA_metadata").
SetRetainMissingValue(true),
}))}
filters := filter.NewAnd().SetFields([]builder.Filter{
filter.NewIn().SetDimension("guid").
SetValues("en-US").
SetExtractionFn(extractionfn.NewCascade().
SetExtractionFns([]builder.ExtractionFn{
extractionfn.NewStringFormat().SetFormat("%s_language"),
extractionfn.NewRegisteredLookup().SetLookup("custA_metadata"),
}))})
q := query.NewGroupBy().
SetDimensions(dimensions).
SetFilter(filters).
...

That’s it.
You should be set now with using these complex lookups.

--

--