Open
Description
Problem
We're trying to figure out how to combine data from a pre-aggregated cube with data from a non pre-aggregated cube. One cube has tons of data which we'd want to be accelerated by pre-aggs but the other is relatively small and can update frequently so we'd want the most up to date data.
Related Cube.js schema
cube('events', {
sql_table: 'public.events',
data_source: 'default',
dimensions: {
event_id: {
sql: 'event_id',
primary_key: true,
public: false,
type: 'string',
},
timestamp: {
sql: 'timestamp',
type: 'time',
},
event: {
sql: 'event',
type: 'string',
},
user_id: {
sql: 'user_id',
type: 'string',
},
// ...
},
measures: {
clicks: {
type: 'count',
filters: [{ sql: `${CUBE}.event = 'click'` }],
},
// ...
},
joins: {
users: {
sql: `${users}.user_id = ${events}.user_id`,
relationship: 'one_to_many',
},
},
pre_aggregations: {
daily_events: {
dimensions: [CUBE.user_id, CUBE.event],
measures: [CUBE.clicks],
time_dimension: CUBE.timestamp,
granularity: 'day',
partition_granularity: 'month',
},
},
});
cube('users', {
sql_table: 'public.users',
data_source: 'default',
dimensions: {
user_id: {
sql: 'user_id',
primary_key: true,
},
name: {
sql: 'name',
type: 'string',
},
// ...
},
});
In this example, we'd want to query the number of clicks for each user and have the events side accelerated by the pre-aggregation.
However, in it's current form, the pre-aggregation is never hit for the event data.
Related Cube.js generated SQL
SELECT
"users".name "users__name",
count(
distinct CASE
WHEN ("events".event = 'click') THEN "events".event_id
END
) "events__clicks"
FROM
public.events AS "events"
LEFT JOIN public.users AS "users" ON "users".user_id = "events".user_id
WHERE
(
"events".timestamp >= $ 1 :: timestamptz
AND "events".timestamp <= $ 2 :: timestamptz
)
GROUP BY
1
ORDER BY
2 DESC
LIMIT
10000
Any help would be hugely appreciated.