Skip to content

How to combine data from cube pre-agg with data from a cube without a pre-agg? #9635

Open
@benskz

Description

@benskz

Problem

We're trying to figure out how to combine data from a pre-aggregated cube with data from a non pre-aggregated cube. One cube has tons of data which we'd want to be accelerated by pre-aggs but the other is relatively small and can update frequently so we'd want the most up to date data.

Related Cube.js schema

cube('events', {
  sql_table: 'public.events',

  data_source: 'default',

  dimensions: {
    event_id: {
      sql: 'event_id',
      primary_key: true,
      public: false,
      type: 'string',
    },

    timestamp: {
      sql: 'timestamp',
      type: 'time',
    },

    event: {
      sql: 'event',
      type: 'string',
    },

    user_id: {
      sql: 'user_id',
      type: 'string',
    },

    // ...
  },

  measures: {
    clicks: {
      type: 'count',
      filters: [{ sql: `${CUBE}.event = 'click'` }],
    },

    // ...
  },

  joins: {
    users: {
      sql: `${users}.user_id = ${events}.user_id`,
      relationship: 'one_to_many',
    },
  },

  pre_aggregations: {
    daily_events: {
      dimensions: [CUBE.user_id, CUBE.event],
      measures: [CUBE.clicks],
      time_dimension: CUBE.timestamp,
      granularity: 'day',
      partition_granularity: 'month',
    },
  },
});

cube('users', {
  sql_table: 'public.users',

  data_source: 'default',

  dimensions: {
    user_id: {
      sql: 'user_id',
      primary_key: true,
    },

    name: {
      sql: 'name',
      type: 'string',
    },

    // ...
  },
});

In this example, we'd want to query the number of clicks for each user and have the events side accelerated by the pre-aggregation.

However, in it's current form, the pre-aggregation is never hit for the event data.

Related Cube.js generated SQL

SELECT
  "users".name "users__name",
  count(
    distinct CASE
      WHEN ("events".event = 'click') THEN "events".event_id
    END
  ) "events__clicks"
FROM
  public.events AS "events"
  LEFT JOIN public.users AS "users" ON "users".user_id = "events".user_id
WHERE
  (
    "events".timestamp >= $ 1 :: timestamptz
    AND "events".timestamp <= $ 2 :: timestamptz
  )
GROUP BY
  1
ORDER BY
  2 DESC
LIMIT
  10000

Any help would be hugely appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionThe issue is a question. Please use Stack Overflow for questions.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions