Skip to content

Update bucket aggregations documentation #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 52 additions & 5 deletions _aggregations/bucket/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,19 @@ redirect_from:
- /query-dsl/aggregations/bucket/
- /aggregations/bucket-agg/
---

# Bucket aggregations

Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines the bucket for a given document.
Bucket aggregations categorize sets of documents into buckets. The type of bucket aggregation determines which bucket a given document belongs to.

You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users filter the results.
You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users filter results.

## Supported bucket aggregations

OpenSearch supports the following bucket aggregations:

- [Adjacency matrix]({{site.url}}{{site.baseurl}}/aggregations/bucket/adjacency-matrix/)
- [Children]({{site.url}}{{site.baseurl}}/aggregations/bucket/children)
- [Auto-interval date histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/auto-interval-date-histogram/)
- [Children]({{site.url}}{{site.baseurl}}/aggregations/bucket/children/)
- [Date histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/)
- [Date range]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-range/)
- [Diversified sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/diversified-sampler/)
Expand All @@ -43,4 +43,51 @@ OpenSearch supports the following bucket aggregations:
- [Sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/)
- [Significant terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-terms/)
- [Significant text]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-text/)
- [Terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/terms/)
- [Terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/terms/)

## Common parameters

Most bucket aggregations support the following common parameters:

- `field`: The field to aggregate on.
- `script`: A script to generate values to aggregate on.
- `missing`: A value to use for documents that are missing the field.

Refer to the documentation for each specific aggregation type for details on additional parameters.

## Nesting aggregations

You can nest bucket aggregations inside other bucket or metric aggregations to create more complex analyses. For example:

```json
GET my-index/_search
{
"aggs": {
"genres": {
"terms": {
"field": "genre"
},
"aggs": {
"avg_rating": {
"avg": {
"field": "rating"
}
}
}
}
}
}
```

This example first buckets documents by genre, then calculates the average rating for each genre bucket.

## Performance considerations

Bucket aggregations can be resource intensive, especially on high-cardinality fields or when nesting multiple aggregations. Consider the following to optimize performance:

- Use `size` to limit the number of buckets returned.
- Apply a `filter` aggregation first to reduce the document set.
- For high-cardinality fields, consider using `significant_terms` instead of `terms`.
- Monitor aggregation performance and adjust as needed.

For more details on each bucket aggregation type, refer to its specific documentation page.
61 changes: 61 additions & 0 deletions _search/search-query-logging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Search Query Logging

Search query logging allows you to capture and analyze information about search queries executed on your OpenSearch cluster. This feature can be useful for monitoring, debugging, and optimizing search performance.

## Enabling Search Query Logging

Search query logging is disabled by default. To enable it:

1. Set the `opensearch.search.query.logging.enabled` setting to `true`:

```yaml
opensearch.search.query.logging.enabled: true
```

2. Restart your OpenSearch nodes for the setting to take effect.

## Configuring Logging Settings

You can configure the following settings to control search query logging behavior:

```yaml
opensearch.search.query.logging:
min_time_to_log: 1s # Minimum query time to log
log_slow_queries: true # Log slow queries exceeding min_time_to_log
```

- `min_time_to_log`: Queries that take longer than this threshold will be logged. Default is 1 second.
- `log_slow_queries`: Whether to log slow queries exceeding the `min_time_to_log` threshold. Default is true.

## Log Output

When enabled, search query logs will be written to the OpenSearch log file. A typical log entry looks like:

```
[2023-05-15T10:30:15,123][INFO][o.o.s.SearchQueryLogger] Search query executed: indices=[my-index], query=[{"match":{"title":"OpenSearch"}}], took=[1200ms], hits=[100], total_hits=[1000]
```

The log entry contains the following information:
- Timestamp
- Indices searched
- Query body
- Query execution time
- Number of hits returned
- Total number of matching documents

## Using Search Query Logs

Search query logs can be used for:

- Identifying slow queries that may need optimization
- Analyzing popular search terms and patterns
- Debugging unexpected search results
- Monitoring overall search performance

You can use log analysis tools or the OpenSearch stack itself to analyze and visualize the logged query data.

## Considerations

- Enabling search query logging may have a small performance impact, especially for high-volume search workloads.
- Be mindful of any sensitive information that may be contained in search queries when storing or analyzing logs.
- Adjust the `min_time_to_log` setting as needed to capture an appropriate level of detail without generating excessive log volume.