Axiom is blazing fast. This page explains how you can further improve performance in Axiom.
Axiom is optimized for storing and querying timestamped event data. However, certain ingest and query practices can degrade performance and increase cost. This page explains pitfalls and provides guidance on how you can avoid them to keep your Axiom queries fast and efficient.
Practice | Severity | Impact |
---|---|---|
Mixing unrelated data in datasets | Critical | Combining unrelated data inflates schema, slows queries |
Excessive backfilling, big difference between _time and _sysTime | Critical | Creates overlapping blocks, breaks time-based indexing |
Large number of fields in a dataset | High | Very high dimensionality slows down query performance |
Failing to use _time | High | No efficient time-based filtering |
Overly wide queries (project *) | High | Returns massive unneeded data |
Mixed data types in the same field | Moderate | Reduces compression, complicates queries |
Using regex when simpler filters suffice | Moderate | More CPU-heavy scanning |
Overusing runtime JSON parsing (parse_json) | Moderate | CPU overhead, no indexing on nested fields |
Virtual fields for simple transformations | Low | Extra overhead for trivial conversions |
Poor filter order in queries | Low | Suboptimal scanning of data |
A “kitchen-sink” dataset is one in which events from multiple, unrelated applications or services get lumped together, often resulting in:
user_id
as a string, while others store it as a number in the same user_id
field.null
or typed differently for another.These issues reduce compression efficiency and force Axiom to scan more data than necessary.
k8s_logs
separate from web_traffic
._time
vs. _sysTime
gapsAxiom’s _time
index is critical for query performance. Ideally, incoming events for a block lie in a closely bounded time range. However, backfilling large amounts of historical data after the fact (especially out of chronological order) creates wide time overlaps in blocks. If _time
is far from _sysTime
(the time the event was ingested), Axiom’s time index effectiveness is weakened.
Future improvements: Axiom’s roadmap includes an initiative which aims to mitigate the impact of poorly clustered time data by performing incremental time-based compaction. Until then, avoid mixing large historical ranges with live ingest whenever possible.
Slow query performance in datasets with very high dimensionality (with more than several thousand fields).
Axiom stores event data in a tuned format. As a result:
Scoping the number of fields in a dataset below a few thousand can help you achieve the best performance in Axiom.
_time
field for event timestampsAxiom’s core optimizations rely on _time
for indexing and time-based queries. If you store event timestamps in a different field (for example, timestamp
or created_at
) and use that field in time filters, Axiom’s time-based optimizations will not be leveraged.
_time
: Configure your ingest pipelines so that Axiom sets _time
to the actual event timestamp.
created_at
, rename it to _time
at ingest._time
.where _time >= ... and _time <= ...
or the built-in time range selectors in the query UI.A single field sometimes stores different data types across events (for instance, strings in some events and integers in others). This is typically a side effect of using “kitchen-sink” ingestion or inconsistent parsing logic in your code.
tostring()
calls, etc.).By default, Axiom’s query engine projects all fields (project *
) for each matching event. This can return large amounts of unneeded data, especially in wide datasets with many fields.
Use project
or project-keep
Specify exactly which fields you need. For example:
Use project-away
if you only need to exclude a few fields: If you need 90% of the fields but want to exclude the largest ones, for instance:
Limit your results
If you only need a sample of events for debugging, use a lower limit
value (such as 10) instead of the default 1000.
Regular expressions (matches
, regex
) can be powerful, but they are also expensive to evaluate, especially on large datasets.
Use direct string filters
Instead of:
Use:
Use search
for substring search:
To find foobar
in all fields, use:
search
matches text in all fields. To find text in a specific field, a more efficient solution is to use the following:
In this example, cs
stands for case-sensitive.
parse_json
)Some ingestion pipelines place large JSON payloads into a string field, deferring parsing until query time with parse_json()
. This is both CPU-intensive and slower than columnar operations.
Ingest as map fields: Axiom’s new map column type can store object fields column by column, preserving structure and optimizing for nested queries. This allows indexing of specific nested keys.
Extract top-level fields where possible: If a certain nested field is frequently used for filtering or grouping, consider promoting it to its own top-level column (for faster scanning and filtering).
Avoid parse_json()
in query: If your JSON cannot be flattened entirely, ingest it into a map field. Then query subfields directly:
You can create virtual fields (for example, extend converted = toint(some_field)
) to transform data at query time. While sometimes necessary, every additional virtual field imposes overhead.
extend
for trivial or frequently repeated operations can add up.Avoid unnecessary casting: If a field must be an integer, handle it at ingest time.
Example: Instead of
Use:
The filter automatically matches string values in mixed columns.
Reserve virtual fields for truly dynamic or derived logic
If you frequently need a computed value, store it at ingest or keep the transformations minimal.
Axiom’s query engine does not currently reorder your where
clauses optimally. This means the sequence of filters in your query can matter.
Put the most selective filters first:
Example:
If user_id == 1234
discards most rows, apply it before log_level == "ERROR"
.
Profile your filters: Experiment with which filters discard the most rows to find the most selective conditions.