ESQL: planning perf improvements over many fields #124395

costin · 2025-03-08T04:34:17Z

Description

We've noticed that planning shows up in the profiler when dealing with huge mappings (10k-100k+ fields).
Overall the goal is to add conditional and prevent code for execution on such large number of objects by avoid iteration in the first place.
This meta issue contains a list of (potential) improvements to apply to improve performance in this scenario broken down in two main buckets:

Avoiding execution

rules working on expressions should perform basic assertion to check whether the logic has to be applied or not, such as checking the size of the collection or attributes.
avoid creating new AttributeMap/Set by making the collection immutable again so it can be safely passed around
(wrap the add/removeIf/delete methods through an utility class so they can be passed only for newly created
objects).
avoid collection copying unless needed
Node#transformChildren and QueryPlan#doTransformExpression create a new Array based on the size of
the children all the time. This should be done lazy and potentially different (clone()).
double check array sorting - Analyzer#278/279
ProjectAwayColumns always creates an output set clone - 44/83/84

Optimized execution of existing code

LogicalVerifier#verify
PruneColumns
PropagateUnmappedFields
PropagateEvalFoldables
stop using super inside TypedAttribute/NamedExpression/Attribute/FieldAttribute equals

Currently the equals method delegate to their parent which helps with code but also causes suboptimal equality since the children of the node are compared before the attributes. Better to compare all the node properties first and delegate to the collection as a last result.

use collection hashing before performing attributes equality
to avoid comparing large collections, use a hash comparison first before iterating over the collection
optimize Node#forEachProperty
prop != children && children.contains(prop) == false && typeToken.isInstance(prop) -->
prop != children && typeToken.isInstance(prop) && children.contains(prop) == false
look in removing/replacing children.contains(prop) inside Node#forEachProperty
A (linkedhashSet) set would work better and preserve order however it would prevent a child to appear more than once. This can be an issue in projection with duplicate fields (keep a,a,a).
optimize NameId#hashCode to avoid array boxing (use Long.hashCode(id) instead)
replace Java stream api with regular for-loop

Though brief, stream(), collect(), reduce() & co are slower than their equivalent foreach and pollute the stack trace. They have an edge in parallel processing which, depending on the data size, could yield better results however that's not the case here.

The text was updated successfully, but these errors were encountered:

A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395

elasticsearchmachine · 2025-03-10T11:29:48Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

costin added >enhancement needs:triage Requires assignment of a team area label labels Mar 8, 2025

costin mentioned this issue Mar 8, 2025

ESQL: Lazy collection copying during node transform #124424

Open

pxsalehi added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Mar 10, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: planning perf improvements over many fields #124395

ESQL: planning perf improvements over many fields #124395

costin commented Mar 8, 2025

elasticsearchmachine commented Mar 10, 2025

ESQL: planning perf improvements over many fields #124395

ESQL: planning perf improvements over many fields #124395

Comments

costin commented Mar 8, 2025

Description

Avoiding execution

Optimized execution of existing code

elasticsearchmachine commented Mar 10, 2025