githubEdit

Query Optimization

The straight forward way to execute a query is using query.ForEachEntity(). This implementation is super compact and provide a direct result during development. E.g.

// query creation
var query = store.Query<EntityName>();

// query execution
query.ForEachEntity((ref EntityName name, Entity entity) => {
    ....
});

This execution type creates a delegate (lambda) and the callback for every result add additional execution costs.

There are various ways to improve query execution significant. Query creation remains unchanged for all variants by using the same Query<>() instance.

    var query = store.Query<EntityName>();

Only the code related for query execution is different and require additional boilerplate code. This approach enables:

  • Query optimization can be postponed. E.g. a specific query turns out to be a bottleneck.

  • All optimization variants require only local code changes.

Query optimizations can be achieved by using various query technics like:

Query optimization examples are part of the unit tests see: Tests/ECS/Examplesarrow-up-right

Info In general friflo ECS provides in many areas only a single API to implement a specific behavior. This avoids confusion to select a one of multiple API variants if available. Query optimization and Batch operations are an exception to this rule. The reason is to provide provide room for significant performance gains if needed.

Query Generator

🚀 New feature in v3.6.0 - preview available: New feature: Query Generatorarrow-up-right.

The goal of the Query Generator is to generate highly efficient query code and reduce the amount of boiler plate to a minimum. The generator is a C# Source Generatorarrow-up-right to enable seamless development.

A super efficient way to execute queries is Enumerate Query Chunks. But writing them by hand is awkward. The query generator takes over this task for every method annotated with [Query]. It automatically generates an additional method suffixed with Query. So given:

The code generator creates / updates the method:

This method calls MovePosition(ref Position position) for every matching entity. Now you can call the generated method to execute the query with:

Explanation The same functionality with a manually written ForEach() is:

The behavior of both approaches is the same.

Comparison

ForEach()
Query Generator

ArchetypeQuery creation

store.Query<>() creates new ArchetypeQuery instance

creates and caches the instances internally

Memory allocations

creates always an ArchetypeQuery and a ForEach delegate

an ArchetypeQuery only at the first call

Execution performance

calling the ForEach delegate is expensive

[Query] method is called directly - can be inlined

How to use To bring the Query Generator alive you need to add Friflo.Engine.ECS.Generators to your csproj. This dependency is only used by the compiler. It is not a runtime dependency.

The parameter values of the annotated [Query] method like MovePosition() depend on their type.

Parameter
Value

Component types

the component values in the EntityStore of a matching entity.

Entity entity

the current iterated entity.

Other types

are set by the values passed to generated ...Query() method call.

Boosted Query

🔥 Update - Introduced new query approach using Friflo.Engine.ECS.Boostarrow-up-right.

Optimization: Use query.Each() and a struct implementing Execute(...) for maximum query performance.

This query approach is the most performant approach - except Query vectorization / SIMD. A unique feature of Friflo.Engine.ECS - it uses no unsafe code. This enables running the dll in trusted environments. For maximum performance unsafe code is required to elide bounds checks.

Instead of processing components of a query with query.ForEachEntity(...) the MoveEach struct below process components in its Execute() method. The processing of all query components is performed by query.Each(new MoveEach()).

The performance gain compared with query.ForEachEntity(...) is ~3x. The method query.Each() requires adding the dependency Friflo.Engine.ECS.Boostarrow-up-right.

Enumerate Query Chunks

Optimization: Replace a query.ForEachEntity(( ... ) => lambda) by two foreach loops. This approach avoids the more expensive lambda calls.

Also as described in the intro enumeration of a query result is fundamental for an ECS. Components are returned as Chunkarrow-up-right's and are suitable for Vectorization - SIMDarrow-up-right.

🔥 Update - Added example code for slower iteration alternative. The alternative should be used only for small result set with less than 10 entities.

Parallel Query Job

Optimization: Execute a query on multiple CPU cores in parallel.

To minimize execution time for large queries a QueryJobarrow-up-right can be used. It provides the same functionality as the foreach loop in example above but runs on multiple cores in parallel. E.g.

To enable running a query job a ParallelJobRunnerarrow-up-right is required. The runner can be assigned to the EntityStore or directly to the QueryJob. A ParallelJobRunner instance is thread-safe and can / should be used for multiple / all query jobs.

In case of structural changes inside ForEach((...) => {...}) use CommandBuffer.Syncedarrow-up-right to record the changes. These changes are adding / removing components, tags or child entities and the creation / deletion of entities.

Note: CommandBuffer is not thread safe. CommandBuffer.Synced is thread safe. After RunParallel() returns these changes can be applied to the EntityStore by calling CommandBuffer.Playback().

Re-use QueryJob

The QueryJob returned by query.ForEach(...) is intended to be re-used. This ensures that subsequent calls to queryJob.RunParallel() do not perform allocations on GC heap.

Recommendation A parallel query achieves notable performance gains in case using only arithmetic computations like * / + - sin(), cos(), ... in the loop.

In case of using a CommandBuffer and and applying massive entity changes the single threaded version is typically faster. The reason is that entity changes applied to a CommandBuffer requires heavy random memory access. If doing this on multiple threads the CPU cores are competing with access to memory heap and CPU caches.

Query Vectorization / SIMD

Optimization: Utilize SIMD of your CPU to execute a query. SIMD architectures: SSE, AVX, AVX2, AVX-512, AdvSIMD, Neon, ...

The most efficient way to speedup query execution is vectorization. Vectorization is similar to loop unrolling - aka loop unwinding - but with hardware support. Its efficiency is superior to multi threading as it requires only a single thread to achieve the same performance gain. So other threads can still keep running without competing for CPU resources.

Note: Vectorization can be combined with multi threading to speedup execution even more. In case of a system with high memory bandwidth the speedup is speedup(SIMD) * speedup(multi threading). If SIMD or multi threading alone already reaches this bandwidth bottleneck their combination provide no performance gain.

The API provide a few methods to convert chunk components into System.Runtime.Intrinsics - Vectorsarrow-up-right. E.g. AsSpan256<> and StepSpan256. See all methods at the Chunk - APIarrow-up-right. The Span retrieved from a chunk component has padding components at the end to enable vectorization without a scalar remainder loop.

The following examples shows how to increment all MyComponent.value's by 1.

Last updated