Optimization
Examples in this section are targeting for performance optimization. The same functionality can be realized by using the features described in the general examples. Performance optimizations are achieved by SIMD, multi threading / parallelization, batching or bulk operations.
Optimization examples are part of the unit tests see: Tests/ECS/Examples
Entity Queries
Boosted Query
🔥 Update - Introduced new query approach using Friflo.Engine.ECS.Boost.
Optimization: Use query.Each()
and a struct implementing Execute(...)
for maximum query performance.
This query approach is the most performant approach - except Query vectorization / SIMD. A unique feature of Friflo.Engine.ECS - it uses no unsafe code. This enables running the dll in trusted environments. For maximum performance unsafe code is required to elide bounds checks.
Instead of processing components of a query with query.ForEachEntity(...)
the MoveEach
struct below process components in its Execute()
method.
The processing of all query components is performed by query.Each(new MoveEach())
.
The performance gain compared with query.ForEachEntity(...)
is ~3x.
The method query.Each()
requires adding the dependency Friflo.Engine.ECS.Boost.
Enumerate Query Chunks
Optimization: Replace a query.ForEachEntity(( ... ) => lambda)
by two foreach
loops.
This approach avoids the more expensive lambda calls.
Also as described in the intro enumeration of a query result is fundamental for an ECS. Components are returned as Chunk's and are suitable for Vectorization - SIMD.
🔥 Update - Added example code for slower iteration alternative. The alternative should be used only for small result set with less than 10 entities.
Parallel Query Job
Optimization: Execute a query
on multiple CPU cores in parallel.
To minimize execution time for large queries a QueryJob can be used. It provides the same functionality as the foreach loop in example above but runs on multiple cores in parallel. E.g.
To enable running a query job a ParallelJobRunner is required.
The runner can be assigned to the EntityStore
or directly to the QueryJob
.
A ParallelJobRunner
instance is thread-safe and can / should be used for multiple / all query jobs.
In case of structural changes inside ForEach((...) => {...})
use CommandBuffer.Synced to record the changes.
These changes are adding / removing components, tags or child entities and the creation / deletion of entities.
Note: CommandBuffer
is not thread safe. CommandBuffer.Synced
is thread safe.
After RunParallel()
returns these changes can be applied to the EntityStore
by calling CommandBuffer.Playback()
.
Recommendation A parallel query achieves notable performance gains in case using only arithmetic computations like * / + - sin(), cos(), ... in the loop.
In case of using a CommandBuffer
and and applying massive entity changes the single threaded version is typically faster.
The reason is that entity changes applied to a CommandBuffer
requires heavy random memory access.
If doing this on multiple threads the CPU cores are competing with access to memory heap and CPU caches.
Query Vectorization - SIMD
Optimization: Utilize SIMD of your CPU to execute a query
.
SIMD architectures: SSE, AVX, AVX2, AVX-512, AdvSIMD, Neon, ...
The most efficient way to speedup query execution is vectorization. Vectorization is similar to loop unrolling - aka loop unwinding - but with hardware support. Its efficiency is superior to multi threading as it requires only a single thread to achieve the same performance gain. So other threads can still keep running without competing for CPU resources.
Note: Vectorization can be combined with multi threading to speedup execution even more. In case of a system with high memory bandwidth the speedup is speedup(SIMD) * speedup(multi threading). If SIMD or multi threading alone already reaches this bandwidth bottleneck their combination provide no performance gain.
The API provide a few methods to convert chunk components into System.Runtime.Intrinsics - Vectors.
E.g. AsSpan256<>
and StepSpan256
. See all methods at the Chunk - API.
The Span
retrieved from a chunk component has padding components at the end to enable vectorization without a scalar remainder loop.
The following examples shows how to increment all MyComponent.value
's by 1.
EventFilter
Optimization: Process multiple entity events in a loop instead of individual event handlers.
An alternative to process entity changes - see section Event - are EventFilter
's.
EventFilter
's can be used on its own or within a query like in the example below.
All events that need to be filtered - like added/removed components/tags - can be added to the EventFilter
.
E.g. ComponentAdded<Position>()
or TagAdded<MyTag1>
.
Batching
Batch - Create Entity
🔥 Update - New example to use simpler / more performant approach to create entities.
Optimization Minimize structural changes when creating entities.
Entities can be created with multiple components and tags in a single step. This can be done by one of the EntityStoreExtensions CreateEntity<T1, ... , Tn> overloads.
Bulk - Create entities
🔥 Update - New example for bulk creation of entities.
Optimization Create multiple entities with the same set of components / tags in a single step.
Entities can be created one by one with store.CreateEntity()
.
To create multiple entities with the same set of components and tags use
archetype.CreateEntities(int count).
Batch - Operations
🔥 Update - Add example to batch add, remove and get components and tags.
Optimization Minimize structural changes when adding / removing multiple components or tags.
Components can be added / removed one by one to / from an entity with
entity.AddComponent()
/ entity.RemoveComponent()
.
Every operation may cause a structural change which is an expensive operation.
To execute these operations in a single step use the EntityExtensions overloads. This approach also reduces the amount of code to perform these operations.
In case accessing multiple components of the same entity use entity.Data instead of multiple entity.GetComponent<>()
calls.
Batch - Entity
Optimization Minimize structural changes when adding and removing components or tags to / from a single entity.
Note
An EntityBatch
should only be used when adding AND removing components / tags to an entity at the same entity.
If only adding OR removing components / tags use the Add() / Remove() overloads shown above.
When adding/removing components or tags to/from a single entity it will be moved to a new archetype. This is also called a structural change and in comparison to other methods a more costly operation. Every component / tag change will cause a structural change.
In case of multiple changes on a single entity use an EntityBatch to apply all changes at once. Using this approach only a single or no structural change will be executed.
EntityBatch - Query
Optimization: Minimize structural changes when adding / removing components or tags to / from multiple entities.
In cases you need to add/remove components or tags to entities returned by a query use a bulk operation.
Executing these type of changes are most efficient using a bulk operation.
This can be done by either using ApplyBatch()
or a common foreach ()
loop as shown below.
To prevent unnecessary allocations the application should cache and reuse the list instance for future batches.
EntityBatch - EntityList
An EntityList is a container of entities added to the list.
Single entities are added using Add()
. AddTree()
adds an entity and all its children including their children etc.
A bulk operation can be applied to all entities in the lists as shown in the example below.
CommandBuffer
Optimization: Required when adding / removing components or tags in a Parallel Query Job.
A CommandBuffer
is used to record changes on multiple entities. E.g. AddComponent()
.
These changes are applied to entities when calling Playback()
.
Recording commands with a CommandBuffer
instance can be done on any thread.
Playback()
must be called on the main thread.
Available commands are in the CommandBuffer - API.
This enables recording entity changes in multi threaded application using entity systems / queries.
In this case enumerations of query results run on multiple worker threads.
Within these enumerations entity changes are recorded with a CommandBuffer
.
After a query thread has finished these changes are executed with Playback()
on the main thread.
Last updated