Performance
m |
|||
Line 110: | Line 110: | ||
* Vertex shading (and Geometry shading, etc.) | * Vertex shading (and Geometry shading, etc.) | ||
** Cost varies per number of vertices rendered. | ** Cost varies per number of vertices rendered. | ||
− | ** Effective [[Level of | + | ** Effective [[Level of Detail]] can be used to help mitigate this cost. |
* Fragment shading. | * Fragment shading. | ||
** Different materials have different fragment costs. | ** Different materials have different fragment costs. |
Revision as of 16:34, 17 October 2017
This document discusses performance from a content creation perspective. It seeks to enumerate the major factors which reduce game frame rate and reference the major strategies in avoiding problems.
Contents |
Performance Concerns
There are a number of discrete metrics which should be considered when evaluating performance:
- The time taken to render each frame. This is inversely proportional to frame rate, and is a more useful metric.
- Timing variations between frames. Longer frames may cause visible hitches. Both longer and shorter frames can cause mis-predictions which are nearly as bad.
- Latency between basic user input (such as a camera movement) and the result appearing on-screen.
- Latency between complex user input (such as a database search, moving a spline, or placing a new locomotive) and the result being finalized.
- Loading time when entering a session, or when teleporting from place to place in a session.
The Low Level
In order to render a single frame, Trainz must perform the following steps in order:
1. Perform a single frame update for all frame-locked game threads. The important threads from a content creation perspective are the Game Main Thread and the PhysX Thread. Most other threads are not frame-locked, or are not likely to hurt frame performance.
2. Process GPU buffer uploads, culling, and sorting.
3. Render the shadow buffer (if enabled).
4. Render the reflection buffer (if enabled).
5. Render the main scene.
6. Vsync and display the result.
Several of these steps are overlapped between frames; for example once the game threads have completed one frame and passed the frame to the renderer, they may start on the next frame. This overlap reduces frame time but does not reduce simple input latency.
Because these steps are performed sequentially, a stall at any stage will delay the entire frame. To reach a steady 60fps, the entire process must complete within 16ms with no exceptions.
Multithreading
Trainz is heavily multithreaded. This is used in the following ways:
- Tasks which frequently wait for external resources (eg. the HDD or the GPU) are run on their own threads. This allows the CPU can move onto other tasks while the thread is waiting for the external resource to respond.
- Tasks which need to be performed per-frame and which are relatively expensive are run on their own threads where feasible. This allows a multi-core CPU to tackle multiple tasks simultaneously rather than sequentially, or (where sequential operation is required) to overlap tasks for one frame with tasks for the next frame.
- Tasks which do not need to be performed within a single frame are run on separate threads. This allows them to take more time than available to a single render frame, without delaying the render frame.
- When operating multiple game windows, the CPU tasks for each window are run on independent threads.
It is important to understand that TrainzScript threads are not native system threads. They all run on the "Game Main Thread".
Game Main Thread
Each game window has a single "Game Main Thread". This is the master thread which coordinates and controls the game environment. Any systems which are not "thread safe" are run on this thread. The in-game user interface is run on this thread. If this thread is blocked, the game is effectively frozen (even though certain aspects of rendering may continue, and the native user interface may still be interactive). For this reason it is essential to avoid any long pauses on this thread.
If the game main thread becomes blocked briefly, rendering will pause and a frame hitch will be noticeable. Longer blockages may result in the renderer decoupling from the game thread temporarily, in which case the renderer and other threads may continue to operate but any operations controlled by the game main thread will remain unresponsive.
Per-frame Costs
Per-frame costs are those that are paid for every rendered frame in normal operation. This specifically refers to CPU costs, and includes:
- Train Physics.
- Cost scales with the number of vehicles in motion.
- Playing of animations.
- Cost scales with the number of animations playing and the number of bones per animation.
- Scene edits for visibility changes or Level of Detail changes.
- Cost scales with the rate of change.
- Really a collection of one-off costs, but for a given scenery density and rate of motion, this can be approximated as a per-frame cost.
- Script overheads.
- This includes any scripted items which perform regular sleep or tick callbacks.
One-off Costs
One-off costs are those that are paid in response to a specific trigger. These might occur semi-frequently but do not occur every frame or on a fixed timer. This specifically refers to CPU costs, and includes:
- Updating signals.
- This only occurs when something within the signal's range of influence actually changes.
- Of course, any moving vehicle counts as a change.
- It's important not to extend the signal's range of influence unnecessarily. Don't TrackSearch further than you need to determine the signal state.
- Render Origin updates.
- Cost scales with the number of objects in the scene.
- One-off script actions.
- Game responses to mouse clicks, key presses, etc.
Loading
Loading occurs both on module startup and also while streaming. Related costs include:
- Object initialization during tile streaming.
- Asset scripts should respond to Init() calls as rapidly as possible. Performing TrackSearch operations, linear searches, etc. at this time is not appropriate. Use a persistent script library to store state instead.
- Loading texture data to the GPU.
- This occurs when a new asset is streamed in.
- If texture streaming is disabled, there is an immediate large hit.
- If texture streaming is disabled, there is a smaller initial hit, but this may be followed up with additional impacts as texture LOD changes.
- The cost scales with the number of textures loaded and the size (in bytes) of the texture.
- Loading meshes to the GPU.
- This occurs when a new asset is streamed in.
- This occurs when new procedural track geometry is generated.
- This occurs when mesh stitching is updated.
- This occurs per frame as PFX are updated.
- The cost scales with the number of meshes loaded and the number of vertices.
PhysX Thread
The PhysX thread is responsible for simulating the PhysX scene, which is used for physical effect simulation with the exception of Train Physics. This thread is responsible for:
- Editing the PhysX scene as objects are streamed in and out.
- Testing for PhysX collisions. In Trainz, this mainly amounts to PFX-vs-scenery collision testing.
- Costs scale with the number of PFX buffers, the number of particles, the number and complexity of geometry within proximity to the buffers.
- Buffers may be larger than expected due to render coalescing.
- Updating the PFX motion in general.
- Costs scale with the number of particles.
- Render Origin updates.
GPU
This includes both performance costs on the GPU itself, and driver overheads involved with loading commands and data to the GPU:
- Loading texture data to the GPU.
- See above.
- If the hardware is VRAM-starved, textures may also be paged at the driver level, or may be read across the PCI bus, both massively decreasing performance
- Loading mesh data to the GPU.
- See above.
- Loading uniform data to the GPU.
- Playing animations require uniform buffer updates per frame.
- Submitting draw calls to the GPU.
- Cost varies per number of draw calls rendered.
- CPU cost is significant on most hardware and most current render APIs.
- Texture atlas techniques, mesh stitching, and hardware instancing can be used to help mitigate this cost.
- Vertex shading (and Geometry shading, etc.)
- Cost varies per number of vertices rendered.
- Effective Level of Detail can be used to help mitigate this cost.
- Fragment shading.
- Different materials have different fragment costs.
- Parallax materials near the camera have the highest cost.
- Deeper parallax depth has higher cost.
- Avoid unnecessary overdraw to help mitigate this cost.
- Avoid double-sided materials to help mitigate this cost.
- Different materials have different fragment costs.
Settings
TBD.
Profiling
TBD.
- Trainz Profiler
- Event Profiler
- Pause
- In-game performance stats
- Asset Preview