Performance Statistics of the DDG Terrain engine

Equipment used and Abbreviations

The primary computer used for performance measurements is a PIII500Mhz Xeon with 256 MB of memory and a GeForce1 with 32MB of texture memory.
TSize = Size of the terrain ( s*s*2 = number of triangles eg. 1024 = 2Million).
BSize = Size of the terrain blocks. (usually 32 or 64)
R = Rendering on, algoritm only mode otherwise.
A = Algoritmic mode only no rendering to openGL.
Tri = Target number of triangles.
FPS = Frames per second.
M1 = PIII-500Mhz Xeon 256MB.
M2 = PII-450Mhz Dual Xeon 256Mb.
M3 = PIII-550Mhz Dual Xeon 256Mb.
M4 = AMD K2-500Mhz 96Mb.
M5 = PIII-866Mhz, 256Mb.
VGA = 640x480x32.
VGA16 = 640x480x16.
VGA32 = 640x480x32.
SVGA = 800x600
P2 = Permidia 2 w 8MB at 1024x768
GF = ASUS AGP-V6600 GeForce 32MB Deluxe
software mode otherwise

Calculation speed (and rendering speed)

Performance is measured by playing back a prerecorded flight path with ~1000 frames. This flight path moves through the terrain at high speed forcing a lot of calculation, near the end of the path the viewpoint changes to overlook almost the entire terrain forcing maximum visibility. The flight path file used is cam.pth.
A highresolution timer is used to calculated the elapsed time per frame. Video synching is disabled in order to run independenly of the monitor refresh rate. The timing does not include load and shutdown time.

summary

The V6 algoritm (not rendering) now achieves about 580FPS for 16000 triangles even for large terrains. This algoritm has not yet been released as source code but is used in the demo.
Some background speeds with M1:
Theoretical max: Running demo with everything disabled including terrain450FPS
Standing still and rendering with vertex buffer no calculations. 16000 triangles in 800x600x16140FPS
Rendering terrain while moving at full speed along cam.pth with 16000 triangles 100FPS

I am glad I have kept a record of my progress, its interesting to see the performance of my original implementation which was getting 10FPS for 1000 triangles and using 2.3MB of pure terrain data to do it. My current implementation gets 140FPS for 16000 triangles and uses 170K, that is 224 times faster at only 1/8th of the memory cost. This shows two things, how bad the original implementation was and how much you can optimize if you keep thinking of new ways to solve a problem.

Performance Record

Date OS MachineWindowAccelTSizeBSizeRTri FPS
Rendering no movement, no fog/sky etc. One texture pass.
Apr18W98M1 SVGA16 GF 256 64A16000140.0
Algorithm no rendering
Mar16W98M1 SVGA16 GF 256 64A16000121.9
Algorithm + vertex setup, no rendering.
Mar16W98M1 SVGA16 GF 256 64A16000127.9
Render terrain standing still (No Algorithm).
Mar16W98M1 SVGA16 GF 256 64R16000112/122
Paralelize rendering and vertex setup. [Fixed some bad bugs]
Mar16W98M1 SVGA16 GF 256 64R1600087.9
Standard terrain, algorithm only.
Feb1W98M1 SVGA16 GF 256 64A16000579.5
Standard terrain, algoritm + optimized vertex setup.
Feb1W98M1 SVGA16 GF 256 64A16000209.3
Without clip plane, near/far + detail textures?
Feb1W98M1 SVGA16 GF 256 64R1600084.0
Without clip plane, near/far + detail textures, with NVidia array range extention
Feb1W98M1 SVGA16 GF 256 64R1600094.2
Without clip plane, near/far + detail textures, from 40 to 82...?
Feb1W98M1 SVGA16 GF 256 64R1600082.5
With clip plane, near/far + detail textures.
Feb1W98M1 SVGA16 GF 256 64R1600025.7
Havasupai, global texture, and reduced memory.
Feb1W98M1 SVGA16 GF 1000 64R1200030.4
Feb1W98M1 SVGA16 GF 1000 64R 800040.4
Jan28W98M1 SVGA16 GF 1000 64R1600020.3
Version 6 Havasupai, after block optimization.
Jan13W98M1 SVGA16 GF 1000 64R1000040.4
Misc changes... testing against large terrain.
Sep17W98M1 SVGA16 GF 2048 64A8000308.0
Sep17W98M1 SVGA16 GF 2048 64R800070.4
Cut memory consumption by 60%.
Aug17W98M1 SVGA16 GF 256 32A8000310.6
Aug17W98M1 SVGA16 GF 256 32R800035.4 (What happened?)
Perform half the work and display double the triangles.
Aug5W98M1 SVGA16 GF 256 32A8000318.6
Aug5W98M1 SVGA16 GF 256 32R800093.7
Only calculate AABBoxes.
Aug5W98M1 SVGA16 GF 256 32A4000170.6
Aug5W98M1 SVGA16 GF 256 32R400095.0
Aug5W98M1 SVGA16 GF 256 32A2000275.6
Aug5W98M1 SVGA16 GF 256 32R2000104.0
Improved error calculation.
Aug1W98M1 SVGA16 GF 256 32A4000170.6
Aug1W98M1 SVGA16 GF 256 32R400094.0
Generic extract planes method (could be optimized) used for CS.
Jul25W98M1 SVGA16 GF 256 32A4000116.6
Dont alloc memory for Min/Delta of leaf nodes.
Jul15W98M1 SVGA16 GF 256 32A4000188.6
Jul15W98M1 SVGA16 GF 256 32R400093.6
Inherit visibility for leaf nodes.
Jul11W98M1 SVGA16 GF 256 32A4000174.2
Jul11W98M1 SVGA16 GF 256 32R400093.5
Implemented MRU cache, reduced # vertices transmitted ~4x.
Jul4W98M1 SVGA16 GF 256 32R400088.1
Disabled vsync for opengl.
Jul3W98M1 SVGA16 GF 256 32R400079.2
Avoid chains and traverse bintree implicitly.
Jul3W98M1 SVGA16 GF 256 32A4000164.2
Avoid calls into PriorityCalc, cache row/col data.[Is this good data]
Jun14W98M1 SVGA16 GF 256 32A4000190.2
2D clipping vectors for level view. [No effect]
Jun1W98M1 SVGA16 GF 256 32A4000167.8
2 Plane clipping for level view.
Jun1W98M1 SVGA16 GF 256 32A4000167.8
Jun1NT5M5 SVGA16 TNT2256 32A4000281.8
Priority of leaf is always 0 && bug fixes.
Jun1W98M1 SVGA16 GF 256 32A4000153.0
Jun1W98M1 SVGA16 GF 256 32R400058.0
Function optimization + Reduced # of levels of indirection
May17W98M1 SVGA16 GF 256 32A4000162.2
May17W98M1 SVGA16 GF 256 32R400058.2
Curved earth rendering.
May16W98M1 SVGA16 GF 256 32R400048.2
May16W98M1 SVGA32 GF 256 32R400048.5
FOV = 120.
May16W98M1 SVGA16 GF 256 32R400048.6
Cached world min/max/height.
May11W98M1 SVGA16 GF 256 32A3000162.6
Cached priority factor + SOA for main data + 1024 entry clipped cache.
May11W98M1 SVGA16 GF 256 32A4000141.5
May11W98M1 SVGA16 GF 256 32R400057.9
May11W98M1 SVGA16 GF 256 32A3000166.5
May11W98M1 SVGA16 GF 256 32R300073.2
Back to converted shorts but lost some perf due to better cache spread.
May10W98M1 SVGA16 GF 256 32A3000163.4
May10W98M1 SVGA16 GF 256 32A4000137.8
May10W98M1 SVGA16 GF 256 32R300070.3
Use floats instead of converted shorts for height coord.
May10W98M1 SVGA16 GF 256 32A3000153.2
May10W98M1 SVGA16 GF 256 32A4000128.8
May10W98M1 SVGA16 GF 256 32R300069.3
Optimize visibility function.
May10W98M1 SVGA16 GF 256 32A3000175.8
May10W98M1 SVGA16 GF 256 32A4000146.8
May10W98M1 SVGA16 GF 256 32R300072.6
New architecture with correct detail, previous versions had a bug.
May 7W98M1 SVGA16 GF 256 32A3000120.8
May 7W98M1 SVGA16 GF 256 32R300055.7
W/out fastcache and texturesynthesis.
Feb23W98M1 SVGA16 GF 256 64R300075.9
Feb23W98M1 SVGA16 GF 256 64A3000124.1
Merge visibility and reset pass + fast triangle cache.
Feb23W98M1 SVGA16 GF 256 64R300073.1
Feb23W98M1 SVGA16 GF 256 64A3000124.7
Incremental calculation of total number of vis triangles.
Feb19W98M1 SVGA16 GF 256 64R300072.7
Feb19W98M1 SVGA16 GF 256 64A3000120.6
Avoid insertion of leaf nodes into split queue.
Feb19W98M1 SVGA16 GF 256 64R300065.5
Feb19W98M1 SVGA16 GF 256 64A3000107.0
Some visibility optimizations before switch to world view culling.
Feb10W98M1 SVGA16 GF 256 64R300059.4
Feb10W98M1 VGA16 GF 256 64R300056.1
Feb10W98M1 SVGA16 GF 256 64R1000118.4
Feb10W98M1 SVGA16 GF 256 64A300097.0
Feb10W98M1 SVGA16 GF 256 64A1000179.8
Using no recursion.
Feb6W98M1 SVGA16 GF 256 64R300060.5
Feb15W98M1 SVGA16 GF 256 64A300094.1
Using RGBA buffer, rendering nothing but white triangles, standing still
Feb6W98M1 SVGA16 GF 256 64R3000230.7
Using RGBA buffer, rendering nothing standing still 3.68 drivers
Feb6W98M1 SVGA16 GF 256 64R3000340.7
Perf1 (Far clip 500, Progdist -10)
Jan20W98M1 SVGA32 GF 256 64R300061.7
Perf1 (Far clip 150, Progdist -10)
Jan20W98M1 SVGA32 GF 256 64R300061.0
Working Merge queue and progressive calculation.
Jan5W98M1 SVGA32 GF 256 64R300058.0
Jan5W98M1 SVGA32 GF 256 64R400049.6
Rendering nothing, only blanking screen standing still.
Jan1W98M1 SVGA32 GF 256 64R3000220
Mesh only, No texture/light standing still.
Jan1W98M1 SVGA32 GF 256 64R3000150
Mesh only in wire frame standing still.
Jan1W98M1 SVGA32 GF 256 64R30008
Full detail standing still.
Dec31W98M1 VGA32 GF 256 64R3000101
Dec31W98M1 SVGA32GF 256 64R300085
Dec31W98M1 XVGA32GF 256 64R300045
Split Queue only.
Dec14W98M1 VGA16 P2 256 64R300017.3
Dec14W98M1 VGA16 P2 256 64R150029.3
Optimized memory usage.
Dec4W98M1 VGA16 P2 256 64R300027.3
Dec2W98M1 VGA16 P2 256 64R200032.3
Dec2W98M1 VGA16 P2 256 64R150037.3
Dec2W98M1 VGA16 P2 256 64R100043.3
Dec2W98M1 VGA16 P2 256 64R 50052.3
Dec2W98M1 VGA16 P2 256 64A1000151.3
Optimized priority function/removed ddgQueue.
Nov24W98M1 VGA32 P2 256 64R150029.7
Nov24W98M1 VGA16 P2 256 64R150043.4
Jan4W98M1 VGA16 GF 256 64R150082.3
Jan4W98M1 VGA16 GF 256 64R250057.3
Jan4W98M1 VGA16 GF 256 64R300051.6
NVIDIA GeForce P550 XEON.
Nov3W98M4 VGA32GeForce 256 64R300031.4
Nov3W98M4 VGA16GeForce 256 64A300044.5
NVIDIA TNT2 P550 XEON.
Nov3NT4M3 VGA? TNT2 256 64R100091.9
Nov3NT4M3 VGA? TNT2 256 64A1000154.4
Improved vertex arrays (really sharing vertices).
Sep12W98M1 VGA32 P2 256 64R100025.4
Sep12W98M1 VGA16 P2 256 64R100035.4
Progressive rendering.
Sep12W98M1 VGA16 P2 256 64A1000113.8
Sep12W98M1 VGA16 P2 256 64R100040.8
No Progressive rendering.
Sep12W98M1 VGA16 P2 256 64A1000105.0
Sep12W98M1 VGA16 P2 256 64R100038.9
Optimized SplayTree with simplified SplayKey.
Sep08W98M1 VGA32 P2 256 64A1000104.0
Sep08W98M1 VGA32 P2 256 64R100022.0
Sep08W98M1 VGA16 P2 256 64R100035.3
Sep08W98M1 VGA16 P2 256 64R300022.6
Sep08W98M1 VGA32 P2 256 64R300017.9
Sep08W98M1 VGA32 256 64R10001
Sep08W98M1 VGA16 256 64R10005
Integrated SplayNode caches and Vertex Arrays
Sep05W98M1 VGA32 P2 256 64R100025.7
Render Near to Far
Sep05W98M1 VGA32 P2 256 64R100023.4
New clock
Sep02W98M1 VGA32 P2 256 64R100024.7
New clock
Sep02W98M1 VGA16 P2 256 64A1000104.0
Aug31NT4M2 VGA16 P2 256 64R100029.6
Aug31NT4M2 VGA16 P2 256 64A100097.3
Aug31NT4M2 VGA16 P2 256 64R100031.3
Aug15W98M1 VGA32 P2 1024 64A300018.8
Aug15W98M1 VGA32 P2 1024 64R300012.8
Aug15W98M1 VGA32 P2 1024 64A100017.8
Aug04W98M1 VGA32 P2 256 64R100018.2
Jul26NT5M1 VGA32 256 64R1000 4.2
Jul26NT5M1 VGA32 256 64A100063.1
Jul26NT4M2 VGA P2 256 256R100031.4
Jul26NT4M2 VGA P2 256 256A100062.8
Jul26NT4M2 VGA P2 256 128R100031.4
Jul26NT4M2 VGA P2 256 128A100062.8
Jul26NT4M2 VGA P2 256 64R100031.4
Jul26NT4M2 VGA P2 256 64A100062.8
Jul24NT5M1 VGA 256 64R1000 4.2
Jul24NT5M1 VGA 256 64A100058.0
Jul14NT5M1 VGA 256 64R1000 2.8
Jul14NT5M1 VGA 256 64A100010.6
Jul14NT4M2 VGA P2 256 64R1000 6.7
Jul14NT4M2 VGA P2 256 64A1000??.?
Jul14NT5M1 VGA 256 64R 500 3.5
Jul14NT5M1 VGA 256 64A 50023.0
Jul07NT5M1 VGA 256 128R1000 2.8
Jul07NT5M1 VGA 256 128A1000 9.7
Jun13NT5M1 VGA P2 256 128R?50017.0
Jun13NT5M1 VGA P2 256 128A?50018.0

Memory Usage

Besides the fixed overhead for rendering buffers, the following memory requirements scale with terrain size:

Record

Total memory used without texture.
Mode 256x256 512x512 1024x1024 2048x2048 4096x4096 Date
Orig 2 349 764 9 042 804 37 391 060 Nov 99
New MTri 2 026 964 6 752 004 26 441 060 Dec 99
New Normals 2 505 236 4 916 996 16 741 732 Dec 99
Reuse bufidx 2 084 236 4 515 196 14 896 732 Dec 99
DelayBit 2 010 436 4 113 396 13 051 732 Dec 99
V4 with hulls 2 359 296 May 00
V5 with error 1 772 612 3 817 120 12 612 856 47 795 872 188 527 864 Aug 00
V5 no redundency 1 293 108 2 649 360 6 708 072 22 942 992 87 882 600 Aug 00
Shared caches 1 207 184 1 953 512 4 938 896 16 880 360 64 646 288 Sep 25
V6 real blocks 259 200 1 036 800 4 147 200 16 588 800 66 355 200 Jan 17
smaller hulls 168 544 674 176 2 696 704 10 786 816 43 147 264 Jan 28