Digital Dawn Graphics Toolkit

Performance Statistics of the DDG Terrain engine

Equipment used and Abbreviations

The primary computer used for performance measurements is a PIII500Mhz Xeon with 256 MB of memory and a GeForce1 with 32MB of texture memory.
TSize = Size of the terrain ( s*s*2 = number of triangles eg. 1024 = 2Million).
BSize = Size of the terrain blocks. (usually 32 or 64)
R = Rendering on, algoritm only mode otherwise.
A = Algoritmic mode only no rendering to openGL.
Tri = Target number of triangles.
FPS = Frames per second.
M1 = PIII-500Mhz Xeon 256MB.
M2 = PII-450Mhz Dual Xeon 256Mb.
M3 = PIII-550Mhz Dual Xeon 256Mb.
M4 = AMD K2-500Mhz 96Mb.
M5 = PIII-866Mhz, 256Mb.
VGA = 640x480x32.
VGA16 = 640x480x16.
VGA32 = 640x480x32.
SVGA = 800x600
P2 = Permidia 2 w 8MB at 1024x768
GF = ASUS AGP-V6600 GeForce 32MB Deluxe
software mode otherwise

Calculation speed (and rendering speed)

Performance is measured by playing back a prerecorded flight path with ~1000 frames. This flight path moves through the terrain at high speed forcing a lot of calculation, near the end of the path the viewpoint changes to overlook almost the entire terrain forcing maximum visibility. The flight path file used is cam.pth.
A highresolution timer is used to calculated the elapsed time per frame. Video synching is disabled in order to run independenly of the monitor refresh rate. The timing does not include load and shutdown time.

summary

The V6 algoritm (not rendering) now achieves about 580FPS for 16000 triangles even for large terrains. This algoritm has not yet been released as source code but is used in the demo.
Some background speeds with M1:

Theoretical max: Running demo with everything disabled including terrain	450FPS
Standing still and rendering with vertex buffer no calculations. 16000 triangles in 800x600x16	140FPS
Rendering terrain while moving at full speed along cam.pth with 16000 triangles	100FPS

I am glad I have kept a record of my progress, its interesting to see the performance of my original implementation which was getting 10FPS for 1000 triangles and using 2.3MB of pure terrain data to do it. My current implementation gets 140FPS for 16000 triangles and uses 170K, that is 224 times faster at only 1/8th of the memory cost. This shows two things, how bad the original implementation was and how much you can optimize if you keep thinking of new ways to solve a problem.

Performance Record

Date	OS	Machine	Window	Accel	TSize	BSize	R	Tri	FPS
Rendering no movement, no fog/sky etc. One texture pass.
Apr18	W98	M1	SVGA16	GF	256	64	A	16000	140.0
Algorithm no rendering
Mar16	W98	M1	SVGA16	GF	256	64	A	16000	121.9
Algorithm + vertex setup, no rendering.
Mar16	W98	M1	SVGA16	GF	256	64	A	16000	127.9
Render terrain standing still (No Algorithm).
Mar16	W98	M1	SVGA16	GF	256	64	R	16000	112/122
Paralelize rendering and vertex setup. [Fixed some bad bugs]
Mar16	W98	M1	SVGA16	GF	256	64	R	16000	87.9
Standard terrain, algorithm only.
Feb1	W98	M1	SVGA16	GF	256	64	A	16000	579.5
Standard terrain, algoritm + optimized vertex setup.
Feb1	W98	M1	SVGA16	GF	256	64	A	16000	209.3
Without clip plane, near/far + detail textures?
Feb1	W98	M1	SVGA16	GF	256	64	R	16000	84.0
Without clip plane, near/far + detail textures, with NVidia array range extention
Feb1	W98	M1	SVGA16	GF	256	64	R	16000	94.2
Without clip plane, near/far + detail textures, from 40 to 82...?
Feb1	W98	M1	SVGA16	GF	256	64	R	16000	82.5
With clip plane, near/far + detail textures.
Feb1	W98	M1	SVGA16	GF	256	64	R	16000	25.7
Havasupai, global texture, and reduced memory.
Feb1	W98	M1	SVGA16	GF	1000	64	R	12000	30.4
Feb1	W98	M1	SVGA16	GF	1000	64	R	8000	40.4
Jan28	W98	M1	SVGA16	GF	1000	64	R	16000	20.3
Version 6 Havasupai, after block optimization.
Jan13	W98	M1	SVGA16	GF	1000	64	R	10000	40.4
Misc changes... testing against large terrain.
Sep17	W98	M1	SVGA16	GF	2048	64	A	8000	308.0
Sep17	W98	M1	SVGA16	GF	2048	64	R	8000	70.4
Cut memory consumption by 60%.
Aug17	W98	M1	SVGA16	GF	256	32	A	8000	310.6
Aug17	W98	M1	SVGA16	GF	256	32	R	8000	35.4 (What happened?)
Perform half the work and display double the triangles.
Aug5	W98	M1	SVGA16	GF	256	32	A	8000	318.6
Aug5	W98	M1	SVGA16	GF	256	32	R	8000	93.7
Only calculate AABBoxes.
Aug5	W98	M1	SVGA16	GF	256	32	A	4000	170.6
Aug5	W98	M1	SVGA16	GF	256	32	R	4000	95.0
Aug5	W98	M1	SVGA16	GF	256	32	A	2000	275.6
Aug5	W98	M1	SVGA16	GF	256	32	R	2000	104.0
Improved error calculation.
Aug1	W98	M1	SVGA16	GF	256	32	A	4000	170.6
Aug1	W98	M1	SVGA16	GF	256	32	R	4000	94.0
Generic extract planes method (could be optimized) used for CS.
Jul25	W98	M1	SVGA16	GF	256	32	A	4000	116.6
Dont alloc memory for Min/Delta of leaf nodes.
Jul15	W98	M1	SVGA16	GF	256	32	A	4000	188.6
Jul15	W98	M1	SVGA16	GF	256	32	R	4000	93.6
Inherit visibility for leaf nodes.
Jul11	W98	M1	SVGA16	GF	256	32	A	4000	174.2
Jul11	W98	M1	SVGA16	GF	256	32	R	4000	93.5
Implemented MRU cache, reduced # vertices transmitted ~4x.
Jul4	W98	M1	SVGA16	GF	256	32	R	4000	88.1
Disabled vsync for opengl.
Jul3	W98	M1	SVGA16	GF	256	32	R	4000	79.2
Avoid chains and traverse bintree implicitly.
Jul3	W98	M1	SVGA16	GF	256	32	A	4000	164.2
Avoid calls into PriorityCalc, cache row/col data.[Is this good data]
Jun14	W98	M1	SVGA16	GF	256	32	A	4000	190.2
2D clipping vectors for level view. [No effect]
Jun1	W98	M1	SVGA16	GF	256	32	A	4000	167.8
2 Plane clipping for level view.
Jun1	W98	M1	SVGA16	GF	256	32	A	4000	167.8
Jun1	NT5	M5	SVGA16	TNT2	256	32	A	4000	281.8
Priority of leaf is always 0 && bug fixes.
Jun1	W98	M1	SVGA16	GF	256	32	A	4000	153.0
Jun1	W98	M1	SVGA16	GF	256	32	R	4000	58.0
Function optimization + Reduced # of levels of indirection
May17	W98	M1	SVGA16	GF	256	32	A	4000	162.2
May17	W98	M1	SVGA16	GF	256	32	R	4000	58.2
Curved earth rendering.
May16	W98	M1	SVGA16	GF	256	32	R	4000	48.2
May16	W98	M1	SVGA32	GF	256	32	R	4000	48.5
FOV = 120.
May16	W98	M1	SVGA16	GF	256	32	R	4000	48.6
Cached world min/max/height.
May11	W98	M1	SVGA16	GF	256	32	A	3000	162.6
Cached priority factor + SOA for main data + 1024 entry clipped cache.
May11	W98	M1	SVGA16	GF	256	32	A	4000	141.5
May11	W98	M1	SVGA16	GF	256	32	R	4000	57.9
May11	W98	M1	SVGA16	GF	256	32	A	3000	166.5
May11	W98	M1	SVGA16	GF	256	32	R	3000	73.2
Back to converted shorts but lost some perf due to better cache spread.
May10	W98	M1	SVGA16	GF	256	32	A	3000	163.4
May10	W98	M1	SVGA16	GF	256	32	A	4000	137.8
May10	W98	M1	SVGA16	GF	256	32	R	3000	70.3
Use floats instead of converted shorts for height coord.
May10	W98	M1	SVGA16	GF	256	32	A	3000	153.2
May10	W98	M1	SVGA16	GF	256	32	A	4000	128.8
May10	W98	M1	SVGA16	GF	256	32	R	3000	69.3
Optimize visibility function.
May10	W98	M1	SVGA16	GF	256	32	A	3000	175.8
May10	W98	M1	SVGA16	GF	256	32	A	4000	146.8
May10	W98	M1	SVGA16	GF	256	32	R	3000	72.6
New architecture with correct detail, previous versions had a bug.
May 7	W98	M1	SVGA16	GF	256	32	A	3000	120.8
May 7	W98	M1	SVGA16	GF	256	32	R	3000	55.7
W/out fastcache and texturesynthesis.
Feb23	W98	M1	SVGA16	GF	256	64	R	3000	75.9
Feb23	W98	M1	SVGA16	GF	256	64	A	3000	124.1
Merge visibility and reset pass + fast triangle cache.
Feb23	W98	M1	SVGA16	GF	256	64	R	3000	73.1
Feb23	W98	M1	SVGA16	GF	256	64	A	3000	124.7
Incremental calculation of total number of vis triangles.
Feb19	W98	M1	SVGA16	GF	256	64	R	3000	72.7
Feb19	W98	M1	SVGA16	GF	256	64	A	3000	120.6
Avoid insertion of leaf nodes into split queue.
Feb19	W98	M1	SVGA16	GF	256	64	R	3000	65.5
Feb19	W98	M1	SVGA16	GF	256	64	A	3000	107.0
Some visibility optimizations before switch to world view culling.
Feb10	W98	M1	SVGA16	GF	256	64	R	3000	59.4
Feb10	W98	M1	VGA16	GF	256	64	R	3000	56.1
Feb10	W98	M1	SVGA16	GF	256	64	R	1000	118.4
Feb10	W98	M1	SVGA16	GF	256	64	A	3000	97.0
Feb10	W98	M1	SVGA16	GF	256	64	A	1000	179.8
Using no recursion.
Feb6	W98	M1	SVGA16	GF	256	64	R	3000	60.5
Feb15	W98	M1	SVGA16	GF	256	64	A	3000	94.1
Using RGBA buffer, rendering nothing but white triangles, standing still
Feb6	W98	M1	SVGA16	GF	256	64	R	3000	230.7
Using RGBA buffer, rendering nothing standing still 3.68 drivers
Feb6	W98	M1	SVGA16	GF	256	64	R	3000	340.7
Perf1 (Far clip 500, Progdist -10)
Jan20	W98	M1	SVGA32	GF	256	64	R	3000	61.7
Perf1 (Far clip 150, Progdist -10)
Jan20	W98	M1	SVGA32	GF	256	64	R	3000	61.0
Working Merge queue and progressive calculation.
Jan5	W98	M1	SVGA32	GF	256	64	R	3000	58.0
Jan5	W98	M1	SVGA32	GF	256	64	R	4000	49.6
Rendering nothing, only blanking screen standing still.
Jan1	W98	M1	SVGA32	GF	256	64	R	3000	220
Mesh only, No texture/light standing still.
Jan1	W98	M1	SVGA32	GF	256	64	R	3000	150
Mesh only in wire frame standing still.
Jan1	W98	M1	SVGA32	GF	256	64	R	3000	8
Full detail standing still.
Dec31	W98	M1	VGA32	GF	256	64	R	3000	101
Dec31	W98	M1	SVGA32	GF	256	64	R	3000	85
Dec31	W98	M1	XVGA32	GF	256	64	R	3000	45
Split Queue only.
Dec14	W98	M1	VGA16	P2	256	64	R	3000	17.3
Dec14	W98	M1	VGA16	P2	256	64	R	1500	29.3
Optimized memory usage.
Dec4	W98	M1	VGA16	P2	256	64	R	3000	27.3
Dec2	W98	M1	VGA16	P2	256	64	R	2000	32.3
Dec2	W98	M1	VGA16	P2	256	64	R	1500	37.3
Dec2	W98	M1	VGA16	P2	256	64	R	1000	43.3
Dec2	W98	M1	VGA16	P2	256	64	R	500	52.3
Dec2	W98	M1	VGA16	P2	256	64	A	1000	151.3
Optimized priority function/removed ddgQueue.
Nov24	W98	M1	VGA32	P2	256	64	R	1500	29.7
Nov24	W98	M1	VGA16	P2	256	64	R	1500	43.4
Jan4	W98	M1	VGA16	GF	256	64	R	1500	82.3
Jan4	W98	M1	VGA16	GF	256	64	R	2500	57.3
Jan4	W98	M1	VGA16	GF	256	64	R	3000	51.6
NVIDIA GeForce P550 XEON.
Nov3	W98	M4	VGA32	GeForce	256	64	R	3000	31.4
Nov3	W98	M4	VGA16	GeForce	256	64	A	3000	44.5
NVIDIA TNT2 P550 XEON.
Nov3	NT4	M3	VGA?	TNT2	256	64	R	1000	91.9
Nov3	NT4	M3	VGA?	TNT2	256	64	A	1000	154.4
Improved vertex arrays (really sharing vertices).
Sep12	W98	M1	VGA32	P2	256	64	R	1000	25.4
Sep12	W98	M1	VGA16	P2	256	64	R	1000	35.4
Progressive rendering.
Sep12	W98	M1	VGA16	P2	256	64	A	1000	113.8
Sep12	W98	M1	VGA16	P2	256	64	R	1000	40.8
No Progressive rendering.
Sep12	W98	M1	VGA16	P2	256	64	A	1000	105.0
Sep12	W98	M1	VGA16	P2	256	64	R	1000	38.9
Optimized SplayTree with simplified SplayKey.
Sep08	W98	M1	VGA32	P2	256	64	A	1000	104.0
Sep08	W98	M1	VGA32	P2	256	64	R	1000	22.0
Sep08	W98	M1	VGA16	P2	256	64	R	1000	35.3
Sep08	W98	M1	VGA16	P2	256	64	R	3000	22.6
Sep08	W98	M1	VGA32	P2	256	64	R	3000	17.9
Sep08	W98	M1	VGA32		256	64	R	1000	1
Sep08	W98	M1	VGA16		256	64	R	1000	5
Integrated SplayNode caches and Vertex Arrays
Sep05	W98	M1	VGA32	P2	256	64	R	1000	25.7
Render Near to Far
Sep05	W98	M1	VGA32	P2	256	64	R	1000	23.4
New clock
Sep02	W98	M1	VGA32	P2	256	64	R	1000	24.7
New clock
Sep02	W98	M1	VGA16	P2	256	64	A	1000	104.0
Aug31	NT4	M2	VGA16	P2	256	64	R	1000	29.6
Aug31	NT4	M2	VGA16	P2	256	64	A	1000	97.3
Aug31	NT4	M2	VGA16	P2	256	64	R	1000	31.3
Aug15	W98	M1	VGA32	P2	1024	64	A	3000	18.8
Aug15	W98	M1	VGA32	P2	1024	64	R	3000	12.8
Aug15	W98	M1	VGA32	P2	1024	64	A	1000	17.8
Aug04	W98	M1	VGA32	P2	256	64	R	1000	18.2
Jul26	NT5	M1	VGA32		256	64	R	1000	4.2
Jul26	NT5	M1	VGA32		256	64	A	1000	63.1
Jul26	NT4	M2	VGA	P2	256	256	R	1000	31.4
Jul26	NT4	M2	VGA	P2	256	256	A	1000	62.8
Jul26	NT4	M2	VGA	P2	256	128	R	1000	31.4
Jul26	NT4	M2	VGA	P2	256	128	A	1000	62.8
Jul26	NT4	M2	VGA	P2	256	64	R	1000	31.4
Jul26	NT4	M2	VGA	P2	256	64	A	1000	62.8
Jul24	NT5	M1	VGA		256	64	R	1000	4.2
Jul24	NT5	M1	VGA		256	64	A	1000	58.0
Jul14	NT5	M1	VGA		256	64	R	1000	2.8
Jul14	NT5	M1	VGA		256	64	A	1000	10.6
Jul14	NT4	M2	VGA	P2	256	64	R	1000	6.7
Jul14	NT4	M2	VGA	P2	256	64	A	1000	??.?
Jul14	NT5	M1	VGA		256	64	R	500	3.5
Jul14	NT5	M1	VGA		256	64	A	500	23.0
Jul07	NT5	M1	VGA		256	128	R	1000	2.8
Jul07	NT5	M1	VGA		256	128	A	1000	9.7
Jun13	NT5	M1	VGA	P2	256	128	R	?500	17.0
Jun13	NT5	M1	VGA	P2	256	128	A	?500	18.0

Memory Usage

Besides the fixed overhead for rendering buffers, the following memory requirements scale with terrain size:

Height samples 2 bytes * rows * cols.
Min/Max hulls 2 bytes * rows * cols * 1/8.
Total memory usage per data point is 2.25bytes.

Record

Total memory used without texture.

Mode	256x256	512x512	1024x1024	2048x2048	4096x4096	Date
Orig	2 349 764	9 042 804	37 391 060			Nov 99
New MTri	2 026 964	6 752 004	26 441 060			Dec 99
New Normals	2 505 236	4 916 996	16 741 732			Dec 99
Reuse bufidx	2 084 236	4 515 196	14 896 732			Dec 99
DelayBit	2 010 436	4 113 396	13 051 732			Dec 99
V4 with hulls	2 359 296					May 00
V5 with error	1 772 612	3 817 120	12 612 856	47 795 872	188 527 864	Aug 00
V5 no redundency	1 293 108	2 649 360	6 708 072	22 942 992	87 882 600	Aug 00
Shared caches	1 207 184	1 953 512	4 938 896	16 880 360	64 646 288	Sep 25
V6 real blocks	259 200	1 036 800	4 147 200	16 588 800	66 355 200	Jan 17
smaller hulls	168 544	674 176	2 696 704	10 786 816	43 147 264	Jan 28