Octree Terrain Engine
As it’s yet again been some time since I’ve last posted anything, I’ve decided to post some info on my Octree terrain engine that I’m submitting sometime later this week as an assessment piece. Below I’ve copied and pasted directly from a document that I’m submitting as part of a detailed analysis of my development cycle.
I’ll see if I can upload a demonstration executable along with its source code later when I revise the code a bit more, as right now it’s a bit of a mess with comment blocks everywhere.
GAM204 Octree Technical Analysis
By Johan Rensenbrink
As I’ve gone a far bit over the word count I’ve split the major topics into sub-sections, so feel free to only read whatever seems interesting, or whatever you need to. As I’ve revised my octree project after creating this document I may have repeated myself a little bit, but I’ve tried to make sure that if this happens it’s no more than ~50 words of repetition.
Planning Stage – What to make?
Although performing a physical simulation was something that seemed quite interesting to me, I had decided to do some form of optimization technique as I personally thought that it will most likely be more useful and relevant for further programming in QANTM, since there aren’t frameworks or packages that do these things for us, unlike physics.
Narrowing that down, that brought me to a range of different choices. Quadtrees, octrees, portals and so forth. I was mainly interested in the area of quadtrees and octrees, as I had predicted that portals would be somewhat overwhelming with having the addition of occlusion culling. I decided to start simple, and have a go at quadtrees instead, with the idea that once I figured out the basics of that to move on with an octree.
Creating the Octree I have now – In big steps
I started the project where I left off with the previous assessment. I did not have any form of optimization, and in this case a quadtree wouldn’t be very useful without some kind of frustum culling. Fortunately, I discovered it was surprisingly easy to implement such a thing with DirectX helper functions such as D3DXPlaneDotCoord and had no issue implementing a few different forms of collision functions involving points, spheres and boxes. For the octree itself I ended up using a sphere and the eight corners of each leaf node to minimize false-positives.
My decision to upgrade to an octree actually came when I started to draw the quadtree outlines using with debug lines. I set two fixed Y values to make it easier to interpret it as a box instead of infinite lines which seems confusing in a way. Here I realised that an octree was literally just adding one dimension, and increasing the children from four to eight (quad -> oct, whoah!). Within minutes I had a something working already.
Debug features everywhere!
For debugging, I have a range of options, such as ‘halving’ the view frustum that makes it possible to see terrain patches around the edges of the screen dissapear as well as walking on the terrain to simulate a practical scenario. The most useful debug feature I had was using a console window and directly feeding that with std::cout from the good old days of ‘Hello World’.
Terrain Dependency and Issues with the Terrain
My octree ended up being very dependent on the terrain class. Unfortunately that isn’t particularly flexible but it was perfectly sufficient for the purpose of this demonstration. The terrain object itself would instantiate and manage its own octree. The reason for this was to easily import settings from the terrain without having to change this as I was trying a lot of small changes to fix things and improve things.
One issue that turned up, and wasn’t really suprising to find was the matter of the terrain causing splits between nodes. Rather than implementing a system that would have to connect each patch and probably leading to somewhat of a headache-inducing night of programming, I decided to give each patch an overlap of one terrain unit size. This way, everything connects wonderfully and even with ~800 nodes, aka 256 triangles max of a 30,000 triangle terrain I only gain around 1,000 triangles which seems very nice. Gaining a 30th in mesh complexity does not seem like it would ever create a performance issue even in complex situations (20fps – 0.67fps = who cares?).
Terrain splitting, argh!
As an additional feature I had a look at a simple LOD interface. Over distance from the camera, the terrain would become ‘simpler’. This was working great within minutes but brought me to a new issue. Terrain patch borders would have unequal heights. One solution I read about was adding ‘skirts’, but as my quadtree was using index buffers instead of vertex buffers for each patch this was not possible. After an hour or so I had my terrain grid working with borders, only to realise the same problem would occur with the borders and the inner sections so I decided to skip the idea of LODs for the time being. If I was to continue trying to solve the issue I would possibly end up switching from index buffers per tree patch to vertex buffers, either to create cheap ‘skirts’ or otherwise somehow better synchronize the height of the patch borders to account for the lack of vertices in neighbouring patches, which seems like the best solution (flattening two vertices for each one vertex of the ‘lower’ quality neighbour patch)
Index Buffers and Vertex Buffers
One more thing I wanted to look into was optimising my octree as one of the last tasks. I had a look at making both a large index buffer and vertex buffer for the entire terrain, with each patch simply adding indices to the index buffer during the render function. The idea was to reduce the draw calls from potentially hundreds down to a single call, without sacrificing the whole point of the quadtree culling out leaf nodes that can’t be seen. A major downside was that the index buffer would have to be locked/unlocked each render call rather than only once, therefore I was somewhat unsure if this would still be an improvement over multiple draw calls. In the end I ended up doing this which improved performance and incredible amount (4FPS -> 200FPS rendering ~800 nodes)
During the initial stages of creating a quadtree (before the octree itself) was deciding how to split the terrain up. The initial idea was to use a large index buffer for the terrain and a vertex buffer for each patch after researching this on the internet. I ended up inverting this and having a large index buffer with a vertex buffer for each patch as it made it easier to control small changes I made to the terrain and it ended up working fine. However, using index buffers for each patch did prevent the idea of a ‘skirt’ for the LOD I was talking about earlier. Although I haven’t done this, I think the index buffer does potentially solve connecting ‘lower’ terrain leaf nodes to ‘higher’ ones if I did not make the terrain height-independent.
Issues During the Final Stages
There were two issues that remained within the final version of my octree terrain engine. Essentially this was due to an issue with generating each terrain node but despite several attempts of resolving this problem I just couldn’t find the source that caused it. The triangle count of the rendered terrain increases depending on how many grids are visible at once. For example, with only one node, there are 30,258 triangles being rendered. With 729 nodes (from splitting above 256 triangles per node) there are 72,826 triangles, more than double than the original terrain.
The second issue, which is most likely linked but not entirely caused by the first one, is performance. When there is one node for the entire terrain, I have 180FPS displaying the entire terrain. With 729 nodes, I have only 4FPS. It’s extremely unlikely that rendering 60,000 triangles could cause this big of a performance hit, seeing as most modern games easily render over a million per frame.
Fortunately, I was able to resolve both of these issues. The first issue was caused by incorrect calculation of node boundary boxes, which was extremely odd as they rendered correctly despite some very wrong values. Now with 729 nodes I have roughly 31,000 triangles. The extra triangles are create by overlapping the borders of nodes to compensate for connection issues (gaps otherwise form).
The second issue was resolved by removing the index buffer for each patch and having the quadtree manage one vertex buffer and one index buffer. Octree nodes would now only contain index data without their buffers. This has led to a huge performance increase, with 280FPS (100FPS more than displaying nothing with individual index buffers despite having nothing to draw!) and around 200FPS with the entire terrain visible.