Tengoku Engine: Optimizing: Global Collision System

Over the weekend, after working on the Studio application a bit, I got a little obsessed over the global collision system. Just a little bit of background; this is the system for partitioning the game's space into a grid of sectors and so culling down the number of objects it needs to test for overlap, using simple hitboxes. Consider it the broad collision phase. Also, this code was written for an engine from a few years ago, when I didn't use formal object orientation, or polymorphism, or components. As a result it was very fast - on a 1.7ghz machine, it could do 5000x5000 object tests at 60fps (and drawing ALL 5000 of them using Allegro, with CPU to spare).

After a first run of adaptation for Tengoku, the thing's performance was dismal; it couldn't do 5000x5000 at all, on a 3.3 ghz computer. (The old version also couldn't do objects bigger than 256x256, but the recent fixing of that is totally not the reason for the slowdown.) So, I felt forced to start optimizing to get it usable ... and here's what I did:

Sped up LIST:ADDTO; all it really needed to do was store the given thing into the last cell and increment the length field.
Sped up LIST:EACH ; for some reason using an older, longer version of the routine sped things up just itterating objects by around 40%.
Extended ENTITY (which affects memory usage globally ... so I guess that now I'm specializing for 2D...)
Cached absolute rectangles for each object.
Added BOX@ and used that instead of @BOX. (a dumbly named word) which leads to an assumption that the HITRECT attributes point to BOXes (which kind of defeats the purpose of IRECTANGLE ...)
Sped up WORDMAP:LOOKUP by turning it into a CODE word; 2 hours. :(

All that work and eventually, performance was significantly improved. 1500x1500 objects in 7ms, when at first it couldn't do that many at all (it led to a spiral of death). It's still not good enough, in my opinion. I keep thinking something is wrong. I might do an more intense investigation into exactly what is happening; how many objects each object actually checks, how long each check takes...

But anyway, for the last thing I wanted to write about, I learned something amazing, and somewhat troubling. That the following piece of code turns a loop that took 1.5 ms into 6.5 ms!!!!

: ASDF DROP ;
: FDSA ['] ASDF MYLIST EACH ;

Why? The only conclusion I can take from this is that modern, prefetching, superscalar processors hate, detest, and abhore very short subroutines. Granted most of my functions aren't that short but some get pretty close! It worries me because sometimes it has a good reason, such as the virtualizing mechanism that lets any component be treated as it's owner, where you have routines like this:

: POS EDATA >_POS ;

Which comes out to a CALL and an ADD instruction. Seems to me that this MUST MUST MUST be inlined. I'll try that out when I get home.

Tengoku Engine

Project Updates

Monday, December 6, 2010

Optimizing: Global Collision System

No comments:

Post a Comment