After a first run of adaptation for Tengoku, the thing's performance was dismal; it couldn't do 5000x5000 at all, on a 3.3 ghz computer. (The old version also couldn't do objects bigger than 256x256, but the recent fixing of that is totally not the reason for the slowdown.) So, I felt forced to start optimizing to get it usable ... and here's what I did:
- Sped up LIST:ADDTO; all it really needed to do was store the given thing into the last cell and increment the length field.
- Sped up LIST:EACH ; for some reason using an older, longer version of the routine sped things up just itterating objects by around 40%.
- Extended ENTITY (which affects memory usage globally ... so I guess that now I'm specializing for 2D...)
- Cached absolute rectangles for each object.
- Added BOX@ and used that instead of @BOX. (a dumbly named word) which leads to an assumption that the HITRECT attributes point to BOXes (which kind of defeats the purpose of IRECTANGLE ...)
- Sped up WORDMAP:LOOKUP by turning it into a CODE word; 2 hours. :(
All that work and eventually, performance was significantly improved. 1500x1500 objects in 7ms, when at first it couldn't do that many at all (it led to a spiral of death). It's still not good enough, in my opinion. I keep thinking something is wrong. I might do an more intense investigation into exactly what is happening; how many objects each object actually checks, how long each check takes...
But anyway, for the last thing I wanted to write about, I learned something amazing, and somewhat troubling. That the following piece of code turns a loop that took 1.5 ms into 6.5 ms!!!!
: ASDF DROP ;
: FDSA ['] ASDF MYLIST EACH ;
: FDSA ['] ASDF MYLIST EACH ;
Why? The only conclusion I can take from this is that modern, prefetching, superscalar processors hate, detest, and abhore very short subroutines. Granted most of my functions aren't that short but some get pretty close! It worries me because sometimes it has a good reason, such as the virtualizing mechanism that lets any component be treated as it's owner, where you have routines like this:
: POS EDATA >_POS ;
Which comes out to a CALL and an ADD instruction. Seems to me that this MUST MUST MUST be inlined. I'll try that out when I get home.
No comments:
Post a Comment