ICEngine and Terrain Generation - June 10, 2012 by Copey


My new engine is finally in a decent enough position that I can show something. :D


I’ve got most of the core systems set up, things like texture loading/management, render system agnostic shader management and static model rendering. Now that that’s working I figured I’d deviate from the “plan” and make something fun so I’ve added terrain generation to the engine. Terrain of any size can be manipulated using Simplex Noise, generating some really nice looking terrain. I’m planning on adding some other affects in the next couple of weeks too. I’ve also added some per fragment Blinn-Phong lighting to make the terrain look a little more interesting. I may have overdone it on the terrains shininess however…


Anyway, short post – I mostly just wanted to post a screenshot. Also you can check the engine itself out on BitBucket.

Mercurial and ICEngine source. - May 5, 2012 by Copey


For the past several years I’ve been using SVN without any complaints. However I’ve heard many people going on about git and mercurial being superior but I’ve never got around to actually trying them out myself. Now that I’ve started working on ICEngine however I need a decent, fast repository host and can’t seem to find any for SVN. So now seemed like the perfect time to try out one of the others.


I settled on Mercurial since it sounded simpler. So far I’m liking it, thought I’m not really seeing the huge benefits over SVN. I guess I’d have to be working on a team project to truly appreciate it.


Anyways, I’ve got the ICEngine source publicly hosted so feel free to check out the source code :) The engine is included as a sub-repository in the skeleton project which can be found at https://bitbucket.org/AzCopey/icengineskeleton.

ICEngine - April 28, 2012 by Copey


I’m abandoning my Windows Phone 7 game!


I had been really enjoying building it. The windows phone development tools are fantastic, and development in XNA is insanely easy and yet I’ve just not been able to pick the project back up due to a nagging feeling. I’m constantly all too aware that the code base is not going to be reusable since Windows Phone 8 will be going native. For ages I’ve wanted a nice code base that I could use for multiple projects and one of the main goals of the project was to end up with a nice re-usable engine. This is not going to happen with this project however so I think its time to shelve it.


Limiting myself to one platform and only technology is also just not as fun. If I had a C++ engine I could easily incorporate new interesting technologies into it. For example I’ve been really wanting to do something with Google’s Native Client. And so ICEngine was born.


ICEngine is a small C++ based engine. It is being designed from the offset to be able to handle multiple platforms – though only windows will initially be supported. Once I’ve got the engine up and running I’m planning on adding a Native Client support so anything I build can easily be demonstrated on the website. Although I’ve only been working on it for a week or so progress is coming along nicely! As soon as I have anything shiny to show I’ll make sure to post about it.

Absences and Updates - August 24, 2011 by Copey


Due to some long hours at Tag and some more important stuff going on at home, I’ve not updated the site or really been working on my Windows Phone 7 game for a couple of months. The project has not been shelved however! I’m hoping to be starting back at it in full swing soon and there has been quite alot thats changed since the last post. This includes things such as enemy AI/pathfinding, the starting of a frontend and the realisation that my gui system is abit crap and needs to be fixed up before any more frontend goes in :-P.


For the time being I’ve updated the portfolio section of my website quite a bit to reflect everything thats happened in the last year. Unfortunately theres several things I can’t really talk about yet, but these will be getting added in the near future! Anyways thats all for the moment, hopefully I’ll manage to get back on my game soon and start the post rolling again! :)

Particle Effects - May 8, 2011 by Copey


This week I’ve been working on a particle system for my Engine. As my artistic skills are somewhat lacking I’ve been planning on using alot of particle effects to try and make the game look good. I’m hoping by having good looking particle effects for everything from explosions to shooting stars in the background it can stop the game from looking bland. Due to this, having a versatile and efficient particle system is fairly important to the project.


The first thing I considered was that to have such a wide variety of particle effects they needed to be as customisable as possible. I decided upon a system where each particle effect would have multiple stages. The first stage would always be the “emission” stage, where a particle is created. There can be several types of emitter, for example a Box Emitter in which a particle is randomly created anywhere with a specified box in 3D space. Custom Emitters can easily be added. Each stage after this is referred to as a “Phase”. A phase it built out of multiple “affectors” which each act upon a differing property of the particle. For example there are Velocity Affectors which change the position of the particle by velocity, Scale Affectors which affect the scale of the particle and of course there are many others. In addition custom affectors can easily be defined allowing for very specific particle behaviours.


Of course a versatile particle system is only good if its efficient so I implemented a nice batch renderer for the particles. This used the cheap billboarding method of building the vertices of the quad using the “up” and “right” vectors pulled from the View Matrix. When run however, I was only managing to get about 50 particles only screen before heavy slow down began to occur. This took me a good amount of hair pulling to solve. In the end it turns out that DynamicVertexBuffers on anything other than PC’s (so this applys to Xbox as well as WP7) are very inefficient. I had been using DynamicVertexBuffers for any data that was changing on a frame by frame basis thinking that this was logically the most efficient way to do it. Instead SendData() was causing a large bottle neck. It seems that sending data each frame using DrawUserIndexedPrimitives is significantly more efficient. Where I had been getting 50-100 particles at 13-14 frames per second(with rendering the rest of my scene of course) I am now getting 1500-2000 at 23-25 FPS. A significant improvement :D.


The last thing I had to do here was to make particle effect development as easy as possible. I have plans to make a particle effect tool in the future, but for the time being simply making it data driven will have to do. After realising that simple config files would not really be a suitable method of representing this system I took the advice of a colleague at Tag and used a XML/Config hybrid. XML tags are used to break up each section and to retain customisablity but key-value lists are used to represent data for readability. This would look something like this:

 
<ParticleEffect>
    <Emitter Type=”Sphere” MaxParticles=”10″>
        Radius = 3
    </Emitter>
    <Phase MinLifetime = “1.0″ MaxLifetime=”2.0″>
    <Images>
        CircleImage
    </Images>
    <Affector Type = “Velocity”>
        MinVelocityX = 1.0
        MinVelocityY = 1.0
        MinVelocityZ = 1.0
        MaxVelocityX = 2.0
        MaxVelocityY = 2.0
        MaxVelocityZ = 2.0
    </Affector>
    </Phase>
</ParticleEffect>


Now that I’ve got all the basic functionality for the engine working my next task is tidying up the game code and starting to make the actual game. I’m hoping to have a basic example level created by next week. Finally, heres a screenshot of some random test particles :)

 

Physics and many other things… - April 17, 2011 by Copey


While I’ve been lazy and not bothered to blog about progress in the last month or so, that doesn’t mean I’ve not made a huge amount of progress since the last post. In fact, I’ve got way too much to write about so I’ll just give a brief run down then go into detail on one topic.


The last time I posted I had finished up with the rendering side of my Map Exporter and was able to output my map to a custom file format, however I could not yet load the map into the game. Since then a great deal has changed. First of all I got the map loading in place which surprisingly went smoothly first time. After map loading I started working on more basic functionality for the engine such as input and GUI rendering. With the help of a rather neat little tool called TexturePacker it was increadably easy to get a quick and easy to use GUI renderer with Texture Atlas functionality.


From here it made sense to start working on getting Entity management in. I’ve chosen to go for a “component” based system whereby an entity consists of multiple components. Each of these components descibe a single peice of functionality – Rendering, Movement, Health Management, etc – and allows for as little duplication of functionality as possible. For example it would be possible for all entities which can die to share the same Health Management component rather than having each Entity handle its own health.


In addition to this I’ve also added skybox functionality to the engine, lighting management and numerious performance improvements. After all this I proceeded onto the last task I’ve been working on, and the one I Intend to talk more in detail about: Physics.


Physics Engines are something I’m no stranger to, having based my honours project on the development and optimisation of one. Due to this i decided to build one from scratch rather than using an out of the box solution. The main reason for this was that I do not need a fully functional realistic physics engine, nor do i want the performance overhead that comes with one. I do however want some basic 2D physics, and an efficient system to handle it. This lead to the following specs for the engine:

  • Accurite collision between circles and rectangles.
  • Ability for complex objects to be built out of combinations of these shapes.
  • Two different types of object: Static, which will never move or itself collide with other objects, but others CAN collide with it; and Dynamic, which can collide with anything and will inact a response.(Note: there is a potential third: a Trigger. This would be similar to a static object, however which a dynamic object collides with it, it would not bounce back.)
  • Objects can be bundled into groups, and can choose which other group it can collide with. E.g. The players bullet can choose not to collide with the player, only enemies.
  • A realist collision response, but with no rotational component for the sake of performance.
  • Collision callbacks, so that objects can define their own response. E.g. A bullet killing an enemy.
  • System wide point and directional forces for explosions, vortexes and “space winds” (for want of a better term :P).
  • No runtime allocation of memory.
  • And of cource, as efficient as possible.


All of this is now in place (with the exception of the Trigger object) and working well. Efficiency has been the focus of this and as such some corners have been cut. For example, a collision between a circle and a rectangle will only respond realistically if the center of the circle has not intersected the rectangle, instead it will only be a very rough approximate. Of course the system also does not handle the situation where an object has completely passed through another in the course of a frame, again becuase of performance… and also becuase I’m too lazy to implement that :P


In any physics system one of the main ways to increase efficiency is through coarse collision checks. Coarse collision is the process of ruling out objects which are clearly not in collision prior to performing a proper collision check on them. Obviously this greatly helps as if you have 1000 objects in the scene there would be 1 million collision checks without course collision. By ruling out all objects that are definately not in collision, this can be reduced to a tiny fraction of that.


I spent quite some time researching differing methods of course collision, considering Quadtrees, Binary Space Partioning and Spatial Hashing. I eventually decided on spatial hashing as this seems to me to be the best one for dynamic objects. Rebuilding of the trees each frame in the other methods simply sounded way too slow.


Spatial Hashing works by converting the positional data of an object into a linear table. This is acheived through a hashing function. A hashing function takes in data and converts to a a seemingly random ouput – however it only seems random, If you supply the same data you will always get the same output. What this means is that all objects which are in the same position (or rather the same cell in a 2D grid) will be placed into the same slot in the table. It can then be presumed that any objects which are NOT in the same slot are not in collision. The hashing function I’ve used is an integer hash called “FNV-1a” which is free to use and can be found here.


The use of spatial hashing (plus several other physics and rendering optimisations) has seen huge improvements in the number of objects that can be in a single scene. My test scene consists of roughly 3-400 Static objects. Prior to these optimisations I could have arround 80 dynamic objects at arround 15 FPS. Now with the same framerate i can have over 1 thousand! :D I also remain at a stable 30FPS up until arround 650 dynamic objects. This is fantastic as I had been aiming for 150-200 dynamic objects in any one level.


Anywho, this has been quite a wall of text so I’ll leave you with a screenshot! :-)

Baked Lighting and 3D Borders - March 20, 2011 by Copey


This week I’ve been finishing up my map exporter and the results have been pretty good. I’ve managed to get the point where I’m considering this iteration as complete. As with everything though, there’s still a lot more that it needs to be able to do, but there’s enough there for me to be loading something into the game and get on with actually creating the game :)


First things first, I’ve broken the map mesh up into an arbitrarily sized grid. This means that once its imported into the game only the sections that are on screen will need to be rendered. This actually proved to be a pretty difficult task, with me going through several differing methods before starting again and ending up with something pretty simple. Originally I was trying to take the outline data (the outline for the tessellation algorithm that I talked about in my last post) and break this up, however it was proving quite difficult due to all the specific scenarios it would have to handle. The method I finally went for was to go back to the tiled data, and break this up into sections, then pass all of these through the algorithms I talked of last time. It took quite a bit of re-jigging to get it to work, but I’m happy with the results.


Next I added borders to the map, to make it look more 3D. This was actually pretty simple in the end, I simply iterated through the outline data and added an extra quad to the mesh for each vector. It was then just a case of tidying up the corners and I had a nice looking 3D map!


Of course no 3D mesh looks 3D without lighting. Since performance is an issue, I want to be doing as few lighting calculations at run-time as possible, so I’ve baked the lighting into the vertex colour data. This also means I can have as many point lights in the map as I want, as long as I can come up with some way to fake the lighting on any moving objects moving passed them. Check out the screen shot for a nicely lit example :)



I’ve now got this all outputing to a nice and clean binary file ready for me to import it into the game for next week! So hopefully I’ll have some screenshots of it running on device for next week. :-)

Level generation and Tessellation - March 6, 2011 by Copey


I’ve been thinking I should be trying to keep this site updated weekly, so heres my next “exciting” installment :P. I’ve been making some good progress on my Windows Phone 7 game, with the beginnings of a decent relatively efficient and easy to use engine. I’ve moved away from the engine for the time being however, and I’ve been developing a tool for level creation. I wanted something that would allow for easy level building as the intention is to have a lot of 5-10minite levels. Without a simple to use tool I’d be spending all my time designing levels, and no time actually building the tech – I’m a programmer, not a level designer!


So with ease of use in mind I decided that something tile-based would be a good approach and means that I can use a third party tool for the building interface. I quickly settled on Tiled as it seems to be about the best tile-editor out there. Tiled is a nice clean and simple tool that can generate tile-maps for litrally any size of map, which is good as I’m still not all that sure how big my maps will be. I’m imagining they’ll be pretty big as the game should progress pretty fast. Its also free to use which is of course important ;-). Tiled can be found here. Heres an example of a very small test map I’ve been using.

 



With a map editor, I then needed some way to convert this to something I could use in game. First of all I don’t want the game to be 2D so the 2D tile-map needed to be converted into a 3D mesh. Second of all i don’t want the game to look like its tilebased, so this meant altering the mesh to de-tileify it. I delt with the latter simply by rounding all the corners, and its worked pretty well. I’ll be doing more to de-tile it, but more on that later. The former however was alittle more chalenging.


I had two options. First of all I could convert each of the tiles into a 3D respesentation and have a tiled approach. Then for rounding the corners I can simply find the corners and chop a part off. The other option was to figure out the boundries of the map and then use a tessellation algorithm to calculate the mesh. The second option is significantly harder so I went with the first.


This tiled approach worked successfully, and I managed to end up with a nice looking level map. There were problems however. The first and more minor was that it was proving difficult to smooth off concave angles in the map. It wasn’t impossible, but I couldn’t get it working and I bailed on it it due to the next problem. This approach led to a huge number of triangles in the mesh. The tiny tiny example map ended up having 716 vertices and 1056 indices. Thats 352 Triangles. Okay that doesn’t sound like much, but when you consider that I’m expecting my level maps to be 100x bigger than that I’m sure you can see the problem. My previous post reflected on the fact that the phone was capable of rendering quite large numbers of triangles, however those were in test cases and wont apply to a full game. I’m estimating that I’ve got 10000 – 15000 or so triangles to work with for my entire scene, so I don’t want to be wasting half of that on just the ground. The following are some screenies of the ground built using this method – one with solid shading, and one wireframe to show of the nice pattern ;-)

 



With this deemed as a no-go, I then considered the tessellation method. I’d previously written an ear-cutting tessellation algorithm at Realtime Worlds for tessellating roads. I couldn’t quite remember how it worked, but I found a good paper describing an implimention which was enough to jog my memory. The paper called Triangulation by Ear Clipping by David Eberly(I’m faily certain I’ve also read a paper on physics calculations by him. He’s clearly a knowledgable man :P) can be found here. This meant the actual tessellation part wasn’t so difficult, but when I’d implemented this before I did not have to generate the outline data. This proved quite a challenge. I eventually settled on a method where I would simply find the most top-left tile in each polygon and work anti-clockwise arround it noting each corner. I would then do the same clockwise for each hole in in the polygon. I then calculate the nearest point in the outer polygon to the start of the inner polygon, and slot the inner polygon in here. This successfully leaves a simply polygon that can be tessellated due to winding the inner polygon the opposite way to the outer. What we end up with looks something like this(Note, this graph is for a different input map. I built this for debugging and couldnt be bothered building annother for the updated map):

 



After all that I then proceeded to pass this to my tesselation algorithm and like magic a level mesh is produced! This producted far greater results. 60 vertices, 174 indices which comes to 58 triangles. Thats 6 times less, meaning 6 times bigger levels! There are potential problems however. The algorithm outputs alot of long thin triangles, which I believe less powerful hardware can have trouble texturing correctly. I’ve not yet got any output from the tool that I can load into the game on device, so I’m not all that sure how the phone will handle rendering it. I’m hopeful however, and there are ways i can ammend the algorithm to try and create wider angles on the triangles. Again heres a solid and wireframe representation.

 



So thats some good progress this week. There are still alot of things that need to be done to the tool before its finished however. First of all I need to break the level polygons up into a grid so that I can omit sections that are offscreen when running on the phone. I also need to do some improved de-tile-ifying (which is totally a word :P) and add a border to the edges. The borders will appear to be rounded and should make the level mesh look significantly more 3D(since at the moment it is completely flat… not much change from 2D :P). I plan on baking lighting into the level mesh here and allow for scenery objects to be placed and batched into the level mesh. Finally, I’ll be adding functionality for processing object start positions and such so enemies and pickups and whatnot can also be added to the level through Tiled.


Anyways, this has turned into one mamoth post, hopefully someones actually endured and made it this far :D. My next week at work is going to be crunch-tastic, so I probably wont have as much to write about, but hopefully I’ll have made at least alittle progress :)

New Theme and Windows Phone 7 - February 27, 2011 by Copey


I’ve not updated this site in ages and now that I have something new to say I felt it was time for a change in setting. I’ve updated the theme to something alittle less… bland. Unfortunatly the switch broke pretty much all the previous content, but I’ve now fixed most of it. I’m lazy however, so I’m not bothering to fix very old posts so they’re mostly still broken. Anyways, let me know what you think! :)


For the last while I’ve been wanting to develop a small project myself – something where I’ve built every aspect of it (Unfortunely for anyone playing the finished product, this includes art :P). I wanted it to be something I could release and possibly make a few extra quid from however so I wanted to make something for the mobile market.


Now, as much as I’d love it to happen, its very unlikely that i’m going to wake up one morning and find that a Mac has misteriously appeared in my flat. That rules out iOS development. Android development is free and can be developed on PC quite happily so I considered this. However from my experience at Tag, Android is quite frankly awful to develop for so I was looking for an alternative. Then I looked into Windows Phone 7 development. I’ve got plenty XNA experience and I found I was able to acquire a device on a contract paying less than I was for my old non-smartphone.


The big problem here is of course that currently the market is no where near as strong as either android’s or apple’s, but I’m not counting this as a huge problem. First of all I really doubt the game will sell on any marketplace and I’m more intrested in the development of, than the sale of the game. Second of all, this could all change at some point especially with the news about the Microsoft / Nokia partnership.


Anywho, I’m now a regestered Windows Phone 7 developer (and Xbox live indies, but I doubt I’ll be releasing anything there anytime soon) and I’ve started development. Having developed for iOS, android(*shudder*…) and now WP7, I’ve got to say WP7 is by far the easiest to get into. Going from starting your first project to building on device litrally could not be simpler! The simulator is great too. It seems to be every bit as good as apples version, and lightyears ahead of androids laughable equivalent. Its also fantastic to be back in Visual Studio after many many months of the far inferior xCode, although i do miss a couple of the features from Eclipse.


I do have a couple of niggles though. The main one is that if u try to run on device while the device is locked, it fails with a truly horrible and annoying error messege. This also happens with android(Well sort of. Andriod just crashes…) It doesn’t sound like much, but I have had that irritating sounds revertebrate over my music every time I’ve tried to build. The other problem is that you can only have 3 devices bound to one developer account. This is rubbish compared to apples 99, and android infinite. This is expecially annoying considering that most developers would want to be testing on each different device, and theres more than 3 types. This doesn’t really affect me however, since I’ll just be developing on my HTC 7 Trophy.


So far development of my game is going very smoothly. I’ve been doing some unit testing to see what the device is capable of and thinks are looking promising. I’ve been testing using a 50 vertex model (40 Triangles). It’s textured and lit with 1 directional light. I first of all tried rendering individual instances of the model and managed to get 500 running at about 15FPS. Thats 25000 vertices in 500 draw calls at an almost playable framerate. With lighting. Not bad :D. I then tried batching this into a single draw call, this proved to be significantly better managed to get 1000 instances of the model at 15FPS. 50000 vertices on this little device is fantastic. And I’m sure it would be quite abit more without lighting or texturing however it was about 3am by this point so I called it a night :)


Anyway thats all for now, but I’m going to try and keep this thing up to date with the project and I’ll be getting screenies up as soon as I can. :-)

First Attempt at VFP Unit Code - April 26, 2010 by Copey


As I’m now nearing the end of my Honours Project, I’m focusing on optimising it. One of the main ways I have intended to optimize the project is to use the iPhones Vector Floating Point Unit and ARM assembly for some of the floating point math.


I’ve just completed my first test, and its been moderately successful. I’ve not properly timed yet, however it appears to have yeilded a 5-10% increase in speed, simply by changing my matrix3x3 multiply from C++ to using the VFP unit.


#define MATRIX3x1ANDVECTOR3x1 "s8", "s9", "s10", "s11", "s12", "s13", "s14", "s15", "s16", "s17", "s18", "s19"
#define VECTOR3SCALARx1 "s0", "s1", "s2"
#define SETVECTORWIDTH3 "fmrx r0, fpscr \n\t" \
"bic r0, r0, #0x00370000 \n\t" \
"orr r0, r0, #0x00020000 \n\t" \
"fmxr fpscr, r0 \n\t"
#define SETVECTORWIDTH1 "fmrx r0, fpscr \n\t" \
"bic r0, r0, #0x00370000 \n\t" \
"orr r0, r0, #0x00020000 \n\t" \
"fmxr fpscr, r0 \n\t"

IMatrix3x3 operator*(IMatrix3x3 & _A, IMatrix3x3 & _B)
{
IMatrix3x3 C;

//C++ code for the simulator
#if TARGET_IPHONE_SIMULATOR == true
C.A0 = _A.A0 * _B.A0 + _A.A1 * _B.B0 + _A.A2 * _B.C0;
C.A1 = _A.A0 * _B.A1 + _A.A1 * _B.B1 + _A.A2 * _B.C1;
C.A2 = _A.A0 * _B.A2 + _A.A1 * _B.B2 + _A.A2 * _B.C2;

C.B0 = _A.B0 * _B.A0 + _A.B1 * _B.B0 + _A.B2 * _B.C0;
C.B1 = _A.B0 * _B.A1 + _A.B1 * _B.B1 + _A.B2 * _B.C1;
C.B2 = _A.B0 * _B.A2 + _A.B1 * _B.B2 + _A.B2 * _B.C2;

C.C0 = _A.C0 * _B.A0 + _A.C1 * _B.B0 + _A.C2 * _B.C0;
C.C1 = _A.C0 * _B.A1 + _A.C1 * _B.B1 + _A.C2 * _B.C1;
C.C2 = _A.C0 * _B.A2 + _A.C1 * _B.B2 + _A.C2 * _B.C2;

//VPU ARM asm for the device
#else
//create a pointer to the Matrices
IMatrix3x3 * pA = &_A;
IMatrix3x3 * pB = &_B;
IMatrix3x3 * pC = &C;

//asm code
asm volatile(
//turn on a vector depth of 3
SETVECTORWIDTH3

//load matrix B into the vector bank
"fldmias %1, {s8-s16} \n\t"

//load the first row of A into the scalar bank
"fldmias %0!, {s0-s2} \n\t"

//calulate C.A0, C.A1 and C.A2
"fmuls s17, s8, s0 \n\t"
"fmacs s17, s11, s1 \n\t"
"fmacs s17, s14, s2 \n\t"

//save this into the output
"fstmias %2!, {s17-s19} \n\t"

//load the second row of A into the scalar bank
"fldmias %0!, {s0-s2} \n\t"

//calulate C.B0, C.B1 and C.B2
"fmuls s17, s8, s0 \n\t"
"fmacs s17, s11, s1 \n\t"
"fmacs s17, s14, s2 \n\t"

//save this into the output
"fstmias %2!, {s17-s19} \n\t"

//load the third row of A into the scalar bank
"fldmias %0!, {s0-s2} \n\t"

//calulate C.C0, C.C1 and C.C2
"fmuls s17, s8, s0 \n\t"
"fmacs s17, s11, s1 \n\t"
"fmacs s17, s14, s2 \n\t"

//save this into the output
"fstmias %2!, {s17-s19} \n\t"

//set the vector depth back to 1
SETVECTORWIDTH1

//pass the inputs and set the clobber list
: : "r"(pA), "r"(pB), "r" (pC)
: "memory",VECTOR3SCALARx1, MATRIX3x1ANDVECTOR3x1
);
#endif

return C;
}


The main problem here appears to be the fact that each time the vector width is changed it stalls. My next step is to attempt to batch together multiple matrix multiplies to reduce this stall and see if i can come up with a more significant speed increase.


Edit: The above code proved to be wrong for two reasons. First of all, I’m not including the “r0″ register in the clobber list which is being used to change the vector width. This was causing some memory problems, however there was a bigger issue with how I was calculating the matrix multiply. When writing this code, i was not aware that the registers were broken into 4 circular banks: s0-7, s8-15, s16-23 and s24-31. This means that if you were to use a vector width of 3 and use, for example, s14 as the first register( “fmacs s17, s14, s0″) it will wrap round such that the vector would be: (s14, s15, s8) NOT (s14,s15,s16) as i had thought. In addition to fixing this, I have optimised the code by calculating the multiplies using non dependant registers such that they can be performed at the same time.


Heres the updated, working code:


#define MATRIXMULTIPLYREGISTERS "s0", "s1", "s2", "s8", "s9", "s10","s11", "s12", "s13", "s16", "s17", "s18", "s19", "s20", "s21", "s24", "s25", "s26", "s27", "s28", "s29"
#define ALL "s0", "s1", "s2", "s3","s4", "s5", "s6", "s7", "s8", "s9", "s10","s11", "s12", "s13", "s14", "s15", "s16", "s17", "s18", "s19", "s20", "s21", "s22", "s23", "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31"
#define SETVECTORWIDTH3 "fmrx r0, fpscr \n\t" \
"bic r0, r0, #0x00370000 \n\t" \
"orr r0, r0, #0x00020000 \n\t" \
"fmxr fpscr, r0 \n\t"
#define SETVECTORWIDTH1 "fmrx r0, fpscr \n\t" \
"bic r0, r0, #0x00370000 \n\t" \
"fmxr fpscr, r0 \n\t"

IMatrix3x3 operator*(IMatrix3x3 & _A, IMatrix3x3 & _B)
{
IMatrix3x3 C;

//C++ code for the simulator
#if TARGET_IPHONE_SIMULATOR == true

C.A0 = _A.A0 * _B.A0 + _A.A1 * _B.B0 + _A.A2 * _B.C0;
C.A1 = _A.A0 * _B.A1 + _A.A1 * _B.B1 + _A.A2 * _B.C1;
C.A2 = _A.A0 * _B.A2 + _A.A1 * _B.B2 + _A.A2 * _B.C2;

C.B0 = _A.B0 * _B.A0 + _A.B1 * _B.B0 + _A.B2 * _B.C0;
C.B1 = _A.B0 * _B.A1 + _A.B1 * _B.B1 + _A.B2 * _B.C1;
C.B2 = _A.B0 * _B.A2 + _A.B1 * _B.B2 + _A.B2 * _B.C2;

C.C0 = _A.C0 * _B.A0 + _A.C1 * _B.B0 + _A.C2 * _B.C0;
C.C1 = _A.C0 * _B.A1 + _A.C1 * _B.B1 + _A.C2 * _B.C1;
C.C2 = _A.C0 * _B.A2 + _A.C1 * _B.B2 + _A.C2 * _B.C2;

//VPU ARM asm for the device
#else

//create a pointer to the Matrices
IMatrix3x3 * pA = &_A;
IMatrix3x3 * pB = &_B;
IMatrix3x3 * pC = &C;

//asm code
asm volatile(
//turn on a vector depth of 3
SETVECTORWIDTH3

//load matrix B into the vector bank
"fldmias %1!, {s8-s13} \n\t"
"fldmias %1!, {s16-s18} \n\t"

//load the first row of A into the scalar bank
"fldmias %0!, {s0-s2} \n\t"

//calulate C.A0, C.A1 and C.A2
"fmuls s19, s8, s0 \n\t"
"fmuls s24, s11, s1 \n\t"
"fmuls s27, s16, s2 \n\t"
"fadds s19, s19, s24 \n\t"
"fadds s19, s19, s27 \n\t"

//save this into the output
"fstmias %2!, {s19-s21} \n\t"

//load the second row of A into the scalar bank
"fldmias %0!, {s0-s2} \n\t"

//calulate C.B0, C.B1 and C.B2
"fmuls s19, s8, s0 \n\t"
"fmuls s24, s11, s1 \n\t"
"fmuls s27, s16, s2 \n\t"
"fadds s19, s19, s24 \n\t"
"fadds s19, s19, s27 \n\t"

//save this into the output
"fstmias %2!, {s19-s21} \n\t"

//load the third row of A into the scalar bank
"fldmias %0!, {s0-s2} \n\t"

//calulate C.C0, C.C1 and C.C2
"fmuls s19, s8, s0 \n\t"
"fmuls s24, s11, s1 \n\t"
"fmuls s27, s16, s2 \n\t"
"fadds s19, s19, s24 \n\t"
"fadds s19, s19, s27 \n\t"

//save this into the output
"fstmias %2!, {s19-s21} \n\t"

//set the vector depth back to 1
SETVECTORWIDTH1

//pass the inputs and set the clobber list
: "=r"(pA), "=r" (pB), "=r" (pC) : "0" (pA), "1"(pB), "2"(pC)
:"r0", "memory", MATRIXMULTIPLYREGISTERS
);

#endif

return C;
}

« old Postsogtzuq
Latest Project