Ideal Binary Blog 

You will find information on all aspects of our work here on our company blog.

Enter your email to hear about our products.

3D Bookshelf Approved In 3 Days

We're delighted to announce that 3D Bookshelf was approved in only 3 days and is now available for download on the App Store. You can find out more in the press release.

And if you've been thinking you might like to get hold of an Apple iPad, look no further. Visit 3DBookshelf.com to find out how you can win your very own iPad.

We would love your feedback! Please let us know what you think of the app.

Looking forward to your comments.

3D Bookshelf Launched on App Store

Dublin, Ireland - January 29, 2010 - Ideal Binary Limited, a privately held iPhone® software development company, is pleased to announce the launch of 3D Bookshelf, the world's first fully 3D eBook reader application for iPhone & iPod touch®.

~ 3D Bookshelf ~

Robin Hood Edition

Free Download!

3D Bookshelf - Robin Hood Edition uses the world's first fully 3D eBook engine.

Download it for free now!

Unlike most other eBook reader applications, 3D Bookshelf is focused on simulating a fully 3D vintage reading experience. "Using a highly specialized 3D engine, we can recreate an ultra realistic three dimensional book and allow the user to open it to any page in full 3D. The user can also manipulate the book in a special viewing mode, as you would with a real book," according to Kevin Doolan, co-owner of Ideal Binary.

The high performance 3D engine is designed to use as little energy as possible and shuts down almost completely when there is no movement on screen. This ensures efficient battery power usage.

The application features fifteen time-honored eBooks, such as The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle and The Picture of Dorian Gray by Oscar Wilde. Many significant updates to the application are planned, including the addition of many more books.

To celebrate the launch of 3D Bookshelf for iPhone & iPod touch, Ideal Binary is giving away an Apple® iPad. Visit www.3DBookshelf.com for more information.

www.idealbinary.com
www.3dbookshelf.com

Founded in 2008, Ideal Binary Limited (www.idealbinary.com) is a privately held iPhone software development company specializing in the production of sophisticated high-performance applications. Its development team has a long history in both desktop and mobile application development and porting, having worked with major brands from companies like Konami, Capcom & From Software among many others. Copyright (C) 2009-2010 Ideal Binary, Ltd., All Rights Reserved. Apple, the Apple logo, iPhone and iPod are registered trademarks of Apple Inc. in the U.S. and/or other countries.

Making an iPhone App - Code Freeze

We're coming towards the end of the final QA phase of 3D Bookshelf. We'll be submitting the app to the App Store within the next few days.

It's a huge relief to have brought the project this far and we're delighted with the layers of polish we've added in the last few weeks. Several of the nice touches we've added were stumbled upon by accident and we would never have thought to add them had it not been for the decision to relax our release schedule (within reason).

We're really looking forward to seeing how the app is received, and we would love to hear from you if you have any feedback.

For those interested, here are links to my recent posts on 3D Bookshelf.

I'll be publishing a follow-up post shortly on our approach to marketing 3D Bookshelf, including details of the promotional competition we'll be running.

Also, Kevin is working on some tasty OpenGL posts detailing the tech behind the 3D rendering engine we produced. These will definitely be worth a look if you're interested in learning about high-end, high-performance 3D engine development for iPhone. In the meantime, here are some of his earlier posts on iPhone OpenGL ES.

For more updates, please subscribe to my RSS feed or you can follow me on Twitter.

iPhone OpenGL ES Performance

EDIT: This article was originally published on my personal blog in March '09, and applies to Generation 1 and 2 hardware.  I'm republishing this article (and its predecessor on UtopiaGL) here to serve as a base for future posts on UtopiaGL.

Fine tuning 3D code can be a black art, particularly if you are new to it.  If you're not familiar with the hardware, it may not be immediately obvious why certain calls are expensive and others aren't. To make matters more complicated, each system has little quirks and gotchas that surface along the way while developing 3D code.  The iPhone is no exception.  Here are some optimizations that may save you some head scratching, particularly if you're new to OpenGL.

EDIT: I have updated some of the points below to reflect the iPhone benchmarking done by Patrick Hogan. Thanks again Patrick. For more details, check out the comments at the bottom.

Timers are Great, Threads are Better

~ 3D Bookshelf ~

Robin Hood Edition

Free Download!

3D Bookshelf - Robin Hood Edition uses the world's first fully 3D eBook engine.

Download it for free now!

The 'Hello World' iPhone OpenGL ES project sets up a simple timer based run loop, which is a very convenient way of executing your app.  You don't have to worry about synchronization issues, there's no overhead involved in context switches (your app is executing in a single thread) and debugging is trivial.  So why would you want to rework your code to operate in a separate thread?  The reason is simple:  With timers there will potentially be a gap of 'dead time' where your engine is doing nothing, waiting for the next timer event to fire.  This simply defeats the potential parallelism that you can achieve between your app and OpenGL.  The GPU has enough to cause it to stall when interacting with your app, without the additional delay of your app also waiting for a specific time in which to execute.  As soon as there are CPU cycles available to your app, you should take them immediately, get all frame processing out of the way and issue your calls to OpenGL.  From a performance point of view, ideally neither the CPU, nor the GPU should ever be stalled if it can be avoided.  My engine, UtopiaGL, runs around 15% faster using a thread based frame update instead of a timer based approach.

Avoid thread synchronization issues by engineering your code to simply not need synching.

EDIT: My current project has turned up some new details here. When I tested on generation 1 hardware, the threaded approach resulted in sporadically choppy update.  I was able to address it by introducing a short sleep per frame, but it ended up being the same as the timer method in terms of performance.  It looks like the OS background tasks get starved of CPU cycles and then all of a sudden decide to shut your app out for a relatively huge chunk of time while they look after themselves.  This feels very Brew 1.1!!  The timer method is far simpler, so for generation 1 hardware I opt for the timer approach.

Allow OpenGL to run in Parallel

The OpenGL ES pipeline executes in its own thread.  When you issue calls to the API, they are not executed immediately, and instead get placed into a command buffer which eventually gets flushed.  Calls to the API return immediately, allowing your app to proceed with whatever it's doing.  This means that in theory you should be able to execute for a considerable chunk of time in parallel with the graphics hardware.  This is as you would expect, but there are certain calls you can make that effectively break this parallelism, and with them, the frame rate of your app comes spinning down.  The usual suspects are glReadPixels, glTex(Sub)Image, glBuffer(Sub)Data etc.  If you need to use them per frame, be aware they have quite a performance hit. 

If your app runs within a single thread (i.e. the renderer) then avoid calls to glFlush and never, ever use glFinish.  glFinish is particularly bad.  It flushes the command buffer and blocks the calling thread while it does it.  glFlush does the same thing, less the block.  Both are used to synchronize with the GL driver, but you rarely need to do that.  An example of when you might need it is if you had two rendering threads, both of which were rendering into the same GL context.  If you do need synchronization, use glFlush.  Pretend glFinish doesn't exist - you should literally never call it on the iPhone.

Use Vertex Buffer Objects or Not!

Make good use of Vertex Buffer Objects for any static geometry.  In theory this allows you to store geometry (including index data) in fast video memory.  The iPhone uses shared memory i.e. the CPU and PVR hardware share the same memory, so in this case it simply means you save having to upload your geometry to GL every time you call glDrawElements etc.  The savings here can be very significant.  You would be amazed how much geometry you can throw at the iPhone with VBOs before it breaks a sweat.

EDIT: After reading Patrick's comments below, I went back and looked at my VBO code. Unfortunately the gl driver doesn't take advantage of VBOs and uses a full copy operation in both the VBO and non-VBO case. Up until now I've been working off observations with my own shader pipeline. I see an increase in speed using VBOs but from examining it more closely, the speed increase doesn't come from the fact that the vertices are in a VBO - it comes from the fact that my non-VBO path does more work on the CPU side. This surprised me because I saw a drop of around 8fps with a stress-test scene in the non-VBO case - quite a bit more than I expected from the extra CPU work alone - I expected it would be in line with VBOs being effective, and that things would run slower without them. My shader pipeline works like this. Geometry is queued for rendering, and when the scene is complete it tries to minimize the number of calls to glDrawElements by rendering from shader buckets. Effectively each shader has a geometry bucket and any geometry in the same reference frame is packed together and issued to gl in a single call to glDrawElements. Most of the geometry (anything not dynamically generated per frame) lives in its own VBO. In this case I don't add it to the shader bucket; I just render it on its own with a call to glDrawElements. Since VBOs don't seem to offer a speed increase, when I turn them off and fall back to my packing scheme, it should be fairly similar in speed - but it's not. It's doing the same amount of work as the VBO case plus the extra work I'm doing to pack everything. There seems to be no advantage in minimizing calls to glDrawElements - at least not with the scenes I've been testing. It seems it's better to just issue multiple glDrawElements calls. I didn't look at it too closely originally because my typical scenes were running at 60fps - I saw what I was expecting to see: that VBOs were faster - in my case they were, but not for the reason I assumed.

The tests I did above were not exhaustive and are very particular to my engine and current app, but they agree with Patricks findings below. When I remove the code to minimize glDrawElements calls, and issue the same geometry with and without VBOs I see no real difference in speed. Vertices have xyz, tc and color attributes - lighting is precached.

Be Cache Aware

Use indexed striped triangles for geometry, and render through the glDrawElements call.  Sort tris to maximize vertex cache usage, then sort vertices to be in sorted-tri-order.  Don't worry too much about having the tris in strip order - make sure cache usage is your sorting metric.  Where it makes sense, interleave vertex attributes, for example all static geometry should have interleaved attributes.

EDIT: I haven't confirmed it in my engine yet but it seems strips do outperform lists by a small amount.

Read the Apple and PVR performance guidelines

In short: RTFM.  There's a lot of good advice in those docs.  Surprisingly, when I last checked, Vertex Buffer Objects weren't mentioned in the Apple guidelines - an odd thing to omit.

EDIT: In light of the VBO implementation, this isn't an odd thing to omit at all. Also, the Apple recommendation to use lists instead of strips doesn't seem to hold water, although it is fairly close. See the comments below from Patrick. The best advice is: Read the guidelines, but profile your code.

Use Instruments

Instruments is an excellent app that comes with Xcode and allows you to profile your app running on-device.  You can examine all kinds of details about how your app executes, and zero in on bottle necks.  With respect to OpenGL, you can use it to see which calls are being issued a lot, and/or take the most time.  This is an invaluable tool, and should be your guide for any optimizations.

Conclusion

Every app is different, and even the same engine could perform wildly differently with different content, so when you hit a bottleneck you really need to profile to find out where your app+content is spending most of it's time, and focus your optimizations there.  That said, many of the above points apply globally, and your app will almost certainly avoid performance pitfalls if you are careful to adhere to them.  The iPhone has some serious horse power under its hood. Initially I was skeptical about just how powerful it was, and after fine tuning my engine I'm extremely pleased with it.  Compared to other mobile 3D devices, the iPhone and iPod Touch are in a class of their own.

iPhone 3D Engine Development

EDIT: This article was originally published on my personal blog in January '09.  Since then, UtopiaGL has been further developed and used in most of the work we've done here at Ideal Binary.  I'm republishing this article (and its follow-up on performance) here to serve as a base for future posts on UtopiaGL.

Last month I completed my middleware 3D engine, UtopiaGL for iPhone.  I'd never actually used a Mac before this project and I wasn't sure what kind of effort would be required to get up to speed on OSX and the Mac development process (short answer - about 2 hours of effort!).  Here are some notes on my experience.

The Engine: UtopiaGL

It's a C++ OpenGL ES, Shader-based engine and tool chain similar in feature-set to PC Games from around 5 years ago.  All OpenGL ES features supported by the iPhone are exposed through the Shader system, so you can do anything the PVR hardware can do.  The pipeline is optimized according to the PVR recommendations, as well as my own testing.  The helper tools include a Shader Compiler, Font Compiler, Model Compiler, 3DS Exporter and State Machine Generator.

Development

~ 3D Bookshelf ~

Robin Hood Edition

Free Download!

3D Bookshelf - Robin Hood Edition uses the world's first fully 3D eBook engine.

Download it for free now!

The core engine and tools took 8 weeks to write.  I wrote it in C++ on Vista using an OpenGL ES emulation layer.  I'm very familiar with Visual Studio - Xcode is cool, but at the time I just wasn't comfortable with it so I battled on in Windows.  I wrote the engine to be portable across OS platforms so it has a System layer to abstract away the Platform specifics.

Once I had the engine ready and running well on Windows I moved it over to Xcode and rolled up my sleeves, expecting quite a bit of work.  Amazingly, it only took a few hours to get working, and most of that was learning the ins and outs of Xcode.  When I ran it on a device using a test scene it initially ran at 30 fps.  I was pretty disappointed at that speed given the optimizations that were being used.  I took a quick look over the code and saw that the Framebuffer was being created as a 32bit RGB buffer - I had just taken the setup code from the OpenGL ES Hello World app to get up and running.  I set it to RGB565 and sure enough it shot up to 60 fps.  It does drop down to 40/45 fps when you turn on Framebuffer effects (like Bloom glow).

EDIT: After a few weeks of further optimizations and testing I've managed to squeeze quite a bit more GL performance out of the device (to my sheer astonishment).  For example Framebuffer effects now run at a constant 60 fps, details coming soon...

Xcode

I wasn't sure what to make of Xcode initially.  After getting used to it I'm quite happy with it as a development environment.  There are a few things that feel a bit minimalist about it - it doesn't feel as advanced as Visual Studio, but at the same time, you do feel like you're getting more signal and less noise, considering what's there.  (Actually Mac OSX in general feels quite minimalist, in a good way.) One thing I find annoying is there is no clean, simple way of including one iPhone project in another (this has been the single biggest influence on my view of Xcode).  VS does this extremely well with 'Solutions' which wrap Projects.  Obviously when you're writing a middleware engine this is something of a requirement.  Right now I'm including all the engine source directly in the Application projects in Xcode, unlike in Visual Studio, where they have a neat Application Solution that simply includes the engine library project.  You can get Xcode to play ball, and a few people have blogged about it, but I'm not happy with the way it works - it should be a trivial thing and it certainly is not - it involves many, many steps, when it should require only one: inclusion of a reference to the library project.

EDIT: When I wrote the above, there was no Static Library project template in Xcode, and I didn't know enough about the process on Mac to set it up quickly.  The Static Library project template does exist now (and has done for a while) and works extremely well, allowing exactly the same library project dependency set-up I was used to in Visual Studio.  Thanks to Simon for pointing me towards this in the comments to this post.

Objective C

If you haven't written Objective C before, it can look a bit weird at first if you're coming from a more traditional C derived language background.  It is growing on me, and I really do like the fact that the Apple Framework APIs look super clean.  I just have one problem with it - the way parameters are interleaved with Message names.  The idea is sound, but I don't like the way it's done in Objective C.  Many languages have done it well and they do it by allowing you to name parameters as you pass them, like MessageName( FirstParam=47, SecondParam="Hello" ).  The reason I'm not keen on the Objective C way of doing it is I find I need to stare at the code to extract the meaning - it takes effort.  This is something I'll probably get used to, but right now it's a bit annoying.  I think it's that there are no brackets delineating the parameter list - my eyes are scanning for it and can't find it, making it a bit jarring.  If I had been proficient with Objective C and moved to C++ I might find C++ equally jarring.  I've learned many new languages over the years, but I've never found any so odd - not even Lisp or Prolog.  Maybe it's just me.

In any case, I was able to avoid Objective C almost entirely.  The only Objective C is in the Platform code to set things up, drive the message pump, and provide implementations of system functions like getting the App path.  Maybe 200 lines of trivial code.  Everything else is portable C++.

Memory Management

Garbage Collection has been disabled on the iPhone (a good thing - full blown Garbage Collection is expensive, and arguably a nonsense feature to have on a mobile, power constrained device). I wrote 3 memory management systems for UtopiaGL, all very simple, and very specialized.  The first is for the core engine back-end which is very light, extremely fast and results in zero fragmentation.  It's not Garbage Collected but it does allow you to nail memory leaks immediately.  You can also use it in the front-end Application code for making gross allocations, like loading large App-specific data.  The second one is exclusively for the Geometry processing engine and is super simple, and extremely fast - it's an allocate-and-forget system designed for permanent allocations.  It gets zapped entirely at the end of each scene render.  Lastly, there's the client memory manager (the client being an Application written using the engine).  This is designed exclusively for the front-end client code to use and is Garbage Collected (reference counted, but detects cyclic refs).  It's also extremely fast (nowhere near as expensive as power-hungry Mark and Sweep GC for example), and is actually a relatively tiny piece of code.  When objects are no longer referenced they are deleted immediately unlike traditional GC.  Objects allocated with this system are only visible through reference pointer objects and reference pointer array objects (like in Java or C#).  You can create arrays (even multi-dimensional arrays) of object references that behave exactly like the Java or C# equivalents, complete with a .length member per dimension.  You never have to explicitly delete anything, space permitting. All allocations go through the 'placement' new operator, so it is very comfortably integrated with C++.

Touch UI and the Accelerometer

These are the most interesting new features you get to play with.  The Accelerometer is impressive - it's extremely sensitive, way more than I was expecting.  It provides a 3 vector with force in each spatial dimension - simple and to the point.  You can control the rate at which the accelerometer feeds samples to your app, but for most interactive apps you'll want that to be as close to 60Hz as possible.  In order to extract meaningful information from the data coming from the accelerometer you will eventually need to apply some kind of filtering.  Some knowledge of Digital Signal Processing is very useful here, but not required (a High or Low Pass filter can be written in a few lines of code and is very straight forward to understand).  The other input method is of course the Touch interface.  Your app gets a handful of messages informing it of touch events (when they start, move, end etc).  Each individual touch is tracked, so if you can imagine pressing your finger on the screen, that creates a 'Touch' object, if you move your finger that particular Touch object gets updated with new position information and a 'Phase' field that reflects the current phase of the touch event: Began, Moved, Ended etc.  When you finally lift your finger, the Touch object gets it's Phase set to Ended and after you have been informed about it, the Touch object gets recycled by the system.  One Touch object exists for each point of contact on screen and is updated as its point of contact changes.  Tap events are modeled as touches also, with the tapCount field indicating the number of taps that have occurred.

I have abstracted Touch events essentially identically in UtopiaGL, except each Touch can be identified with an ID. UtopiaGL Touches can track several seconds of movement (you can configure this), unlike the raw events which just give you a current and previous position.  This eases the burden on Gesture Recognition somewhat.

UtopiaGL has it's own event system, so in the platform specific code, native events are translated and fed to UtopiaGL's system object in a format it recognises.  From there, they are distributed to the rest of the engine in an entirely platform independent way.

OpenGL ES on iPhone and iPod Touch

The features exposed by the PVR hardware are excellent.  There was one disappointing omission and that was the Vertex Program extension.  The PVR hardware supports Dot3 blending which means you can do Bump/Normal mapping.  Unfortunately, without the Vertex Program extension you are forced to enlist the CPU if you're doing Tangent Space bump mapping, with a matrix multiply per vertex.  If you're doing Object Space bump mapping you don't need to do that, but you are stuck with rigid models that can't deform without breaking their lighting.  Another reason why I was hoping the VP extension would be exposed is to supply my own lighting equation.  The standard OpenGL ES lighting system is expensive.  All you need in most cases is a very trivial ambient + diffuse lighting equation.  Without the VP extension you either bite the bullet and use the Standard OpenGL ES lighting model (which allows you to store your geometry in video ram) or write your own simplified lighting code and upload the vertex colors per lighting change.  If your models require CPU work, for example if you're applying some kind of CPU-based deformation to them then it may make sense to implement the lighting on the CPU and upload everything in one go.  If your geometry is entirely static, then it may make sense to just use the OpenGL ES lighting model.  It's not a clear cut situation. Right now I'm using the OpenGL ES lighting pipeline for all lighting, but I have left a stub for CPU based lighting - I'll be experimenting with this shortly.

EDIT: The shared memory model on the iPhone and the fact that the VBO extension offers no speed up really mean you're faced with a more level playing field: you just operate off system memory vertices.

The Framebuffer (FBO) extension is supported, and actually the primary way you render to screen.  The FBO extension opens up a wealth of possibilities (using render-to-texture) and I was delighted to see it on the device.

Performance is excellent, above what I was expecting from a non-dedicated games device. 

UtopiaGL Shaders

The Shader system is a pass-based renderer, which is configured by a compiled shader script.  The compiled shaders are tiny, usually around 150 bytes.  The reason I compile them offline is to remove the burden of parsing them on-device at run time.  They're not a million miles from Quake3 shaders, but they give lower-level control, for example you can fully control multi-texture (2 TMUs are expected) and the texture combiners.

I wrote a packing system for vertex attributes to minimize VGP and CPU cache misses.  It's very straight forward and works like this: Shaders expect certain attributes to be present in a vertex in order to execute, e.g. if a shader does multi-texture mapping and lighting then it needs XYZ, Normal, TC0 and TC1 attributes.  You have two options when representing these attributes in ram.  You can either have a Structure of Arrays (SOA) or you can have an Array of Structures (AOS).  SOA is conceptually easier.  Essentially you have an array of XYZs per vertex, an array of Normals per vertex and so on.  This has the advantage that if you are performing CPU-based deformation on any attribute array, you limit cache misses.  You pay for it when the GPU gets to work though, because the CPU cache is under-utilized as it pulls vertex information from very different locations in ram.  The alternative is to interleave the attributes ala AOS.  In this case you have and array of vertex structures, each of which has the XYZ, Normal, TC0 and TC1 attributes.  This means you under utilize the cache if you need to perform CPU-based deformation on any attribute, but it means the cache is well utilized when the GPU is pulling in vertices.  My solution to the problem was to not use either method, but a hybrid of both.  Any static, non-changing attributes get interleaved into a per-vertex structure.  This doesn't incur any cache abuse w.r.t the CPU because it never looks at them and the GPU maximizes cache hits as it pulls in the vertices.  Any volatile attributes that need to be processed by the CPU are arranged into arrays for quick processing and I take the cache miss on the GPU end which is essentially limited to those specific attributes.  You get the best of both worlds.

Before the geometry ever gets to the Shader system, I apply a reordering algorithm to both the triangles and then the vertices to ensure maximum VGP cache usage.  It's a fast process and is performed offline in a Model Compilation tool which is part of the engine tool-chain.

EDIT: There seems to be an advantage to using strips with a call to glDrawElements: see my more recent post on gl performance.

One thing I found odd about writing the shader code was the PVR compressed texture support.  PVR compressed textures appear flipped along the y axis - they load upside down!  The apparent reason for this is to maintain consistency with the render-to-texture support.  That doesn't make any sense to me!  Anyway, our lives are now slightly harder - that's how it is, so unfortunately you need to add an entirely redundant step into your content build process to manually flip textures before PVR-compressing them, or resort to other hacks as you load the textures. Bad smell.  Apart from that oddity, the compression is excellent and the quality is likewise impressive.

EDIT: Since writing this, the PVR utilities were updated to allow you to parametrically flip-on-compress, which pretty much removes the issue.

Conclusion

iPhone development has been fun so far.  I've done a lot of work in mobile game development, and the iPhone is easily the best thing I've ever experienced in a mobile device.  I'll be submitting my first applications built using UtopiaGL to Apple soon, with a little luck.  Time permitting, I will blog about specific aspects of the engine in detail and iPhone development in general.