Or so goes the argument. Still.
I wish Apple would just pull the plug and completely remove MRC support from LLVM. I’m getting tired, annoyed and sometimes angry when I browse stackoverflow.com and frequently find MRC code samples containing one or more blatant memory management issues.
Before I rant any further, this article is about testing the performance difference of ARC vs MRC code. I provide some examples, and the updated performance measurement project I’ve used before for cocos2d performance analysis, and the results of the full run at the bottom. I also split it into both synthetic low-level tests and closer to real-world algorithms to prove not one but two points:
ARC is generally faster, and ARC can indeed be slower – but that’s no reason to dismiss it altogether.
Measuring & Comparing Objective-C ARC vs MRC performance
Without further ado, here are the results of the low-level MRC vs ARC performance tests, obtained from an iPod touch 5th generation with compiler optimizations enabled (release build):
Times are in nanoseconds. A nanosecond (ns) is one billionth of a second (0.000 000 001 second). One Gigahertz (GHz) equals 1,000,000,000 Hz. One cycle of a 1 GHz CPU takes 1 nanosecond. A game that renders every frame in less than 16,700,000 nanoseconds runs at 60 frames per second. In other words, you can perform close to half a million Objective-C message sends every frame before your framerate drops below 60.
ARC vs MRC: Messaging Tests
Name | Each (ns) |
ARC: Message Send to Object | 49 |
MRC: Message Send to Object | 86 |
ARC: Assign (nonatomic, copy) property | 329 |
MRC: Assign (nonatomic, copy) property | 331 |
ARC vs MRC: Alloc/Init Tests
Name | Each (ns) |
ARC: Create Alloc/Init Object | 4.844 |
MRC: Create Alloc/Init Object | 4.881 |
ARC vs MRC: Autorelease Tests
Name | Each (ns) |
ARC: Create & Return Autorelease Object | 5.769 |
ARC: Create Autorelease Object | 5.840 |
MRC: Create & Return Autorelease Object | 5.843 |
MRC: Create Autorelease Object | 6.625 |
So the low-level tests are about equal and ARC usually has an advantage. Mainly due to faster autorelease pools and other low-level optimizations. This is why Apple says that ARC is generally faster than MRC.
Where ARC can actually be slower
That ARC is faster is not true in every case. Specifically where existing MRC code has already been fine-tuned to provide the best possible performance in regards to memory usage.
This is because in some situations ARC adds additional retain/release messages that are sort of optional under MRC. This happens for example when you send a message to a selector with one or more object (id) parameters. It also happens when you receive a temporary variable from a method – it gets retained and then released even though its lifetime is short and contained. For example:
// this is typical MRC code:
idobject=[array objectAtIndex:0];
[object doSomething];
[object doAnotherThing];
// this is what ARC does (and what is considered best practice under MRC):
idobject=[array objectAtIndex:0];
[object retain];// inserted by ARC
[object doSomething];
[object doAnotherThing];
[object release];// inserted by ARC
And again when sending messages to a selector which takes one or more objects as parameters:
// this is typical MRC code:
idobject=[array objectAtIndex:0];
[object doSomething];
[object doAnotherThing];
// this is what ARC does (and what is considered best practice under MRC):
idobject=[array objectAtIndex:0];
[object retain];// inserted by ARC
[object doSomething];
[object doAnotherThing];
[object release];// inserted by ARC
The reason why ARC adds these additional retain/release is because above all else ARC ensures correctness. And that code is correct under MRC as well. So why don’t we usually add these additional retain/release in MRC and it still works?
Because most of us aren’t writing multi-threaded applications, and even those who do will probably not understand why they’re treading dangerously close to the edge. In multithreaded applications said object could be released between the doSomething and doAnotherThing calls – thus crashing your app. Granted, it’s rare but when it rears its ugly head, you wish you wouldn’t have to debug it.
This explains why some algorithms run slower under ARC than the same code written in MRC. I took this Objective-C ARC genetic algorithm and converted it to MRC to make the comparison. You’ll see in the results that the ARC version is noticably slower than the MRC version mainly because of the additional retain/release:
ARC vs MRC: Algorithm Tests
Name | Each (ns) |
ARC: Unoptimized Contains String algorithm | 1.696.808 |
MRC: Unoptimized Contains String algorithm | 1.752.414 |
MRC: genetic algorithm | 1.134.536.017 |
ARC: genetic algorithm | 1.579.992.123 |
PS: don’t ask me why the “contains string” algorithm is faster under ARC. It just is.
Avoid using ARC for best performance, right?
No way!
Whoever thinks like that hasn’t understood what programming or ARC are all about. First and foremost, you want to write an app and release it without bugs, especially no crashes. ARC helps you tremendously with that.
While writing this very little MRC code for the project I had already introduced several leaks, one of which wasn’t even caught by the analyzer. It’s just too damn easy to write incorrect manual reference counting code. And it’s even more code to read and write, too. With ARC I have not had a single leak or other such issue in months! The static analyzer has changed from a tool that I used to run daily to one I run on a monthly basis only to confirm that there are few or no memory management issues.
For beginners MRC is even worse, because not understanding MRC they’ll randomly insert or remove release, retain and autorelease messages until it somehow works. Usually they just shift the problem around or change the nature of it: a crash becomes a leak, a leak becomes a dangling pointer. It makes me want to cry. It makes me want to not help anyone who is still using MRC code for no reason or simply because that’s the default in cocos2d.
That every day thousands of cute kittens have to die because developers are starting new cocos2d projects with MRC templates makes me really sad. I find solace only in knowing that Kobold2D and KoboldTouch help to preserve the imaginary kitten population.
Good programmers use ARC!
Spending as little time as possible on any extraneous tasks such as manual memory management and debugging issues because of it is a programmer’s primary concern. Maybe second only to writing readable and maintainable code. Performance is way at the bottom of the priority lists, even for game developers.
If you’re more concerned about performance than code correctness, you’re not doing a good job because you set the wrong priorities. ARC ensures that you get the most out of your time and your app’s runtime.
And then any performance difference you can measure between ARC and MRC code in synthetic tests is practically null and void in real world applications. You’re usually doing a lot more elsewhere, like rendering, and often you don’t even have much control over those parts (game engine, foundation, etc). And you always have room for optimization just by writing better code.
The genetic algorithm I picked is such an example. The code repeats a terribly wasteful memory management pattern regardless of whether you’re using ARC or MRC. Refactor or rewrite the essential parts of the code in C, or just optimized Objective-C, and you’ll likely see gains far greater than the difference between ARC and MRC in the test result above.
Finally, typically less than 5% of your code contributes to 95% of its runtime performance! It would be downright stupid not to use ARC for the remaining 95%, and hand-tune only the 5% where it’ll make an actual difference in responsiveness, speed or framerate.
Final words
Arguing against using ARC because it may be slower in some well-defined situations really makes me wonder one thing: if you’re that concerned about performance, why are you even using Objective-C to begin with?
Objective-C has been proven to be slower than C countless times already. Your argument is invalid!
You can certainly make significantly greater performance improvements by rewriting certain performance-critical code to C than you could ever hope to achieve by avoiding ARC and sticking to MRC. Fortunately only very few developers will ever have to consider doing so.
And why are we using Objective-C?
It’s because Objective-C is easier to use and safer than plain C! In that sense, using ARC is a logical next step for any Objective-C programmer because it makes your work even easier and your app more stable. And when you do find an actually noticable performance difference, compared to one that’s only measurable, you can always -fno-objc-arc the hell out of that code.
Complete Test Results (iPod touch 5G)
This is from a different run, so the values you’ve already seen earlier may differ. Comparing the results gives you an indication of how much of an error range there is in these values.
The results involving cocos2d are not directly comparable to the previous performance tests I did because the updated performance test project uses cocos2d v2.1 (rc0a), the previous one used cocos2d v1.0.1.
ARC vs MRC: Messaging Tests
Name | Each (ns) |
ARC: Message Send to Object | 61 |
MRC: Message Send to Object | 61 |
ARC: Assign (nonatomic, copy) property | 329 |
MRC: Assign (nonatomic, copy) property | 330 |
ARC vs MRC: Alloc/Init Tests
Name | Each (ns) |
ARC: Create Alloc/Init Object | 4.747 |
MRC: Create Alloc/Init Object | 4.811 |
ARC vs MRC: Autorelease Tests
Name | Each (ns) |
ARC: Create Autorelease Object | 5.503 |
ARC: Create & Return Autorelease Object | 5.608 |
MRC: Create & Return Autorelease Object | 5.612 |
MRC: Create Autorelease Object | 6.342 |
ARC vs MRC: Algorithm Tests
Name | Each (ns) |
ARC: Unoptimized Contains String algorithm | 1.691.563 |
MRC: Unoptimized Contains String algorithm | 1.750.430 |
MRC: genetic algorithm | 1.144.589.482 |
ARC: genetic algorithm | 1.559.268.544 |
Loading Textures
Time it takes to load and unload the same 1024×1024 texture using a variety of different image file formats, compression and color bit depths.
Name | Each (ns) |
PVRTC2.pvr.ccz | 300.497 |
PVRTC4.pvr.ccz | 306.025 |
PVRTC2.pvr | 3.752.015 |
PVRTC4.pvr | 7.255.280 |
PVRTC2.pvr.gz | 10.308.318 |
PVRTC4.pvr.gz | 19.612.099 |
RGBA8888.pvr | 57.411.790 |
RGBA8888.pvr.ccz | 85.919.379 |
RGBA8888.png | 98.239.494 |
RGBA8888.pvr.gz | 100.842.780 |
Node Hierarchy (children)
The performance of functions that act on the node hierarchy (children list) depends heavily on the number of children.
Name | Each (ns) |
reorderChild w/ 100 Nodes | 258 |
reorderChild w/ 2,500 Nodes | 258 |
reorderChild w/ 10 Nodes | 259 |
reorderChild w/ 500 Nodes | 261 |
getChildByTag w/ 10 Nodes | 1.188 |
getChildByTag w/ 100 Nodes | 5.998 |
addChild with tag | 6.758 |
removeChildByTag | 16.493 |
getChildByTag w/ 500 Nodes | 29.022 |
getChildByTag w/ 2,500 Nodes | 178.572 |
Array Tests
Testing Cocos2D’s CCArray performance against regular NSMutableArray.
Name | Each (ns) |
CCArray objectAtIndex: | 248 |
CCArray exchangeObjectAtIndex | 267 |
NSMutableArray objectAtIndex: | 285 |
CFArray GetValueAtIndex | 319 |
CCArray withCapacity addObject: | 385 |
CCArray addObject: | 385 |
NSMutableArray insertObject: atIndex:0 | 468 |
NSMutableArray withCapacity addObject: | 539 |
NSMutableArray addObject: | 540 |
CCArray removeLastObject | 734 |
NSMutableArray removeLastObject | 1.039 |
CCArray removeObjectAtIndex | 1.546 |
NSMutableArray removeObjectAtIndex | 1.776 |
NSMutableArray exchangeObjectAtIndex | 2.069 |
NSMutableArray insertObject: atIndex:random | 2.709 |
CCArray insertObject: atIndex:random | 4.479 |
CCArray insertObject: atIndex:0 | 8.430 |
NSMutableArray fast enumeration | 9.353 |
CCArray indexOfObject | 17.732 |
CCArray containsObject | 17.799 |
NSMutableArray indexOfObject | 46.535 |
NSMutableArray containsObject | 47.374 |
NSMutableArray makeObjectsPerformSelector | 125.084 |
NSMutableArray makeObjectsPerformSelector withObject | 137.865 |
CCArray makeObjectsPerformSelector | 138.679 |
CCArray makeObjectsPerformSelector withObject | 152.657 |
CCArray fast enumeration | 333.421 |
CCArray enumeration | 408.957 |
NSMutableArray enumeration | 450.680 |
NSMutableArray add/removeObjectsInArray | 1.223.791 |
CCArray add/removeObjectsInArray | 2.732.601 |
Object Creation
These tests tell you how long it takes to allocate memory, initialize the object, and deallocate it. The longer this takes for an object, the higher the chance that doing this during gameplay will negatively affect performance. Note that these tests do not give any indication whatsoever of the runtime/rendering performance of these objects.
Name | Each (ns) |
NSAutoreleasePool alloc/init/release | 225 |
NSObject alloc/init/release | 1.868 |
CCMoveTo alloc/init/release | 2.869 |
CCSequence alloc/initOne/release | 3.864 |
CCNode alloc/init/release | 4.663 |
CCSprite alloc/initWithFile/release | 21.914 |
CCParticleSystemQuad 25 particles alloc/init/release | 284.135 |
CCParticleSystemQuad 250 particles alloc/init/release | 675.596 |
CCLabelBMFont alloc/initWithString/release | 897.208 |
CCLabelTTF alloc/initWithString/release | 1.385.200 |
CCTMXTiledMap small alloc/init/release | 6.494.550 |
CCTMXTiledMap large alloc/init/release | 492.783.615 |
CCSprite GCD alloc/initWithFile/release | 12.424.037.083 |
Messaging / Function Calls
Low-level overhead for calling C++ functions respectively sending Objective-C messages in various ways.
Name | Each (ns) |
ObjC class @public variable | 3 |
C++ cached virtual method call | 11 |
C++ virtual method call | 11 |
IMP-cached message send | 11 |
Objective-C message send | 34 |
Objective-C performSelector | 115 |
ObjC class nonatomic property dot notation | 127 |
ObjC class atomic property dot notation | 141 |
ObjC class nonatomic property message send | 160 |
ObjC class atomic property message send | 164 |
NSInvocation message send | 2.101 |
Objective-C CCArray message send | 43.796 |
Objective-C NSArray message send | 43.829 |
Objective-C NSArray enumerateWithBlock msg send | 70.741 |
Objective-C NSArray enumerateWithBlock concurrent msg send | 100.265 |
Objective-C NSArray makeObjectsPerformSelector | 121.572 |
Objective-C CCArray makeObjectsPerformSelector | 138.597 |
Object Comparison
Compare objects with various methods, and testing if it makes any difference if the test fails (mismatch) or succeeds (match).
Name | Each (ns) |
NSObject hash | 35 |
NSObject is not Equal | 50 |
NSString isEqual | 50 |
NSString isEqualToString | 50 |
NSObject isEqual | 52 |
isMemberOfClass, class cached | 144 |
is not MemberOfClass, class cached | 144 |
NSString hash | 148 |
isMemberOfClass | 199 |
is not MemberOfClass | 200 |
isKindOfClass, class cached | 224 |
is not KindOfClass, class cached | 260 |
NSString is not EqualToString | 277 |
NSString is not Equal | 277 |
isKindOfClass | 278 |
is not KindOfClass | 338 |
Arithmetic Tests
Simple calculations done frequently in an app, using various data types.
Name | Each (ns) |
Double square root | -0 |
Float division with int conversion | -0 |
Float square root | -0 |
Double division with int conversion | -0 |
Integer multiplication | 0 |
Integer division | 0 |
Float multiplication | 5 |
Double multiplication | 9 |
Float division | 20 |
Accelerometer Highpass filter | 20 |
Double division | 33 |
Memory Tests
Allocating and releasing memory.
Name | Each (ns) |
16MB malloc/free | -0 |
1MB memcpy | 0 |
16 byte memcpy | 0 |
16 byte malloc/free | 0 |
File IO
Name | Each (ns) |
Read 16-byte file | 109.127 |
Read 16MB file | 110.008 |
Write 16-byte file | 202.430 |
Write 16MB file | 219.193 |
Write 16-byte file (atomic) | 359.137 |
Write 16MB file (atomic) | 362.201 |
Miscellaneous Tests
Name | Each (ns) |
Zero-second delayed perform | 7 |
pthread create/join | 158.470 |
Follow @gaminghorror | Follow @kobold2d |
|