Taking a look at Anandtech iPad4 Battery Life Benchmarks running GLBenchmark 2.5, GFX Bench results, and Wikipedia's Apple SOC Table. Note Anandtech has the fps capped to 30 (both iPhone5 and iPad4 can do over 40 fps).
device ... fps ... hours .. battery Wh .. Wh/hours
---------- ------- -------- ------------- --------
iPhone5 .. 30.0 .. 3.15 ... 5.45 ........ 1.73
iPad4 .... 30.0 .. 5.93 ... 43 .......... 7.25
iPad3 .... 22.4 .. 5.78 ... 42.5 ........ 7.35
The Wh/hours column gives some idea of the difference of human thermally limited sustained power draw on a phone vs a tablet. Another interesting metric is to look at poorly estimated W divided by screen resolution,
device .. in2 ... Wh/h .. Wh/h/in2
--------- ------- ------- --------
iPhone .. 56.7 .. 1.73 .. 0.26
iPad4 ... 45 .... 7.25 .. 0.16
Devices held by humans have this very real thermal barrier. Anandtech's Nexus 10 CoreMark MT + Modern Combat 3 Load Graph shows measured numbers of the tablet attempting to maintain around 10W load splitting the power between the needs of the CPU and GPU dynamically. Around half the power is taken by everything excluding the CPU+GPU leaving around 5W for the CPU+GPU. As for the display's load, Display Mate is a great resource, for instance the HTC One's 468 ppi screen (phone) uses around 1W.
Minimizing CPU Load is Critical
Taking the tablet 5W CPU+GPU power limit example, GPU performance directly relates to CPU load. An application which uses 1W of CPU is going to have 4x the GPU performance as an application which uses 4W of CPU.
Hopefully it is obvious how important optimized ultra-slim driver stacks will be on mobile devices. Also why quad core mobile CPUs? IMO the reverse direction is better, optimize such that when given an ARM Big+Little configuration, that the game and driver runs on the Little core only (25% of the power and 40% the performance of the Big chip). The Cortex A53 (the Little 64-bit ARM) is going to be awesome. Those who are truely performance oriented (ie no object oriented tax) will have no problem with the in-order 8 stage pipeline given 32 registers. The A53 should have improvements over the A7's partial dual-issue (and on top of that ARMv8 ISA's ability to load a pair of registers with a single instruction). For those with the OOP tax, the Big A57 is out-of-order with tri-issue.
Chip-Architect.com posted an interesting die size comparison which visualizes just how small ARM chips are in comparion with x86,
28nm ARM A7 ................... 0.45 mm2
28nm ARM A15 .................. 1.62 mm2
28nm AMD Jaguar CPU Core ...... 3.1_ mm2
32nm Intel Atom Clovertrail ... 5.6_ mm2
22nm Intel Haswell ........... 14.5_ mm2
Mobile Kepler (Full Desktop GL on Mobile)
Working from just the numbers posted in the AnandTech Article. The demo shows 0.9W Kepler with the same GLES Benchmark perf as an iPad4 with a measured 2.6W GPU power rail (an estimated 2.8x perf/W improvement for Kepler). Scaling iPad4's specs by the 4.0W/0.9W ratio for the prior example tablet with very low CPU load yields something faster than an Xbox360 for ALU,
76.8 Gflop/s = iPad4
341. Gflop/s = iPad4 x4.4
Looks like Xbox360 level GPU performance in a tablet will be year 2014.
With pending desktop performance ARM devices, someone needs to do something about the lack of a successful open consumer desktop OS for ARM. Windows on ARM isn't going to cut it. Until something unexpected happens, I'm betting on a very long tail for Win7 and x86-64 instead. Note NVIDIA's Q2 Earnings show a 7.5% growth for non-mobile GPUs during a industry decline of 7.4% for desktop and 13.9% for notebook PC shipments. The way I see it, people upgrade to 64-bit Win7 then only need to upgrade the GPU periodically to keep in sync with new content. Desktop GPUs are still selling.
3D track data visualization
1 hour ago