Difference between revisions of "CPU/FPU Performance Benchmarks"

From Bobs Projects
Jump to: navigation, search
(With Optimizations)
m (With Optimizations)
Line 120: Line 120:
 
* -falign-functions  
 
* -falign-functions  
 
* -falign-jumps
 
* -falign-jumps
 +
 +
The [http://gcc.gnu.org/onlinedocs/gcc-3.4.5/gcc/Optimize-Options.html GCC Compiler Optimizations] list describes multiple other optimizations that may be applied to get better performance out of CoreMark, which needs to be looked into. Many other [http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html ARM Optimizations] exist and can possibly be applied.
  
 
The results obtained were much better than before as expected i.e. a high CoreMark score of '''5372.011818 '''.
 
The results obtained were much better than before as expected i.e. a high CoreMark score of '''5372.011818 '''.

Revision as of 13:58, 14 September 2011

BogoMIPS

With the ALARM port, we can see that the reported BogoMIPS makes more sense for both cores. Initially, Bob had reported that one of the cores was showing aberrant values.


[root@alarm ~]# cat /proc/cpuinfo
cat /proc/cpuinfo
Processor	: ARMv7 Processor rev 0 (v7l)
processor	: 0
BogoMIPS	: 1987.37

processor	: 1
BogoMIPS	: 1993.93

Features	: swp half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16 tls 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x1
CPU part	: 0xc09
CPU revision	: 0

Hardware	: trimslice
Revision	: 0000
Serial		: 0000000000000000

CoreMark

Good News! The CoreMark 4 threads shows a much better score, higher than the best reported score for Tegra 250 i.e 5148.01 here CoreMark Scores.

Here are 2 separate runs without any specific compiler optimizations:

First run gave us 4885.595636 iterations/sec:


[root@alarm coremark_v1.0]# cat run1.log
cat run1.log
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 24562
Total time (secs): 24.562000
Iterations/Sec   : 4885.595636
Iterations       : 120000
Compiler version : GCC4.6.1
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt
Parallel PThreads : 4
Memory location  : Please put data memory location here
			(e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[1]crclist       : 0xe714
[2]crclist       : 0xe714
[3]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[1]crcmatrix     : 0x1fd7
[2]crcmatrix     : 0x1fd7
[3]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[1]crcstate      : 0x8e3a
[2]crcstate      : 0x8e3a
[3]crcstate      : 0x8e3a
[0]crcfinal      : 0x5275
[1]crcfinal      : 0x5275
[2]crcfinal      : 0x5275
[3]crcfinal      : 0x5275
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 4885.595636 / GCC4.6.1 -O2 -DPERFORMANCE_RUN=1  -lrt / Heap / 4:PThreads 

Second run gave us 4888.979426 iterations/sec:


[root@alarm coremark_v1.0]# cat run1.log
cat run1.log
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 24545
Total time (secs): 24.545000
Iterations/Sec   : 4888.979426
Iterations       : 120000
Compiler version : GCC4.6.1
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt
Parallel PThreads : 4
Memory location  : Please put data memory location here
			(e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[1]crclist       : 0xe714
[2]crclist       : 0xe714
[3]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[1]crcmatrix     : 0x1fd7
[2]crcmatrix     : 0x1fd7
[3]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[1]crcstate      : 0x8e3a
[2]crcstate      : 0x8e3a
[3]crcstate      : 0x8e3a
[0]crcfinal      : 0x5275
[1]crcfinal      : 0x5275
[2]crcfinal      : 0x5275
[3]crcfinal      : 0x5275
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 4888.979426 / GCC4.6.1 -O2 -DPERFORMANCE_RUN=1  -lrt / Heap / 4:PThreads

With Optimizations

The Tegra 250 SoC is equipped with a Cortex A9 processor and therefore an ARM VFPv3-D16 FPU. The hardware specific compiler optimizations I added were :

  • -mcpu=cortex-a9
  • -mfpu=vfpv3-d16
  • -mfloat-abi=hard

I also added some more loop optimization flags:

  • -floop-optimize
  • -falign-loops
  • -falign-labels
  • -falign-functions
  • -falign-jumps

The GCC Compiler Optimizations list describes multiple other optimizations that may be applied to get better performance out of CoreMark, which needs to be looked into. Many other ARM Optimizations exist and can possibly be applied.

The results obtained were much better than before as expected i.e. a high CoreMark score of 5372.011818 .

First run:

[root@alarm coremark_v1.0]# cat run1.log
cat run1.log
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 29787
Total time (secs): 29.787000
Iterations/Sec   : 5371.470776
Iterations       : 160000
Compiler version : GCC4.6.1
Compiler flags   : -O3 -floop-optimize -falign-loops -falign-labels -falign-functions 
-falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard -DPERFORMANCE_RUN=1  -lrt
Parallel PThreads : 4
Memory location  : Please put data memory location here
			(e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[1]crclist       : 0xe714
[2]crclist       : 0xe714
[3]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[1]crcmatrix     : 0x1fd7
[2]crcmatrix     : 0x1fd7
[3]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[1]crcstate      : 0x8e3a
[2]crcstate      : 0x8e3a
[3]crcstate      : 0x8e3a
[0]crcfinal      : 0x25b5
[1]crcfinal      : 0x25b5
[2]crcfinal      : 0x25b5
[3]crcfinal      : 0x25b5
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 5371.470776 / GCC4.6.1 -O3 -floop-optimize -falign-loops -falign-labels 
-falign-functions -falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard 
-DPERFORMANCE_RUN=1  -lrt / Heap / 4:PThreads

Second run:

[root@alarm coremark_v1.0]# cat run1.log
cat run1.log
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 29784
Total time (secs): 29.784000
Iterations/Sec   : 5372.011818
Iterations       : 160000
Compiler version : GCC4.6.1
Compiler flags   : -O3 -floop-optimize -falign-loops -falign-labels -falign-functions 
-falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard -DPERFORMANCE_RUN=1  -lrt
Parallel PThreads : 4
Memory location  : Please put data memory location here
			(e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[1]crclist       : 0xe714
[2]crclist       : 0xe714
[3]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[1]crcmatrix     : 0x1fd7
[2]crcmatrix     : 0x1fd7
[3]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[1]crcstate      : 0x8e3a
[2]crcstate      : 0x8e3a
[3]crcstate      : 0x8e3a
[0]crcfinal      : 0x25b5
[1]crcfinal      : 0x25b5
[2]crcfinal      : 0x25b5
[3]crcfinal      : 0x25b5
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 5372.011818 / GCC4.6.1 -O3 -floop-optimize -falign-loops -falign-labels 
-falign-functions -falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard 
-DPERFORMANCE_RUN=1  -lrt / Heap / 4:PThreads