Difference between revisions of "CPU/FPU Performance Benchmarks"
(→With Optimizations) |
m (→With Optimizations) |
||
Line 120: | Line 120: | ||
* -falign-functions | * -falign-functions | ||
* -falign-jumps | * -falign-jumps | ||
+ | |||
+ | The [http://gcc.gnu.org/onlinedocs/gcc-3.4.5/gcc/Optimize-Options.html GCC Compiler Optimizations] list describes multiple other optimizations that may be applied to get better performance out of CoreMark, which needs to be looked into. Many other [http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html ARM Optimizations] exist and can possibly be applied. | ||
The results obtained were much better than before as expected i.e. a high CoreMark score of '''5372.011818 '''. | The results obtained were much better than before as expected i.e. a high CoreMark score of '''5372.011818 '''. |
Revision as of 13:58, 14 September 2011
BogoMIPS
With the ALARM port, we can see that the reported BogoMIPS makes more sense for both cores. Initially, Bob had reported that one of the cores was showing aberrant values.
[root@alarm ~]# cat /proc/cpuinfo cat /proc/cpuinfo Processor : ARMv7 Processor rev 0 (v7l) processor : 0 BogoMIPS : 1987.37 processor : 1 BogoMIPS : 1993.93 Features : swp half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16 tls CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xc09 CPU revision : 0 Hardware : trimslice Revision : 0000 Serial : 0000000000000000
CoreMark
Good News! The CoreMark 4 threads shows a much better score, higher than the best reported score for Tegra 250 i.e 5148.01 here CoreMark Scores.
Here are 2 separate runs without any specific compiler optimizations:
First run gave us 4885.595636 iterations/sec:
[root@alarm coremark_v1.0]# cat run1.log cat run1.log 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 24562 Total time (secs): 24.562000 Iterations/Sec : 4885.595636 Iterations : 120000 Compiler version : GCC4.6.1 Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt Parallel PThreads : 4 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [1]crclist : 0xe714 [2]crclist : 0xe714 [3]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [1]crcmatrix : 0x1fd7 [2]crcmatrix : 0x1fd7 [3]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [1]crcstate : 0x8e3a [2]crcstate : 0x8e3a [3]crcstate : 0x8e3a [0]crcfinal : 0x5275 [1]crcfinal : 0x5275 [2]crcfinal : 0x5275 [3]crcfinal : 0x5275 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 4885.595636 / GCC4.6.1 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap / 4:PThreads
Second run gave us 4888.979426 iterations/sec:
[root@alarm coremark_v1.0]# cat run1.log cat run1.log 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 24545 Total time (secs): 24.545000 Iterations/Sec : 4888.979426 Iterations : 120000 Compiler version : GCC4.6.1 Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt Parallel PThreads : 4 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [1]crclist : 0xe714 [2]crclist : 0xe714 [3]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [1]crcmatrix : 0x1fd7 [2]crcmatrix : 0x1fd7 [3]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [1]crcstate : 0x8e3a [2]crcstate : 0x8e3a [3]crcstate : 0x8e3a [0]crcfinal : 0x5275 [1]crcfinal : 0x5275 [2]crcfinal : 0x5275 [3]crcfinal : 0x5275 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 4888.979426 / GCC4.6.1 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap / 4:PThreads
With Optimizations
The Tegra 250 SoC is equipped with a Cortex A9 processor and therefore an ARM VFPv3-D16 FPU. The hardware specific compiler optimizations I added were :
- -mcpu=cortex-a9
- -mfpu=vfpv3-d16
- -mfloat-abi=hard
I also added some more loop optimization flags:
- -floop-optimize
- -falign-loops
- -falign-labels
- -falign-functions
- -falign-jumps
The GCC Compiler Optimizations list describes multiple other optimizations that may be applied to get better performance out of CoreMark, which needs to be looked into. Many other ARM Optimizations exist and can possibly be applied.
The results obtained were much better than before as expected i.e. a high CoreMark score of 5372.011818 .
First run:
[root@alarm coremark_v1.0]# cat run1.log cat run1.log 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 29787 Total time (secs): 29.787000 Iterations/Sec : 5371.470776 Iterations : 160000 Compiler version : GCC4.6.1 Compiler flags : -O3 -floop-optimize -falign-loops -falign-labels -falign-functions -falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard -DPERFORMANCE_RUN=1 -lrt Parallel PThreads : 4 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [1]crclist : 0xe714 [2]crclist : 0xe714 [3]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [1]crcmatrix : 0x1fd7 [2]crcmatrix : 0x1fd7 [3]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [1]crcstate : 0x8e3a [2]crcstate : 0x8e3a [3]crcstate : 0x8e3a [0]crcfinal : 0x25b5 [1]crcfinal : 0x25b5 [2]crcfinal : 0x25b5 [3]crcfinal : 0x25b5 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 5371.470776 / GCC4.6.1 -O3 -floop-optimize -falign-loops -falign-labels -falign-functions -falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard -DPERFORMANCE_RUN=1 -lrt / Heap / 4:PThreads
Second run:
[root@alarm coremark_v1.0]# cat run1.log cat run1.log 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 29784 Total time (secs): 29.784000 Iterations/Sec : 5372.011818 Iterations : 160000 Compiler version : GCC4.6.1 Compiler flags : -O3 -floop-optimize -falign-loops -falign-labels -falign-functions -falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard -DPERFORMANCE_RUN=1 -lrt Parallel PThreads : 4 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [1]crclist : 0xe714 [2]crclist : 0xe714 [3]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [1]crcmatrix : 0x1fd7 [2]crcmatrix : 0x1fd7 [3]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [1]crcstate : 0x8e3a [2]crcstate : 0x8e3a [3]crcstate : 0x8e3a [0]crcfinal : 0x25b5 [1]crcfinal : 0x25b5 [2]crcfinal : 0x25b5 [3]crcfinal : 0x25b5 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 5372.011818 / GCC4.6.1 -O3 -floop-optimize -falign-loops -falign-labels -falign-functions -falign-jumps -mcpu=cortex-a9 -mfpu=vfpv3-d16 -mfloat-abi=hard -DPERFORMANCE_RUN=1 -lrt / Heap / 4:PThreads