This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

如何比较程序利用OpenMP优化前后的性能?



对一个计算pi的程序进行了OpenMP优化:

#include <ti/omp/omp.h>

#include <string.h>
#include <assert.h>
#include <stdio.h>
#include <time.h>

#define NTHREADS  4

#define NUM_STEPS 100000
double step, pi;

void main()
{
	omp_set_num_threads(NTHREADS);
	int i;
	double x, sum = 0.0;
	step = 1.0/(double)NUM_STEPS;
#pragma omp parallel for reduction(+:sum) private(x)
	for(i = 0; i < NUM_STEPS; i++){
		x = (i + 0.5)*step;
		sum = sum + 4.0/(1.0 + x*x);
	}
	pi = step * sum;
	printf("Pi = %f\n", pi);
}

利用Profiler工具查看了优化前后的时钟周期数,发现在4核的情况下main函数的时钟数相差甚大:

优化前(即单核)为32598913,而优化后(4核)查看core0的main函数时钟数为3179633,相差近十倍。

明显4核的情况下不会达到这样的优化效果,但我又不知道该去如何比较...求教该如何比较真实的优化效果(包括时间上的和空间上的)?