public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* openMP gcc vs icc, erratic results with gcc
@ 2008-05-21  7:18 diego sandoval
  2008-05-21  8:47 ` Theodore Papadopoulo
  0 siblings, 1 reply; 4+ messages in thread
From: diego sandoval @ 2008-05-21  7:18 UTC (permalink / raw)
  To: gcc

 Hi everybody,
I just started working with openMP,  i installed first gcc-4.2.3 and
then gcc-4.3.0,  both of them having  support for openMP.
I tried a code to calculate the product \pi*\e.  When i compile  the
code with gcc (both 4.2.3 and 4.3.0) withtout -fopenmp the result is
correct. When i try with the -fopenmp option the result is erroneous.
I also tried with the intel compiler icc  (with -openmp) in order to
verify the code correctness . There was no problem. I dont know what
is wrong with gcc and this particular code but the results are
erratic. If anyone of you can help me ... thanks in advance.

Let me ellaborate on this problem.

I am using gcc-4.3.0  in slackware 12.0 vanilla,  i have a quad core smp machine

$ uname -a
Linux ra 2.6.24.3-smp #1 SMP Wed Feb 27 18:46:56 COT 2008 i686
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz GenuineIntel GNU/Linux

$ gcc-4.3.0 -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ./configure
--prefix=/home/medrano/compilers/gcc-4.3.0/ --enable-shared
--enable-languages=c,c++ --enable-threads=posix --enable-__cxa_atexit
--disable-checking --with-gnu-ld --verbose --program-suffix=-4.3.0
Thread model: posix
gcc version 4.3.0 (GCC)



I tried the  openMP code below from
http://www.kallipolis.com/openmp/taylor_mp.c   which is supposed to
calculate the product \pi*\e using the taylor series.

$ gcc-4.3.0 -O2 -fopenmp taylor_mp.c -o taylor.gcc.out
$ icc -O2 -openmp taylor_mp.c -o taylor.intel.out

The results are:

$ ./taylor.gcc.out
Reached result 5.142145 in 10640.000 seconds ### wrong result
$ ./taylor.gcc.out
Reached result 10.795894 in 10660.000 seconds ### wrong result

$ ./taylor.intel.out
Reached result 8.539734 in 9950.000 seconds ### right result
$ ./taylor.intel.out
Reached result 8.539734 in 9570.000 seconds  ### right result




/*
 * taylor.c
 *
 * This program calculates the value of e*pi by first calculating e
 * and pi by their taylor expansions and then multiplying them
 * together.
 */

#include <stdio.h>

#include <time.h>

#define num_steps 20000000

int main(int argc, char *argv[])
{
  double start, stop; /* times of beginning and end of procedure */
  double e, pi, factorial, product;
  int i;


  /* start the timer */
  start = clock();

  /* Now there is no first and seccond, we calculate e and pi */
#pragma omp parallel sections shared(e, pi)
  {
#pragma omp section
    {
      printf("e started\n");

      e = 1;
      factorial = 1; /* rather than recalculating the factorial from
			scratch each iteration we keep it in this varialbe
			and multiply it by i each iteration. */
      for (i = 1; i<num_steps; i++) {

	factorial *= i;
	e += 1.0/factorial;
      }
      printf("e done\n");
    } /* e section */

#pragma omp section
    {
      /* In this thread we calculate pi expansion */
      printf("pi started\n");


      pi = 0;
      for (i = 0; i < num_steps*10; i++) {
	/* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
	   therefore we count by fours (0, 4, 8, 12...) and take
             1/(0+1) =  1/1
	   - 1/(0+3) = -1/3

             1/(4+1) =  1/5
	   - 1/(4+3) = -1/7 and so on */
	pi += 1.0/(i*4.0 + 1.0);
	pi -= 1.0/(i*4.0 + 3.0);
      }
      pi = pi * 4.0;
      printf("pi done\n");
    } /* pi section */


  } /* omp sections */
  /* at this point the threads should rejoin */

  product = e * pi;

  stop = clock();

  printf("Reached result %f in %.3f seconds\n", product, (stop-start)/1000);


  return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: openMP gcc vs icc, erratic results with gcc
  2008-05-21  7:18 openMP gcc vs icc, erratic results with gcc diego sandoval
@ 2008-05-21  8:47 ` Theodore Papadopoulo
  0 siblings, 0 replies; 4+ messages in thread
From: Theodore Papadopoulo @ 2008-05-21  8:47 UTC (permalink / raw)
  To: diego sandoval; +Cc: gcc

diego sandoval wrote:
>  Hi everybody,
> I just started working with openMP,  i installed first gcc-4.2.3 and
> then gcc-4.3.0,  both of them having  support for openMP.
> I tried a code to calculate the product \pi*\e.  When i compile  the
> code with gcc (both 4.2.3 and 4.3.0) withtout -fopenmp the result is
> correct. When i try with the -fopenmp option the result is erroneous.
> I also tried with the intel compiler icc  (with -openmp) in order to
> verify the code correctness . There was no problem. I dont know what
> is wrong with gcc and this particular code but the results are
> erratic. If anyone of you can help me ... thanks in advance.
>   
I do not know where the problem lies.... but the good news seems to be that
it is either corrected in the development version (gcc version 4.4.0 
20080414 (experimental)),
and/or it works fine on x86-64 (both 32 and 64 bit)... and I would not 
have expected
such a speedup in the 64bit version (the timings are in milliseconds not 
in seconds)!!

/usr/local/gcc/bin/gcc -fopenmp -O3 taylor.c
./a.out
Reached result 8.539734 in 5650.000 seconds

/usr/local/gcc/bin/gcc -fopenmp -O3 -m32 taylor.c
./a.out
Reached result 8.539734 in 10410.000 seconds

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: openMP gcc vs icc, erratic results with gcc
  2008-05-21 10:21 Ed Brambley
@ 2008-05-21 11:00 ` Jakub Jelinek
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Jelinek @ 2008-05-21 11:00 UTC (permalink / raw)
  To: Ed Brambley; +Cc: gcc, Diego Sandoval

On Wed, May 21, 2008 at 11:21:27AM +0100, Ed Brambley wrote:
> As I understand it (which is not necessarily correct), your code is slightly
> incorrect, since variable are by default shared between parallel sections.
> Therefore, the "int i" is shared between threads, and hence the erratic
> results if both loops execute at the same time.  To fix it, you could try
> changing the first #pragma to read
> 
> #pragma omp parallel sections private(i)
> 
> Or, alternatively, define i in the for loops as, for example
> 
> for (int i = 0; i < num_steps*10; i++) {

Yep, or just in the block inside of #pragma omp sections or #pragma omp section.
In fact, it would be good to define factorial there too, or add
private(factorial) - while this one is not necessary for correctness,
it still would improve readability and could be tiny bit faster.

> So why does this works with icc or gcc 4.4?  My guess would be it's because
> OpenMP doesn't guarantee that variables are in sync between threads unless it
> hits a flush directive (either explicit or implicit), and so it would seem
> with icc or gcc 4.4 the variable i is out of sync (probably because it's held
> in a register, which is probably a good idea).

When multiple threads modify the same shared library you are really in an
undefined behavior territory, where the results will depend on what kind
of loop optimizations is performed etc. - if each iteration updates the
shared variable then it is of course much more likely to see "unexpected"
results than if the var is just written at the end of loop (which is
possible, both because the loops don't call any function and i's address
isn't taken).

	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: openMP gcc vs icc, erratic results with gcc
@ 2008-05-21 10:21 Ed Brambley
  2008-05-21 11:00 ` Jakub Jelinek
  0 siblings, 1 reply; 4+ messages in thread
From: Ed Brambley @ 2008-05-21 10:21 UTC (permalink / raw)
  To: gcc; +Cc: Diego Sandoval

Dear Diego,

As I understand it (which is not necessarily correct), your code is slightly
incorrect, since variable are by default shared between parallel sections.
Therefore, the "int i" is shared between threads, and hence the erratic
results if both loops execute at the same time.  To fix it, you could try
changing the first #pragma to read

#pragma omp parallel sections private(i)

Or, alternatively, define i in the for loops as, for example

for (int i = 0; i < num_steps*10; i++) {
...


So why does this works with icc or gcc 4.4?  My guess would be it's because
OpenMP doesn't guarantee that variables are in sync between threads unless it
hits a flush directive (either explicit or implicit), and so it would seem
with icc or gcc 4.4 the variable i is out of sync (probably because it's held
in a register, which is probably a good idea).

Yours,
Ed

Ps: For those interested, a great tutorial on OpenMP is available at
http://www.llnl.gov/computing/tutorials/openMP/.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-05-21 11:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-21  7:18 openMP gcc vs icc, erratic results with gcc diego sandoval
2008-05-21  8:47 ` Theodore Papadopoulo
2008-05-21 10:21 Ed Brambley
2008-05-21 11:00 ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).