public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libgomp/97213] New: OpenMP "if" is dramatically slower than code-level "if" - why?
@ 2020-09-26 13:50 ttsiodras at gmail dot com
  2020-09-26 14:12 ` [Bug libgomp/97213] " jakub at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: ttsiodras at gmail dot com @ 2020-09-26 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

            Bug ID: 97213
           Summary: OpenMP "if" is dramatically slower than code-level
                    "if" - why?
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ttsiodras at gmail dot com
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

In trying to understand how OpenMP `task` works, I did this benchmark:

    #include <omp.h>
    #include <stdio.h>

    long fib(int val)
    {
        if (val < 2)
            return val;

        long total = 0;
        {
            #pragma omp task shared(total) if(val==45)
            total += fib(val-1);
            #pragma omp task shared(total) if(val==45)
            total += fib(val-2);
            #pragma omp taskwait
        }
        return total;
    }

    int main()
    {
        #pragma omp parallel
        #pragma omp single
        {
            long res = fib(45);
            printf("fib(45)=%ld\n", res);
        }
    }

It's a simple Fibonacci calculation, that only spawns two tasks at the
top-level of fib(45) - basically, one thread does fib(44), the other does
fib(43); and the results are added and returned.

I know there's a chance for a race on the "+=" of the total - but that's not
the point of this... Here's the performance in my i5 laptop:

    $ gcc -O2 with_openmp_if.c -fopenmp

    $ time ./a.out 
    fib(45)=1134903170

    real    1m4.244s
    user    1m44.696s
    sys     0m0.010s

64 seconds... Now compare this, to the same code, but with the "if" moved from
OpenMP level, to user code level - i.e. this change in "fib":

    long fib(int val)
    {
        if (val < 2)
            return val;

        long total = 0;
        {
            if (val == 45) {
                #pragma omp task shared(total)
                total += fib(val-1);
                #pragma omp task shared(total)
                total += fib(val-2);
                #pragma omp taskwait
            } else
                return fib(val-1) + fib(val-2);
        }
        return total;
    }

    $ gcc -O2 with_normal_if.c -fopenmp

    $ time ./a.out 
    fib(45)=1134903170

    real    0m8.585s
    user    0m14.021s
    sys     0m0.011s

We go from 64 seconds down to 8.5 seconds.

Why? 

What does the OpenMP-level "if" do so differently, that it causes an order of
magnitude less performance?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?
  2020-09-26 13:50 [Bug libgomp/97213] New: OpenMP "if" is dramatically slower than code-level "if" - why? ttsiodras at gmail dot com
@ 2020-09-26 14:12 ` jakub at gcc dot gnu.org
  2020-09-26 14:58 ` ttsiodras at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-09-26 14:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Even with if(false) the implementation has to create a new data environment
etc.
if(false) just means the task will be included, i.e. the generating task will
only continue when the included task finishes and the generating thread will
execute the task.
You'd need to add mergeable clause also to let the implementation for if(false)
pretend there wasn't the task directive at all, but that is just an
optimization option that GCC doesn't use right now (would require basically
copying the region once again).
Also, there is the overhead of the taskwait that you perform unconditionally at
all levels.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?
  2020-09-26 13:50 [Bug libgomp/97213] New: OpenMP "if" is dramatically slower than code-level "if" - why? ttsiodras at gmail dot com
  2020-09-26 14:12 ` [Bug libgomp/97213] " jakub at gcc dot gnu.org
@ 2020-09-26 14:58 ` ttsiodras at gmail dot com
  2020-09-26 15:01 ` jakub at gcc dot gnu.org
  2020-09-26 15:07 ` ttsiodras at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: ttsiodras at gmail dot com @ 2020-09-26 14:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #2 from Thanassis Tsiodras <ttsiodras at gmail dot com> ---
I see. I was not aware of "mergeable", TBH - thanks for pointing it out (it led
me to reading about "data environments"). 

Thanks, Jakub.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?
  2020-09-26 13:50 [Bug libgomp/97213] New: OpenMP "if" is dramatically slower than code-level "if" - why? ttsiodras at gmail dot com
  2020-09-26 14:12 ` [Bug libgomp/97213] " jakub at gcc dot gnu.org
  2020-09-26 14:58 ` ttsiodras at gmail dot com
@ 2020-09-26 15:01 ` jakub at gcc dot gnu.org
  2020-09-26 15:07 ` ttsiodras at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-09-26 15:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, I think significant speedup is in tail recursion optimization which will
be prevented even with mergeable task.  Computing fibonacci this way is not
efficient.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?
  2020-09-26 13:50 [Bug libgomp/97213] New: OpenMP "if" is dramatically slower than code-level "if" - why? ttsiodras at gmail dot com
                   ` (2 preceding siblings ...)
  2020-09-26 15:01 ` jakub at gcc dot gnu.org
@ 2020-09-26 15:07 ` ttsiodras at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: ttsiodras at gmail dot com @ 2020-09-26 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

Thanassis Tsiodras <ttsiodras at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #4 from Thanassis Tsiodras <ttsiodras at gmail dot com> ---
Marking as resolved.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-26 15:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-26 13:50 [Bug libgomp/97213] New: OpenMP "if" is dramatically slower than code-level "if" - why? ttsiodras at gmail dot com
2020-09-26 14:12 ` [Bug libgomp/97213] " jakub at gcc dot gnu.org
2020-09-26 14:58 ` ttsiodras at gmail dot com
2020-09-26 15:01 ` jakub at gcc dot gnu.org
2020-09-26 15:07 ` ttsiodras at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).