[Bug fortran/100855] New: pow run time gfortran vs ifort

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug fortran/100855] New: pow run time gfortran vs ifort
@ 2021-06-01 13:01 nadavhalahmi560 at gmail dot com
  2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-01 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

            Bug ID: 100855
           Summary: pow run time gfortran vs ifort
           Product: gcc
           Version: 11.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nadavhalahmi560 at gmail dot com
  Target Milestone: ---

I wrote the code below:

```
program power
    implicit none

    real :: sum, n, q
    integer :: i, j
    integer :: limit
    real :: start, finish

    sum = 0d0
    limit = 10000
    n = 2.0
    q = 0.5
    call CPU_TIME(start)
    do j=1,limit
        do i=1, limit
            n = n*q
            sum = sum + (i ** (0.05 + n))
        end do
    end do
    call CPU_TIME(finish)
    print *, sum
    print '("Time = ",f6.3," seconds.")',finish-start
end program power
```

and compiled it using:

ifort pow.f90 -O3 -no-vec -o intel.out

gfortran pow.f90 -O3 -fno-tree-vectorize -o gnu.out

When I run `./intel.out` I get the following output:
  3.3554432E+07
Time =  1.615 seconds.

When I run `./gnu.out` I get the following output:
  33554432.0    
Time =  7.817 seconds.

Therefore, gfortran is much slower than ifort. I get similar behavior for `log`
and `exp` functions.

gfortran -v:
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/software/x86_64/3.10.0/gcc/11.1.0/libexec/gcc/x86_64-pc-linux-gnu/11.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/software/x86_64/3.10.0/gcc/11.1.0
--disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.1.0 (GCC) 

ifort -v:
ifort version 19.1.3.304

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
@ 2021-06-01 16:19 ` kargl at gcc dot gnu.org
  2021-06-01 17:20 ` anlauf at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: kargl at gcc dot gnu.org @ 2021-06-01 16:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

kargl at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P4
   Last reconfirmed|                            |2021-06-01
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
                 CC|                            |kargl at gcc dot gnu.org

--- Comment #1 from kargl at gcc dot gnu.org ---
This is not a gfortran bug.  Adding code to use exp() and log(),
I compiled the modified code:

         s0 = s0 + i**(0.05 + n)
         s1 = s1 + exp(0.05 + n)
         s2 = s2 + log(0.05 + n)

with the -fdump-tree-optimized option.  Looking at the dumped info,
one finds the three lines 

  _107 = __builtin_powf (_103, _106);
  _109 = __builtin_expf (_105);
  _111 = __builtin_logf (_105);

If I compile the code with "-S -O3" and look at the assembly code
I see

        call    powf
        call    expf
        call    logf

which are math functions contained in your system's libm.  So, this
is an issue with your libm not gfortran.  I'll let someone else judge
whether the bug should be closed with INVALID or WONTFIX.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
  2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
@ 2021-06-01 17:20 ` anlauf at gcc dot gnu.org
  2021-06-02  7:52 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: anlauf at gcc dot gnu.org @ 2021-06-01 17:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #2 from anlauf at gcc dot gnu.org ---
If you do not care about correct rounding, you can replace

   sum = sum + (i ** (0.05 + n))

by

   sum = sum + exp (log (real(i)) * (0.05 + n))

I think __builtin_powf and powf do care.

I do not know if there is a gcc flag that replaces __builtin_powf by the
combination of __builtin_expf / __builtin_logf which would also allow
for (better) vectorization.
(I know of a $$$$ compiler for $$$$ hardware which offers this).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
  2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
  2021-06-01 17:20 ` anlauf at gcc dot gnu.org
@ 2021-06-02  7:52 ` rguenth at gcc dot gnu.org
  2021-06-02  9:37 ` nadavhalahmi560 at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-02  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Might be interesting to see whether ifort does any expression simplification
here.  Can you share the produced assembly?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
                   ` (2 preceding siblings ...)
  2021-06-02  7:52 ` rguenth at gcc dot gnu.org
@ 2021-06-02  9:37 ` nadavhalahmi560 at gmail dot com
  2021-06-02  9:38 ` nadavhalahmi560 at gmail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-02  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #4 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Might be interesting to see whether ifort does any expression simplification
> here.  Can you share the produced assembly?

gfortran pow.f90 -O3 -fno-tree-vectorize -S -o gnu.s:

        .file   "pow.f90"
        .text
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC5:
        .string "pow.f90"
.LC6:
        .string "(\"Time = \",f6.3,\" seconds.\")"
        .text
        .p2align 4
        .type   MAIN__, @function
MAIN__:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        xorl    %eax, %eax
        movl    $10000, %ebp
        pushq   %rbx
        .cfi_def_cfa_offset 24
        .cfi_offset 3, -24
        subq    $568, %rsp
        .cfi_def_cfa_offset 592
        leaq    20(%rsp), %rdi
        call    _gfortran_cpu_time_4
        pxor    %xmm4, %xmm4
        movss   .LC1(%rip), %xmm2
        movss   %xmm4, 8(%rsp)
.L4:
        movss   .LC2(%rip), %xmm0
        movl    $1, %ebx
        jmp     .L3
        .p2align 4,,10
        .p2align 3
.L8:
        movss   .LC3(%rip), %xmm1
        pxor    %xmm0, %xmm0
        movss   %xmm2, 12(%rsp)
        cvtsi2ssl       %ebx, %xmm0
        mulss   %xmm2, %xmm1
        addss   .LC4(%rip), %xmm1
        call    powf
        movss   12(%rsp), %xmm2
.L3:
        addss   8(%rsp), %xmm0
        addl    $1, %ebx
        mulss   .LC3(%rip), %xmm2
        movss   %xmm0, 8(%rsp)
        cmpl    $10001, %ebx
        jne     .L8
        subl    $1, %ebp
        jne     .L4
        leaq    16(%rsp), %rdi
        xorl    %eax, %eax
        movss   %xmm0, 24(%rsp)
        call    _gfortran_cpu_time_4
        leaq    32(%rsp), %rdi
        movabsq $25769803904, %rax
        movq    $.LC5, 40(%rsp)
        movq    %rax, 32(%rsp)
        movl    $21, 48(%rsp)
        call    _gfortran_st_write
        leaq    24(%rsp), %rsi
        movl    $4, %edx
        leaq    32(%rsp), %rdi
        call    _gfortran_transfer_real_write
        leaq    32(%rsp), %rdi
        call    _gfortran_st_write_done
        leaq    32(%rsp), %rdi
        movabsq $25769807872, %rax
        movq    $.LC5, 40(%rsp)
        movq    %rax, 32(%rsp)
        movl    $22, 48(%rsp)
        movq    $.LC6, 112(%rsp)
        movq    $28, 120(%rsp)
        call    _gfortran_st_write
        movss   16(%rsp), %xmm0
        subss   20(%rsp), %xmm0
        leaq    28(%rsp), %rsi
        leaq    32(%rsp), %rdi
        movl    $4, %edx
        movss   %xmm0, 28(%rsp)
        call    _gfortran_transfer_real_write
        leaq    32(%rsp), %rdi
        call    _gfortran_st_write_done
        addq    $568, %rsp
        .cfi_def_cfa_offset 24
        popq    %rbx
        .cfi_def_cfa_offset 16
        popq    %rbp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE0:
        .size   MAIN__, .-MAIN__
        .section        .text.startup,"ax",@progbits
        .p2align 4
        .globl  main
        .type   main, @function
main:
.LFB1:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        call    _gfortran_set_args
        movl    $options.2.0, %esi
        movl    $7, %edi
        call    _gfortran_set_options
        call    MAIN__
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE1:
        .size   main, .-main
        .section        .rodata
        .align 16
        .type   options.2.0, @object
        .size   options.2.0, 28
options.2.0:
        .long   2116
        .long   4095
        .long   0
        .long   1
        .long   1
        .long   0
        .long   31
        .section        .rodata.cst4,"aM",@progbits,4
        .align 4
.LC1:
        .long   1073741824
        .align 4
.LC2:
        .long   1065353216
        .align 4
.LC3:
        .long   1056964608
        .align 4
.LC4:
        .long   1028443341
        .ident  "GCC: (GNU) 11.1.0"
        .section        .note.GNU-stack,"",@progbits

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
                   ` (3 preceding siblings ...)
  2021-06-02  9:37 ` nadavhalahmi560 at gmail dot com
@ 2021-06-02  9:38 ` nadavhalahmi560 at gmail dot com
  2021-06-02 16:34 ` dominiq at lps dot ens.fr
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-02  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #5 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Might be interesting to see whether ifort does any expression simplification
> here.  Can you share the produced assembly?

ifort pow.f90 -O3 -no-vec -S -o intel.s:

# mark_description "Intel(R) Fortran Intel(R) 64 Compiler for applications
running on Intel(R) 64, Version 19.1.3.304 Build 2020";
# mark_description "0925_000000";
# mark_description "-O3 -no-vec -S -o intel.s";
        .file "pow.f90"
        .text
..TXTST0:
.L_2__routine_start_MAIN___0:
# -- Begin  MAIN__
        .text
# mark_begin;
       .align    16,0x90
        .globl MAIN__
# --- POWER
MAIN__:
..B1.1:                         # Preds ..B1.0
                                # Execution count [1.00e+00]
        .cfi_startproc
..___tag_value_MAIN__.1:
..L2:
                                                          #1.9
        pushq     %rbp                                          #1.9
        .cfi_def_cfa_offset 16
        movq      %rsp, %rbp                                    #1.9
        .cfi_def_cfa 6, 16
        .cfi_offset 6, -16
        andq      $-128, %rsp                                   #1.9
        subq      $128, %rsp                                    #1.9
        movl      $3, %edi                                      #1.9
        xorl      %esi, %esi                                    #1.9
        call      __intel_new_feature_proc_init                 #1.9
                                # LOE rbx r12 r13 r14 r15
..B1.13:                        # Preds ..B1.1
                                # Execution count [1.00e+00]
        stmxcsr   (%rsp)                                        #1.9
        movl      $__NLITPACK_0.0.1, %edi                       #1.9
        orl       $32832, (%rsp)                                #1.9
        ldmxcsr   (%rsp)                                        #1.9
        call      for_set_reentrancy                            #1.9
                                # LOE rbx r12 r13 r14 r15
..B1.2:                         # Preds ..B1.13
                                # Execution count [1.00e+00]
        movss     .L_2il0floatpacket.0(%rip), %xmm1             #11.5
        lea       80(%rsp), %rdi                                #13.10
        movss     %xmm1, -64(%rdi)                              #11.5[spill]
        pxor      %xmm0, %xmm0                                  #9.5
        movss     %xmm0, -8(%rdi)                               #9.5[spill]
        call      for_cpusec                                    #13.10
                                # LOE rbx r12 r13 r14 r15
..B1.3:                         # Preds ..B1.2
                                # Execution count [8.67e-01]
        movl      $1, %eax                                      #14.5
        movq      %r15, (%rsp)                                  #12.5[spill]
        movq      %rbx, 8(%rsp)                                 #12.5[spill]
        .cfi_escape 0x10, 0x03, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
        .cfi_escape 0x10, 0x0f, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
        movl      %eax, %ebx                                    #12.5
                                # LOE r12 r13 r14 ebx
..B1.4:                         # Preds ..B1.6 ..B1.3
                                # Execution count [5.33e+00]
        movl      $1, %r15d                                     #15.9
                                # LOE r12 r13 r14 ebx r15d
..B1.5:                         # Preds ..B1.14 ..B1.4
                                # Execution count [2.96e+01]
        movss     16(%rsp), %xmm2                               #16.13[spill]
        pxor      %xmm0, %xmm0                                  #17.28
        mulss     .L_2il0floatpacket.1(%rip), %xmm2             #16.13
        cvtsi2ss  %r15d, %xmm0                                  #17.28
        movss     .L_2il0floatpacket.2(%rip), %xmm1             #17.28
        movss     %xmm2, 16(%rsp)                               #16.13[spill]
        addss     %xmm2, %xmm1                                  #17.28
        call      powf                                          #17.28
                                # LOE r12 r13 r14 ebx r15d xmm0
..B1.14:                        # Preds ..B1.5
                                # Execution count [2.96e+01]
        movss     72(%rsp), %xmm1                               #17.13[spill]
        incl      %r15d                                         #18.9
        addss     %xmm0, %xmm1                                  #17.13
        movss     %xmm1, 72(%rsp)                               #17.13[spill]
        cmpl      $10000, %r15d                                 #18.9
        jle       ..B1.5        # Prob 82%                      #18.9
                                # LOE r12 r13 r14 ebx r15d
..B1.6:                         # Preds ..B1.14
                                # Execution count [5.44e+00]
        incl      %ebx                                          #19.5
        cmpl      $10000, %ebx                                  #19.5
        jle       ..B1.4        # Prob 82%                      #19.5
                                # LOE r12 r13 r14 ebx
..B1.7:                         # Preds ..B1.6
                                # Execution count [1.00e+00]
        movq      (%rsp), %r15                                  #[spill]
        .cfi_restore 15
        lea       84(%rsp), %rdi                                #20.10
        movq      8(%rsp), %rbx                                 #[spill]
        .cfi_restore 3
        call      for_cpusec                                    #20.10
        .cfi_escape 0x10, 0x03, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
        .cfi_escape 0x10, 0x0f, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
                                # LOE rbx r12 r13 r14 r15
..B1.8:                         # Preds ..B1.7
                                # Execution count [1.00e+00]
        movss     72(%rsp), %xmm0                               #21.5[spill]
        lea       (%rsp), %rdi                                  #21.5
        movl      $-1, %esi                                     #21.5
        movq      $0x1208384ff00, %rdx                          #21.5
        movl      $__STRLITPACK_3.0.1, %ecx                     #21.5
        lea       64(%rsp), %r8                                 #21.5
        xorl      %eax, %eax                                    #21.5
        movq      $0, (%rdi)                                    #21.5
        movss     %xmm0, 64(%rdi)                               #21.5
        call      for_write_seq_lis                             #21.5
                                # LOE rbx r12 r13 r14 r15
..B1.9:                         # Preds ..B1.8
                                # Execution count [1.00e+00]
        movss     84(%rsp), %xmm0                               #22.5
        lea       (%rsp), %rdi                                  #22.5
        movl      $-1, %esi                                     #22.5
        movq      $0x1208384ff00, %rdx                          #22.5
        movl      $__STRLITPACK_4.0.1, %ecx                     #22.5
        lea       72(%rsp), %r8                                 #22.5
        movl      $power_$format_pack.0.1, %r9d                 #22.5
        xorl      %eax, %eax                                    #22.5
        movq      $0, (%rdi)                                    #22.5
        subss     80(%rdi), %xmm0                               #22.5
        movss     %xmm0, 72(%rdi)                               #22.5
        call      for_write_seq_fmt                             #22.5
                                # LOE rbx r12 r13 r14 r15
..B1.10:                        # Preds ..B1.9
                                # Execution count [1.00e+00]
        xorl      %eax, %eax                                    #23.1
        movq      %rbp, %rsp                                    #23.1
        popq      %rbp                                          #23.1
        .cfi_def_cfa 7, 8
        .cfi_restore 6
        ret                                                     #23.1
        .align    16,0x90
                                # LOE
        .cfi_endproc
# mark_end;
        .type   MAIN__,@function
        .size   MAIN__,.-MAIN__
..LNMAIN__.0:
        .section .rodata, "a"
        .align 4
        .align 4
__NLITPACK_0.0.1:
        .long   2
        .align 4
__STRLITPACK_3.0.1:
        .long   65818
        .byte   0
        .space 3, 0x00  # pad
        .align 4
__STRLITPACK_4.0.1:
        .long   65818
        .byte   0
        .space 3, 0x00  # pad
        .align 4
power_$format_pack.0.1:
        .byte   54
        .byte   0
        .byte   0
        .byte   0
        .byte   28
        .byte   0
        .byte   7
        .byte   0
        .byte   84
        .byte   105
        .byte   109
        .byte   101
        .byte   32
        .byte   61
        .byte   32
        .byte   0
        .byte   33
        .byte   0
        .byte   0
        .byte   3
        .byte   1
        .byte   0
        .byte   0
        .byte   0
        .byte   6
        .byte   0
        .byte   0
        .byte   0
        .byte   28
        .byte   0
        .byte   9
        .byte   0
        .byte   32
        .byte   115
        .byte   101
        .byte   99
        .byte   111
        .byte   110
        .byte   100
        .byte   115
        .byte   46
        .byte   0
        .byte   0
        .byte   0
        .byte   55
        .byte   0
        .byte   0
        .byte   0
        .data
# -- End  MAIN__
        .section .rodata, "a"
        .align 4
.L_2il0floatpacket.0:
        .long   0x40000000
        .type   .L_2il0floatpacket.0,@object
        .size   .L_2il0floatpacket.0,4
        .align 4
.L_2il0floatpacket.1:
        .long   0x3f000000
        .type   .L_2il0floatpacket.1,@object
        .size   .L_2il0floatpacket.1,4
        .align 4
.L_2il0floatpacket.2:
        .long   0x3d4ccccd
        .type   .L_2il0floatpacket.2,@object
        .size   .L_2il0floatpacket.2,4
        .data
        .section .note.GNU-stack, ""
# End

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
                   ` (4 preceding siblings ...)
  2021-06-02  9:38 ` nadavhalahmi560 at gmail dot com
@ 2021-06-02 16:34 ` dominiq at lps dot ens.fr
  2021-06-03  8:21 ` nadavhalahmi560 at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: dominiq at lps dot ens.fr @ 2021-06-02 16:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #6 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
On a MacOS, Corei9, 2.4Ghz, the program runs in ~1s, almost indpendtly of the
option level.

This PR remind me an old problem in which the transcendental functions were
almost slower for REAL(4) then for REAL(8) on some Unix distros (Fedora(?),
based of "correct rounding").

What are your timings if you replace

    real :: sum, n, q

with

    real(8) :: sum, n, q

and

            sum = sum + (i ** (0.05 + n))

with

            sum = sum + (i ** (0.05_8 + n))

?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
                   ` (5 preceding siblings ...)
  2021-06-02 16:34 ` dominiq at lps dot ens.fr
@ 2021-06-03  8:21 ` nadavhalahmi560 at gmail dot com
  2021-06-03 14:24 ` dominiq at lps dot ens.fr
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-03  8:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #7 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Dominique d'Humieres from comment #6)
> On a MacOS, Corei9, 2.4Ghz, the program runs in ~1s, almost indpendtly of
> the option level.
> 
> This PR remind me an old problem in which the transcendental functions were
> almost slower for REAL(4) then for REAL(8) on some Unix distros (Fedora(?),
> based of "correct rounding").
> 
> What are your timings if you replace
> 
>     real :: sum, n, q
> 
> with
> 
>     real(8) :: sum, n, q
> 
> and
> 
>             sum = sum + (i ** (0.05 + n))
> 
> with
> 
>             sum = sum + (i ** (0.05_8 + n))
> 
> ?

Timings for this change (notice the result was also changed):
gnu:
   150945570.07620683     
Time =  6.303 seconds.
intel:
   150945570.076207     
Time =  2.349 seconds.

So gnu is indeed faster for real(8), but the result was changed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
                   ` (6 preceding siblings ...)
  2021-06-03  8:21 ` nadavhalahmi560 at gmail dot com
@ 2021-06-03 14:24 ` dominiq at lps dot ens.fr
  2021-06-05 11:59 ` dominiq at lps dot ens.fr
  2021-06-06  8:52 ` nadavhalahmi560 at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: dominiq at lps dot ens.fr @ 2021-06-03 14:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #8 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> So gnu is indeed faster for real(8), but the result was changed.

What OS are you using? In any sensible library REAL(4° should be faster than
REAL(8).

> notice the result was also changed

REAL(4):  33554432.0 
REAL(8):  150945570.07620683
REAL(16): 150945570.075233660889594015556531239

I did not do a full numerical analysis, but it is known that SUM is very
limited for REAL(4).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
                   ` (7 preceding siblings ...)
  2021-06-03 14:24 ` dominiq at lps dot ens.fr
@ 2021-06-05 11:59 ` dominiq at lps dot ens.fr
  2021-06-06  8:52 ` nadavhalahmi560 at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: dominiq at lps dot ens.fr @ 2021-06-05 11:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|WAITING                     |RESOLVED

--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
I don't know if the test is coming from a real world problem. The modified test

program power
    implicit none

    real :: sum, sum1, n, q
    integer :: i, j
    integer :: limit
    real :: start, finish

    sum = 0d0
    sum1 = 0d0
    limit = 10000
    n = 2.0
    q = 0.5
    call CPU_TIME(start)
        do i=1, limit
            n = n*q
            sum1 = sum1 + (i ** (0.05 + n))
        end do
        do i=1, limit
            sum = sum + (i ** 0.05)
        end do
        sum = sum1 + (limit-1)*sum
    call CPU_TIME(finish)
    print *, sum, n, sum1
    print '("Time = ",f6.3," seconds.")',finish-start
end program power

yields

   150945680.       0.00000000       15095.7852    
Time =  0.000 seconds.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/100855] pow run time gfortran vs ifort
  2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
                   ` (8 preceding siblings ...)
  2021-06-05 11:59 ` dominiq at lps dot ens.fr
@ 2021-06-06  8:52 ` nadavhalahmi560 at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-06  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855

--- Comment #10 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Dominique d'Humieres from comment #9)
> I don't know if the test is coming from a real world problem. The modified
> test
> 
> program power
>     implicit none
> 
>     real :: sum, sum1, n, q
>     integer :: i, j
>     integer :: limit
>     real :: start, finish
> 
>     sum = 0d0
>     sum1 = 0d0
>     limit = 10000
>     n = 2.0
>     q = 0.5
>     call CPU_TIME(start)
>         do i=1, limit
>             n = n*q
>             sum1 = sum1 + (i ** (0.05 + n))
>         end do
>         do i=1, limit
>             sum = sum + (i ** 0.05)
>         end do
>         sum = sum1 + (limit-1)*sum
>     call CPU_TIME(finish)
>     print *, sum, n, sum1
>     print '("Time = ",f6.3," seconds.")',finish-start
> end program power
> 
> yields
> 
>    150945680.       0.00000000       15095.7852    
> Time =  0.000 seconds.

What did you try to show here?

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-06  8:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
2021-06-01 17:20 ` anlauf at gcc dot gnu.org
2021-06-02  7:52 ` rguenth at gcc dot gnu.org
2021-06-02  9:37 ` nadavhalahmi560 at gmail dot com
2021-06-02  9:38 ` nadavhalahmi560 at gmail dot com
2021-06-02 16:34 ` dominiq at lps dot ens.fr
2021-06-03  8:21 ` nadavhalahmi560 at gmail dot com
2021-06-03 14:24 ` dominiq at lps dot ens.fr
2021-06-05 11:59 ` dominiq at lps dot ens.fr
2021-06-06  8:52 ` nadavhalahmi560 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).