public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/100855] New: pow run time gfortran vs ifort
@ 2021-06-01 13:01 nadavhalahmi560 at gmail dot com
2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-01 13:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
Bug ID: 100855
Summary: pow run time gfortran vs ifort
Product: gcc
Version: 11.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: nadavhalahmi560 at gmail dot com
Target Milestone: ---
I wrote the code below:
```
program power
implicit none
real :: sum, n, q
integer :: i, j
integer :: limit
real :: start, finish
sum = 0d0
limit = 10000
n = 2.0
q = 0.5
call CPU_TIME(start)
do j=1,limit
do i=1, limit
n = n*q
sum = sum + (i ** (0.05 + n))
end do
end do
call CPU_TIME(finish)
print *, sum
print '("Time = ",f6.3," seconds.")',finish-start
end program power
```
and compiled it using:
ifort pow.f90 -O3 -no-vec -o intel.out
gfortran pow.f90 -O3 -fno-tree-vectorize -o gnu.out
When I run `./intel.out` I get the following output:
3.3554432E+07
Time = 1.615 seconds.
When I run `./gnu.out` I get the following output:
33554432.0
Time = 7.817 seconds.
Therefore, gfortran is much slower than ifort. I get similar behavior for `log`
and `exp` functions.
gfortran -v:
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/software/x86_64/3.10.0/gcc/11.1.0/libexec/gcc/x86_64-pc-linux-gnu/11.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/software/x86_64/3.10.0/gcc/11.1.0
--disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.1.0 (GCC)
ifort -v:
ifort version 19.1.3.304
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
@ 2021-06-01 16:19 ` kargl at gcc dot gnu.org
2021-06-01 17:20 ` anlauf at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: kargl at gcc dot gnu.org @ 2021-06-01 16:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
kargl at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P4
Last reconfirmed| |2021-06-01
Ever confirmed|0 |1
Status|UNCONFIRMED |WAITING
CC| |kargl at gcc dot gnu.org
--- Comment #1 from kargl at gcc dot gnu.org ---
This is not a gfortran bug. Adding code to use exp() and log(),
I compiled the modified code:
s0 = s0 + i**(0.05 + n)
s1 = s1 + exp(0.05 + n)
s2 = s2 + log(0.05 + n)
with the -fdump-tree-optimized option. Looking at the dumped info,
one finds the three lines
_107 = __builtin_powf (_103, _106);
_109 = __builtin_expf (_105);
_111 = __builtin_logf (_105);
If I compile the code with "-S -O3" and look at the assembly code
I see
call powf
call expf
call logf
which are math functions contained in your system's libm. So, this
is an issue with your libm not gfortran. I'll let someone else judge
whether the bug should be closed with INVALID or WONTFIX.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
@ 2021-06-01 17:20 ` anlauf at gcc dot gnu.org
2021-06-02 7:52 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: anlauf at gcc dot gnu.org @ 2021-06-01 17:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #2 from anlauf at gcc dot gnu.org ---
If you do not care about correct rounding, you can replace
sum = sum + (i ** (0.05 + n))
by
sum = sum + exp (log (real(i)) * (0.05 + n))
I think __builtin_powf and powf do care.
I do not know if there is a gcc flag that replaces __builtin_powf by the
combination of __builtin_expf / __builtin_logf which would also allow
for (better) vectorization.
(I know of a $$$$ compiler for $$$$ hardware which offers this).
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
2021-06-01 17:20 ` anlauf at gcc dot gnu.org
@ 2021-06-02 7:52 ` rguenth at gcc dot gnu.org
2021-06-02 9:37 ` nadavhalahmi560 at gmail dot com
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-02 7:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Might be interesting to see whether ifort does any expression simplification
here. Can you share the produced assembly?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
` (2 preceding siblings ...)
2021-06-02 7:52 ` rguenth at gcc dot gnu.org
@ 2021-06-02 9:37 ` nadavhalahmi560 at gmail dot com
2021-06-02 9:38 ` nadavhalahmi560 at gmail dot com
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-02 9:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #4 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Might be interesting to see whether ifort does any expression simplification
> here. Can you share the produced assembly?
gfortran pow.f90 -O3 -fno-tree-vectorize -S -o gnu.s:
.file "pow.f90"
.text
.section .rodata.str1.1,"aMS",@progbits,1
.LC5:
.string "pow.f90"
.LC6:
.string "(\"Time = \",f6.3,\" seconds.\")"
.text
.p2align 4
.type MAIN__, @function
MAIN__:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
xorl %eax, %eax
movl $10000, %ebp
pushq %rbx
.cfi_def_cfa_offset 24
.cfi_offset 3, -24
subq $568, %rsp
.cfi_def_cfa_offset 592
leaq 20(%rsp), %rdi
call _gfortran_cpu_time_4
pxor %xmm4, %xmm4
movss .LC1(%rip), %xmm2
movss %xmm4, 8(%rsp)
.L4:
movss .LC2(%rip), %xmm0
movl $1, %ebx
jmp .L3
.p2align 4,,10
.p2align 3
.L8:
movss .LC3(%rip), %xmm1
pxor %xmm0, %xmm0
movss %xmm2, 12(%rsp)
cvtsi2ssl %ebx, %xmm0
mulss %xmm2, %xmm1
addss .LC4(%rip), %xmm1
call powf
movss 12(%rsp), %xmm2
.L3:
addss 8(%rsp), %xmm0
addl $1, %ebx
mulss .LC3(%rip), %xmm2
movss %xmm0, 8(%rsp)
cmpl $10001, %ebx
jne .L8
subl $1, %ebp
jne .L4
leaq 16(%rsp), %rdi
xorl %eax, %eax
movss %xmm0, 24(%rsp)
call _gfortran_cpu_time_4
leaq 32(%rsp), %rdi
movabsq $25769803904, %rax
movq $.LC5, 40(%rsp)
movq %rax, 32(%rsp)
movl $21, 48(%rsp)
call _gfortran_st_write
leaq 24(%rsp), %rsi
movl $4, %edx
leaq 32(%rsp), %rdi
call _gfortran_transfer_real_write
leaq 32(%rsp), %rdi
call _gfortran_st_write_done
leaq 32(%rsp), %rdi
movabsq $25769807872, %rax
movq $.LC5, 40(%rsp)
movq %rax, 32(%rsp)
movl $22, 48(%rsp)
movq $.LC6, 112(%rsp)
movq $28, 120(%rsp)
call _gfortran_st_write
movss 16(%rsp), %xmm0
subss 20(%rsp), %xmm0
leaq 28(%rsp), %rsi
leaq 32(%rsp), %rdi
movl $4, %edx
movss %xmm0, 28(%rsp)
call _gfortran_transfer_real_write
leaq 32(%rsp), %rdi
call _gfortran_st_write_done
addq $568, %rsp
.cfi_def_cfa_offset 24
popq %rbx
.cfi_def_cfa_offset 16
popq %rbp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
.size MAIN__, .-MAIN__
.section .text.startup,"ax",@progbits
.p2align 4
.globl main
.type main, @function
main:
.LFB1:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call _gfortran_set_args
movl $options.2.0, %esi
movl $7, %edi
call _gfortran_set_options
call MAIN__
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.section .rodata
.align 16
.type options.2.0, @object
.size options.2.0, 28
options.2.0:
.long 2116
.long 4095
.long 0
.long 1
.long 1
.long 0
.long 31
.section .rodata.cst4,"aM",@progbits,4
.align 4
.LC1:
.long 1073741824
.align 4
.LC2:
.long 1065353216
.align 4
.LC3:
.long 1056964608
.align 4
.LC4:
.long 1028443341
.ident "GCC: (GNU) 11.1.0"
.section .note.GNU-stack,"",@progbits
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
` (3 preceding siblings ...)
2021-06-02 9:37 ` nadavhalahmi560 at gmail dot com
@ 2021-06-02 9:38 ` nadavhalahmi560 at gmail dot com
2021-06-02 16:34 ` dominiq at lps dot ens.fr
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-02 9:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #5 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Might be interesting to see whether ifort does any expression simplification
> here. Can you share the produced assembly?
ifort pow.f90 -O3 -no-vec -S -o intel.s:
# mark_description "Intel(R) Fortran Intel(R) 64 Compiler for applications
running on Intel(R) 64, Version 19.1.3.304 Build 2020";
# mark_description "0925_000000";
# mark_description "-O3 -no-vec -S -o intel.s";
.file "pow.f90"
.text
..TXTST0:
.L_2__routine_start_MAIN___0:
# -- Begin MAIN__
.text
# mark_begin;
.align 16,0x90
.globl MAIN__
# --- POWER
MAIN__:
..B1.1: # Preds ..B1.0
# Execution count [1.00e+00]
.cfi_startproc
..___tag_value_MAIN__.1:
..L2:
#1.9
pushq %rbp #1.9
.cfi_def_cfa_offset 16
movq %rsp, %rbp #1.9
.cfi_def_cfa 6, 16
.cfi_offset 6, -16
andq $-128, %rsp #1.9
subq $128, %rsp #1.9
movl $3, %edi #1.9
xorl %esi, %esi #1.9
call __intel_new_feature_proc_init #1.9
# LOE rbx r12 r13 r14 r15
..B1.13: # Preds ..B1.1
# Execution count [1.00e+00]
stmxcsr (%rsp) #1.9
movl $__NLITPACK_0.0.1, %edi #1.9
orl $32832, (%rsp) #1.9
ldmxcsr (%rsp) #1.9
call for_set_reentrancy #1.9
# LOE rbx r12 r13 r14 r15
..B1.2: # Preds ..B1.13
# Execution count [1.00e+00]
movss .L_2il0floatpacket.0(%rip), %xmm1 #11.5
lea 80(%rsp), %rdi #13.10
movss %xmm1, -64(%rdi) #11.5[spill]
pxor %xmm0, %xmm0 #9.5
movss %xmm0, -8(%rdi) #9.5[spill]
call for_cpusec #13.10
# LOE rbx r12 r13 r14 r15
..B1.3: # Preds ..B1.2
# Execution count [8.67e-01]
movl $1, %eax #14.5
movq %r15, (%rsp) #12.5[spill]
movq %rbx, 8(%rsp) #12.5[spill]
.cfi_escape 0x10, 0x03, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
.cfi_escape 0x10, 0x0f, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
movl %eax, %ebx #12.5
# LOE r12 r13 r14 ebx
..B1.4: # Preds ..B1.6 ..B1.3
# Execution count [5.33e+00]
movl $1, %r15d #15.9
# LOE r12 r13 r14 ebx r15d
..B1.5: # Preds ..B1.14 ..B1.4
# Execution count [2.96e+01]
movss 16(%rsp), %xmm2 #16.13[spill]
pxor %xmm0, %xmm0 #17.28
mulss .L_2il0floatpacket.1(%rip), %xmm2 #16.13
cvtsi2ss %r15d, %xmm0 #17.28
movss .L_2il0floatpacket.2(%rip), %xmm1 #17.28
movss %xmm2, 16(%rsp) #16.13[spill]
addss %xmm2, %xmm1 #17.28
call powf #17.28
# LOE r12 r13 r14 ebx r15d xmm0
..B1.14: # Preds ..B1.5
# Execution count [2.96e+01]
movss 72(%rsp), %xmm1 #17.13[spill]
incl %r15d #18.9
addss %xmm0, %xmm1 #17.13
movss %xmm1, 72(%rsp) #17.13[spill]
cmpl $10000, %r15d #18.9
jle ..B1.5 # Prob 82% #18.9
# LOE r12 r13 r14 ebx r15d
..B1.6: # Preds ..B1.14
# Execution count [5.44e+00]
incl %ebx #19.5
cmpl $10000, %ebx #19.5
jle ..B1.4 # Prob 82% #19.5
# LOE r12 r13 r14 ebx
..B1.7: # Preds ..B1.6
# Execution count [1.00e+00]
movq (%rsp), %r15 #[spill]
.cfi_restore 15
lea 84(%rsp), %rdi #20.10
movq 8(%rsp), %rbx #[spill]
.cfi_restore 3
call for_cpusec #20.10
.cfi_escape 0x10, 0x03, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
.cfi_escape 0x10, 0x0f, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
# LOE rbx r12 r13 r14 r15
..B1.8: # Preds ..B1.7
# Execution count [1.00e+00]
movss 72(%rsp), %xmm0 #21.5[spill]
lea (%rsp), %rdi #21.5
movl $-1, %esi #21.5
movq $0x1208384ff00, %rdx #21.5
movl $__STRLITPACK_3.0.1, %ecx #21.5
lea 64(%rsp), %r8 #21.5
xorl %eax, %eax #21.5
movq $0, (%rdi) #21.5
movss %xmm0, 64(%rdi) #21.5
call for_write_seq_lis #21.5
# LOE rbx r12 r13 r14 r15
..B1.9: # Preds ..B1.8
# Execution count [1.00e+00]
movss 84(%rsp), %xmm0 #22.5
lea (%rsp), %rdi #22.5
movl $-1, %esi #22.5
movq $0x1208384ff00, %rdx #22.5
movl $__STRLITPACK_4.0.1, %ecx #22.5
lea 72(%rsp), %r8 #22.5
movl $power_$format_pack.0.1, %r9d #22.5
xorl %eax, %eax #22.5
movq $0, (%rdi) #22.5
subss 80(%rdi), %xmm0 #22.5
movss %xmm0, 72(%rdi) #22.5
call for_write_seq_fmt #22.5
# LOE rbx r12 r13 r14 r15
..B1.10: # Preds ..B1.9
# Execution count [1.00e+00]
xorl %eax, %eax #23.1
movq %rbp, %rsp #23.1
popq %rbp #23.1
.cfi_def_cfa 7, 8
.cfi_restore 6
ret #23.1
.align 16,0x90
# LOE
.cfi_endproc
# mark_end;
.type MAIN__,@function
.size MAIN__,.-MAIN__
..LNMAIN__.0:
.section .rodata, "a"
.align 4
.align 4
__NLITPACK_0.0.1:
.long 2
.align 4
__STRLITPACK_3.0.1:
.long 65818
.byte 0
.space 3, 0x00 # pad
.align 4
__STRLITPACK_4.0.1:
.long 65818
.byte 0
.space 3, 0x00 # pad
.align 4
power_$format_pack.0.1:
.byte 54
.byte 0
.byte 0
.byte 0
.byte 28
.byte 0
.byte 7
.byte 0
.byte 84
.byte 105
.byte 109
.byte 101
.byte 32
.byte 61
.byte 32
.byte 0
.byte 33
.byte 0
.byte 0
.byte 3
.byte 1
.byte 0
.byte 0
.byte 0
.byte 6
.byte 0
.byte 0
.byte 0
.byte 28
.byte 0
.byte 9
.byte 0
.byte 32
.byte 115
.byte 101
.byte 99
.byte 111
.byte 110
.byte 100
.byte 115
.byte 46
.byte 0
.byte 0
.byte 0
.byte 55
.byte 0
.byte 0
.byte 0
.data
# -- End MAIN__
.section .rodata, "a"
.align 4
.L_2il0floatpacket.0:
.long 0x40000000
.type .L_2il0floatpacket.0,@object
.size .L_2il0floatpacket.0,4
.align 4
.L_2il0floatpacket.1:
.long 0x3f000000
.type .L_2il0floatpacket.1,@object
.size .L_2il0floatpacket.1,4
.align 4
.L_2il0floatpacket.2:
.long 0x3d4ccccd
.type .L_2il0floatpacket.2,@object
.size .L_2il0floatpacket.2,4
.data
.section .note.GNU-stack, ""
# End
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
` (4 preceding siblings ...)
2021-06-02 9:38 ` nadavhalahmi560 at gmail dot com
@ 2021-06-02 16:34 ` dominiq at lps dot ens.fr
2021-06-03 8:21 ` nadavhalahmi560 at gmail dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: dominiq at lps dot ens.fr @ 2021-06-02 16:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #6 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
On a MacOS, Corei9, 2.4Ghz, the program runs in ~1s, almost indpendtly of the
option level.
This PR remind me an old problem in which the transcendental functions were
almost slower for REAL(4) then for REAL(8) on some Unix distros (Fedora(?),
based of "correct rounding").
What are your timings if you replace
real :: sum, n, q
with
real(8) :: sum, n, q
and
sum = sum + (i ** (0.05 + n))
with
sum = sum + (i ** (0.05_8 + n))
?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
` (5 preceding siblings ...)
2021-06-02 16:34 ` dominiq at lps dot ens.fr
@ 2021-06-03 8:21 ` nadavhalahmi560 at gmail dot com
2021-06-03 14:24 ` dominiq at lps dot ens.fr
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-03 8:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #7 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Dominique d'Humieres from comment #6)
> On a MacOS, Corei9, 2.4Ghz, the program runs in ~1s, almost indpendtly of
> the option level.
>
> This PR remind me an old problem in which the transcendental functions were
> almost slower for REAL(4) then for REAL(8) on some Unix distros (Fedora(?),
> based of "correct rounding").
>
> What are your timings if you replace
>
> real :: sum, n, q
>
> with
>
> real(8) :: sum, n, q
>
> and
>
> sum = sum + (i ** (0.05 + n))
>
> with
>
> sum = sum + (i ** (0.05_8 + n))
>
> ?
Timings for this change (notice the result was also changed):
gnu:
150945570.07620683
Time = 6.303 seconds.
intel:
150945570.076207
Time = 2.349 seconds.
So gnu is indeed faster for real(8), but the result was changed.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
` (6 preceding siblings ...)
2021-06-03 8:21 ` nadavhalahmi560 at gmail dot com
@ 2021-06-03 14:24 ` dominiq at lps dot ens.fr
2021-06-05 11:59 ` dominiq at lps dot ens.fr
2021-06-06 8:52 ` nadavhalahmi560 at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: dominiq at lps dot ens.fr @ 2021-06-03 14:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #8 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> So gnu is indeed faster for real(8), but the result was changed.
What OS are you using? In any sensible library REAL(4° should be faster than
REAL(8).
> notice the result was also changed
REAL(4): 33554432.0
REAL(8): 150945570.07620683
REAL(16): 150945570.075233660889594015556531239
I did not do a full numerical analysis, but it is known that SUM is very
limited for REAL(4).
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
` (7 preceding siblings ...)
2021-06-03 14:24 ` dominiq at lps dot ens.fr
@ 2021-06-05 11:59 ` dominiq at lps dot ens.fr
2021-06-06 8:52 ` nadavhalahmi560 at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: dominiq at lps dot ens.fr @ 2021-06-05 11:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
Dominique d'Humieres <dominiq at lps dot ens.fr> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
Status|WAITING |RESOLVED
--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
I don't know if the test is coming from a real world problem. The modified test
program power
implicit none
real :: sum, sum1, n, q
integer :: i, j
integer :: limit
real :: start, finish
sum = 0d0
sum1 = 0d0
limit = 10000
n = 2.0
q = 0.5
call CPU_TIME(start)
do i=1, limit
n = n*q
sum1 = sum1 + (i ** (0.05 + n))
end do
do i=1, limit
sum = sum + (i ** 0.05)
end do
sum = sum1 + (limit-1)*sum
call CPU_TIME(finish)
print *, sum, n, sum1
print '("Time = ",f6.3," seconds.")',finish-start
end program power
yields
150945680. 0.00000000 15095.7852
Time = 0.000 seconds.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/100855] pow run time gfortran vs ifort
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
` (8 preceding siblings ...)
2021-06-05 11:59 ` dominiq at lps dot ens.fr
@ 2021-06-06 8:52 ` nadavhalahmi560 at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: nadavhalahmi560 at gmail dot com @ 2021-06-06 8:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100855
--- Comment #10 from Nadav Halahmi <nadavhalahmi560 at gmail dot com> ---
(In reply to Dominique d'Humieres from comment #9)
> I don't know if the test is coming from a real world problem. The modified
> test
>
> program power
> implicit none
>
> real :: sum, sum1, n, q
> integer :: i, j
> integer :: limit
> real :: start, finish
>
> sum = 0d0
> sum1 = 0d0
> limit = 10000
> n = 2.0
> q = 0.5
> call CPU_TIME(start)
> do i=1, limit
> n = n*q
> sum1 = sum1 + (i ** (0.05 + n))
> end do
> do i=1, limit
> sum = sum + (i ** 0.05)
> end do
> sum = sum1 + (limit-1)*sum
> call CPU_TIME(finish)
> print *, sum, n, sum1
> print '("Time = ",f6.3," seconds.")',finish-start
> end program power
>
> yields
>
> 150945680. 0.00000000 15095.7852
> Time = 0.000 seconds.
What did you try to show here?
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-06-06 8:52 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 13:01 [Bug fortran/100855] New: pow run time gfortran vs ifort nadavhalahmi560 at gmail dot com
2021-06-01 16:19 ` [Bug fortran/100855] " kargl at gcc dot gnu.org
2021-06-01 17:20 ` anlauf at gcc dot gnu.org
2021-06-02 7:52 ` rguenth at gcc dot gnu.org
2021-06-02 9:37 ` nadavhalahmi560 at gmail dot com
2021-06-02 9:38 ` nadavhalahmi560 at gmail dot com
2021-06-02 16:34 ` dominiq at lps dot ens.fr
2021-06-03 8:21 ` nadavhalahmi560 at gmail dot com
2021-06-03 14:24 ` dominiq at lps dot ens.fr
2021-06-05 11:59 ` dominiq at lps dot ens.fr
2021-06-06 8:52 ` nadavhalahmi560 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).