public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/41137]  New: inefficient zeroing of an array
@ 2009-08-21  6:15 jv244 at cam dot ac dot uk
  2009-08-21  7:02 ` [Bug fortran/41137] " jv244 at cam dot ac dot uk
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-21  6:15 UTC (permalink / raw)
  To: gcc-bugs

triggered by some discussion in PR41113

SUBROUTINE S(a,n)
INTEGER :: n
REAL :: a(n,n,n,n)
a(:,:,:,:)=0.0
END SUBROUTINE

generates a four-fold look to do the zeroing, while it would be more efficient
to zero n**4 elements starting from a(1,1,1,1). I.e. since a is contiguous in
memory a memset or similar can be done (properly guarded for zero-sized
arrays).

Note that the case with compile time constant bounds is already captured i.e. 

.LFB2:
        movl    $40000, %edx
        xorl    %esi, %esi
        jmp     memset
.LFE2:

is generated for 

SUBROUTINE S(a)
REAL :: a(10,10,10,10)
a(:,:,:,:)=0.0
END SUBROUTINE


-- 
           Summary: inefficient zeroing of an array
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: fortran
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jv244 at cam dot ac dot uk


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
@ 2009-08-21  7:02 ` jv244 at cam dot ac dot uk
  2009-08-21  7:40 ` dfranke at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-21  7:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from jv244 at cam dot ac dot uk  2009-08-21 07:02 -------
Just for reference, the difference in time between the two variants is truly
impressive. About a factor of 11 with gcc 4.4 and 8 with gcc 4.5. Given that a
code like CP2K spents sometimes about 5-10% of its time in zeroing stuff, this
would help significantly.

trunk:

> gfortran -O3 -march=native test.f90
> ./a.out
  0.10000600
  0.84405303

4.4 branch:
> gfortran -O3 -march=native test.f90
> ./a.out
  0.10400600
  1.1320710

test code:
SUBROUTINE S(a,n)
INTEGER :: n
REAL :: a(n,n,n,n)
a(:,:,:,:)=0.0
END SUBROUTINE

SUBROUTINE S2(a)
REAL :: a(10,10,10,10)
a(:,:,:,:)=0.0
END SUBROUTINE


REAL :: a(10,10,10,10),t1,t2
INTEGER :: I,N
N=100000

CALL CPU_TIME(t1)
DO I=1,N
CALL S2(a)
ENDDO
CALL CPU_TIME(t2)
write(6,*) t2-t1

CALL CPU_TIME(t1)
DO I=1,N
CALL S(a,10)
ENDDO
CALL CPU_TIME(t2)
write(6,*) t2-t1

END


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
  2009-08-21  7:02 ` [Bug fortran/41137] " jv244 at cam dot ac dot uk
@ 2009-08-21  7:40 ` dfranke at gcc dot gnu dot org
  2009-08-21  8:29 ` jv244 at cam dot ac dot uk
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: dfranke at gcc dot gnu dot org @ 2009-08-21  7:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from dfranke at gcc dot gnu dot org  2009-08-21 07:39 -------
I think PR31009 is similar.


-- 

dfranke at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dfranke at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
  2009-08-21  7:02 ` [Bug fortran/41137] " jv244 at cam dot ac dot uk
  2009-08-21  7:40 ` dfranke at gcc dot gnu dot org
@ 2009-08-21  8:29 ` jv244 at cam dot ac dot uk
  2009-08-24 20:06 ` jv244 at cam dot ac dot uk
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-21  8:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from jv244 at cam dot ac dot uk  2009-08-21 08:29 -------
(In reply to comment #2)
> I think PR31009 is similar.

In fact, this is almost a dup of PR31016, since also here, I'm explicitly
talking about the case of known-to-be-contiguous arrays.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (2 preceding siblings ...)
  2009-08-21  8:29 ` jv244 at cam dot ac dot uk
@ 2009-08-24 20:06 ` jv244 at cam dot ac dot uk
  2009-11-01 16:21 ` tkoenig at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-24 20:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jv244 at cam dot ac dot uk  2009-08-24 20:06 -------
I don't think this PR depends on PR40632, which just provides a F2008 mechanism
to signal an assumed shape array to be contiguous (certainly a useful feature
in its own respect). The cases discussed here are rather assumed size and
explicit shape arrays, which are always contiguous. As an added complication,
certain array sections of these arrays are also known to be contiguous at
compile time.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (3 preceding siblings ...)
  2009-08-24 20:06 ` jv244 at cam dot ac dot uk
@ 2009-11-01 16:21 ` tkoenig at gcc dot gnu dot org
  2009-11-01 17:36 ` tkoenig at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2009-11-01 16:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from tkoenig at gcc dot gnu dot org  2009-11-01 16:21 -------
A workaround (which should really be implemented within the compiler):

subroutine s(a,n)
integer :: n
real :: a(n*n*n*n)
a = 0.0
end subroutine

This is legal Fortran, equivalent to your routine, and should be much faster.

Confirmed, BTW.


-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-11-01 16:21:21
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (4 preceding siblings ...)
  2009-11-01 16:21 ` tkoenig at gcc dot gnu dot org
@ 2009-11-01 17:36 ` tkoenig at gcc dot gnu dot org
  2010-05-07 21:02 ` dfranke at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2009-11-01 17:36 UTC (permalink / raw)
  To: gcc-bugs



-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (5 preceding siblings ...)
  2009-11-01 17:36 ` tkoenig at gcc dot gnu dot org
@ 2010-05-07 21:02 ` dfranke at gcc dot gnu dot org
  2010-06-21 15:02 ` burnus at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: dfranke at gcc dot gnu dot org @ 2010-05-07 21:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from dfranke at gcc dot gnu dot org  2010-05-07 21:01 -------
See also PR40598.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (6 preceding siblings ...)
  2010-05-07 21:02 ` dfranke at gcc dot gnu dot org
@ 2010-06-21 15:02 ` burnus at gcc dot gnu dot org
  2010-06-21 15:22 ` burnus at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: burnus at gcc dot gnu dot org @ 2010-06-21 15:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from burnus at gcc dot gnu dot org  2010-06-21 15:02 -------
(In reply to comment #1)
> Just for reference, the difference in time between the two variants is truly
> impressive. About a factor of 11 with gcc 4.4 and 8 with gcc 4.5.

I get for the example the following values, note especially the newly added
CONTIGUOUS result:

  0.31601900     - assumed-shape
  0.21601403     - assumed-shape CONTIGUOUS 
  0.21601295      - explicit size (n,n,...)
  0.20801300      - explicit size (10,10,...)
  0.21601403      - explicit size (10*10*...)

Ignoring some measuring noise, assumed-shape is 46% (-O0) to 25% (-O3) slower
than explicit  size, but using the CONTIGUOUS attribute, the performance is
re-gained. I cannot reproduce the factor of 10 results, however. What surprises
me a bit is that -flto -fwhole-program does not reduce the speed penalty of
assumed-shape arrays.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (7 preceding siblings ...)
  2010-06-21 15:02 ` burnus at gcc dot gnu dot org
@ 2010-06-21 15:22 ` burnus at gcc dot gnu dot org
  2010-06-21 15:49 ` jv244 at cam dot ac dot uk
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: burnus at gcc dot gnu dot org @ 2010-06-21 15:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from burnus at gcc dot gnu dot org  2010-06-21 15:22 -------
(In reply to comment #7)
> I get for the example the following values, note especially the newly added
> CONTIGUOUS result:

For the test case, see attachment 20966 at PR 44612; that PR I have filled
because GCC does not optimize away the loops, which only set but never read the
value from the variable. (Ifort does this optimization.) Additionally, if one
prints the variable, ifort is twice as fast. For curiosity: Using NAG, the
timing is 0.6900000 vs. 1.2200000, i.e. the assumed-shape version is actually
faster [though, its overall the performance is poor].


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (8 preceding siblings ...)
  2010-06-21 15:22 ` burnus at gcc dot gnu dot org
@ 2010-06-21 15:49 ` jv244 at cam dot ac dot uk
  2010-06-21 17:00 ` burnus at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: jv244 at cam dot ac dot uk @ 2010-06-21 15:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jv244 at cam dot ac dot uk  2010-06-21 15:49 -------
(In reply to comment #7)

> I cannot reproduce the factor of 10 results, however. 

Here this still is the case (so might depend on the precise architecture):

/data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951
test.f90 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -quiet -dumpbase
test.f90 -auxbase test -O3 -version -fintrinsic-modules-path
/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/finclude
-o /tmp/ccXsKXnD.s

> ./a.out
  0.10800600
   1.0520660


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (9 preceding siblings ...)
  2010-06-21 15:49 ` jv244 at cam dot ac dot uk
@ 2010-06-21 17:00 ` burnus at gcc dot gnu dot org
  2010-06-21 17:44 ` jakub at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: burnus at gcc dot gnu dot org @ 2010-06-21 17:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from burnus at gcc dot gnu dot org  2010-06-21 17:00 -------
(In reply to comment #9)
> (In reply to comment #7)
> > I cannot reproduce the factor of 10 results, however. 
> Here this still is the case (so might depend on the precise architecture):

OK, I was using -fwhole-file out of habit - thus the difference is that small
(all optimization levels, including -O0). Otherwise, I also get the same
factor-of-10 difference. If one splits it in two files, one needs to use "-O3
-flto" to get a fast program.

For comparison, using two files, ifort also shows a factor of 2 to 5 difference
(and is at -O0 ten times slower than gfortran; at -O2 it is twice as fast as
gfortran).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (10 preceding siblings ...)
  2010-06-21 17:00 ` burnus at gcc dot gnu dot org
@ 2010-06-21 17:44 ` jakub at gcc dot gnu dot org
  2010-06-22 14:42 ` burnus at gcc dot gnu dot org
  2010-06-22 15:25 ` jakub at gcc dot gnu dot org
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-06-21 17:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from jakub at gcc dot gnu dot org  2010-06-21 17:43 -------
What's the reason why gfc_trans_zero_assign insists that len is INTEGER_CST?
At least if it is contiguous (and not assumed size), why can't memset be used
even for non-constant sizes?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (11 preceding siblings ...)
  2010-06-21 17:44 ` jakub at gcc dot gnu dot org
@ 2010-06-22 14:42 ` burnus at gcc dot gnu dot org
  2010-06-22 15:25 ` jakub at gcc dot gnu dot org
  13 siblings, 0 replies; 15+ messages in thread
From: burnus at gcc dot gnu dot org @ 2010-06-22 14:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from burnus at gcc dot gnu dot org  2010-06-22 14:42 -------
(In reply to comment #11)
> What's the reason why gfc_trans_zero_assign insists that len is INTEGER_CST?
> At least if it is contiguous (and not assumed size), why can't memset be used
> even for non-constant sizes?

Suggested by Jakub: 

 -  if (!len || TREE_CODE (len) != INTEGER_CST)
 +  if (!len
 +      || (TREE_CODE (len) != INTEGER_CST
 +          && !gfc_is_simply_contiguous (expr, false)))

Though, one needs to be careful that one zeros the right spot (maybe already
taken care of):
  a(5:) = 0

Additionally, one could do the same for arrays which are contiguous but have a
descriptor - for which one has to calculate the size manually (as "len" ==
NULL). At least after memset/memcpy middle-end fixes, the change should be
profitable.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug fortran/41137] inefficient zeroing of an array
  2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
                   ` (12 preceding siblings ...)
  2010-06-22 14:42 ` burnus at gcc dot gnu dot org
@ 2010-06-22 15:25 ` jakub at gcc dot gnu dot org
  13 siblings, 0 replies; 15+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-06-22 15:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jakub at gcc dot gnu dot org  2010-06-22 15:25 -------
Well, a(5:)=0.0 doesn't satisfy copyable_array_p, so gfc_trans_zero_assign
isn't called at all.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-06-22 15:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-21  6:15 [Bug fortran/41137] New: inefficient zeroing of an array jv244 at cam dot ac dot uk
2009-08-21  7:02 ` [Bug fortran/41137] " jv244 at cam dot ac dot uk
2009-08-21  7:40 ` dfranke at gcc dot gnu dot org
2009-08-21  8:29 ` jv244 at cam dot ac dot uk
2009-08-24 20:06 ` jv244 at cam dot ac dot uk
2009-11-01 16:21 ` tkoenig at gcc dot gnu dot org
2009-11-01 17:36 ` tkoenig at gcc dot gnu dot org
2010-05-07 21:02 ` dfranke at gcc dot gnu dot org
2010-06-21 15:02 ` burnus at gcc dot gnu dot org
2010-06-21 15:22 ` burnus at gcc dot gnu dot org
2010-06-21 15:49 ` jv244 at cam dot ac dot uk
2010-06-21 17:00 ` burnus at gcc dot gnu dot org
2010-06-21 17:44 ` jakub at gcc dot gnu dot org
2010-06-22 14:42 ` burnus at gcc dot gnu dot org
2010-06-22 15:25 ` jakub at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).