[Bug rtl-optimization/17838] spills are not re-used

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
@ 2011-11-08 14:29 ` tstdenis at elliptictech dot com
  2011-11-08 21:25 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-08 14:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #6 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-08 14:17:55 UTC ---
Created attachment 25751
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25751
Another test case

Another example using 

gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 

The function when compiled with "-m32 -O3" uses way more stack than it should. 
It's like it's putting the fp_int.dp[] array on the stack...

I can confirm this is a problem on 32/64 and ARM as well.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
  2011-11-08 14:29 ` [Bug rtl-optimization/17838] spills are not re-used tstdenis at elliptictech dot com
@ 2011-11-08 21:25 ` pinskia at gcc dot gnu.org
  2011-11-10 19:32 ` tstdenis at elliptictech dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-11-08 21:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-11-08 20:24:01 UTC ---
(In reply to comment #6)
> Created attachment 25751 [details]
> Another test case
> 
> Another example using 
> 
> gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 
> 
> The function when compiled with "-m32 -O3" uses way more stack than it should. 
> It's like it's putting the fp_int.dp[] array on the stack...
> 
> I can confirm this is a problem on 32/64 and ARM as well.

That is a different issue dealing with memcpy works (or does not get
optimized).  File a different bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
  2011-11-08 14:29 ` [Bug rtl-optimization/17838] spills are not re-used tstdenis at elliptictech dot com
  2011-11-08 21:25 ` pinskia at gcc dot gnu.org
@ 2011-11-10 19:32 ` tstdenis at elliptictech dot com
  2011-11-10 19:38 ` tstdenis at elliptictech dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-10 19:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

Tom St Denis <tstdenis at elliptictech dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tstdenis at elliptictech
                   |                            |dot com

--- Comment #8 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-10 19:27:23 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > Created attachment 25751 [details]
> > Another test case
> > 
> > Another example using 
> > 
> > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 
> > 
> > The function when compiled with "-m32 -O3" uses way more stack than it should. 
> > It's like it's putting the fp_int.dp[] array on the stack...
> > 
> > I can confirm this is a problem on 32/64 and ARM as well.
> 
> That is a different issue dealing with memcpy works (or does not get
> optimized).  File a different bug.

How the hell am I supposed to know that?  Maybe the GCC team should clean up
their stack spills once and for all.  I'm sure $OBSCURE_PLATFORM_NAME or
$WONDEROUS_NEW_TREE_REPRESENTATION can wait.

I actually have a different routine [fp_mul_comba_small_set.c] from
TomsFastMath that does use memcpy, has the same style of unrolled multipliers,
and does not have this problem.

I can file a new bug if you want, but as a user of GCC I'm not meant to
understand the ins-and-outs of the depths of the compiler.  I actually did
search for stack-waste instead of just blindly filing a new report...

/rant


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-11-10 19:32 ` tstdenis at elliptictech dot com
@ 2011-11-10 19:38 ` tstdenis at elliptictech dot com
  2011-11-15 14:29 ` tstdenis at elliptictech dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-10 19:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #9 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-10 19:28:33 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > Created attachment 25751 [details]
> > Another test case
> > 
> > Another example using 
> > 
> > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC) 
> > 
> > The function when compiled with "-m32 -O3" uses way more stack than it should. 
> > It's like it's putting the fp_int.dp[] array on the stack...
> > 
> > I can confirm this is a problem on 32/64 and ARM as well.
> 
> That is a different issue dealing with memcpy works (or does not get
> optimized).  File a different bug.

How the hell am I supposed to know that?  Maybe the GCC team should clean up
their stack spills once and for all.  I'm sure $OBSCURE_PLATFORM_NAME or
$WONDEROUS_NEW_TREE_REPRESENTATION can wait.

I actually have a different routine [fp_mul_comba_small_set.c] from
TomsFastMath that does use memcpy, has the same style of unrolled multipliers,
and does not have this problem.

I can file a new bug if you want, but as a user of GCC I'm not meant to
understand the ins-and-outs of the depths of the compiler.  I actually did
search for stack-waste instead of just blindly filing a new report...

/rant

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2011-11-10 19:38 ` tstdenis at elliptictech dot com
@ 2011-11-15 14:29 ` tstdenis at elliptictech dot com
  2011-11-16  8:26 ` ebotcazou at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-15 14:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #10 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-15 14:20:07 UTC ---
Another update ... We've just profiled our crypto library and across the board
[cipher, hashes, PK functions like RSA/ECC] GCC is a complete loser against
ARMcc [r713].  And it's not that GCC is faster and that's at least a price
worth paying... In most cases ARMcc and GCC are dead even [arm faster for some
things, slower for others].

This is with gcc 4.4.5 and 4.5.1 on an ARM.

This is not a new bug.  This is not a "misfeature."  This is actually something
worth working on.  This was filed in 2004 and hasn't been addressed since ...
What is the hold up?  If GCC is to be used in embedded platforms it can't go
around taking 150% of the stack space as it's competitors...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2011-11-15 14:29 ` tstdenis at elliptictech dot com
@ 2011-11-16  8:26 ` ebotcazou at gcc dot gnu.org
  2013-04-20 15:47 ` bpringlemeir at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-11-16  8:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

Eric Botcazou <ebotcazou at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ebotcazou at gcc dot
                   |                            |gnu.org

--- Comment #11 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-11-16 08:13:48 UTC ---
> This is not a new bug.  This is not a "misfeature."  This is actually something
> worth working on.  This was filed in 2004 and hasn't been addressed since ...
> What is the hold up?  If GCC is to be used in embedded platforms it can't go
> around taking 150% of the stack space as it's competitors...

GCC is a volunteer project.  If you think that it can be improved, you're
welcome to implement enhancements or hire/sponsor someone to do the work for
you.

Note that using -O3 for embedded targets isn't recommended; use -Os instead.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2011-11-16  8:26 ` ebotcazou at gcc dot gnu.org
@ 2013-04-20 15:47 ` bpringlemeir at gmail dot com
  2013-04-21  9:59 ` ebotcazou at gcc dot gnu.org
  2013-04-21 20:37 ` dean at arctic dot org
  8 siblings, 0 replies; 12+ messages in thread
From: bpringlemeir at gmail dot com @ 2013-04-20 15:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

Bill Pringlemeir <bpringlemeir at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bpringlemeir at gmail dot
                   |                            |com

--- Comment #12 from Bill Pringlemeir <bpringlemeir at gmail dot com> 2013-04-20 15:47:36 UTC ---
(In reply to comment #11)

> Note that using -O3 for embedded targets isn't recommended; use -Os instead.

In this case the code is computationally intensive.  It doesn't make sense to
compile with '-Os' for cryptographic algorithms.

However, I think that a performance increase can be achieved by working with
gcc.  I have worked on an ARM project where two different developers choose
'TomsFastMath' and 'libgcrypt' as a crypto-base.  It seems that 'libgcrypt' was
performing better on the ARM.  I believe this is because it used 'gcc' inline
assembler to map op-codes not available in 'C'.  Gcc's inline assembler is very
nice as you don't have to do register allocation and all the other nice things
that 'gcc' does for us.

http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=mpi/longlong.h;hb=HEAD

The use of the carry bit for multi-precision arithmetic gives a large advantage
for algorithms such as RSA cites as being worse with ARMcc versus 'gcc' on the
ARM.

For the original issue which the bug was filed (x86 sha), I can understand your
frustration.  I also tried to expand the SHA to handle 64 bits at a time as you
have done with MMX ('__builtin_ia32_pslld', etc).  It is difficult to get this
to work with 'gcc'; I only had a 30% speed up versus 32bit versions.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2013-04-20 15:47 ` bpringlemeir at gmail dot com
@ 2013-04-21  9:59 ` ebotcazou at gcc dot gnu.org
  2013-04-21 20:37 ` dean at arctic dot org
  8 siblings, 0 replies; 12+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2013-04-21  9:59 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

--- Comment #13 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-04-21 09:59:26 UTC ---
> In this case the code is computationally intensive.  It doesn't make sense to
> compile with '-Os' for cryptographic algorithms.

Huh?  Of course it makes sense to compile with -Os if you have specific code
size constraints and it's quite easy to have code compiled at -O3 running
slower than compiled at -O2/-Os on (very) embedded CPUs.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2013-04-21  9:59 ` ebotcazou at gcc dot gnu.org
@ 2013-04-21 20:37 ` dean at arctic dot org
  8 siblings, 0 replies; 12+ messages in thread
From: dean at arctic dot org @ 2013-04-21 20:37 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838

dean at arctic dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME

--- Comment #14 from dean at arctic dot org 2013-04-21 20:36:55 UTC ---
i dug out the old c code for my original bug report -- it's fine with a 4.7.x
prerelease.  i didn't bother narrowing down to where the spills went away.

-dean


^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <bug-17838-5748@http.gcc.gnu.org/bugzilla/>]

* [Bug rtl-optimization/17838] spills are not re-used
       [not found] <bug-17838-5748@http.gcc.gnu.org/bugzilla/>
@ 2009-04-22 21:17 ` pinskia at gcc dot gnu dot org
  0 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-04-22 21:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2009-04-22 21:17 -------
I think this was fixed for GCC 4.4.0 with the IRA but I can't test right now
since the preprocessed source uses builtin functions which are no longer exist
in 4.4.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/17838] New: spills are not re-used
@ 2004-10-05  8:56 dean-gcc at arctic dot org
  2004-10-05 11:27 ` [Bug rtl-optimization/17838] " pinskia at gcc dot gnu dot org
  2004-10-05 13:08 ` bangerth at dealii dot org
  0 siblings, 2 replies; 12+ messages in thread
From: dean-gcc at arctic dot org @ 2004-10-05  8:56 UTC (permalink / raw)
  To: gcc-bugs

the code produced by this has a 1036 byte stack frame... hand inspection of the
assembly finds many stack spills which are never re-used after the value is
dead.  for example look at 172(%esp), 748(%esp), 628(%esp), 588(%esp), ...

.i and .s will be in attachments.

-dean

% /home/dean/gcc/bin/gcc -v -save-temps -std=c99 -O3 -g -Wall -march=pentium4
-fomit-frame-pointer -c -o sha256.o sha256.c
Reading specs from /home/dean/gcc/lib/gcc/i686-pc-linux-gnu/4.0.0/specs
Configured with: ../gcc/configure --prefix=/home/dean/gcc
--with-gcc-version-trigger=/home/dean/gcc/gcc/gcc/version.c --enable-languages=c
Thread model: posix
gcc version 4.0.0 20041004 (experimental)
 /home/dean/gcc/libexec/gcc/i686-pc-linux-gnu/4.0.0/cc1 -E -quiet -v sha256.c
-march=pentium4 -std=c99 -Wall -fomit-frame-pointer -fworking-directory -O3
-fpch-preprocess -o sha256.i
ignoring nonexistent directory
"/home/dean/gcc/lib/gcc/i686-pc-linux-gnu/4.0.0/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /home/dean/gcc/include
 /home/dean/gcc/lib/gcc/i686-pc-linux-gnu/4.0.0/include
 /usr/include
End of search list.
 /home/dean/gcc/libexec/gcc/i686-pc-linux-gnu/4.0.0/cc1 -fpreprocessed sha256.i
-quiet -dumpbase sha256.c -march=pentium4 -auxbase-strip sha256.o -g -O3 -Wall
-std=c99 -version -fomit-frame-pointer -o sha256.s
GNU C version 4.0.0 20041004 (experimental) (i686-pc-linux-gnu)
        compiled by GNU C version 3.3.4 (Debian 1:3.3.4-9).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
 as -V -Qy -o sha256.o sha256.s
GNU assembler version 2.15 (i386-linux) using BFD version 2.15

-- 
           Summary: spills are not re-used
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dean-gcc at arctic dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
  2004-10-05  8:56 [Bug target/17838] New: " dean-gcc at arctic dot org
@ 2004-10-05 11:27 ` pinskia at gcc dot gnu dot org
  2004-10-05 13:08 ` bangerth at dealii dot org
  1 sibling, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-05 11:27 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-05 11:27 -------
Confirmed, we know about this problem already, I thought there was a bug for this but I could be 
wrong.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
          Component|target                      |rtl-optimization
     Ever Confirmed|                            |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2004-10-05 11:27:34
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/17838] spills are not re-used
  2004-10-05  8:56 [Bug target/17838] New: " dean-gcc at arctic dot org
  2004-10-05 11:27 ` [Bug rtl-optimization/17838] " pinskia at gcc dot gnu dot org
@ 2004-10-05 13:08 ` bangerth at dealii dot org
  1 sibling, 0 replies; 12+ messages in thread
From: bangerth at dealii dot org @ 2004-10-05 13:08 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bangerth at dealii dot org  2004-10-05 13:08 -------
There are probably a number of PRs about excessive stack usage. 
W. 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-04-21 20:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
2011-11-08 14:29 ` [Bug rtl-optimization/17838] spills are not re-used tstdenis at elliptictech dot com
2011-11-08 21:25 ` pinskia at gcc dot gnu.org
2011-11-10 19:32 ` tstdenis at elliptictech dot com
2011-11-10 19:38 ` tstdenis at elliptictech dot com
2011-11-15 14:29 ` tstdenis at elliptictech dot com
2011-11-16  8:26 ` ebotcazou at gcc dot gnu.org
2013-04-20 15:47 ` bpringlemeir at gmail dot com
2013-04-21  9:59 ` ebotcazou at gcc dot gnu.org
2013-04-21 20:37 ` dean at arctic dot org
     [not found] <bug-17838-5748@http.gcc.gnu.org/bugzilla/>
2009-04-22 21:17 ` pinskia at gcc dot gnu dot org
2004-10-05  8:56 [Bug target/17838] New: " dean-gcc at arctic dot org
2004-10-05 11:27 ` [Bug rtl-optimization/17838] " pinskia at gcc dot gnu dot org
2004-10-05 13:08 ` bangerth at dealii dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).