* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
@ 2011-11-08 14:29 ` tstdenis at elliptictech dot com
2011-11-08 21:25 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-08 14:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
--- Comment #6 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-08 14:17:55 UTC ---
Created attachment 25751
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25751
Another test case
Another example using
gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC)
The function when compiled with "-m32 -O3" uses way more stack than it should.
It's like it's putting the fp_int.dp[] array on the stack...
I can confirm this is a problem on 32/64 and ARM as well.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
2011-11-08 14:29 ` [Bug rtl-optimization/17838] spills are not re-used tstdenis at elliptictech dot com
@ 2011-11-08 21:25 ` pinskia at gcc dot gnu.org
2011-11-10 19:32 ` tstdenis at elliptictech dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-11-08 21:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-11-08 20:24:01 UTC ---
(In reply to comment #6)
> Created attachment 25751 [details]
> Another test case
>
> Another example using
>
> gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC)
>
> The function when compiled with "-m32 -O3" uses way more stack than it should.
> It's like it's putting the fp_int.dp[] array on the stack...
>
> I can confirm this is a problem on 32/64 and ARM as well.
That is a different issue dealing with memcpy works (or does not get
optimized). File a different bug.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
2011-11-08 14:29 ` [Bug rtl-optimization/17838] spills are not re-used tstdenis at elliptictech dot com
2011-11-08 21:25 ` pinskia at gcc dot gnu.org
@ 2011-11-10 19:32 ` tstdenis at elliptictech dot com
2011-11-10 19:38 ` tstdenis at elliptictech dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-10 19:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
Tom St Denis <tstdenis at elliptictech dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tstdenis at elliptictech
| |dot com
--- Comment #8 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-10 19:27:23 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > Created attachment 25751 [details]
> > Another test case
> >
> > Another example using
> >
> > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC)
> >
> > The function when compiled with "-m32 -O3" uses way more stack than it should.
> > It's like it's putting the fp_int.dp[] array on the stack...
> >
> > I can confirm this is a problem on 32/64 and ARM as well.
>
> That is a different issue dealing with memcpy works (or does not get
> optimized). File a different bug.
How the hell am I supposed to know that? Maybe the GCC team should clean up
their stack spills once and for all. I'm sure $OBSCURE_PLATFORM_NAME or
$WONDEROUS_NEW_TREE_REPRESENTATION can wait.
I actually have a different routine [fp_mul_comba_small_set.c] from
TomsFastMath that does use memcpy, has the same style of unrolled multipliers,
and does not have this problem.
I can file a new bug if you want, but as a user of GCC I'm not meant to
understand the ins-and-outs of the depths of the compiler. I actually did
search for stack-waste instead of just blindly filing a new report...
/rant
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2011-11-10 19:32 ` tstdenis at elliptictech dot com
@ 2011-11-10 19:38 ` tstdenis at elliptictech dot com
2011-11-15 14:29 ` tstdenis at elliptictech dot com
` (4 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-10 19:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
--- Comment #9 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-10 19:28:33 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > Created attachment 25751 [details]
> > Another test case
> >
> > Another example using
> >
> > gcc version 4.6.1 20110908 (Red Hat 4.6.1-9) (GCC)
> >
> > The function when compiled with "-m32 -O3" uses way more stack than it should.
> > It's like it's putting the fp_int.dp[] array on the stack...
> >
> > I can confirm this is a problem on 32/64 and ARM as well.
>
> That is a different issue dealing with memcpy works (or does not get
> optimized). File a different bug.
How the hell am I supposed to know that? Maybe the GCC team should clean up
their stack spills once and for all. I'm sure $OBSCURE_PLATFORM_NAME or
$WONDEROUS_NEW_TREE_REPRESENTATION can wait.
I actually have a different routine [fp_mul_comba_small_set.c] from
TomsFastMath that does use memcpy, has the same style of unrolled multipliers,
and does not have this problem.
I can file a new bug if you want, but as a user of GCC I'm not meant to
understand the ins-and-outs of the depths of the compiler. I actually did
search for stack-waste instead of just blindly filing a new report...
/rant
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2011-11-10 19:38 ` tstdenis at elliptictech dot com
@ 2011-11-15 14:29 ` tstdenis at elliptictech dot com
2011-11-16 8:26 ` ebotcazou at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: tstdenis at elliptictech dot com @ 2011-11-15 14:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
--- Comment #10 from Tom St Denis <tstdenis at elliptictech dot com> 2011-11-15 14:20:07 UTC ---
Another update ... We've just profiled our crypto library and across the board
[cipher, hashes, PK functions like RSA/ECC] GCC is a complete loser against
ARMcc [r713]. And it's not that GCC is faster and that's at least a price
worth paying... In most cases ARMcc and GCC are dead even [arm faster for some
things, slower for others].
This is with gcc 4.4.5 and 4.5.1 on an ARM.
This is not a new bug. This is not a "misfeature." This is actually something
worth working on. This was filed in 2004 and hasn't been addressed since ...
What is the hold up? If GCC is to be used in embedded platforms it can't go
around taking 150% of the stack space as it's competitors...
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2011-11-15 14:29 ` tstdenis at elliptictech dot com
@ 2011-11-16 8:26 ` ebotcazou at gcc dot gnu.org
2013-04-20 15:47 ` bpringlemeir at gmail dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-11-16 8:26 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
Eric Botcazou <ebotcazou at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ebotcazou at gcc dot
| |gnu.org
--- Comment #11 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-11-16 08:13:48 UTC ---
> This is not a new bug. This is not a "misfeature." This is actually something
> worth working on. This was filed in 2004 and hasn't been addressed since ...
> What is the hold up? If GCC is to be used in embedded platforms it can't go
> around taking 150% of the stack space as it's competitors...
GCC is a volunteer project. If you think that it can be improved, you're
welcome to implement enhancements or hire/sponsor someone to do the work for
you.
Note that using -O3 for embedded targets isn't recommended; use -Os instead.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2011-11-16 8:26 ` ebotcazou at gcc dot gnu.org
@ 2013-04-20 15:47 ` bpringlemeir at gmail dot com
2013-04-21 9:59 ` ebotcazou at gcc dot gnu.org
2013-04-21 20:37 ` dean at arctic dot org
8 siblings, 0 replies; 12+ messages in thread
From: bpringlemeir at gmail dot com @ 2013-04-20 15:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
Bill Pringlemeir <bpringlemeir at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bpringlemeir at gmail dot
| |com
--- Comment #12 from Bill Pringlemeir <bpringlemeir at gmail dot com> 2013-04-20 15:47:36 UTC ---
(In reply to comment #11)
> Note that using -O3 for embedded targets isn't recommended; use -Os instead.
In this case the code is computationally intensive. It doesn't make sense to
compile with '-Os' for cryptographic algorithms.
However, I think that a performance increase can be achieved by working with
gcc. I have worked on an ARM project where two different developers choose
'TomsFastMath' and 'libgcrypt' as a crypto-base. It seems that 'libgcrypt' was
performing better on the ARM. I believe this is because it used 'gcc' inline
assembler to map op-codes not available in 'C'. Gcc's inline assembler is very
nice as you don't have to do register allocation and all the other nice things
that 'gcc' does for us.
http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=mpi/longlong.h;hb=HEAD
The use of the carry bit for multi-precision arithmetic gives a large advantage
for algorithms such as RSA cites as being worse with ARMcc versus 'gcc' on the
ARM.
For the original issue which the bug was filed (x86 sha), I can understand your
frustration. I also tried to expand the SHA to handle 64 bits at a time as you
have done with MMX ('__builtin_ia32_pslld', etc). It is difficult to get this
to work with 'gcc'; I only had a 30% speed up versus 32bit versions.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2013-04-20 15:47 ` bpringlemeir at gmail dot com
@ 2013-04-21 9:59 ` ebotcazou at gcc dot gnu.org
2013-04-21 20:37 ` dean at arctic dot org
8 siblings, 0 replies; 12+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2013-04-21 9:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
--- Comment #13 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2013-04-21 09:59:26 UTC ---
> In this case the code is computationally intensive. It doesn't make sense to
> compile with '-Os' for cryptographic algorithms.
Huh? Of course it makes sense to compile with -Os if you have specific code
size constraints and it's quite easy to have code compiled at -O3 running
slower than compiled at -O2/-Os on (very) embedded CPUs.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug rtl-optimization/17838] spills are not re-used
[not found] <bug-17838-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2013-04-21 9:59 ` ebotcazou at gcc dot gnu.org
@ 2013-04-21 20:37 ` dean at arctic dot org
8 siblings, 0 replies; 12+ messages in thread
From: dean at arctic dot org @ 2013-04-21 20:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17838
dean at arctic dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WORKSFORME
--- Comment #14 from dean at arctic dot org 2013-04-21 20:36:55 UTC ---
i dug out the old c code for my original bug report -- it's fine with a 4.7.x
prerelease. i didn't bother narrowing down to where the spills went away.
-dean
^ permalink raw reply [flat|nested] 12+ messages in thread