[Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5)
@ 2011-11-08  0:43 solar-gcc at openwall dot com
  2011-11-08  0:57 ` [Bug middle-end/51017] " solar-gcc at openwall dot com
                   ` (29 more replies)
  0 siblings, 30 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2011-11-08  0:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

             Bug #: 51017
           Summary: GCC 4.6 performance regression (vs. 4.4/4.5)
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: solar-gcc@openwall.com

GCC 4.6 happens to produce approx. 25% slower code on at least x86_64 than 4.4
and 4.5 did for John the Ripper 1.7.8's bitslice DES implementation.  To
reproduce, download
http://download.openwall.net/pub/projects/john/1.7.8/john-1.7.8.tar.bz2 and
build it with "make linux-x86-64" (will use SSE2 intrinsics), "make
linux-x86-64-avx" (will use AVX instead), or "make generic" (won't use any
intrinsics).  Then run "../run/john -te=1".  With GCC 4.4 and 4.5, the
"Traditional DES" benchmark reports a speed of around 2500K c/s for the
"linux-x86-64" (SSE2) build on a 2.33 GHz Core 2 (this is using one core). 
With 4.6, this drops to about 1850K c/s.  Similar slowdown was observed for AVX
on Core i7-2600K when going from GCC 4.5.x to 4.6.x.  And it is reproducible
for the without-intrinsics code as well, although that's of less practical
importance (the intrinsics are so much faster).  Similar slowdown with GCC 4.6
was reported by a Mac OS X user.  It was also spotted by Phoronix in their
recently published C compiler benchmarks, but misinterpreted as a GCC vs. clang
difference.

Adding "-Os" to OPT_INLINE in the Makefile partially corrects the performance
(to something like 2000K c/s - still 20% slower than GCC 4.4/4.5's).  Applying
the OpenMP patch from
http://download.openwall.net/pub/projects/john/1.7.8/john-1.7.8-omp-des-4.diff.gz
and then running with OMP_NUM_THREADS=1 (for a fair comparison) corrects the
performance almost fully.  Keeping the patch applied, but removing -fopenmp
still keeps the performance at a good level.  So it's some change made to the
source code by this patch that mitigates the GCC regression.  Similar behavior
is seen with current CVS version of John the Ripper, even though it has OpenMP
support for DES heavily revised and integrated into the tree.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
@ 2011-11-08  0:57 ` solar-gcc at openwall dot com
  2011-11-08  1:05 ` solar-gcc at openwall dot com
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2011-11-08  0:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #1 from Alexander Peslyak <solar-gcc at openwall dot com> 2011-11-08 00:47:49 UTC ---
(In reply to comment #0)
> [...] Similar behavior
> is seen with current CVS version of John the Ripper, even though it has OpenMP
> support for DES heavily revised and integrated into the tree.

I forgot to note that in the CVS version, I changed the default for non-OpenMP
builds to use the supplied SSE2 assembly code, which hides this GCC issue for
SSE2 non-OpenMP builds.  The C code may be re-enabled in x86-64.h, or
alternatively an -avx or generic build may be used.  (Yes, -avx is still fully
affected by the GCC regression even in the latest version of JtR code.)

But it is probably simpler to use the 1.7.8 release to reproduce this bug
anyway.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
  2011-11-08  0:57 ` [Bug middle-end/51017] " solar-gcc at openwall dot com
@ 2011-11-08  1:05 ` solar-gcc at openwall dot com
  2011-12-15  0:34 ` pinskia at gcc dot gnu.org
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2011-11-08  1:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #2 from Alexander Peslyak <solar-gcc at openwall dot com> 2011-11-08 00:56:47 UTC ---
The affected code is in DES_bs_b.c: DES_bs_crypt_25().  (Sorry, I should have
mentioned that right away.)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
  2011-11-08  0:57 ` [Bug middle-end/51017] " solar-gcc at openwall dot com
  2011-11-08  1:05 ` solar-gcc at openwall dot com
@ 2011-12-15  0:34 ` pinskia at gcc dot gnu.org
  2012-01-03  4:46 ` solar-gcc at openwall dot com
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-12-15  0:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-12-15 00:28:51 UTC ---
It might be interesting to get numbers for the trunk.  There have been some
register allocator fixes which might have improved this.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (2 preceding siblings ...)
  2011-12-15  0:34 ` pinskia at gcc dot gnu.org
@ 2012-01-03  4:46 ` solar-gcc at openwall dot com
  2012-01-04 19:39 ` solar-gcc at openwall dot com
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2012-01-03  4:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #4 from Alexander Peslyak <solar-gcc at openwall dot com> 2012-01-03 04:45:43 UTC ---
(In reply to comment #3)
> It might be interesting to get numbers for the trunk.  There have been some
> register allocator fixes which might have improved this.

I've just tested the gcc-4.7-20111231 snapshot vs. 4.6.2 release.  There's no
improvement as it relates to this issue: I am getting the same poor performance
(a lot worse than for 4.5).  This is for generating x86-64 code with SSE2
intrinsics, benchmarking the resulting code on a Core 2'ish CPU (I used Xeon
E5420 this time).


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (3 preceding siblings ...)
  2012-01-03  4:46 ` solar-gcc at openwall dot com
@ 2012-01-04 19:39 ` solar-gcc at openwall dot com
  2012-01-04 22:43 ` jakub at gcc dot gnu.org
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2012-01-04 19:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #5 from Alexander Peslyak <solar-gcc at openwall dot com> 2012-01-04 19:39:26 UTC ---
I wrote and ran some scripts to test many versions/snapshots of gcc.  It turns
out that 4.6-20100703 (oldest 4.6 snapshot available for FTP) was already
affected by this regression, whereas 4.5-20111229 and 4.4-20120103 are not
affected (as expected).  Also, it turns out that there was a smaller regression
at this same benchmark between 4.3 and 4.4.  That is, 4.3 produces the fastest
code of all gcc versions I tested.  Here are some numbers:

4.3.5 20100502 - 2950K c/s, 28229 bytes
4.3.6 20110626 - 2950K c/s, 28229 bytes
4.4.5 20100504 - 2697K c/s, 29764 bytes
4.4.7 20120103 - 2691K c/s, 29316 bytes
4.5.1 20100603 - 2729K c/s, 29203 bytes
4.5.4 20111229 - 2710K c/s, 29203 bytes
4.6.0 20100703 - 2133K c/s, 29911 bytes
4.6.0 20100807 - 2119K c/s, 29940 bytes
4.6.0 20100904 - 2142K c/s, 29848 bytes
4.6.0 20101106 - 2124K c/s, 29848 bytes
4.6.0 20101204 - 2114K c/s, 29624 bytes
4.6.3 20111230 - 2116K c/s, 29624 bytes
4.7.0 20111231 - 2147K c/s, 29692 bytes

These are for JtR 1.7.9 with DES_BS_ASM set to 0 on line 157 of x86-64.h (to
disable this version's workaround for this GCC 4.6 regression), built with
"make linux-x86-64" and run on one core in a Xeon E5420 2.5 GHz (the system is
otherwise idle).  The code sizes given are for .text of DES_bs_b.o (which
contains three similar functions, of which one is in use by this benchmark -
that is, the code size in the loop is about 10 KB).

As you can see, 4.3 generated code that was both significantly faster and a bit
smaller than all other versions'.  In 4.4, the speed decreased by 8.5% and code
size increased by 4.4%.  4.5 corrected this to a very limited extent - still 8%
slower and 3.5% larger than 4.3's.  4.6 brought a huge performance drop and a
slight code size increase.  4.7.0 20111231's code is still 27% slower than
4.3's.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (4 preceding siblings ...)
  2012-01-04 19:39 ` solar-gcc at openwall dot com
@ 2012-01-04 22:43 ` jakub at gcc dot gnu.org
  2012-01-04 23:00 ` solar-gcc at openwall dot com
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-01-04 22:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-01-04 22:42:37 UTC ---
The big performance drop seems to be from r143756 to r143757, i.e. RA changes:
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=143757
CCing Vladimir.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (5 preceding siblings ...)
  2012-01-04 22:43 ` jakub at gcc dot gnu.org
@ 2012-01-04 23:00 ` solar-gcc at openwall dot com
  2015-02-09  0:12 ` pinskia at gcc dot gnu.org
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2012-01-04 23:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #7 from Alexander Peslyak <solar-gcc at openwall dot com> 2012-01-04 23:00:24 UTC ---
(I ran the tests below and wrote this comment before seeing Jakub's.  Then I
thought I'd post it anyway.)

Here are some numbers for gcc releases:

4.0.0 - 383K c/s, 71879 bytes (this old version of gcc generates function calls
for SSE2 intrinsics)
4.1.0 - 2959K c/s, 28182 bytes
4.1.2 - 2964K c/s, 28365 bytes
4.2.0 - 2968K c/s, 28363 bytes
4.2.4 - 2971K c/s, 28382 bytes
4.3.0 - 2971K c/s, 28229 bytes
4.3.6 - 2959K c/s, 28229 bytes
4.4.0 - 2625K c/s, 29770 bytes
4.4.6 - 2695K c/s, 29316 bytes
4.5.0 - 2729K c/s, 29203 bytes
4.5.3 - 2716K c/s, 29203 bytes
4.6.0 - 2111K c/s, 29624 bytes
4.6.2 - 2123K c/s, 29624 bytes

So thing were really good for versions 4.1.0 through 4.3.6, but started to get
worse afterwards and got really bad with 4.6.

To be fair, things are very different for some other hash/cipher types
supported by JtR - e.g., for Blowfish-based hashing we went from 560 c/s for
4.1.0 to 700 c/s for 4.6.2.

<plug>JtR 1.7.9 and 1.7.9-jumbo include a benchmark comparison tool called
relbench, which calculates geometric mean, median, and some other metrics for
multiple individual outputs from a pair of JtR benchmark invocations (e.g.,
built with different versions of gcc).  In 1.7.9-jumbo-5, there are over 160
individual benchmark outputs (for different hashes/ciphers) and it may be built
in a variety of ways (with/without explicit assembly code, with/without
intrinsics etc.)  relbench combines those 160+ outputs into a nice summary
showing overall speedup/slowdown and more.  It might be useful for testing of
future gcc versions for potential performance regressions like this.</plug>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (6 preceding siblings ...)
  2012-01-04 23:00 ` solar-gcc at openwall dot com
@ 2015-02-09  0:12 ` pinskia at gcc dot gnu.org
  2015-02-16  0:08 ` solar-gcc at openwall dot com
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-02-09  0:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2015-02-09
     Ever confirmed|0                           |1

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Can you try GCC 4.9?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (7 preceding siblings ...)
  2015-02-09  0:12 ` pinskia at gcc dot gnu.org
@ 2015-02-16  0:08 ` solar-gcc at openwall dot com
  2015-02-16  1:10 ` solar-gcc at openwall dot com
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-16  0:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #9 from Alexander Peslyak <solar-gcc at openwall dot com> ---
(In reply to Andrew Pinski from comment #8)
> Can you try GCC 4.9?

Yes.  Bad news: things mostly became even worse.  Same machine, same JtR
version, same test script as in my previous comment:

4.9.2 - 1849K c/s, 28256 bytes

The code size is back to 4.1.0 to 4.3.6 levels (good), but the performance
decreased by another 13% since 4.6.2 (and by 38% since it peaked with 4.3.0). 
I ran this benchmark multiple times, and I also re-ran benchmarks with some
previous gcc versions to make sure this isn't caused by some change in my
environment - no, I am getting consistently poor results for 4.9.2, and the
same results as before for other gcc versions.  I'll plan to test with some
versions in the range 4.7.0 to 4.9.0 next.

(I also see some much smaller regressions with 4.9.2 for other hash types.)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug middle-end/51017] GCC 4.6 performance regression (vs. 4.4/4.5)
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (8 preceding siblings ...)
  2015-02-16  0:08 ` solar-gcc at openwall dot com
@ 2015-02-16  1:10 ` solar-gcc at openwall dot com
  2015-02-16 10:51 ` [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-16  1:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #10 from Alexander Peslyak <solar-gcc at openwall dot com> ---
I decided to take a look at the generated code.  Compared to 4.6.2, GCC 4.9.2
started generating lots of xorps, orps, andps, andnps where it previously
generated pxor, por, pand, pandn.  Changing those with:

sed -i 's/xorps/pxor/g; s/orps/por/g; s/andps/pand/g; s/andnps/pandn/g'

made no difference for performance on this machine (still 4.9.2's poor
performance).

The next suspect were the varieties of MOV instructions.  In 4.9.2's generated
code, there were 1319 movaps, 721 movups.  In 4.6.2's, there were 1258 movaps,
465 movups.  Simply changing all movups to movaps in 4.9.2's original code with
sed (thus, with no other changes except for this one), resulting in a total of
2040 movaps, brought the performance to levels similar to GCC 4.4 and 4.5's
(and is better than 4.6's, but worse than 4.3's).  So movups appear to be the
main culprit.  The same hack for 4.6.2's code brought its performance almost to
4.3's level (still 5% worse, though), and significantly above 4.9.2's (so
there's still some other, smaller regression with 4.9.2).

Here are my new results:

4.1.0o - 2960K c/s, 28182 bytes, 1758 movaps, 0 movups
4.3.6o - 2956K c/s, 28229 bytes, 1755 movaps, 0 movups
4.4.6o - 2694K c/s, 29316 bytes, 1709 movaps, 7 movups
4.4.6h - 2714K c/s, 29316 bytes, 1716 movaps, 0 movups
4.5.3o - 2709K c/s, 29203 bytes, 1669 movaps, 0 movups
4.6.2o - 2121K c/s, 29624 bytes, 1258 movaps, 465 movups
4.6.2h - 2817K c/s, 29624 bytes, 1723 movaps, 0 movups
4.9.2o - 1852K c/s, 28256 bytes, 1319 movaps, 721 movups
4.9.2h - 2688K c/s, 28256 bytes, 2040 movaps, 0 movups

"o" means original, "h" means hacked generated assembly code (all movups
changed to movaps).  (BTW, there were no movdqa/movdqu in any of these code
versions.)

Now I am wondering to what extent this is a GCC issue and to what extent it
might be my source code's, if GCC is somehow unsure it can assume alignment. 
What are the conditions when GCC should in fact use movups?  Is it intentional
that newer versions of GCC are being more careful at this, resulting in worse
performance?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (9 preceding siblings ...)
  2015-02-16  1:10 ` solar-gcc at openwall dot com
@ 2015-02-16 10:51 ` rguenth at gcc dot gnu.org
  2015-02-17  2:21 ` solar-gcc at openwall dot com
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-16 10:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|WAITING                     |NEW
                 CC|                            |rguenth at gcc dot gnu.org
          Component|middle-end                  |tree-optimization
            Summary|GCC 4.6 performance         |GCC 4.6 performance
                   |regression (vs. 4.4/4.5)    |regression (vs. 4.4/4.5),
                   |                            |PRE increases register
                   |                            |pressure

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
As for movaps vs. movups when movaps actually works shouldn't make any
difference on modern architectures.  So I wonder if you could share the exact
CPU type
you are using?

We are putting quite heavy register-pressure on the thing by means of
partial redundancy elimination, thus disabling PRE using -fno-tree-pre
might help (we still spill a lot).

Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     103296 c/s real, 103296 c/s virtual
Only one salt:  100736 c/s real, 100736 c/s virtual

improves to

Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     126848 c/s real, 126848 c/s virtual
Only one salt:  123008 c/s real, 123008 c/s virtual

with that for me (gcc 4.8, SSE2).  Which is close to what 4.5.3 gets for me:

Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
Many salts:     128384 c/s real, 128384 c/s virtual
Only one salt:  124800 c/s real, 124800 c/s virtual

albeit that doesn't need -fno-tree-pre to fix things.

Note that we have to use movups because DES_bs_all is not aligned as seen
from DES_bs_b.c (it's defined in DES_bs.c and only there annotated with
CC_CACHE_ALIGN, not at the point of declaration in DES_bs.h).  So the
unaligned moves are the sources fault.  Annotating that with CC_CACHE_ALIGN
produces the desired movaps instructions (with no effect on performance for
me).

I think for the effect of PRE increasing register pressure we do have some
duplicate bugs (but no good heuristic to fix anything).  LIM store-motion can
have the very same issue.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (10 preceding siblings ...)
  2015-02-16 10:51 ` [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure rguenth at gcc dot gnu.org
@ 2015-02-17  2:21 ` solar-gcc at openwall dot com
  2015-02-17  2:56 ` solar-gcc at openwall dot com
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-17  2:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #12 from Alexander Peslyak <solar-gcc at openwall dot com> ---
(In reply to Richard Biener from comment #11)
> I wonder if you could share the exact CPU type you are using?

This is on (dual) Xeon E5420 (using only one core for these benchmarks), but
there was similar slowdown with GCC 4.6 on other Core 2'ish CPUs as well (such
as desktop Core 2 Duo CPUs). You might not call these "modern".

> Note that we have to use movups because [...]

Thank you for looking into this. I still have a question, though: does this
mean you're treating older GCC's behavior, where it dared to use movaps anyway,
a bug?

I was under impression that with most SSE*/AVX* intrinsics (except for those
explicitly defined to do unaligned loads/stores) natural alignment is assumed
and is supposed to be provided by the programmer. Not only with GCC, but with
compilers for x86(-64) in general. I thought this was part of the contract: I
use intrinsics and I guarantee alignment. (Things would certainly not work for
me at least with older GCC if I assumed the compiler would use unaligned loads
whenever it was unsure of alignment.) Was I wrong, or has this changed (in GCC?
or in some compiler-neutral specification?), or is GCC wrong in not assuming
alignment now?

Is there a command-line option to ask GCC to assume alignment, like it did
before?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (11 preceding siblings ...)
  2015-02-17  2:21 ` solar-gcc at openwall dot com
@ 2015-02-17  2:56 ` solar-gcc at openwall dot com
  2015-02-17  3:11 ` solar-gcc at openwall dot com
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-17  2:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #13 from Alexander Peslyak <solar-gcc at openwall dot com> ---
(In reply to Richard Biener from comment #11)
> We are putting quite heavy register-pressure on the thing by means of
> partial redundancy elimination, thus disabling PRE using -fno-tree-pre
> might help (we still spill a lot).

It looks like -fno-tree-pre or equivalent was implied in the options I was
using, which were "-O2 -fomit-frame-pointer -Os -funroll-loops
-finline-functions" - yes, with -Os added after -O2 when compiling this
specific source file.  IIRC, this was experimentally derived as producing best
performance with 4.6.x or older.  Adding -fno-tree-pre after all of these
options merely changes the label names in the generated assembly code, while
resulting in identical object files (and obviously no performance change). 
Also, I now realize -Os was probably the reason why GCC preferred SSE
"floating-point" bitwise ops and MOVs here, instead of SSE2's integer ones
(they have longer encodings). Omitting -Os results in usage of the SSE2
instructions (both bitwise and MOVs), with correspondingly larger code. And
yes, when I omit -Os, I do need to add -fno-tree-pre to regain roughly the same
performance, and then to s/movdqu/movdqa/g to regain almost the full speed
(movdqu is just as slow as movups on this CPU). I've just tested all of this
with GCC 4.8.4 to possibly match yours (you mentioned you used 4.8). So I think
you uncovered yet another performance regression I had already worked around
with -Os.

FWIW, here are the generated assembly code sizes ("wc" output) with GCC 4.8.4:

-O2 -fomit-frame-pointer -Os -funroll-loops -finline-functions
  5870  17420 137636 1.s
-O2 -fomit-frame-pointer -Os -funroll-loops -finline-functions -fno-tree-pre
  5870  17420 137636 2.s
-O2 -fomit-frame-pointer -funroll-loops -finline-functions
  6814  20193 156837 a.s
-O2 -fomit-frame-pointer -funroll-loops -finline-functions -fno-tree-pre
  6028  17842 138284 b.s

As you can see, -fno-tree-pre reduces the size almost to the -Os level. (But
the .text size would be significantly larger because of the SSE2 instruction
encodings.  This is why I show the assembly code sizes for this comparison.)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (12 preceding siblings ...)
  2015-02-17  2:56 ` solar-gcc at openwall dot com
@ 2015-02-17  3:11 ` solar-gcc at openwall dot com
  2015-02-17  9:25 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-17  3:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #14 from Alexander Peslyak <solar-gcc at openwall dot com> ---
For completeness, here are the results for 4.7.x, 4.8.x, and 4.9.0:

4.7.0o - 2142K c/s, 29692 bytes, 1267 movaps, 465 movups
4.7.0h - 2823K c/s, 29692 bytes, 1732 movaps, 0 movups
4.7.4o - 2144K c/s, 29692 bytes, 1267 movaps, 465 movups
4.7.4h - 2827K c/s, 29692 bytes, 1732 movaps, 0 movups
4.8.0o - 1825K c/s, 27813 bytes, 1341 movaps, 721 movups
4.8.0h - 2792K c/s, 27813 bytes, 2062 movaps, 0 movups
4.8.4o - 1827K c/s, 27807 bytes, 1341 movaps, 721 movups
4.8.4h - 2786K c/s, 27807 bytes, 2062 movaps, 0 movups
4.9.0o - 1852K c/s, 28262 bytes, 1319 movaps, 721 movups
4.9.0h - 2685K c/s, 28262 bytes, 2040 movaps, 0 movups

4.8 produces the smallest code so far, but even with the aligned loads hack is
still 6% slower than 4.3.

All of these are with "-O2 -fomit-frame-pointer -Os -funroll-loops
-finline-functions", like similar results I had posted before.  Xeon E5420,
x86_64.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (13 preceding siblings ...)
  2015-02-17  3:11 ` solar-gcc at openwall dot com
@ 2015-02-17  9:25 ` rguenth at gcc dot gnu.org
  2015-02-17  9:27 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-17  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Peslyak from comment #12)
> (In reply to Richard Biener from comment #11)
> > I wonder if you could share the exact CPU type you are using?
> 
> This is on (dual) Xeon E5420 (using only one core for these benchmarks), but
> there was similar slowdown with GCC 4.6 on other Core 2'ish CPUs as well
> (such as desktop Core 2 Duo CPUs). You might not call these "modern".
> 
> > Note that we have to use movups because [...]
> 
> Thank you for looking into this. I still have a question, though: does this
> mean you're treating older GCC's behavior, where it dared to use movaps
> anyway, a bug?

If you used intrinsics for aligned loads then no.

> I was under impression that with most SSE*/AVX* intrinsics (except for those
> explicitly defined to do unaligned loads/stores) natural alignment is
> assumed and is supposed to be provided by the programmer. Not only with GCC,
> but with compilers for x86(-64) in general. I thought this was part of the
> contract: I use intrinsics and I guarantee alignment. (Things would
> certainly not work for me at least with older GCC if I assumed the compiler
> would use unaligned loads whenever it was unsure of alignment.) Was I wrong,
> or has this changed (in GCC? or in some compiler-neutral specification?), or
> is GCC wrong in not assuming alignment now?

GCC was changed to be more permissive to broken programs and also intrinsics
were changed to map to plain C code in some cases (thus they are not visible
as intrinsics to the compiler).

> Is there a command-line option to ask GCC to assume alignment, like it did
> before?

No.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (14 preceding siblings ...)
  2015-02-17  9:25 ` rguenth at gcc dot gnu.org
@ 2015-02-17  9:27 ` rguenth at gcc dot gnu.org
  2015-02-18  0:03 ` solar-gcc at openwall dot com
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-17  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Peslyak from comment #14)
> For completeness, here are the results for 4.7.x, 4.8.x, and 4.9.0:
> 
> 4.7.0o - 2142K c/s, 29692 bytes, 1267 movaps, 465 movups
> 4.7.0h - 2823K c/s, 29692 bytes, 1732 movaps, 0 movups
> 4.7.4o - 2144K c/s, 29692 bytes, 1267 movaps, 465 movups
> 4.7.4h - 2827K c/s, 29692 bytes, 1732 movaps, 0 movups
> 4.8.0o - 1825K c/s, 27813 bytes, 1341 movaps, 721 movups
> 4.8.0h - 2792K c/s, 27813 bytes, 2062 movaps, 0 movups
> 4.8.4o - 1827K c/s, 27807 bytes, 1341 movaps, 721 movups
> 4.8.4h - 2786K c/s, 27807 bytes, 2062 movaps, 0 movups
> 4.9.0o - 1852K c/s, 28262 bytes, 1319 movaps, 721 movups
> 4.9.0h - 2685K c/s, 28262 bytes, 2040 movaps, 0 movups
> 
> 4.8 produces the smallest code so far, but even with the aligned loads hack
> is still 6% slower than 4.3.
> 
> All of these are with "-O2 -fomit-frame-pointer -Os -funroll-loops
> -finline-functions", like similar results I had posted before.  Xeon E5420,
> x86_64.

I'm completely confused now as to what the original regression was reported
against.  I thought it was the default options in the Makefile, -O2
-fomit-frame-pointer, which showed the regression and you found -Os would
mitigate it somewhat (and I more specifically told you it is -fno-tree-pre that
makes the actual difference).

So - what options give good results with old compilers but bad results with new
compilers?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (15 preceding siblings ...)
  2015-02-17  9:27 ` rguenth at gcc dot gnu.org
@ 2015-02-18  0:03 ` solar-gcc at openwall dot com
  2015-02-18  1:25 ` solar-gcc at openwall dot com
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-18  0:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #17 from Alexander Peslyak <solar-gcc at openwall dot com> ---
(In reply to Richard Biener from comment #16)
> I'm completely confused now as to what the original regression was reported
> against.

I'm sorry, I should have re-read my original description of the regression
before I wrote comment 13.  Together, these are indeed confusing.

> I thought it was the default options in the Makefile, -O2
> -fomit-frame-pointer, which showed the regression and you found -Os would
> mitigate it somewhat (and I more specifically told you it is -fno-tree-pre
> that makes the actual difference).

That's one of the regressions I mentioned in the original description.  Yes,
you identified -fno-tree-pre as the component of -Os that makes the difference
- Thank You!  However, I also mentioned in the original description that a
bigger regression with 4.6+ vs. 4.5 and 4.4 remained despite of -Os, and I had
no similar workaround for it at the time (but enabling -fopenmp made it go
away, perhaps due to changes to declarations in the source code in #ifdef
_OPENMP blocks).  I think we can now say that this bigger 4.6+ regression was
primarily caused by the unaligned load instructions.  So two regressions are
figured out, and the remaining slowdown (not investigated yet) vs. 4.1 to 4.3
(which worked best) is only 6% to 10% in recent versions (9% in 4.9.2).

> So - what options give good results with old compilers but bad results with
> new compilers?

On CPUs where movups/movdqu are slower than their aligned counterparts (for
addresses that happen to be aligned), any sane optimization options of 4.6+
give bad results as compared to pre-4.6 with same options.  As you say, this
can be fixed in the source code (and I most likely will fix it there), but I
think many other programs may experience similar slowdowns, so maybe GCC should
do something about this too.

Other than that, either -Os or -fno-tree-pre works around the second worst
slowdown seen in 4.6+.

To avoid confusion, maybe this bug should focus on one of the three
regressions?  Should we keep it for PRE only?

Should we create a new bug for the unnecessary and non-optional use of
unaligned load instructions for source code like this, or is this considered
the new intended behavior despite of the major slowdown on such CPUs? 
(Presumably not only for JtR.  I'd expect this to affect many programs.)

Should we also create a bug for investigating the remaining slowdown of 9% in
4.9.2 (vs. 4.1 to 4.3), or is it considered too minor to bother?

Thank you!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (16 preceding siblings ...)
  2015-02-18  0:03 ` solar-gcc at openwall dot com
@ 2015-02-18  1:25 ` solar-gcc at openwall dot com
  2015-02-18  3:20 ` solar-gcc at openwall dot com
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-18  1:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #18 from Alexander Peslyak <solar-gcc at openwall dot com> ---
(In reply to Richard Biener from comment #11)
> Note that we have to use movups because DES_bs_all is not aligned as seen
> from DES_bs_b.c (it's defined in DES_bs.c and only there annotated with
> CC_CACHE_ALIGN, not at the point of declaration in DES_bs.h).  So the
> unaligned moves are the sources fault.  Annotating that with CC_CACHE_ALIGN
> produces the desired movaps instructions

Confirmed also with GCC 4.9.2 on JtR 1.8.0's version of the code.

> (with no effect on performance for me).

... with the expected performance improvement for me.  I'll commit this fix. 
Thanks again!


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (17 preceding siblings ...)
  2015-02-18  1:25 ` solar-gcc at openwall dot com
@ 2015-02-18  3:20 ` solar-gcc at openwall dot com
  2015-02-18 10:32 ` [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: solar-gcc at openwall dot com @ 2015-02-18  3:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #19 from Alexander Peslyak <solar-gcc at openwall dot com> ---
(In reply to Alexander Peslyak from comment #17)
> Should we create a new bug for the unnecessary and non-optional use of
> unaligned load instructions for source code like this, or is this considered
> the new intended behavior despite of the major slowdown on such CPUs? 
> (Presumably not only for JtR.  I'd expect this to affect many programs.)

Upon further analysis, I now think that this was my fault, and (presumably) not
common in other programs.  What I had was differing definition vs. declaration,
so a bug.  The lack of alignment specification in the declaration of the struct
essentially told (newer) GCC not to assume alignment - to an extent greater
than e.g. a pointer would.  As far as I can tell, GCC does not currently
produce unaligned load instructions (so assumes that SSE* vectors are properly
aligned) when all it has is a pointer coming from another object file.  I think
that's the common scenario, whereas mine was uncommon (and incorrect).

So let's focus on PRE only.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (18 preceding siblings ...)
  2015-02-18  3:20 ` solar-gcc at openwall dot com
@ 2015-02-18 10:32 ` rguenth at gcc dot gnu.org
  2015-02-18 11:09 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-18 10:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
      Known to work|                            |4.3.4
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Target Milestone|---                         |4.8.5
            Summary|GCC 4.6 performance         |[4.8/4.9/5 Regression] GCC
                   |regression (vs. 4.4/4.5),   |performance regression (vs.
                   |PRE increases register      |4.4/4.5), PRE increases
                   |pressure                    |register pressure too much
      Known to fail|                            |4.8.3, 4.9.2, 5.0

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Peslyak from comment #19)
> (In reply to Alexander Peslyak from comment #17)
> > Should we create a new bug for the unnecessary and non-optional use of
> > unaligned load instructions for source code like this, or is this considered
> > the new intended behavior despite of the major slowdown on such CPUs? 
> > (Presumably not only for JtR.  I'd expect this to affect many programs.)
> 
> Upon further analysis, I now think that this was my fault, and (presumably)
> not common in other programs.  What I had was differing definition vs.
> declaration, so a bug.  The lack of alignment specification in the
> declaration of the struct essentially told (newer) GCC not to assume
> alignment - to an extent greater than e.g. a pointer would.  As far as I can
> tell, GCC does not currently produce unaligned load instructions (so assumes
> that SSE* vectors are properly aligned) when all it has is a pointer coming
> from another object file.  I think that's the common scenario, whereas mine
> was uncommon (and incorrect).

Yes.  Note that we are trying to be more forgiving to users here and do not
exploit undefined behavior fully.

> So let's focus on PRE only.

Ok.  There are related bugreports for that I think.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (19 preceding siblings ...)
  2015-02-18 10:32 ` [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much rguenth at gcc dot gnu.org
@ 2015-02-18 11:09 ` rguenth at gcc dot gnu.org
  2015-02-25 14:26 ` law at redhat dot com
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-18 11:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
We do already inhibit creating loop-carried dependencies of some kind, but only
when vectorization is enabled (because it can inhibit vectorization).  But we
still PRE invariant loads:

Replaced MEM[(vtype * {ref-all})&DES_bs_all + 20528B] with prephitmp_2898 in
all uses of _1195 = MEM[(vtype * {ref-all})&DES_bs_all + 20528B] because we
know
it's {0, 0} on entry.  Note that store motion doesn't apply here because
those stores are said to alias with the MEM[(vtype * {ref-all})k_2 + 848B]
kinds (iterating DES_bs_all.KS.v - unfortunately field-sensitive points-to
analysis doesn't help here as the points-to result itself isn't
field-sensitive).
Of course without store-motion applying this kind of PRE is not really useful.
If store-motion applied it would create the same kind of problem, of course
(in this case up to 0x300(?) live registers).

One possible solution is to simply avoid this kind of "partly" store-motion,
that is converting

  for (;;)
    reg = MEM;
    MEM = fn(reg);

to

  reg = MEM;
  for (;;)
    reg = fn(reg);
    MEM = reg;

of course this is also a profitable transform.  Thus the solution might be
instead to limit register pressure in some way by somehow assessing costs
to individual transforms.  At least it seems to be too difficult for
the register allocator to re-materialize 'reg' from MEM (as it would also
need to perform sophisticated analysis to determine that, basically
undoing the PRE transform).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (20 preceding siblings ...)
  2015-02-18 11:09 ` rguenth at gcc dot gnu.org
@ 2015-02-25 14:26 ` law at redhat dot com
  2015-06-23  8:14 ` [Bug tree-optimization/51017] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: law at redhat dot com @ 2015-02-25 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
                 CC|                            |law at redhat dot com


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [4.8/4.9/5/6 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (21 preceding siblings ...)
  2015-02-25 14:26 ` law at redhat dot com
@ 2015-06-23  8:14 ` rguenth at gcc dot gnu.org
  2015-06-26 20:04 ` [Bug tree-optimization/51017] [4.9/5/6 " jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-23  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.5                       |4.9.3

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [4.9/5/6 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (22 preceding siblings ...)
  2015-06-23  8:14 ` [Bug tree-optimization/51017] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
@ 2015-06-26 20:04 ` jakub at gcc dot gnu.org
  2015-06-26 20:33 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

--- Comment #23 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [4.9/5/6 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (23 preceding siblings ...)
  2015-06-26 20:04 ` [Bug tree-optimization/51017] [4.9/5/6 " jakub at gcc dot gnu.org
@ 2015-06-26 20:33 ` jakub at gcc dot gnu.org
  2021-05-14  9:46 ` [Bug tree-optimization/51017] [9/10/11/12 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase " jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.9.3                       |4.9.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [9/10/11/12 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (24 preceding siblings ...)
  2015-06-26 20:33 ` jakub at gcc dot gnu.org
@ 2021-05-14  9:46 ` jakub at gcc dot gnu.org
  2021-06-01  8:05 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-14  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|8.5                         |9.4

--- Comment #30 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 8 branch is being closed.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [9/10/11/12 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (25 preceding siblings ...)
  2021-05-14  9:46 ` [Bug tree-optimization/51017] [9/10/11/12 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase " jakub at gcc dot gnu.org
@ 2021-06-01  8:05 ` rguenth at gcc dot gnu.org
  2022-05-27  9:34 ` [Bug tree-optimization/51017] [10/11/12/13 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01  8:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.4                         |9.5

--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [10/11/12/13 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (26 preceding siblings ...)
  2021-06-01  8:05 ` rguenth at gcc dot gnu.org
@ 2022-05-27  9:34 ` rguenth at gcc dot gnu.org
  2022-06-28 10:30 ` jakub at gcc dot gnu.org
  2023-07-07 10:29 ` [Bug tree-optimization/51017] [11/12/13/14 " rguenth at gcc dot gnu.org
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27  9:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.5                         |10.4

--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [10/11/12/13 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (27 preceding siblings ...)
  2022-05-27  9:34 ` [Bug tree-optimization/51017] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:30 ` jakub at gcc dot gnu.org
  2023-07-07 10:29 ` [Bug tree-optimization/51017] [11/12/13/14 " rguenth at gcc dot gnu.org
  29 siblings, 0 replies; 31+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.4                        |10.5

--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/51017] [11/12/13/14 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase register pressure too much
  2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
                   ` (28 preceding siblings ...)
  2022-06-28 10:30 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:29 ` rguenth at gcc dot gnu.org
  29 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #34 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-07-07 10:29 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-08  0:43 [Bug middle-end/51017] New: GCC 4.6 performance regression (vs. 4.4/4.5) solar-gcc at openwall dot com
2011-11-08  0:57 ` [Bug middle-end/51017] " solar-gcc at openwall dot com
2011-11-08  1:05 ` solar-gcc at openwall dot com
2011-12-15  0:34 ` pinskia at gcc dot gnu.org
2012-01-03  4:46 ` solar-gcc at openwall dot com
2012-01-04 19:39 ` solar-gcc at openwall dot com
2012-01-04 22:43 ` jakub at gcc dot gnu.org
2012-01-04 23:00 ` solar-gcc at openwall dot com
2015-02-09  0:12 ` pinskia at gcc dot gnu.org
2015-02-16  0:08 ` solar-gcc at openwall dot com
2015-02-16  1:10 ` solar-gcc at openwall dot com
2015-02-16 10:51 ` [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure rguenth at gcc dot gnu.org
2015-02-17  2:21 ` solar-gcc at openwall dot com
2015-02-17  2:56 ` solar-gcc at openwall dot com
2015-02-17  3:11 ` solar-gcc at openwall dot com
2015-02-17  9:25 ` rguenth at gcc dot gnu.org
2015-02-17  9:27 ` rguenth at gcc dot gnu.org
2015-02-18  0:03 ` solar-gcc at openwall dot com
2015-02-18  1:25 ` solar-gcc at openwall dot com
2015-02-18  3:20 ` solar-gcc at openwall dot com
2015-02-18 10:32 ` [Bug tree-optimization/51017] [4.8/4.9/5 Regression] GCC performance regression (vs. 4.4/4.5), PRE increases register pressure too much rguenth at gcc dot gnu.org
2015-02-18 11:09 ` rguenth at gcc dot gnu.org
2015-02-25 14:26 ` law at redhat dot com
2015-06-23  8:14 ` [Bug tree-optimization/51017] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
2015-06-26 20:04 ` [Bug tree-optimization/51017] [4.9/5/6 " jakub at gcc dot gnu.org
2015-06-26 20:33 ` jakub at gcc dot gnu.org
2021-05-14  9:46 ` [Bug tree-optimization/51017] [9/10/11/12 Regression] GCC performance regression (vs. 4.4/4.5), PRE/LIM increase " jakub at gcc dot gnu.org
2021-06-01  8:05 ` rguenth at gcc dot gnu.org
2022-05-27  9:34 ` [Bug tree-optimization/51017] [10/11/12/13 " rguenth at gcc dot gnu.org
2022-06-28 10:30 ` jakub at gcc dot gnu.org
2023-07-07 10:29 ` [Bug tree-optimization/51017] [11/12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).