public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
@ 2011-03-04  7:23 ` adam at consulting dot net.nz
  2011-03-04  7:46 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: adam at consulting dot net.nz @ 2011-03-04  7:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

--- Comment #6 from Adam Warner <adam at consulting dot net.nz> 2011-03-04 07:22:47 UTC ---
Below is a very simple test case of an ordinary input argument to a function
being:

(a) copied to a spare register
(b) copied back from a spare register

When the input argument is:

(a) never modified; and
(b) an ordinary register (not a global register variable)

unmodified_ordinary_register_is_copied.c:


#include <stdint.h>

/* Six caller-saved registers as input arguments */
#define CALLER_SAVED uint64_t REG0, uint64_t REG1, uint64_t REG2, \
                     uint64_t REG3, uint64_t REG4, uint64_t REG5
typedef void (*fn_t)(CALLER_SAVED);

/* Six callee-saved registers as global register variables */
register uint64_t REG6 __asm__("rbx");
register fn_t    *REG7 __asm__("rbp");
register uint64_t REG8 __asm__("r12");
register uint64_t REG9 __asm__("r13");
register uint64_t REG10 __asm__("r14");
register uint64_t REG11 __asm__("r15");

/* Free general purpose registers are RSP, RAX, R10 and R11 */

void optimal_code_generation(CALLER_SAVED) {
  fn_t next=REG7[1];
  next(REG0, REG1, REG2, REG3, REG4, REG5);
}

void unmodified_input_arg_is_copied(CALLER_SAVED) {
  fn_t next=REG7[1];
  ++REG7;
  next(REG0, REG1, REG2, REG3, REG4, REG5);
}

int main() {
  return 0;
}


gcc-4.5 generates optimal code for both functions:
$ gcc-4.5 -O3 unmodified_ordinary_register_is_copied.c && objdump -d -m
i386:x86-64 a.out|less
...
00000000004004a0 <optimal_code_generation>:
  4004a0:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004a4:       ff e0                   jmpq   *%rax
...
00000000004004b0 <unmodified_input_arg_is_copied>:
  4004b0:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004b4:       48 83 c5 08             add    $0x8,%rbp
  4004b8:       ff e0                   jmpq   *%rax
...

Compare with GCC 4.6:
$ gcc-4.6 --version
gcc-4.6 (Debian 4.6-20110227-1) 4.6.0 20110227 (experimental) [trunk revision
170543]
...

$ gcc-4.6 -O3 unmodified_ordinary_register_is_copied.c && objdump -d -m
i386:x86-64 a.out|less
...
00000000004004a0 <optimal_code_generation>:
  4004a0:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004a4:       ff e0                   jmpq   *%rax
...
00000000004004b0 <unmodified_input_arg_is_copied>:
  4004b0:       49 89 fa                mov    %rdi,%r10
  4004b3:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004b7:       48 8d 6d 08             lea    0x8(%rbp),%rbp
  4004bb:       4c 89 d7                mov    %r10,%rdi
  4004be:       ff e0                   jmpq   *%rax
...

According to the Linux x86-64 ABI %rdi is the first argument passed to the
functions. For some reason this is being copied to %r10 before being copied
back from %r10 to %rdi. At no stage is %rdi modified.

(Minor aside:
lea 0x8(%rbp),%rbp has also replaced add $0x8,%rbp. My Intel Core 2 hardware
can execute a maximum of one LEA instruction per clock cycle compared to three
ADD instructions per clock cycle. If I add -march=core2 -mtune=core2 the code
generation becomes:
00000000004004b0 <unmodified_input_arg_is_copied>:
  4004b0:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004b4:       48 8d 6d 08             lea    0x8(%rbp),%rbp
  4004b8:       49 89 fa                mov    %rdi,%r10
  4004bb:       4c 89 d7                mov    %r10,%rdi
  4004be:       ff e0                   jmpq   *%rax
)

This bizarre register copying goes away if I comment out one of the six global
register variables (i.e. five callee-saved global register variables instead of
six). For some reason GCC 4.6 cannot generate sensible code with %rsp, %rax,
%r10 and %r11 available---but can generate sensible code when an additional
register (%rbx, %r12, %r13, %r14 or %r15) is available.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
  2011-03-04  7:23 ` [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation adam at consulting dot net.nz
@ 2011-03-04  7:46 ` jakub at gcc dot gnu.org
  2011-03-04 10:51 ` adam at consulting dot net.nz
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-03-04  7:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-03-04 07:46:11 UTC ---
Using 6 global register variables is clearly self-inflicted pain, even on
x86_64, because if you take 6 registers away and another 6 registers are used
for parameter passing, you make the target very limited on number of registers
and the compiler has much more limited choices in generating close to optimal
code.
Just don't do this.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
  2011-03-04  7:23 ` [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation adam at consulting dot net.nz
  2011-03-04  7:46 ` jakub at gcc dot gnu.org
@ 2011-03-04 10:51 ` adam at consulting dot net.nz
  2011-03-04 11:23 ` jakub at gcc dot gnu.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: adam at consulting dot net.nz @ 2011-03-04 10:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

--- Comment #8 from Adam Warner <adam at consulting dot net.nz> 2011-03-04 10:51:01 UTC ---
Jakub, I fail to see how your conclusion not to do this is supported by the
facts. There are:

(a) six global register variables (though the same effect can be observed with
one global register variable and -ffixed-rbx -ffixed-r12 -ffixed-r13
-ffixed-r14 -ffixed-r15)
(b) six function arguments
(c) one stack pointer

Therefore three registers remain free: %rax, %r10 and %r11. Only one free
register is required to generate the optimal code. GCC 4.5 can do this. GCC 4.6
can't.

The fact GCC outputs the assembly sequence "mov %rdi,%r10; mov %r10,%rdi" is
evidence of a bizarre cascade of bugs. Even rudimentary pinhole optimisation
could elide that assembly sequence.

Are you able to explain why GCC outputs assembly code for a register that is
never unmodified? %rdi remains unmodified. This has nothing to do with a
"compiler has much more limited choices in generating close to optimal
code". The compiler has the choice to use %rax, %r10 or %r11 to store the
address to jump to without spilling. There is no register pressure in this
example. One register is required. Three are available.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-03-04 10:51 ` adam at consulting dot net.nz
@ 2011-03-04 11:23 ` jakub at gcc dot gnu.org
  2011-03-05  2:01 ` adam at consulting dot net.nz
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-03-04 11:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-03-04 11:22:51 UTC ---
You are talking about this single testcase, I'm talking in general that if gcc
is on x86_64 tuned for a medium sized general purpose register file and you
suddenly turn it into a very limited size general purpose register file, you
can get non-optimal code.  Such bugreports are definitely much lower priority
than what you get with the common case where no global register vars are used,
or at most one or two.  The "weird" saving/restoring of %rdi into/from %r10 is
because the RA chose to use %rdi for a temporary used in incrementing of REG7
and loading the next pointer from it, while postreload managed to remove all
needs for such a temporary register, it is too late for the save/restore code
not to be emitted.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2011-03-04 11:23 ` jakub at gcc dot gnu.org
@ 2011-03-05  2:01 ` adam at consulting dot net.nz
  2011-06-27 14:14 ` [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6/4.7 " rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: adam at consulting dot net.nz @ 2011-03-05  2:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

--- Comment #10 from Adam Warner <adam at consulting dot net.nz> 2011-03-05 02:01:04 UTC ---
Jakub,

Thanks for the explanation [The "weird" saving/restoring of %rdi into/from %r10
is because the RA chose to use %rdi for a temporary used in incrementing of
REG7 and loading the next pointer from it, while postreload managed to remove
all needs for such a temporary register, it is too late for the save/restore
code not to be emitted.]

I've replaced the memory lookup and REG7 increment with equivalent inline
assembly to help clarify this explanation. With one remaining source code
variable (next of type fn_t) and everything else opaque assembly the code
generation is worse.


#include <stdint.h>

/* Six caller-saved registers as input arguments */
#define CALLER_SAVED uint64_t REG0, uint64_t REG1, uint64_t REG2, \
                     uint64_t REG3, uint64_t REG4, uint64_t REG5
typedef void (*fn_t)(CALLER_SAVED);

/* Six callee-saved registers as global register variables */
register uint64_t REG6 __asm__("rbx");
register fn_t    *REG7 __asm__("rbp");
register uint64_t REG8 __asm__("r12");
register uint64_t REG9 __asm__("r13");
register uint64_t REG10 __asm__("r14");
register uint64_t REG11 __asm__("r15");

/* Free general purpose registers are RSP, RAX, R10 and R11 */

void optimal_code_generation(CALLER_SAVED) {
  fn_t next=REG7[1];
  next(REG0, REG1, REG2, REG3, REG4, REG5);
}

void unmodified_input_arg_is_copied(CALLER_SAVED) {
  fn_t next=REG7[1];
  ++REG7;
  next(REG0, REG1, REG2, REG3, REG4, REG5);
}

void unmodified_input_arg_is_copied_alt(CALLER_SAVED) {
  fn_t next=REG7[1];
  __asm__("add $8, %0" : "+r" (REG7));
  next(REG0, REG1, REG2, REG3, REG4, REG5);
}

void unmodified_input_arg_is_copied_alt2(CALLER_SAVED) {
  fn_t next;
  __asm__("mov 0x8(%[from]), %[to]" : [to] "=a" (next) : [from] "r" (REG7));
  __asm__("add $8, %0" : "+r" (REG7));
  next(REG0, REG1, REG2, REG3, REG4, REG5);
}

int main() {
  return 0;
}


$ gcc-4.6 -O3 unmodified_ordinary_register_is_copied_with_pure_asm.c && objdump
-d -m i386:x86-64 a.out|less

00000000004004a0 <optimal_code_generation>:
  4004a0:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004a4:       ff e0                   jmpq   *%rax
  4004a6:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  4004ad:       00 00 00 

00000000004004b0 <unmodified_input_arg_is_copied>:
  4004b0:       49 89 fa                mov    %rdi,%r10
  4004b3:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004b7:       48 8d 6d 08             lea    0x8(%rbp),%rbp
  4004bb:       4c 89 d7                mov    %r10,%rdi
  4004be:       ff e0                   jmpq   *%rax

00000000004004c0 <unmodified_input_arg_is_copied_alt>:
  4004c0:       49 89 fa                mov    %rdi,%r10
  4004c3:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004c7:       4c 89 d7                mov    %r10,%rdi
  4004ca:       48 83 c5 08             add    $0x8,%rbp
  4004ce:       ff e0                   jmpq   *%rax

00000000004004d0 <unmodified_input_arg_is_copied_alt2>:
  4004d0:       49 89 fa                mov    %rdi,%r10
  4004d3:       48 89 f7                mov    %rsi,%rdi
  4004d6:       48 89 d6                mov    %rdx,%rsi
  4004d9:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004dd:       48 89 f2                mov    %rsi,%rdx
  4004e0:       48 89 fe                mov    %rdi,%rsi
  4004e3:       4c 89 d7                mov    %r10,%rdi
  4004e6:       48 83 c5 08             add    $0x8,%rbp
  4004ea:       ff e0                   jmpq   *%rax

unmodified_input_arg_is_copied_alt2() specifies a variable next of type fn_t.
The first assembly statement __asm__("mov 0x8(%[from]), %[to]" : [to] "=a"
(next) : [from] "r" (REG7)); directly translates to mov 0x8(%rbp),%rax. Note
use of the "=a" machine constrain to force use of the free %rax register.

The second assembly statement __asm__("add $8, %0" : "+r" (REG7)); directly
translates to add $0x8,%rbp. This is in-place register mutation which does not
require a temporary for incrementing.

While I suspected I might be able to work around the spurious saving/restoring
of unmodified registers with inline assembly the results are far worse. mov
%rdi,%r10; mov %rsi,%rdi; mov %rdx,%rsi is maximally serialized. One cannot
move %rdx into %rsi until %rsi is moved into %rdi. But one cannot move %rsi
into %rdi until %rdi is moved into %r10. Restoring the unmodified registers is
also maximally serialized.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6/4.7 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2011-03-05  2:01 ` adam at consulting dot net.nz
@ 2011-06-27 14:14 ` rguenth at gcc dot gnu.org
  2012-03-13 14:38 ` [Bug rtl-optimization/44281] [4.5/4.6/4.7/4.8 " jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-06-27 14:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.3.6                       |4.4.7

--- Comment #11 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-06-27 12:13:58 UTC ---
4.3 branch is being closed, moving to 4.4.7 target.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.5/4.6/4.7/4.8 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2011-06-27 14:14 ` [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6/4.7 " rguenth at gcc dot gnu.org
@ 2012-03-13 14:38 ` jakub at gcc dot gnu.org
  2012-07-02 11:09 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-13 14:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.7                       |4.5.4

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-13 12:47:05 UTC ---
4.4 branch is being closed, moving to 4.5.4 target.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.5/4.6/4.7/4.8 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2012-03-13 14:38 ` [Bug rtl-optimization/44281] [4.5/4.6/4.7/4.8 " jakub at gcc dot gnu.org
@ 2012-07-02 11:09 ` rguenth at gcc dot gnu.org
  2013-02-23 14:56 ` [Bug rtl-optimization/44281] [4.6/4.7/4.8 " steven at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-02 11:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.5.4                       |4.6.4

--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-02 11:08:29 UTC ---
The 4.5 branch is being closed, adjusting target milestone.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.6/4.7/4.8 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2012-07-02 11:09 ` rguenth at gcc dot gnu.org
@ 2013-02-23 14:56 ` steven at gcc dot gnu.org
  2013-04-12 15:17 ` [Bug rtl-optimization/44281] [4.7/4.8/4.9 " jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: steven at gcc dot gnu.org @ 2013-02-23 14:56 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra

--- Comment #14 from Steven Bosscher <steven at gcc dot gnu.org> 2013-02-23 14:55:53 UTC ---
(In reply to comment #10)

GCC 4.8 r196182, all at -O3 and verified that -Os is similar:

0000000000400470 <optimal_code_generation>:
  400470:       48 8b 45 08             mov    0x8(%rbp),%rax
  400474:       ff e0                   jmpq   *%rax
...

0000000000400480 <unmodified_input_arg_is_copied>:
  400480:       4c 8b 55 08             mov    0x8(%rbp),%r10
  400484:       48 8d 6d 08             lea    0x8(%rbp),%rbp
  400488:       41 ff e2                jmpq   *%r10
...

0000000000400490 <unmodified_input_arg_is_copied_alt>:
  400490:       48 8b 45 08             mov    0x8(%rbp),%rax
  400494:       48 83 c5 08             add    $0x8,%rbp
  400498:       ff e0                   jmpq   *%rax
...

00000000004004a0 <unmodified_input_arg_is_copied_alt2>:
  4004a0:       48 8b 45 08             mov    0x8(%rbp),%rax
  4004a4:       48 83 c5 08             add    $0x8,%rbp
  4004a8:       ff e0                   jmpq   *%rax
...


The reported slowdown in comment #5 is also gone.

For the original complaint, comment #0:

0000000000000000 <push_flag_into_global_reg_var>:
   0:   31 c0                   xor    %eax,%eax
   2:   48 89 da                mov    %rbx,%rdx
   5:   48 39 f7                cmp    %rsi,%rdi
   8:   0f 94 c0                sete   %al
   b:   48 c1 e2 08             shl    $0x8,%rdx
   f:   48 09 d0                or     %rdx,%rax
  12:   48 89 c3                mov    %rax,%rbx
  15:   c3                      retq   
...

0000000000000020 <push_flag_into_local_var>:
  20:   31 c0                   xor    %eax,%eax
  22:   48 39 f7                cmp    %rsi,%rdi
  25:   0f 94 c0                sete   %al
  28:   48 c1 e2 08             shl    $0x8,%rdx
  2c:   48 09 d0                or     %rdx,%rax
  2f:   c3                      retq   

So the code for push_flag_into_local_var is the same as gcc-3.3 and the
code for push_flag_into_global_reg_var is the same as gcc-4.4.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.7/4.8/4.9 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2013-02-23 14:56 ` [Bug rtl-optimization/44281] [4.6/4.7/4.8 " steven at gcc dot gnu.org
@ 2013-04-12 15:17 ` jakub at gcc dot gnu.org
  2014-06-12 13:46 ` [Bug rtl-optimization/44281] [4.7/4.8/4.9/4.10 " rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-04-12 15:17 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.6.4                       |4.7.4

--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-04-12 15:16:43 UTC ---
GCC 4.6.4 has been released and the branch has been closed.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.7/4.8/4.9/4.10 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2013-04-12 15:17 ` [Bug rtl-optimization/44281] [4.7/4.8/4.9 " jakub at gcc dot gnu.org
@ 2014-06-12 13:46 ` rguenth at gcc dot gnu.org
  2014-12-19 13:44 ` [Bug rtl-optimization/44281] [4.8/4.9/5 " jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-06-12 13:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.7.4                       |4.8.4

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
The 4.7 branch is being closed, moving target milestone to 4.8.4.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.8/4.9/5 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2014-06-12 13:46 ` [Bug rtl-optimization/44281] [4.7/4.8/4.9/4.10 " rguenth at gcc dot gnu.org
@ 2014-12-19 13:44 ` jakub at gcc dot gnu.org
  2015-02-16 21:19 ` law at redhat dot com
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-12-19 13:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.4                       |4.8.5

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.4 has been released.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.8/4.9/5 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2014-12-19 13:44 ` [Bug rtl-optimization/44281] [4.8/4.9/5 " jakub at gcc dot gnu.org
@ 2015-02-16 21:19 ` law at redhat dot com
  2015-02-16 21:36 ` law at redhat dot com
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: law at redhat dot com @ 2015-02-16 21:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

--- Comment #18 from Jeffrey A. Law <law at redhat dot com> ---
In reference to c#10 and c#14, we get from the trunk:

   0:   ff 65 08                jmpq   *0x8(%rbp)
   3:   66 66 66 66 2e 0f 1f    data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
   a:   84 00 00 00 00 00 

0000000000000010 <unmodified_input_arg_is_copied>:
  10:   4c 8b 55 08             mov    0x8(%rbp),%r10
  14:   48 8d 6d 08             lea    0x8(%rbp),%rbp
  18:   41 ff e2                jmpq   *%r10
  1b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000000020 <unmodified_input_arg_is_copied_alt>:
  20:   48 8b 45 08             mov    0x8(%rbp),%rax
  24:   48 83 c5 08             add    $0x8,%rbp
  28:   ff e0                   jmpq   *%rax
  2a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

0000000000000030 <unmodified_input_arg_is_copied_alt2>:
  30:   48 8b 45 08             mov    0x8(%rbp),%rax
  34:   48 83 c5 08             add    $0x8,%rbp
  38:   ff e0                   jmpq   *%rax

Which is better than the code reference in c#14 and c#10 in each case.  This
was probably Kai's code to improve our indirect jump support in the backend.

We also still get the desired code for push_flag_into_local_var.   The only
issue left is the poor code for push_flag_into_global_reg_var.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.8/4.9/5 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2015-02-16 21:19 ` law at redhat dot com
@ 2015-02-16 21:36 ` law at redhat dot com
  2015-06-23  8:26 ` [Bug rtl-optimization/44281] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: law at redhat dot com @ 2015-02-16 21:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

--- Comment #19 from Jeffrey A. Law <law at redhat dot com> ---
AFAICT the issue with push_flag_into_global_reg_var is poor register
allocation, perhaps made worse by the x86 backend's constrains on the ashldi3_1
insn.


  Loop 0 (parent -1, header bb2, depth 0)
    bbs: 2
    all: 0r96 1r94 2r93 3r92
    modified regnos: 92 93 94 96
    border:
    Pressure: GENERAL_REGS=2
    Hard reg set forest:
      0:( 0-2 4-6 8-15 21-52)@0
        1:( 0-2 4-6 21-52)@0
          2:( 0-2 4-6 37-44)@28000
            3:( 0-2 5 6 37-44)@20000
      Allocno a0r96 of ALL_REGS(46) has 38 avail. regs  0-2 4-6 21-52, node: 
0-2 4-6 21-52 (confl regs =  3 7 16-20 53-79)
      Allocno a1r94 of GENERAL_REGS(14) has 14 avail. regs  0-2 4-6 37-44,
node:  0-2 4-6 37-44 (confl regs =  3 7-36 45-79)
      Allocno a2r93 of GENERAL_REGS(14) has 14 avail. regs  0-2 4-6 37-44,
node:  0-2 4-6 37-44 (confl regs =  3 7-36 45-79)
      Allocno a3r92 of GENERAL_REGS(14) has 13 avail. regs  0-2 5 6 37-44,
node:  0-2 5 6 37-44 (confl regs =  3 4 7-36 45-79)
      Pushing a3(r92,l0)(cost 0)
      Pushing a2(r93,l0)(cost 0)
      Pushing a1(r94,l0)(cost 0)
      Pushing a0(r96,l0)(cost 0)
      Popping a0(r96,l0)  -- assign reg 0
      Popping a1(r94,l0)  -- assign reg 1
      Popping a2(r93,l0)  -- assign reg 4
      Popping a3(r92,l0)  -- assign reg 5
Disposition:
    3:r92  l0     5    2:r93  l0     4    1:r94  l0     1    0:r96  l0     0

In particular note a0(r96) going into reg0.  At that point, we've lost.  We'd
really like to see it go into %ebx, which is a global register variable.  The
key insns are:

(insn 11 10 12 2 (parallel [
            (set (reg:DI 96 [ D.1874 ])
                (ashift:DI (reg/v:DI 3 bx [ global_flag_stack ])
                    (const_int 8 [0x8])))
            (clobber (reg:CC 17 flags))
        ]) j.c:61 511 {*ashldi3_1}
     (expr_list:REG_DEAD (reg/v:DI 3 bx [ global_flag_stack ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 12 11 0 2 (parallel [
            (set (reg/v:DI 3 bx [ global_flag_stack ])
                (ior:DI (reg:DI 94 [ D.1873 ])
                    (reg:DI 96 [ D.1874 ])))
            (clobber (reg:CC 17 flags))
        ]) j.c:61 400 {*iordi_1}
     (expr_list:REG_DEAD (reg:DI 96 [ D.1874 ])
        (expr_list:REG_DEAD (reg:DI 94 [ D.1873 ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.8/4.9/5/6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2015-02-16 21:36 ` law at redhat dot com
@ 2015-06-23  8:26 ` rguenth at gcc dot gnu.org
  2015-06-26 20:26 ` [Bug rtl-optimization/44281] [4.9/5/6 " jakub at gcc dot gnu.org
  2015-06-26 20:37 ` jakub at gcc dot gnu.org
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-23  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.5                       |4.9.3

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.9/5/6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2015-06-23  8:26 ` [Bug rtl-optimization/44281] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
@ 2015-06-26 20:26 ` jakub at gcc dot gnu.org
  2015-06-26 20:37 ` jakub at gcc dot gnu.org
  16 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug rtl-optimization/44281] [4.9/5/6 Regression] Global Register variable pessimisation
       [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2015-06-26 20:26 ` [Bug rtl-optimization/44281] [4.9/5/6 " jakub at gcc dot gnu.org
@ 2015-06-26 20:37 ` jakub at gcc dot gnu.org
  16 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44281

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.9.3                       |4.9.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-06-26 20:37 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-44281-4@http.gcc.gnu.org/bugzilla/>
2011-03-04  7:23 ` [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6 Regression] Global Register variable pessimisation adam at consulting dot net.nz
2011-03-04  7:46 ` jakub at gcc dot gnu.org
2011-03-04 10:51 ` adam at consulting dot net.nz
2011-03-04 11:23 ` jakub at gcc dot gnu.org
2011-03-05  2:01 ` adam at consulting dot net.nz
2011-06-27 14:14 ` [Bug rtl-optimization/44281] [4.3/4.4/4.5/4.6/4.7 " rguenth at gcc dot gnu.org
2012-03-13 14:38 ` [Bug rtl-optimization/44281] [4.5/4.6/4.7/4.8 " jakub at gcc dot gnu.org
2012-07-02 11:09 ` rguenth at gcc dot gnu.org
2013-02-23 14:56 ` [Bug rtl-optimization/44281] [4.6/4.7/4.8 " steven at gcc dot gnu.org
2013-04-12 15:17 ` [Bug rtl-optimization/44281] [4.7/4.8/4.9 " jakub at gcc dot gnu.org
2014-06-12 13:46 ` [Bug rtl-optimization/44281] [4.7/4.8/4.9/4.10 " rguenth at gcc dot gnu.org
2014-12-19 13:44 ` [Bug rtl-optimization/44281] [4.8/4.9/5 " jakub at gcc dot gnu.org
2015-02-16 21:19 ` law at redhat dot com
2015-02-16 21:36 ` law at redhat dot com
2015-06-23  8:26 ` [Bug rtl-optimization/44281] [4.8/4.9/5/6 " rguenth at gcc dot gnu.org
2015-06-26 20:26 ` [Bug rtl-optimization/44281] [4.9/5/6 " jakub at gcc dot gnu.org
2015-06-26 20:37 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).