public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/24810]  New: mov + mov + testl generated instead of testb
@ 2005-11-11 19:28 dann at godzilla dot ics dot uci dot edu
  2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-11 19:28 UTC (permalink / raw)
  To: gcc-bugs

Compiling i387.c from the Linux kernel using: 
 -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.0.1/include -D__KERNEL__
-Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -ffreestanding -O2 -fomit-frame-pointer -g -save-temps -msoft-float
-m32 -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts
-mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=i686 -mtune=pentium4
-mregparm=3 -Iinclude/asm-i386/mach-default -Wdeclaration-after-statement
-Wno-pointer-sign -DKBUILD_BASENAME=i387 -DKBUILD_MODNAME=i387
-carch/i386/kernel/i387.c
(these are the flags generated by rpmbuild on a Fedora Core 4 system) 

Using 4.0 the restore_fpu function looks like:
restore_fpu:
        testb   $1, boot_cpu_data+15
        je      .L23
        [snip]

Using 4.1 it looks like:
restore_fpu:
        movl    %eax, %edx
        movl    boot_cpu_data+12, %eax
        testl   $16777216, %eax
        je      .L24
        [snip]

Similar code sequences appear in other functions in the same file: 
get_fpu_mxcsr, get_fpu_swd, get_fpu_cwd, set_fpregs.
The size of these functions increases by 5 bytes (i.e.20%) 

It seems that some of these functions might be on some critical path in the
kernel, so the size increase (and maybe speed penalty) could have an impact.

For 4.0 the 00.expand dump looks like:

(insn 9 7 10 1 (set (reg/f:SI 59)
        (const:SI (plus:SI (symbol_ref:SI ("boot_cpu_data") [flags 0x40]
<var_decl 0xb7ee2d
80 boot_cpu_data>)
                (const_int 12 [0xc])))) -1 (nil)
    (nil))

(insn 10 9 11 1 (set (reg:SI 60)
        (mem/s/j:SI (reg/f:SI 59) [0 boot_cpu_data.x86_capability+0 S4 A32]))
-1 (nil)
    (nil))

(insn 11 10 12 1 (parallel [
            (set (reg:SI 61)
                (and:SI (reg:SI 60)
                    (const_int 16777216 [0x1000000])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 12 11 13 1 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 61)
            (const_int 0 [0x0]))) -1 (nil)
    (nil))


for 4.1 is identical except for insn 10 which has mem/s/v/j:SI 
instead of mem/s/j:SI. 

The combine pass of 4.0 deletes insn 10, that does not happen for 4.1


For 4.1 the generated code does not change when using -Os or -march=pentium4

This is one of the causes for PR23153


-- 
           Summary: mov + mov + testl generated instead of testb
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
@ 2005-11-11 19:29 ` dann at godzilla dot ics dot uci dot edu
  2005-11-13  2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-11 19:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from dann at godzilla dot ics dot uci dot edu  2005-11-11 19:29 -------
Created an attachment (id=10220)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10220&action=view)
Preprocessed code containing the functions that exhibit the problem


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
  2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
@ 2005-11-13  2:47 ` dann at godzilla dot ics dot uci dot edu
  2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-13  2:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from dann at godzilla dot ics dot uci dot edu  2005-11-13 02:47 -------
Simplified testcase: 
struct cpuinfo_x86 {
  unsigned char x86;
  unsigned char x86_vendor;
  unsigned char x86_model;
  unsigned char x86_mask;
  char wp_works_ok;
  char hlt_works_ok;
  char hard_math;
  char rfu;
  int cpuid_level;
  unsigned long x86_capability[7];
} __attribute__((__aligned__((1 << (7)))));

struct task_struct;
extern void foo (struct task_struct *tsk);
extern void bar (struct task_struct *tsk);

extern struct cpuinfo_x86 boot_cpu_data;

static inline __attribute__((always_inline)) int
constant_test_bit(int nr, const volatile unsigned long *addr)
{
 return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}

void
restore_fpu(struct task_struct *tsk)
{
  if (constant_test_bit(24, boot_cpu_data.x86_capability))
    foo (tsk);
  else
    bar (tsk);
}

The generated code for this simplified tescase shows one additional issue:

restore_fpu:
        movl    %eax, %edx
        movl    boot_cpu_data+12, %eax  ; edx could be used here
        testl   $16777216, %eax         ; and here
        je      .L2
        movl    %edx, %eax  ; then all the mov %eax, %edx and mov %edx, %eax
        jmp     foo         ; instructions could be eliminated.
        .p2align 4,,7
.L2:
        movl    %edx, %eax
        jmp     bar


-- 

dann at godzilla dot ics dot uci dot edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|mov + mov + testl generated |[4.1 Regression] mov + mov +
                   |instead of testb            |testl generated instead of
                   |                            |testb


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
  2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
  2005-11-13  2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
@ 2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
  2005-11-14 22:17 ` janis at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-11-14 13:24 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 GCC target triplet|i686-pc-linux-gnu           |i?86-*-*, x86_64-*-*
   Target Milestone|---                         |4.1.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
                   ` (2 preceding siblings ...)
  2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
@ 2005-11-14 22:17 ` janis at gcc dot gnu dot org
  2005-11-19  2:10 ` mmitchel at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: janis at gcc dot gnu dot org @ 2005-11-14 22:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from janis at gcc dot gnu dot org  2005-11-14 22:17 -------
A regression hunt using an i686-linux cross compiler identified the following
patch where the code generation changes:

http://gcc.gnu.org/viewcvs?view=rev&rev=99658

r99658 | hubicka | 2005-05-13 13:57:19 +0000 (Fri, 13 May 2005) | 15 lines


        * gcc.dg/builtins-43.c: Use gimple dump instead of generic.
        * gcc.dg/fold-xor-?.c: Likewise.
        * gcc.dg/pr15784-?.c: Likewise.
        * gcc.dg/pr20922-?.c: Likewise.
        * gcc.dg/tree-ssa/20050128-1.c: Likewise.
        * gcc.dg/tree-ssa/pr17598.c: Likewise.
        * gcc.dg/tree-ssa/pr20470.c: Likewise.

        * tree-inline.c (copy_body_r): Simplify substituted ADDR_EXPRs.
        * tree-optimize.c (pass_gimple): Kill.
        (init_tree_optimization_passes): Kill pass_gimple.
        * tree-cfg.c (build_tree_cfg): Do verify_stmts to check that we are
gimple.
        * tree-dump.c (dump_files): Rename .generic to .gimple.*


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
                   ` (3 preceding siblings ...)
  2005-11-14 22:17 ` janis at gcc dot gnu dot org
@ 2005-11-19  2:10 ` mmitchel at gcc dot gnu dot org
  2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-11-19  2:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from mmitchel at gcc dot gnu dot org  2005-11-19 02:10 -------
Should be fixed before 4.1, if possible.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
                   ` (4 preceding siblings ...)
  2005-11-19  2:10 ` mmitchel at gcc dot gnu dot org
@ 2005-12-18 20:53 ` hubicka at gcc dot gnu dot org
  2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2005-12-18 20:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from hubicka at gcc dot gnu dot org  2005-12-18 20:53 -------
Simplified testcase seems to work for me on 4.1 branch:
restore_fpu:
        movl    4(%esp), %edx
        movl    boot_cpu_data+12, %eax
        testl   $16777216, %eax
        je      .L2
        jmp     foo
.L2:
        movl    %edx, 4(%esp)
        jmp     bar
"jmp foo" is not elliminated because we don't have pattern for conditional
tailcalls.  Should not be big issue to add the neccesary patterns however.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
                   ` (5 preceding siblings ...)
  2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
@ 2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
  2005-12-19  0:37 ` kazu at gcc dot gnu dot org
  2005-12-29 11:53 ` jakub at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-12-18 22:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from dann at godzilla dot ics dot uci dot edu  2005-12-18 22:57 -------
(In reply to comment #5)
> Simplified testcase seems to work for me on 4.1 branch:
> restore_fpu:
>         movl    4(%esp), %edx
>         movl    boot_cpu_data+12, %eax
>         testl   $16777216, %eax

4.0 still does better, it uses a single "testb" instruction instead of 2
dependent 
movl + testb instructions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
                   ` (6 preceding siblings ...)
  2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
@ 2005-12-19  0:37 ` kazu at gcc dot gnu dot org
  2005-12-29 11:53 ` jakub at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: kazu at gcc dot gnu dot org @ 2005-12-19  0:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from kazu at gcc dot gnu dot org  2005-12-19 00:37 -------
We are basically talking about narrowing the memory being loaded for testing.
Now, can we really optimize this case?  We've got

  const volatile unsigned long *addr

I am not sure if "volatile" allows us to change the width of a memory read.
I know a chip that expects you to read memory at one address repeatedly to
transfer a block of data, and people probably use volatile
for this kind of case.  If the compiler changes the width of memory access,
we may be screwing up something.

IMHO, if byte access is really desired, the code should be rewritten that way.


-- 

kazu at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kazu at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
  2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
                   ` (7 preceding siblings ...)
  2005-12-19  0:37 ` kazu at gcc dot gnu dot org
@ 2005-12-29 11:53 ` jakub at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2005-12-29 11:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from jakub at gcc dot gnu dot org  2005-12-29 11:53 -------
I don't think this is a bug, in fact, not honoring the volatile in GCC 4.0.x
and earlier was a bug.  If you want to allow byte access rather than word
access, you really need to remove the volatile keyword and then it compiles
into
restore_fpu:
        testb   $1, boot_cpu_data+15
        je      .L2
        jmp     foo
.L2:
        jmp     bar
        .size   restore_fpu, .-restore_fpu
        .ident  "GCC: (GNU) 4.2.0 20051223 (experimental)"

You should report this against Linux kernel, it shouldn't use volatile in
there.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-12-29 11:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
2005-11-13  2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
2005-11-14 22:17 ` janis at gcc dot gnu dot org
2005-11-19  2:10 ` mmitchel at gcc dot gnu dot org
2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
2005-12-19  0:37 ` kazu at gcc dot gnu dot org
2005-12-29 11:53 ` jakub at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).