public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/41505]  New: GCC choosing poor code sequence for certain stores (x86)
@ 2009-09-29 15:53 law at redhat dot com
  2009-09-29 16:07 ` [Bug target/41505] " rguenth at gcc dot gnu dot org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: law at redhat dot com @ 2009-09-29 15:53 UTC (permalink / raw)
  To: gcc-bugs

Consider:

foo(int *x)
{
  *x = 0;
}

Compile with -Os -fomit-frame-pointer and you'll get something like this:

        movl    4(%esp), %eax
        movl    $0, (%eax)

It would be 2 bytes shorter to instead load the constant 0 via an xor
instruction into a scratch register, then store the scratch register into the
memory location.  Something like this:

        movl    4(%esp), %eax
        xor     %edx, %edx
        movl    %edx, (%eax)


ISTM this could easily be implemented with a peep2.

I'm not well versed enough in x86 instruction timings to know if the xor
sequence is going to generally be faster.


-- 
           Summary: GCC choosing poor code sequence for certain stores (x86)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: law at redhat dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
  2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
@ 2009-09-29 16:07 ` rguenth at gcc dot gnu dot org
  2009-09-29 17:12 ` law at redhat dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-09-29 16:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-09-29 16:07 -------
difficult


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
  2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
  2009-09-29 16:07 ` [Bug target/41505] " rguenth at gcc dot gnu dot org
@ 2009-09-29 17:12 ` law at redhat dot com
  2009-09-29 21:18 ` rth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: law at redhat dot com @ 2009-09-29 17:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from law at redhat dot com  2009-09-29 17:12 -------
I don't understand your comment Richard.  Isn't it just something like this?
(define_peephole2
  [(match_scratch:SI 2 "r")
   (set (match_operand:SI 0 "memory_operand" "")
        (match_operand:SI 1 "const_0_operand" ""))]
  ""
  [(set (match_dup 2) (match_dup 1))
   (set (match_dup 0) (match_dup 2))]
  "")


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
  2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
  2009-09-29 16:07 ` [Bug target/41505] " rguenth at gcc dot gnu dot org
  2009-09-29 17:12 ` law at redhat dot com
@ 2009-09-29 21:18 ` rth at gcc dot gnu dot org
  2009-09-29 21:56 ` law at redhat dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rth at gcc dot gnu dot org @ 2009-09-29 21:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rth at gcc dot gnu dot org  2009-09-29 21:18 -------
There are already peepholes for this, though the condition appears to be
slightly wrong for -Os.  See i386.md:21121 :

(define_peephole2
  [(match_scratch:SI 1 "r")
   (set (match_operand:SI 0 "memory_operand" "")
        (const_int 0))]
  "optimize_insn_for_speed_p ()
   && ! TARGET_USE_MOV0
   && TARGET_SPLIT_LONG_MOVES
   && get_attr_length (insn) >= ix86_cur_cost ()->large_insn
   && peep2_regno_dead_p (0, FLAGS_REG)"


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
  2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
                   ` (2 preceding siblings ...)
  2009-09-29 21:18 ` rth at gcc dot gnu dot org
@ 2009-09-29 21:56 ` law at redhat dot com
  2009-09-29 23:43 ` rth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: law at redhat dot com @ 2009-09-29 21:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from law at redhat dot com  2009-09-29 21:55 -------
Subject: Re:  GCC choosing poor code sequence for certain
 stores (x86)

On 09/29/09 15:18, rth at gcc dot gnu dot org wrote:
> ------- Comment #3 from rth at gcc dot gnu dot org  2009-09-29 21:18 -------
> There are already peepholes for this, though the condition appears to be
> slightly wrong for -Os.  See i386.md:21121 :
>
> (define_peephole2
>    [(match_scratch:SI 1 "r")
>     (set (match_operand:SI 0 "memory_operand" "")
>          (const_int 0))]
>    "optimize_insn_for_speed_p ()
>     &&  ! TARGET_USE_MOV0
>     &&  TARGET_SPLIT_LONG_MOVES
>     &&  get_attr_length (insn)>= ix86_cur_cost ()->large_insn
>     &&  peep2_regno_dead_p (0, FLAGS_REG)"
>
>    

Ah, yes, the flags register needs to be available.

As for the condition, after reading optimization guides for the various 
x86 chips that

     mov $0, <mem>

is generally going to be faster than

     xor  temp, temp
     mov temp, <mem>

So I was thinking we'd want something like this for the condition.

  ((optimize_insn_for_size_p ()
    || (!TARGET_USE_MOV0
&& TARGET_SPLIT_LONG_MOVES
&& get_attr_length (insn) >= ix86_cur_cost()->large_insn))
&& peep2_regno_dead_p (0, FLAGS_REG)

Which I think should always give us the xor sequence when optimizing for 
size or when optimizing for the odd x86 implementation where the xor 
sequence is faster.


I can easily bundle that up as a patch if it looks right to you...

Jeff


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
  2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
                   ` (3 preceding siblings ...)
  2009-09-29 21:56 ` law at redhat dot com
@ 2009-09-29 23:43 ` rth at gcc dot gnu dot org
  2009-09-30  9:22 ` jakub at gcc dot gnu dot org
  2009-09-30 14:47 ` law at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: rth at gcc dot gnu dot org @ 2009-09-29 23:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rth at gcc dot gnu dot org  2009-09-29 23:43 -------
Yeah, that looks right.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
  2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
                   ` (4 preceding siblings ...)
  2009-09-29 23:43 ` rth at gcc dot gnu dot org
@ 2009-09-30  9:22 ` jakub at gcc dot gnu dot org
  2009-09-30 14:47 ` law at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-09-30  9:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from jakub at gcc dot gnu dot org  2009-09-30 09:22 -------
For x86-64 we perhaps want further checks for the size optimization - if the
scratch register is %r8d through %r15d, 3 byte xorl %r8d, %r8d and e.g. 3 byte
movl %r8d, (%rdx) won't be shorter than movl $0, (%rdx) which is 6 bytes).
And likely the 2 insns will be slower.
But if the address already needs rex prefix, it is still a win.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
  2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
                   ` (5 preceding siblings ...)
  2009-09-30  9:22 ` jakub at gcc dot gnu dot org
@ 2009-09-30 14:47 ` law at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: law at redhat dot com @ 2009-09-30 14:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from law at redhat dot com  2009-09-30 14:47 -------
Subject: Re:  GCC choosing poor code sequence for certain
 stores (x86)

On 09/30/09 03:22, jakub at gcc dot gnu dot org wrote:
> ------- Comment #6 from jakub at gcc dot gnu dot org  2009-09-30 09:22 -------
> For x86-64 we perhaps want further checks for the size optimization - if the
> scratch register is %r8d through %r15d, 3 byte xorl %r8d, %r8d and e.g. 3 byte
> movl %r8d, (%rdx) won't be shorter than movl $0, (%rdx) which is 6 bytes).
> And likely the 2 insns will be slower.
> But if the address already needs rex prefix, it is still a win.
>
>
>    
Do we have any good way to test if the address needs a rex prefix?  I 
see the rex_prefix attribute in i386.md, but that's for testing an 
entire insn and based on my quick reading of i386.md it's not complete 
as many insns set the attribute explicitly.

Jeff


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-09-30 14:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-29 15:53 [Bug target/41505] New: GCC choosing poor code sequence for certain stores (x86) law at redhat dot com
2009-09-29 16:07 ` [Bug target/41505] " rguenth at gcc dot gnu dot org
2009-09-29 17:12 ` law at redhat dot com
2009-09-29 21:18 ` rth at gcc dot gnu dot org
2009-09-29 21:56 ` law at redhat dot com
2009-09-29 23:43 ` rth at gcc dot gnu dot org
2009-09-30  9:22 ` jakub at gcc dot gnu dot org
2009-09-30 14:47 ` law at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).