public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/68000] Suboptimal ternary operator codegen
       [not found] <bug-68000-4@http.gcc.gnu.org/bugzilla/>
@ 2015-10-17  6:54 ` pinskia at gcc dot gnu.org
  2015-10-17  8:34 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-10-17  6:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68000

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|c                           |middle-end

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I think the trunk already has improved code generation.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/68000] Suboptimal ternary operator codegen
       [not found] <bug-68000-4@http.gcc.gnu.org/bugzilla/>
  2015-10-17  6:54 ` [Bug middle-end/68000] Suboptimal ternary operator codegen pinskia at gcc dot gnu.org
@ 2015-10-17  8:34 ` glisse at gcc dot gnu.org
  2015-10-18 20:38 ` thaines.astro at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: glisse at gcc dot gnu.org @ 2015-10-17  8:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68000

--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> ---
Independently of hoisting,

    mov   eax, edx
    add   edx, 1
    add   eax, 1

apparently we fail to CSE this because at the time of CSE, one addition is done
in mode QI and the other in SI, and it is only in split2 that the QI one is
promoted to SI, which is too late for CSE.


Actually, it is quite hard to notice that foo is equivalent to the other
versions. If you wrote: return (uint8_t)(p->x + 1) == p->y ? 0 : p->x + 1; it
would be much easier, and indeed I get better code. For the equivalence, the
compiler has to notice that the only case where the cast matters is when x is
255 and y is 0, and in that case both branches (after sinking the cast to
uint8_t from the return type) are equivalent.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/68000] Suboptimal ternary operator codegen
       [not found] <bug-68000-4@http.gcc.gnu.org/bugzilla/>
  2015-10-17  6:54 ` [Bug middle-end/68000] Suboptimal ternary operator codegen pinskia at gcc dot gnu.org
  2015-10-17  8:34 ` glisse at gcc dot gnu.org
@ 2015-10-18 20:38 ` thaines.astro at gmail dot com
  2015-10-19  5:17 ` pinskia at gcc dot gnu.org
  2021-06-03  4:11 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 5+ messages in thread
From: thaines.astro at gmail dot com @ 2015-10-18 20:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68000

--- Comment #3 from Tim Haines <thaines.astro at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> Note I think the trunk already has improved code generation.

Here is the codegen from the latest trunk build using the same options as
before.

foo_manual_hoist:
    movzx eax, BYTE PTR [rdi]
    mov   edx, 0
    inc   eax
    cmp   al, BYTE PTR [rdi+1]
    cmove eax, edx
    ret

foo:
    movzx edx, BYTE PTR [rdi]
    movzx ecx, BYTE PTR [rdi+1]
    mov   eax, edx
    inc   edx
    cmp   edx, ecx
    je    .L6
    inc   eax
    ret
.L6:
    xor   eax, eax
    ret

foo_if:
    movzx eax, BYTE PTR [rdi]
    mov   edx, 0
    inc   eax
    cmp   al, BYTE PTR [rdi+1]
    cmove eax, edx
    ret

Changing from a cmove to a cmp/jmp doesn't change the instruction latency
(although I couldn't find the latency for je on intel. I assume it's 1 like on
AMD), but now the branch predictor will be invoked- bringing possible pipeline
hazards. I don't mean to be overly critical, but I wouldn't consider this to be
an improvement to the previous code- especially since the other two versions of
the function use cmove. As Marc noted, we are still missing CSE here.

NB: The structure offsets are different here because the assembly I originally
posted was poorly anonymized by me. Mea culpa!


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/68000] Suboptimal ternary operator codegen
       [not found] <bug-68000-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2015-10-18 20:38 ` thaines.astro at gmail dot com
@ 2015-10-19  5:17 ` pinskia at gcc dot gnu.org
  2021-06-03  4:11 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-10-19  5:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68000

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I was looking at the code generation of AARCH64 which does not have
addition in QI Mode and does those adds in SImode and RTL CSE (or RTL GCSE, I
did not look into which one) could remove the extra addition.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/68000] Suboptimal ternary operator codegen
       [not found] <bug-68000-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2015-10-19  5:17 ` pinskia at gcc dot gnu.org
@ 2021-06-03  4:11 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-03  4:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68000

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So take:
 p->x + 1 == p->y

If p->x == 255 and p->y == 0. The above will be false due to C integer
promotion rules.

The other two testcases have the case where p->x == 255 and p->y == 0 will be
true.  Basically foo is not the same as the other two.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-03  4:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-68000-4@http.gcc.gnu.org/bugzilla/>
2015-10-17  6:54 ` [Bug middle-end/68000] Suboptimal ternary operator codegen pinskia at gcc dot gnu.org
2015-10-17  8:34 ` glisse at gcc dot gnu.org
2015-10-18 20:38 ` thaines.astro at gmail dot com
2015-10-19  5:17 ` pinskia at gcc dot gnu.org
2021-06-03  4:11 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).