public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data
@ 2023-11-05 22:02 lis8215 at gmail dot com
  2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: lis8215 at gmail dot com @ 2023-11-05 22:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

            Bug ID: 112398
           Summary: Suboptimal code generation for xor pattern on subword
                    data
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lis8215 at gmail dot com
  Target Milestone: ---

These minimal examples showcase the issue:

uint8_t neg8 (const uint8_t *src)
{
    return ~*src;
    // or return *src ^ 0xff;
}

uint16_t neg16 (const uint16_t *src)
{
    return ~*src;
    // or return *src ^ 0xffff;
}

GCC transforms xor here to not + zero_extend, which isn't the best choice.
I guess combiner have to try xor pattern instead of not + zero_extend as it
might be cheaper.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/112398] Suboptimal code generation for xor pattern on subword data
  2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
@ 2023-11-05 22:16 ` pinskia at gcc dot gnu.org
  2023-11-05 22:17 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-05 22:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |target

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Which target is this for?
Because not is normally cheaper than xor in many sense.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/112398] Suboptimal code generation for xor pattern on subword data
  2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
  2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
@ 2023-11-05 22:17 ` pinskia at gcc dot gnu.org
  2023-11-06  5:32 ` lis8215 at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-05 22:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2023-11-05
     Ever confirmed|0                           |1

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also it depends on the ABI since the return value on many targets' ABI don't
care about the upper bits.

For an example on aarch64 we get:
neg8:
        ldrb    w0, [x0]
        mvn     w0, w0
        ret
neg16:
        ldrh    w0, [x0]
        mvn     w0, w0
        ret

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/112398] Suboptimal code generation for xor pattern on subword data
  2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
  2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
  2023-11-05 22:17 ` pinskia at gcc dot gnu.org
@ 2023-11-06  5:32 ` lis8215 at gmail dot com
  2023-11-06  5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: lis8215 at gmail dot com @ 2023-11-06  5:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

--- Comment #3 from Siarhei Volkau <lis8215 at gmail dot com> ---
Well, let's rewrite it in that way:

void neg8 (uint8_t *restrict dst, const uint8_t *restrict src)
{
    uint8_t work = ~*src; // or *src ^ 0xff;
    dst[0] = (work >> 4) | (work << 4);
}

Wherever upper bits have to be in zero state it is cheaper to use xor,
otherwise we're relying on techniques for eliminating redundant zero_extend and
at least on MIPS (prior to R2) and RISC-V GCC emits the zero_extend
instruction.

MIPS, neg8:
neg8:
        lbu     $2,0($5)
        nop
        nor     $2,$0,$2
        andi    $3,$2,0x00ff
        srl     $3,$3,4
        sll     $2,$2,4
        or      $2,$2,$3
        jr      $31
        sb      $2,0($4)

RISC-V, neg8:
        lbu     a5,0(a1)
        not     a5,a5
        andi    a4,a5,0xff
        srli    a4,a4,4
        slli    a5,a5,4
        or      a4,a4,a5
        sb      a4,0(a0)
        ret

Some other RISCs also emit zero_extend but I'm not sure about having cheaper
xor alternative on them (S390, SH, Xtensa).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subword data
  2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
                   ` (2 preceding siblings ...)
  2023-11-06  5:32 ` lis8215 at gmail dot com
@ 2023-11-06  5:46 ` pinskia at gcc dot gnu.org
  2024-01-13  0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org
  2024-01-13  0:25 ` law at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-06  5:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |UNCONFIRMED
          Component|target                      |rtl-optimization
     Ever confirmed|1                           |0

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Expand does:
;; _1 = *src_5(D);

(insn 7 6 0 (set (reg:SI 134 [ _1 ])
        (zero_extend:SI (mem:QI (reg/v/f:SI 138 [ srcD.2336 ]) [0 MEM[(const
uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]))) "/app/example.cpp":5:21
-1
     (nil))

;; work_6 = ~_1;

(insn 8 7 9 (set (reg:SI 139)
        (not:SI (reg:SI 134 [ _1 ]))) "/app/example.cpp":5:13 -1
     (nil))

(insn 9 8 0 (set (reg/v:SI 136 [ workD.2339 ])
        (zero_extend:SI (subreg:QI (reg:SI 139) 0))) "/app/example.cpp":5:13 -1
     (nil))

The bigger issue we don't take into track of nonzerobits as much as we could.

Though the other issue when combine does the combining here:
Trying 7, 8 -> 9:
    7: r134:SI=zero_extend([r148:SI])
      REG_DEAD r148:SI
    8: r139:SI=~r134:SI
      REG_DEAD r134:SI
    9: r136:SI=zero_extend(r139:SI#0)
      REG_DEAD r139:SI
Failed to match this instruction:
(set (reg/v:SI 136 [ workD.2339 ])
    (zero_extend:SI (subreg:QI (not:SI (subreg:SI (mem:QI (reg:SI 148) [0
MEM[(const uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]) 0)) 0)))



that could be just (xor (zero_extend:SI (mem:QI (reg:SI 148) [0 MEM[(const
uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]) 0))) 255)
But I am not so sure combine knows how to simplify that ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg
  2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
                   ` (3 preceding siblings ...)
  2023-11-06  5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org
@ 2024-01-13  0:17 ` law at gcc dot gnu.org
  2024-01-13  0:25 ` law at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: law at gcc dot gnu.org @ 2024-01-13  0:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2023-11-05 00:00:00         |2024-01-13
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg
  2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
                   ` (4 preceding siblings ...)
  2024-01-13  0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org
@ 2024-01-13  0:25 ` law at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: law at gcc dot gnu.org @ 2024-01-13  0:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

--- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I don't think we need to do any significant bit tracking to optimize the
original neg8 test.  I think we can be handled entirely within the simplify-rtx
framework.    I've got a junior engineer that's going to take a peek at this --
so don't go and fix it Andrew ;-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-13  0:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
2023-11-05 22:17 ` pinskia at gcc dot gnu.org
2023-11-06  5:32 ` lis8215 at gmail dot com
2023-11-06  5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org
2024-01-13  0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org
2024-01-13  0:25 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).