public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data
@ 2023-11-05 22:02 lis8215 at gmail dot com
2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: lis8215 at gmail dot com @ 2023-11-05 22:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
Bug ID: 112398
Summary: Suboptimal code generation for xor pattern on subword
data
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: lis8215 at gmail dot com
Target Milestone: ---
These minimal examples showcase the issue:
uint8_t neg8 (const uint8_t *src)
{
return ~*src;
// or return *src ^ 0xff;
}
uint16_t neg16 (const uint16_t *src)
{
return ~*src;
// or return *src ^ 0xffff;
}
GCC transforms xor here to not + zero_extend, which isn't the best choice.
I guess combiner have to try xor pattern instead of not + zero_extend as it
might be cheaper.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/112398] Suboptimal code generation for xor pattern on subword data
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
@ 2023-11-05 22:16 ` pinskia at gcc dot gnu.org
2023-11-05 22:17 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-05 22:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|middle-end |target
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Which target is this for?
Because not is normally cheaper than xor in many sense.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/112398] Suboptimal code generation for xor pattern on subword data
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
@ 2023-11-05 22:17 ` pinskia at gcc dot gnu.org
2023-11-06 5:32 ` lis8215 at gmail dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-05 22:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2023-11-05
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also it depends on the ABI since the return value on many targets' ABI don't
care about the upper bits.
For an example on aarch64 we get:
neg8:
ldrb w0, [x0]
mvn w0, w0
ret
neg16:
ldrh w0, [x0]
mvn w0, w0
ret
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/112398] Suboptimal code generation for xor pattern on subword data
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
2023-11-05 22:17 ` pinskia at gcc dot gnu.org
@ 2023-11-06 5:32 ` lis8215 at gmail dot com
2023-11-06 5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: lis8215 at gmail dot com @ 2023-11-06 5:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
--- Comment #3 from Siarhei Volkau <lis8215 at gmail dot com> ---
Well, let's rewrite it in that way:
void neg8 (uint8_t *restrict dst, const uint8_t *restrict src)
{
uint8_t work = ~*src; // or *src ^ 0xff;
dst[0] = (work >> 4) | (work << 4);
}
Wherever upper bits have to be in zero state it is cheaper to use xor,
otherwise we're relying on techniques for eliminating redundant zero_extend and
at least on MIPS (prior to R2) and RISC-V GCC emits the zero_extend
instruction.
MIPS, neg8:
neg8:
lbu $2,0($5)
nop
nor $2,$0,$2
andi $3,$2,0x00ff
srl $3,$3,4
sll $2,$2,4
or $2,$2,$3
jr $31
sb $2,0($4)
RISC-V, neg8:
lbu a5,0(a1)
not a5,a5
andi a4,a5,0xff
srli a4,a4,4
slli a5,a5,4
or a4,a4,a5
sb a4,0(a0)
ret
Some other RISCs also emit zero_extend but I'm not sure about having cheaper
xor alternative on them (S390, SH, Xtensa).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subword data
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
` (2 preceding siblings ...)
2023-11-06 5:32 ` lis8215 at gmail dot com
@ 2023-11-06 5:46 ` pinskia at gcc dot gnu.org
2024-01-13 0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org
2024-01-13 0:25 ` law at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-06 5:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |UNCONFIRMED
Component|target |rtl-optimization
Ever confirmed|1 |0
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Expand does:
;; _1 = *src_5(D);
(insn 7 6 0 (set (reg:SI 134 [ _1 ])
(zero_extend:SI (mem:QI (reg/v/f:SI 138 [ srcD.2336 ]) [0 MEM[(const
uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]))) "/app/example.cpp":5:21
-1
(nil))
;; work_6 = ~_1;
(insn 8 7 9 (set (reg:SI 139)
(not:SI (reg:SI 134 [ _1 ]))) "/app/example.cpp":5:13 -1
(nil))
(insn 9 8 0 (set (reg/v:SI 136 [ workD.2339 ])
(zero_extend:SI (subreg:QI (reg:SI 139) 0))) "/app/example.cpp":5:13 -1
(nil))
The bigger issue we don't take into track of nonzerobits as much as we could.
Though the other issue when combine does the combining here:
Trying 7, 8 -> 9:
7: r134:SI=zero_extend([r148:SI])
REG_DEAD r148:SI
8: r139:SI=~r134:SI
REG_DEAD r134:SI
9: r136:SI=zero_extend(r139:SI#0)
REG_DEAD r139:SI
Failed to match this instruction:
(set (reg/v:SI 136 [ workD.2339 ])
(zero_extend:SI (subreg:QI (not:SI (subreg:SI (mem:QI (reg:SI 148) [0
MEM[(const uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]) 0)) 0)))
that could be just (xor (zero_extend:SI (mem:QI (reg:SI 148) [0 MEM[(const
uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]) 0))) 255)
But I am not so sure combine knows how to simplify that ...
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
` (3 preceding siblings ...)
2023-11-06 5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org
@ 2024-01-13 0:17 ` law at gcc dot gnu.org
2024-01-13 0:25 ` law at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: law at gcc dot gnu.org @ 2024-01-13 0:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2023-11-05 00:00:00 |2024-01-13
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
` (4 preceding siblings ...)
2024-01-13 0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org
@ 2024-01-13 0:25 ` law at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: law at gcc dot gnu.org @ 2024-01-13 0:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
--- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I don't think we need to do any significant bit tracking to optimize the
original neg8 test. I think we can be handled entirely within the simplify-rtx
framework. I've got a junior engineer that's going to take a peek at this --
so don't go and fix it Andrew ;-)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-01-13 0:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com
2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org
2023-11-05 22:17 ` pinskia at gcc dot gnu.org
2023-11-06 5:32 ` lis8215 at gmail dot com
2023-11-06 5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org
2024-01-13 0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org
2024-01-13 0:25 ` law at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).