public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data @ 2023-11-05 22:02 lis8215 at gmail dot com 2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: lis8215 at gmail dot com @ 2023-11-05 22:02 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398 Bug ID: 112398 Summary: Suboptimal code generation for xor pattern on subword data Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: lis8215 at gmail dot com Target Milestone: --- These minimal examples showcase the issue: uint8_t neg8 (const uint8_t *src) { return ~*src; // or return *src ^ 0xff; } uint16_t neg16 (const uint16_t *src) { return ~*src; // or return *src ^ 0xffff; } GCC transforms xor here to not + zero_extend, which isn't the best choice. I guess combiner have to try xor pattern instead of not + zero_extend as it might be cheaper. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/112398] Suboptimal code generation for xor pattern on subword data 2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com @ 2023-11-05 22:16 ` pinskia at gcc dot gnu.org 2023-11-05 22:17 ` pinskia at gcc dot gnu.org ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: pinskia at gcc dot gnu.org @ 2023-11-05 22:16 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|middle-end |target --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Which target is this for? Because not is normally cheaper than xor in many sense. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/112398] Suboptimal code generation for xor pattern on subword data 2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com 2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org @ 2023-11-05 22:17 ` pinskia at gcc dot gnu.org 2023-11-06 5:32 ` lis8215 at gmail dot com ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: pinskia at gcc dot gnu.org @ 2023-11-05 22:17 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |WAITING Last reconfirmed| |2023-11-05 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Also it depends on the ABI since the return value on many targets' ABI don't care about the upper bits. For an example on aarch64 we get: neg8: ldrb w0, [x0] mvn w0, w0 ret neg16: ldrh w0, [x0] mvn w0, w0 ret ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/112398] Suboptimal code generation for xor pattern on subword data 2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com 2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org 2023-11-05 22:17 ` pinskia at gcc dot gnu.org @ 2023-11-06 5:32 ` lis8215 at gmail dot com 2023-11-06 5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: lis8215 at gmail dot com @ 2023-11-06 5:32 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398 --- Comment #3 from Siarhei Volkau <lis8215 at gmail dot com> --- Well, let's rewrite it in that way: void neg8 (uint8_t *restrict dst, const uint8_t *restrict src) { uint8_t work = ~*src; // or *src ^ 0xff; dst[0] = (work >> 4) | (work << 4); } Wherever upper bits have to be in zero state it is cheaper to use xor, otherwise we're relying on techniques for eliminating redundant zero_extend and at least on MIPS (prior to R2) and RISC-V GCC emits the zero_extend instruction. MIPS, neg8: neg8: lbu $2,0($5) nop nor $2,$0,$2 andi $3,$2,0x00ff srl $3,$3,4 sll $2,$2,4 or $2,$2,$3 jr $31 sb $2,0($4) RISC-V, neg8: lbu a5,0(a1) not a5,a5 andi a4,a5,0xff srli a4,a4,4 slli a5,a5,4 or a4,a4,a5 sb a4,0(a0) ret Some other RISCs also emit zero_extend but I'm not sure about having cheaper xor alternative on them (S390, SH, Xtensa). ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subword data 2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com ` (2 preceding siblings ...) 2023-11-06 5:32 ` lis8215 at gmail dot com @ 2023-11-06 5:46 ` pinskia at gcc dot gnu.org 2024-01-13 0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org 2024-01-13 0:25 ` law at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: pinskia at gcc dot gnu.org @ 2023-11-06 5:46 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |UNCONFIRMED Component|target |rtl-optimization Ever confirmed|1 |0 --- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Expand does: ;; _1 = *src_5(D); (insn 7 6 0 (set (reg:SI 134 [ _1 ]) (zero_extend:SI (mem:QI (reg/v/f:SI 138 [ srcD.2336 ]) [0 MEM[(const uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]))) "/app/example.cpp":5:21 -1 (nil)) ;; work_6 = ~_1; (insn 8 7 9 (set (reg:SI 139) (not:SI (reg:SI 134 [ _1 ]))) "/app/example.cpp":5:13 -1 (nil)) (insn 9 8 0 (set (reg/v:SI 136 [ workD.2339 ]) (zero_extend:SI (subreg:QI (reg:SI 139) 0))) "/app/example.cpp":5:13 -1 (nil)) The bigger issue we don't take into track of nonzerobits as much as we could. Though the other issue when combine does the combining here: Trying 7, 8 -> 9: 7: r134:SI=zero_extend([r148:SI]) REG_DEAD r148:SI 8: r139:SI=~r134:SI REG_DEAD r134:SI 9: r136:SI=zero_extend(r139:SI#0) REG_DEAD r139:SI Failed to match this instruction: (set (reg/v:SI 136 [ workD.2339 ]) (zero_extend:SI (subreg:QI (not:SI (subreg:SI (mem:QI (reg:SI 148) [0 MEM[(const uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]) 0)) 0))) that could be just (xor (zero_extend:SI (mem:QI (reg:SI 148) [0 MEM[(const uint8_tD.2311 *)src_5(D) clique 1 base 1]+0 S1 A8]) 0))) 255) But I am not so sure combine knows how to simplify that ... ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg 2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com ` (3 preceding siblings ...) 2023-11-06 5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org @ 2024-01-13 0:17 ` law at gcc dot gnu.org 2024-01-13 0:25 ` law at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: law at gcc dot gnu.org @ 2024-01-13 0:17 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398 Jeffrey A. Law <law at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed|2023-11-05 00:00:00 |2024-01-13 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg 2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com ` (4 preceding siblings ...) 2024-01-13 0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org @ 2024-01-13 0:25 ` law at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: law at gcc dot gnu.org @ 2024-01-13 0:25 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398 --- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> --- I don't think we need to do any significant bit tracking to optimize the original neg8 test. I think we can be handled entirely within the simplify-rtx framework. I've got a junior engineer that's going to take a peek at this -- so don't go and fix it Andrew ;-) ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-01-13 0:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-05 22:02 [Bug middle-end/112398] New: Suboptimal code generation for xor pattern on subword data lis8215 at gmail dot com 2023-11-05 22:16 ` [Bug target/112398] " pinskia at gcc dot gnu.org 2023-11-05 22:17 ` pinskia at gcc dot gnu.org 2023-11-06 5:32 ` lis8215 at gmail dot com 2023-11-06 5:46 ` [Bug rtl-optimization/112398] " pinskia at gcc dot gnu.org 2024-01-13 0:17 ` [Bug rtl-optimization/112398] Suboptimal code generation for xor pattern on subreg law at gcc dot gnu.org 2024-01-13 0:25 ` law at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).