public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/115529] New: Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4)
@ 2024-06-18  1:36 Explorer09 at gmail dot com
  2024-06-18  1:43 ` [Bug tree-optimization/115529] " pinskia at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: Explorer09 at gmail dot com @ 2024-06-18  1:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115529

            Bug ID: 115529
           Summary: Optimization with "bit mask and compare equality" ((x
                    & C1) == C2), ((x | C3) == C4)
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

I thought this pattern is commonly seen, but I didn't find any issue report in
GCC mentioning this.

```c
#include <stdbool.h>

bool pred_bitor(unsigned int x) {
    return (x | 0x1F) == 0x381F;
}
bool pred_bitand(unsigned int x) {
    return (x & ~(uint32_t)0x1F) == 0x3800;
}
bool pred_shift(unsigned int x) {
    return (x >> 5) == (0x3800 >> 5);
}
```

The three predicates are equivalent, but GCC doesn't seem to recognize one can
be converted to another. Clang does recognize that, however.

My optimization request is this:
For the patterns ((x & C1) == C2) and ((x | C3) == C4), where C1 to C4 are all
compile time constants, try converting one code pattern to another, and figure
out which can generate faster or smaller code (or maybe both).

* If any constant happens to be already loaded into one register, then reusing
that constant can save code size by not needing to load the constant into
register for compare op. In other words, which pattern is the best depends a
lot on the surrounding code, as well is CPU instruction sets.

* My personal testing says that pred_bitand() can win in more cases than the
other two:
** In 32-bit ARM, the constant 0x3800 can fit into an immediate operand but
0x381F cannot, which means there would be an additional "mov" instruction for
0x381F.
** In RISC-V, the "andi" has a 16-bit encoding (with the "C"/Compressed
instruction extension) that works here, but there is no 16-bit encoding for
"ori" instruction.

* The pred_shift() case can work only if the mask constant is in the `((1 << N)
- 1)` form, or `((unsigned)-1 << M)` form. It doesn't work with all mask
constants but I think you knew that already.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug tree-optimization/115529] Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4)
  2024-06-18  1:36 [Bug tree-optimization/115529] New: Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4) Explorer09 at gmail dot com
@ 2024-06-18  1:43 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18  1:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115529

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>((x | C3) == C4)
shows up in PR 86975, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86975#c2
.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-06-18  1:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-18  1:36 [Bug tree-optimization/115529] New: Optimization with "bit mask and compare equality" ((x & C1) == C2), ((x | C3) == C4) Explorer09 at gmail dot com
2024-06-18  1:43 ` [Bug tree-optimization/115529] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).