public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/114087] New: RISC-V optimization on checking certain bits set ((x & mask) == val)
@ 2024-02-24 12:48 Explorer09 at gmail dot com
2024-02-24 17:32 ` [Bug middle-end/114087] " pinskia at gcc dot gnu.org
2024-02-27 1:16 ` andrew at sifive dot com
0 siblings, 2 replies; 3+ messages in thread
From: Explorer09 at gmail dot com @ 2024-02-24 12:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087
Bug ID: 114087
Summary: RISC-V optimization on checking certain bits set ((x &
mask) == val)
Product: gcc
Version: 13.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: Explorer09 at gmail dot com
Target Milestone: ---
It might be common in the C family of languages to check if certain bits are
set in an integer with a code pattern like this:
```c
unsigned int x;
if ((x & 0x3000) == 0x1000) {
// Do something...
}
```
And I am surprised when compilers like GCC and Clang didn't realize they can
use some bit shifts and inversions of bit masks to save some instructions and
emit smaller code.
Here I present 3 possible optimizations that could be implemented in a
compiler. Two of them can apply not only to RISC-V, but other RISC
architectures as well (except ARM, perhaps). The last one is specific to RISC-V
due to the 20-bit immediate operand of its "lui" (load upper immediate)
instruction.
The bit masks should be compile time constants, and the "-Os" flag (optimize
for size) is assumed.
### Test code
The example code and constants are crafted specifically for RISC-V.
Each group of `pred*` functions should function identically (if not, please let
me know; it might be a typo).
* The "a" variants are what I commonly write for checking the set bits.
* The "b" variants are what I believe the compiler should ideally transform the
code to. I wrote them to let compiler developers know how the optimization can
be done. (But in practice the "b" code might transform to "a", meaning the
"optimization" direction reversed.)
* The "c" variants are hacks to make things work. They contain `__asm__
volatile` directives to force GCC or Clang to optimize in the direction I want.
The generated assembly should present what I considered the ideal result.
```c
#include <stdbool.h>
#include <stdint.h>
#define POWER_OF_TWO_FACTOR(x) ((x) & -(x))
// -------------------------------------------------------------------
// Example 1: The bitwise AND mask contains lower bits in all ones.
// By converting the bitwise AND into a bitwise OR, an "addi"
// instruction can be saved.
// (This might conflict with optimizations utilizing RISC-V "bclri"
// instruction; use one or the other.)
// (In ARM there are "bic" instructions already, making this
// optimization useless.)
static uint32_t mask1 = 0x55555FFF;
static uint32_t val1 = 0x14501DEF;
// static_assert((mask1 & val1) == val1);
// static_assert((mask1 & 0xFFF) == 0xFFF);
bool pred1a(uint32_t x) {
return ((x & mask1) == val1);
}
bool pred1b(uint32_t x) {
return ((x | ~mask1) == (val1 | ~mask1));
}
bool pred1c(uint32_t x) {
register uint32_t temp = x | ~mask1;
__asm__ volatile ("" : "+r" (temp));
return (temp == (val1 | ~mask1));
}
// -------------------------------------------------------------------
// Example 2: The bitwise AND mask could fit an 11-bit immediate
// operand of RISC-V "andi" instruction with a help of right
// shifting. (Keep the sign bit of the immediate operand zero.)
// (This kind of optimization could also work with other RISC
// architectures, except ARM.)
static uint32_t mask2 = 0x55500000;
static uint32_t val2 = 0x14500000;
// static_assert(mask2 != 0);
// static_assert((mask2 & val2) == val2);
// static_assert(mask2 / POWER_OF_TWO_FACTOR(mask2) <= 0x7FF);
bool pred2a(uint32_t x) {
return ((x & mask2) == val2);
}
bool pred2b(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
return ((x >> __builtin_ctz(factor)) & (mask2 / factor))
== (val2 / factor);
}
bool pred2c(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
register uint32_t temp = x >> 20;
__asm__ volatile ("" : "+r" (temp));
return (temp & 0x555) == 0x145;
}
// -------------------------------------------------------------------
// Example 3: The bitwise AND mask could fit a 20-bit immediate
// operand of RISC-V "lui" instruction.
// Only RISC-V has this 20-bit immediate "U-type" format, AFAIK.
static uint32_t mask3 = 0x00055555;
static uint32_t val3 = 0x00045014;
// static_assert(mask3 / POWER_OF_TWO_FACTOR(mask3) <= 0xFFFFF);
bool pred3a(uint32_t x) {
return ((x & mask3) == val3);
}
bool pred3b(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
return (((x / factor) << 12) & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}
bool pred3c(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
register uint32_t temp = x << 12;
__asm__ volatile ("" : "+r" (temp));
return (temp & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}
```
I tested the code in the Compiler Explorer (godbolt.org).
### Generated assembly (for reference only)
```
pred1a:
li a5,1431658496
addi a5,a5,-1
and a0,a0,a5
li a5,340795392
addi a5,a5,-529
sub a0,a0,a5
seqz a0,a0
ret
pred1b: # Expected result
li a5,-1431658496
or a0,a0,a5
li a5,-1090863104
addi a5,a5,-529
sub a0,a0,a5
seqz a0,a0
ret
pred2a:
li a5,1431306240
and a0,a0,a5
li a5,340787200
sub a0,a0,a5
seqz a0,a0
ret
# pred2b compiles to same code as pred2a
pred2c: # Expected result
srliw a0,a0,20
andi a0,a0,1365
addi a0,a0,-325
seqz a0,a0
ret
pred3a:
li a5,348160
addi a5,a5,1365
and a0,a0,a5
li a5,282624
addi a5,a5,20
sub a0,a0,a5
seqz a0,a0
ret
# pred3b compiles to same code as pred3a
pred3c: # Expected result
slliw a0,a0,12
li a5,1431654400
and a0,a0,a5
li a5,1157709824
sub a0,a0,a5
seqz a0,a0
ret
```
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug middle-end/114087] RISC-V optimization on checking certain bits set ((x & mask) == val)
2024-02-24 12:48 [Bug rtl-optimization/114087] New: RISC-V optimization on checking certain bits set ((x & mask) == val) Explorer09 at gmail dot com
@ 2024-02-24 17:32 ` pinskia at gcc dot gnu.org
2024-02-27 1:16 ` andrew at sifive dot com
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-24 17:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Keywords| |missed-optimization
Component|rtl-optimization |middle-end
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug middle-end/114087] RISC-V optimization on checking certain bits set ((x & mask) == val)
2024-02-24 12:48 [Bug rtl-optimization/114087] New: RISC-V optimization on checking certain bits set ((x & mask) == val) Explorer09 at gmail dot com
2024-02-24 17:32 ` [Bug middle-end/114087] " pinskia at gcc dot gnu.org
@ 2024-02-27 1:16 ` andrew at sifive dot com
1 sibling, 0 replies; 3+ messages in thread
From: andrew at sifive dot com @ 2024-02-27 1:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087
Andrew Waterman <andrew at sifive dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |andrew at sifive dot com
--- Comment #1 from Andrew Waterman <andrew at sifive dot com> ---
Note that, in some of these cases, there is a tradeoff. If this code were
executed in a loop, such that the constant loads could be hoisted, (2a) would
have a shorter critical path than (2c); likewise (3a) vs. (3c).
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-02-27 1:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-24 12:48 [Bug rtl-optimization/114087] New: RISC-V optimization on checking certain bits set ((x & mask) == val) Explorer09 at gmail dot com
2024-02-24 17:32 ` [Bug middle-end/114087] " pinskia at gcc dot gnu.org
2024-02-27 1:16 ` andrew at sifive dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).