public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
@ 2023-08-31 4:31 xry111 at gcc dot gnu.org
2023-08-31 4:33 ` [Bug target/111252] " xry111 at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-08-31 4:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
Bug ID: 111252
Summary: LoongArch: Suboptimal code for (a & ~mask) | (b &
mask) where mask is a constant with value ((1 << n) -
1) << m
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: xry111 at gcc dot gnu.org
Target Milestone: ---
int test(int a, int b)
{
return (a & ~0x10) | (b & 0x10);
}
compiles to:
addi.w $r12,$r0,-17 # 0xffffffffffffffef
and $r12,$r12,$r4
andi $r5,$r5,16
or $r12,$r12,$r5
slli.w $r4,$r12,0
jr $r1
It should be improved:
bstrpick.w $r4, $r4, 4, 4
bstrins.w $r5, $r4, 4, 4
or $r5, $r4, $r0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
@ 2023-08-31 4:33 ` xry111 at gcc dot gnu.org
2023-08-31 4:42 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-08-31 4:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
Xi Ruoyao <xry111 at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |chenglulu at loongson dot cn,
| |chenxiaolong at loongson dot cn
Target| |loongarch*-*-*
--- Comment #1 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
In particular this issue causes the compiler to compile __builtin_copysignf128
into very stupid code.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
2023-08-31 4:33 ` [Bug target/111252] " xry111 at gcc dot gnu.org
@ 2023-08-31 4:42 ` pinskia at gcc dot gnu.org
2023-08-31 4:44 ` xry111 at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-31 4:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu.org
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2023-08-31
Keywords| |missed-optimization
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Interesting:
int test(int a, int b)
{
return (a & ~0x80000000) | (b & 0x80000000);
}
Produces better code:
lu12i.w $r12,-2147483648>>12 # 0xffffffff80000000
and $r12,$r12,$r5
bstrpick.w $r4,$r4,30,0
or $r4,$r4,$r12
slli.w $r4,$r4,0
jr $r1
But note the expansion of __builtin_copysignf128 should be using
extract_bit_field and friends to extract the bit and do the insertation. I have
not looked into that yet though.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
2023-08-31 4:33 ` [Bug target/111252] " xry111 at gcc dot gnu.org
2023-08-31 4:42 ` pinskia at gcc dot gnu.org
@ 2023-08-31 4:44 ` xry111 at gcc dot gnu.org
2023-08-31 4:46 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-08-31 4:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
Xi Ruoyao <xry111 at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
` (2 preceding siblings ...)
2023-08-31 4:44 ` xry111 at gcc dot gnu.org
@ 2023-08-31 4:46 ` pinskia at gcc dot gnu.org
2023-08-31 4:53 ` xry111 at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-31 4:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The easiest fix for __builtin_copysignf128 is change expand_copysign_bit in
optabs.cc to use extract_bit_field to do the extraction and store_bit_field for
the insert instead of what it currently does of using ands and ors ...
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
` (3 preceding siblings ...)
2023-08-31 4:46 ` pinskia at gcc dot gnu.org
@ 2023-08-31 4:53 ` xry111 at gcc dot gnu.org
2023-08-31 4:54 ` xry111 at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-08-31 4:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
--- Comment #4 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> Interesting:
> int test(int a, int b)
> {
> return (a & ~0x80000000) | (b & 0x80000000);
> }
>
> Produces better code:
> lu12i.w $r12,-2147483648>>12 # 0xffffffff80000000
> and $r12,$r12,$r5
> bstrpick.w $r4,$r4,30,0
> or $r4,$r4,$r12
> slli.w $r4,$r4,0
> jr $r1
Hmm, this seems a separate issue. The compiler knows to optimize (a & mask) if
mask is ((1 << a) - 1) << b iff a + b = 32 or b = 0, but not for any other
masks even if it's "expensive" to materialize the mask:
long test(long a, long b)
{
return a & 0xfffff0000l;
}
compiles to:
lu12i.w $r12,-65536>>12 # 0xffffffffffff0000
lu32i.d $r12,0xf00000000>>32
and $r4,$r4,$r12
jr $r1
But the following is better:
bstrpick.d $r12, $r12, 35, 16
slli.d $r12, $r12, 16
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
` (4 preceding siblings ...)
2023-08-31 4:53 ` xry111 at gcc dot gnu.org
@ 2023-08-31 4:54 ` xry111 at gcc dot gnu.org
2023-09-07 7:59 ` cvs-commit at gcc dot gnu.org
2023-09-07 8:01 ` xry111 at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-08-31 4:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
--- Comment #5 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #4)
> Hmm, this seems a separate issue. The compiler knows to optimize (a & mask)
> if mask is ((1 << a) - 1) << b iff a + b = 32 or b = 0, but not for any
I mean "32 or 64".
> other masks even if it's "expensive" to materialize the mask:
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
` (5 preceding siblings ...)
2023-08-31 4:54 ` xry111 at gcc dot gnu.org
@ 2023-09-07 7:59 ` cvs-commit at gcc dot gnu.org
2023-09-07 8:01 ` xry111 at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-09-07 7:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Xi Ruoyao <xry111@gcc.gnu.org>:
https://gcc.gnu.org/g:5b857e87201335148f23ec7134cf7fbf97c04c72
commit r14-3773-g5b857e87201335148f23ec7134cf7fbf97c04c72
Author: Xi Ruoyao <xry111@xry111.site>
Date: Tue Sep 5 19:42:30 2023 +0800
LoongArch: Use bstrins instruction for (a & ~mask) and (a & mask) | (b &
~mask) [PR111252]
If mask is a constant with value ((1 << N) - 1) << M we can perform this
optimization.
gcc/ChangeLog:
PR target/111252
* config/loongarch/loongarch-protos.h
(loongarch_pre_reload_split): Declare new function.
(loongarch_use_bstrins_for_ior_with_mask): Likewise.
* config/loongarch/loongarch.cc
(loongarch_pre_reload_split): Implement.
(loongarch_use_bstrins_for_ior_with_mask): Likewise.
* config/loongarch/predicates.md (ins_zero_bitmask_operand):
New predicate.
* config/loongarch/loongarch.md (bstrins_<mode>_for_mask):
New define_insn_and_split.
(bstrins_<mode>_for_ior_mask): Likewise.
(define_peephole2): Further optimize code sequence produced by
bstrins_<mode>_for_ior_mask if possible.
gcc/testsuite/ChangeLog:
* g++.target/loongarch/bstrins-compile.C: New test.
* g++.target/loongarch/bstrins-run.C: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/111252] LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
` (6 preceding siblings ...)
2023-09-07 7:59 ` cvs-commit at gcc dot gnu.org
@ 2023-09-07 8:01 ` xry111 at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-09-07 8:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111252
Xi Ruoyao <xry111 at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #7 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
Done.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-09-07 8:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-31 4:31 [Bug target/111252] New: LoongArch: Suboptimal code for (a & ~mask) | (b & mask) where mask is a constant with value ((1 << n) - 1) << m xry111 at gcc dot gnu.org
2023-08-31 4:33 ` [Bug target/111252] " xry111 at gcc dot gnu.org
2023-08-31 4:42 ` pinskia at gcc dot gnu.org
2023-08-31 4:44 ` xry111 at gcc dot gnu.org
2023-08-31 4:46 ` pinskia at gcc dot gnu.org
2023-08-31 4:53 ` xry111 at gcc dot gnu.org
2023-08-31 4:54 ` xry111 at gcc dot gnu.org
2023-09-07 7:59 ` cvs-commit at gcc dot gnu.org
2023-09-07 8:01 ` xry111 at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).