* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
@ 2021-01-28 13:40 ` rguenth at gcc dot gnu.org
2021-01-28 14:26 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-28 13:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |x86_64-*-*
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Happens in Botan AES-128/XTS (seen in PR98856). Probably sth for RTL expansion
or even match.pd and not target specific. Quite faster for > word_mode
arithmetic (only the upper part needs shifting and can be shared for the
bitwise and) - but that's then really for RTL expansion.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
2021-01-28 13:40 ` [Bug middle-end/98865] " rguenth at gcc dot gnu.org
@ 2021-01-28 14:26 ` jakub at gcc dot gnu.org
2021-03-07 2:46 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-28 14:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
PR middle-end/98865
* match.pd (a * (b >> (prec-1)) to ((signed)b >> (prec-1)) & a): New
simplification.
--- gcc/match.pd.jj 2021-01-22 11:50:09.882909120 +0100
+++ gcc/match.pd 2021-01-28 15:20:20.536238614 +0100
@@ -793,6 +793,16 @@ (define_operator_list COND_TERNARY
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
+/* Fold (a * (b >> (prec-1))) with logical shift into
+ ((signed)b >> (prec-1)) & a. */
+(simplify
+ (mult:c @0 (nop_convert? (rshift @1 INTEGER_CST@2)))
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
+ && TYPE_UNSIGNED (TREE_TYPE (@1))
+ && wi::to_widest (@2) + 1 == TYPE_PRECISION (TREE_TYPE (@1)))
+ (with { tree stype = signed_type_for (TREE_TYPE (@1)); }
+ (bit_and (convert:type (rshift (convert:stype @1) @2)) @0))))
+
/* Fold (1 << (C - x)) where C = precision(type) - 1
into ((1 << C) >> x). */
(simplify
(completely untested) does that.
It doesn't handle vector types, whether that is a good idea or not depends on
how do we deal with the match.pd simplifications after last veclower pass
issue.
And, given:
unsigned long long
foo (unsigned long long a, unsigned long long b)
{
return (a >> 63) * b;
}
long long
bar (long long a, long long b)
{
return -(a >> 63) * b;
}
long long
baz (long long a, long long b)
{
long long c = a >> 63;
long long d = -c;
return d * b;
}
we optimize with it for and bar but not baz, apparently the -(a >> 63)
arithmetic to (a >> 63) logical shift is done only in GENERIC folding and not
later.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
2021-01-28 13:40 ` [Bug middle-end/98865] " rguenth at gcc dot gnu.org
2021-01-28 14:26 ` jakub at gcc dot gnu.org
@ 2021-03-07 2:46 ` pinskia at gcc dot gnu.org
2021-07-20 22:19 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-03-07 2:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-03-07
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
` (2 preceding siblings ...)
2021-03-07 2:46 ` pinskia at gcc dot gnu.org
@ 2021-07-20 22:19 ` pinskia at gcc dot gnu.org
2021-09-22 18:19 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-20 22:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu.org
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually I think we should do:
(simplify
(mult:c truth_valuep@0 @1)
(and (neg @0) @1))
Instead. What do you think?
This will catch things like:
unsigned long foo (long a, unsigned long b)
{
unsigned long t = a & 1;
return t * b;
}
---- CUT ---
We can put a ! after neg if we want it to be optimized out even.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
` (3 preceding siblings ...)
2021-07-20 22:19 ` pinskia at gcc dot gnu.org
@ 2021-09-22 18:19 ` cvs-commit at gcc dot gnu.org
2022-01-11 11:19 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-22 18:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:8f571e64713cc72561f84241863496e473eae4c6
commit r12-3824-g8f571e64713cc72561f84241863496e473eae4c6
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Wed Sep 22 19:17:49 2021 +0100
More NEGATE_EXPR folding in match.pd
As observed by Jakub in comment #2 of PR 98865, the expression -(a>>63)
is optimized in GENERIC but not in GIMPLE. Investigating further it
turns out that this is one of a few transformations performed by
fold_negate_expr in fold-const.c that aren't yet performed by match.pd.
This patch moves/duplicates them there, and should be relatively safe
as these transformations are already performed by the compiler, but
just in different passes.
This revised patch adds a Boolean simplify argument to tree-ssa-sccvn.c's
vn_nary_build_or_lookup_1 to control whether simplification should be
performed before value numbering, updating the callers, but then
avoiding simplification when constructing/value-numbering NEGATE_EXPR.
This avoids the regression of gcc.dg/tree-ssa/ssa-free-88.c, and enables
the new test case(s) to pass.
2021-09-22 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* match.pd (negation simplifications): Implement some negation
folding transformations from fold-const.c's fold_negate_expr.
* tree-ssa-sccvn.c (vn_nary_build_or_lookup_1): Add a SIMPLIFY
argument, to control whether the op should be simplified prior
to looking up/assigning a value number.
(vn_nary_build_or_lookup): Update call to
vn_nary_build_or_lookup_1.
(vn_nary_simplify): Likewise.
(visit_nary_op): Likewise, but when constructing a NEGATE_EXPR
now call vn_nary_build_or_lookup_1 disabling simplification.
gcc/testsuite/ChangeLog
* gcc.dg/fold-negate-1.c: New test case.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
` (4 preceding siblings ...)
2021-09-22 18:19 ` cvs-commit at gcc dot gnu.org
@ 2022-01-11 11:19 ` rguenth at gcc dot gnu.org
2022-05-18 15:24 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-11 11:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|enhancement |normal
Last reconfirmed|2021-03-07 00:00:00 |2022-1-11
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
` (5 preceding siblings ...)
2022-01-11 11:19 ` rguenth at gcc dot gnu.org
@ 2022-05-18 15:24 ` cvs-commit at gcc dot gnu.org
2022-05-19 16:55 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-18 15:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:4a9be8d51182076222d707d9d68f6eda78e8ee2c
commit r13-624-g4a9be8d51182076222d707d9d68f6eda78e8ee2c
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Wed May 18 16:23:01 2022 +0100
Correct ix86_rtx_cost for multi-word multiplication.
This is the i386 backend specific piece of my revised patch for
PR middle-end/98865, where Richard Biener has suggested that I perform
the desired transformation during RTL expansion where the backend can
control whether it is profitable to convert a multiplication into a
bit-wise AND and a negation. This works well for x86_64, but alas
exposes a latent bug with -m32, where a DImode multiplication incorrectly
appears to be cheaper than negdi2+anddi3(!?). The fix to ix86_rtx_costs
is to report that a DImode (multi-word) multiplication actually requires
three SImode multiplications and two SImode additions. This also corrects
the cost of TImode multiplication on TARGET_64BIT.
2022-05-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) [MULT]: When mode size
is wider than word_mode, a multiplication costs three word_mode
multiplications and two word_mode additions.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
` (6 preceding siblings ...)
2022-05-18 15:24 ` cvs-commit at gcc dot gnu.org
@ 2022-05-19 16:55 ` cvs-commit at gcc dot gnu.org
2022-05-27 8:02 ` cvs-commit at gcc dot gnu.org
2022-05-28 9:18 ` roger at nextmovesoftware dot com
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-19 16:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:d863ba23fb16122bb0547b0c678173be0d98f43c
commit r13-673-gd863ba23fb16122bb0547b0c678173be0d98f43c
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Thu May 19 17:54:38 2022 +0100
PR middle-end/98865: Expand X*Y as X&-Y when Y is [0,1].
The patch is a revised solution for PR middle-end/98865 incorporating
the feedback/suggestions from Richard Biener's review here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593928.html
Most significantly, this patch now performs the transformation/optimization
during RTL expansion, where the target's rtx_costs can be used to determine
whether the original multiplication (that may potentially be implemented by
a shift or lea) is cheaper than a negation and a bit-wise and.
Previously the expression (x>>63)*y would be compiled with -O2 as
shrq $63, %rdi
movq %rdi, %rax
imulq %rsi, %rax
but with this patch now produces:
sarq $63, %rdi
movq %rdi, %rax
andq %rsi, %rax
Likewise the expression (x>>63)*135 [that appears in a hot-spot of the
Botan AES-128 benchmark] was previously:
shrq $63, %rdi
leaq (%rdi,%rdi,8), %rdx
movq %rdx, %rax
salq $4, %rax
subq %rdx, %rax
now becomes:
movq %rdi, %rax
sarq $63, %rax
andl $135, %eax
2022-05-19 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/98865
* expr.cc (expand_expr_real_2) [MULT_EXPR]: Expand X*Y as X&Y
when both X and Y are [0, 1], X*Y as X&-Y when Y is [0,1] and
likewise X*Y as -X&Y when X is [0,1] using tree_nonzero_bits.
gcc/testsuite/ChangeLog
PR middle-end/98865
* gcc.target/i386/pr98865.c: New test case.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
` (7 preceding siblings ...)
2022-05-19 16:55 ` cvs-commit at gcc dot gnu.org
@ 2022-05-27 8:02 ` cvs-commit at gcc dot gnu.org
2022-05-28 9:18 ` roger at nextmovesoftware dot com
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-27 8:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:8fb94fc6097c0a934aac0d89c9c5e2038da67655
commit r13-793-g8fb94fc6097c0a934aac0d89c9c5e2038da67655
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Fri May 27 08:57:46 2022 +0100
Canonicalize X&-Y as X*Y in match.pd when Y is [0,1].
"For every pessimization, there's an equal and opposite optimization".
In the review of my original patch for PR middle-end/98865, Richard
Biener pointed out that match.pd shouldn't be transforming X*Y into
X&-Y as the former is considered cheaper by tree-ssa's cost model
(operator count). A corollary of this is that we should instead be
transforming X&-Y into the cheaper X*Y as a preferred canonical form
(especially as RTL expansion now intelligently selects the appropriate
implementation based on the target's costs).
With this patch we now generate identical code for:
int foo(int x, int y) { return -(x&1) & y; }
int bar(int x, int y) { return (x&1) * y; }
specifically on x86_64-pc-linux-gnu both use and/neg/and with -O2,
but both use and/mul with -Os.
One minor wrinkle/improvement is that this patch includes three
additional optimizations (that account for the change in canonical
form) to continue to optimize PR92834 and PR94786.
2022-05-27 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd (match_zero_one_valued_p): New predicate.
(mult @0 @1): Use zero_one_valued_p for optimization to the
expression "bit_and @0 @1".
(bit_and (negate zero_one_valued_p@0) @1): Optimize to MULT_EXPR.
(plus @0 (mult (minus @1 @0) zero_one_valued_p@2)): New transform.
(minus @0 (mult (minus @0 @1) zero_one_valued_p@2)): Likewise.
(bit_xor @0 (mult (bit_xor @0 @1) zero_one_valued_p@2)): Likewise.
Remove three redundant transforms obsoleted by the three above.
gcc/testsuite/ChangeLog
* gcc.dg/pr98865.c: New test case.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug middle-end/98865] Missed transform of (a >> 63) * b
2021-01-28 13:38 [Bug middle-end/98865] New: Missed transform of (a >> 63) * b rguenth at gcc dot gnu.org
` (8 preceding siblings ...)
2022-05-27 8:02 ` cvs-commit at gcc dot gnu.org
@ 2022-05-28 9:18 ` roger at nextmovesoftware dot com
9 siblings, 0 replies; 11+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-05-28 9:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot com
Status|NEW |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |13.0
--- Comment #9 from Roger Sayle <roger at nextmovesoftware dot com> ---
This is now fixed/implemented on mainline for GCC 13.
^ permalink raw reply [flat|nested] 11+ messages in thread