public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh
@ 2022-09-29 17:51 zhongyunde at huawei dot com
2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-09-29 17:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
Bug ID: 107090
Summary: [aarch64] sequence logic should be combined with mul
and umulh
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: zhongyunde at huawei dot com
Target Milestone: ---
* test case: https://godbolt.org/z/x5jMhqW8s
```
# define BN_BITS4 32
# define BN_MASK2 (0xffffffffffffffffL)
# define BN_MASK2l (0xffffffffL)
# define BN_MASK2h (0xffffffff00000000L)
# define BN_MASK2h1 (0xffffffff80000000L)
# define LBITS(a) ((a)&BN_MASK2l)
# define HBITS(a) (((a)>>BN_BITS4)&BN_MASK2l)
# define L2HBITS(a) (((a)<<BN_BITS4)&BN_MASK2)
void mul64(unsigned long in0, unsigned long in1,
unsigned long &l, unsigned long &h) {
unsigned long m, m1, lt, ht, bl, bh;
lt = LBITS(in0);
ht = HBITS(in0);
bl = LBITS(in1);
bh = HBITS(in1);
m = bh * lt;
lt = bl * lt;
m1 = bl * ht;
ht = bh * ht;
m = (m + m1) & BN_MASK2;
if (m < m1) ht += L2HBITS((unsigned long)1);
ht += HBITS(m);
m1 = L2HBITS(m);
lt = (lt + m1) & BN_MASK2; if (lt < m1) ht++;
l = lt;
h = ht;
}
```
* The above source is equel to an mull operater for two 64bits integer vaules,
so it should be fold to similar assemble
```
mul x8,x1,x0
umulh x9,x0,x1
str x8,[x2]
str x9,[x3]
ret
```
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
@ 2022-09-29 17:55 ` pinskia at gcc dot gnu.org
2022-09-29 21:32 ` [Bug tree-optimization/107090] " pinskia at gcc dot gnu.org
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-29 17:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
@ 2022-09-29 21:32 ` pinskia at gcc dot gnu.org
2022-10-01 23:42 ` zhongyunde at huawei dot com
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-29 21:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2022-09-29
Component|middle-end |tree-optimization
Ever confirmed|0 |1
CC| |pinskia at gcc dot gnu.org
Status|UNCONFIRMED |NEW
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
A few issues.
First is:
if (_26 != 0)
goto <bb 3>; [50.00%]
else
goto <bb 4>; [50.00%]
<bb 3> [local count: 536870913]:
ht_15 = ht_13 + 4294967296;
<bb 4> [local count: 1073741824]:
# ht_2 = PHI <ht_13(2), ht_15(3)>
This should be done as:
tmp_ = _26 != 0
tmp1_ = (unsigned long) tmp_
tmp2_ = tmp1_ << 32;
ht_2 = ht_13 + tmp2_;
And then there is huge pattern matching with respect to doing widening multiple
here.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
2022-09-29 21:32 ` [Bug tree-optimization/107090] " pinskia at gcc dot gnu.org
@ 2022-10-01 23:42 ` zhongyunde at huawei dot com
2022-10-01 23:51 ` pinskia at gcc dot gnu.org
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-01 23:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
--- Comment #2 from vfdff <zhongyunde at huawei dot com> ---
Thanks for your suggestion.
As the combine pass can't address more than 4 sequence insns, which pass may be
more suitable to match the huge pattern after fixing the 1st issue.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (2 preceding siblings ...)
2022-10-01 23:42 ` zhongyunde at huawei dot com
@ 2022-10-01 23:51 ` pinskia at gcc dot gnu.org
2022-10-07 9:44 ` zhongyunde at huawei dot com
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-01 23:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to vfdff from comment #2)
> Thanks for your suggestion.
>
> As the combine pass can't address more than 4 sequence insns, which pass may
> be more suitable to match the huge pattern after fixing the 1st issue.
match.pd can handle more than 4 gimple statement to do the matching.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (3 preceding siblings ...)
2022-10-01 23:51 ` pinskia at gcc dot gnu.org
@ 2022-10-07 9:44 ` zhongyunde at huawei dot com
2022-10-10 4:22 ` zhongyunde at huawei dot com
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-07 9:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
--- Comment #4 from vfdff <zhongyunde at huawei dot com> ---
(In reply to Andrew Pinski from comment #1)
> A few issues.
> First is:
>
> if (_26 != 0)
> goto <bb 3>; [50.00%]
> else
> goto <bb 4>; [50.00%]
>
> <bb 3> [local count: 536870913]:
> ht_15 = ht_13 + 4294967296;
>
> <bb 4> [local count: 1073741824]:
> # ht_2 = PHI <ht_13(2), ht_15(3)>
>
For the 1st issue, I see the gcc works well before gcc8 with if-conversion,
https://godbolt.org/z/99a9e59Ge
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (4 preceding siblings ...)
2022-10-07 9:44 ` zhongyunde at huawei dot com
@ 2022-10-10 4:22 ` zhongyunde at huawei dot com
2022-10-10 4:30 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-10 4:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
vfdff <zhongyunde at huawei dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |zhongyunde at huawei dot com
--- Comment #5 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53684
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53684&action=edit
Add A ? B + CST : B match and simplify optimizations
Fix the 1st issue of the pattern match
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (5 preceding siblings ...)
2022-10-10 4:22 ` zhongyunde at huawei dot com
@ 2022-10-10 4:30 ` pinskia at gcc dot gnu.org
2022-10-10 9:49 ` zhongyunde at huawei dot com
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-10 4:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |103216
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to vfdff from comment #5)
> Created attachment 53684 [details]
> Add A ? B + CST : B match and simplify optimizations
>
> Fix the 1st issue of the pattern match
There is a generic way of implementing this which I had posted at
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html (you can
finish up that patch if you want).
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103216
[Bug 103216] missed optimization, phiopt/vrp?
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (6 preceding siblings ...)
2022-10-10 4:30 ` pinskia at gcc dot gnu.org
@ 2022-10-10 9:49 ` zhongyunde at huawei dot com
2022-10-12 13:36 ` zhongyunde at huawei dot com
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-10 9:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
vfdff <zhongyunde at huawei dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #53684|0 |1
is obsolete| |
--- Comment #7 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53685
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53685&action=edit
[PHIOPT] Add A ? B + CST : B match and simplify optimizations
Thank you for your guidance, I have referred to your patch linked to supplement
the various types of operations.
Due to an error in building the (op @1 (cond^ @0 @2 {build_zero_cst (type);})),
I have not incorporated the modifications into your patch at this time. If it
is your consent, I would like to merge these modifications separately. And then
finish up that patch, after asking for further information.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (7 preceding siblings ...)
2022-10-10 9:49 ` zhongyunde at huawei dot com
@ 2022-10-12 13:36 ` zhongyunde at huawei dot com
2022-10-12 21:33 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-12 13:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
--- Comment #8 from vfdff <zhongyunde at huawei dot com> ---
hi @Andrew Pinski
For the 2nd issue, I also matched the huge pattern, but it need return two
value, it seems don't work with current framework? so should I have to split it
into two simples to match the high and low values of ResHi and ResLo
separately?
```
(i64 ResLo, i64 ResHi) = Mul64(i64 In0, i64 In1) {
In0Hi = In0(D) & 4294967295;
In0Lo = In0(D) >> 32;
In1Hi = In1(D) & 4294967295;
In1Lo = In1(D) >> 32;
Mull_01 = In0Lo * In1Hi;
Addc = In0Hi * In1Lo + Mull_01;
addc32 = Addc << 32;
ResLo = In0Hi * In1Hi + addc32;
ResHi = ((long unsigned int) (addc32 > ResLo)) + In0Lo * In1Lo +
(((long unsigned int) (Mull_01 > Addc)) << 32) + (Addc >> 32);
}
```
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (8 preceding siblings ...)
2022-10-12 13:36 ` zhongyunde at huawei dot com
@ 2022-10-12 21:33 ` pinskia at gcc dot gnu.org
2022-10-13 2:57 ` zhongyunde at huawei dot com
2022-10-29 1:54 ` zhongyunde at huawei dot com
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-12 21:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Look at how ctz_table_index is done and used.
The matching is done in match.pd language and then inside
simplify_count_trailing_zeroes (tree-ssa-forwprop.cc) it is used
nop_atomic_bit_test_and_p is another example but that is more complex and is
used inside tree-ssa-ccp.cc .
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (9 preceding siblings ...)
2022-10-12 21:33 ` pinskia at gcc dot gnu.org
@ 2022-10-13 2:57 ` zhongyunde at huawei dot com
2022-10-29 1:54 ` zhongyunde at huawei dot com
11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-13 2:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
--- Comment #10 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53698
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53698&action=edit
the huge bb sligtly change after match ResLo
Thanks for your suggestion, and I think both ctz_table_index and
nop_atomic_bit_test_and_p only return one value, so I'll try to match ResHi and
ResLo separately as the bb only sligtly change after we first matched the ResLo
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
` (10 preceding siblings ...)
2022-10-13 2:57 ` zhongyunde at huawei dot com
@ 2022-10-29 1:54 ` zhongyunde at huawei dot com
11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-29 1:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090
--- Comment #11 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53787
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53787&action=edit
has different operand order base on different commit node
hi @Andrew Pinski
* Showed as the figure swap_order.jpg attaiched, we can introduce flags :c for
the plus node m_13 to match commutated node according
https://gcc.gnu.org/onlinedocs/gccint/The-Language.html.
And for the plus node _24, does it also have some similar flag to simplify the
matching ?
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-10-29 1:54 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
2022-09-29 21:32 ` [Bug tree-optimization/107090] " pinskia at gcc dot gnu.org
2022-10-01 23:42 ` zhongyunde at huawei dot com
2022-10-01 23:51 ` pinskia at gcc dot gnu.org
2022-10-07 9:44 ` zhongyunde at huawei dot com
2022-10-10 4:22 ` zhongyunde at huawei dot com
2022-10-10 4:30 ` pinskia at gcc dot gnu.org
2022-10-10 9:49 ` zhongyunde at huawei dot com
2022-10-12 13:36 ` zhongyunde at huawei dot com
2022-10-12 21:33 ` pinskia at gcc dot gnu.org
2022-10-13 2:57 ` zhongyunde at huawei dot com
2022-10-29 1:54 ` zhongyunde at huawei dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).