public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh
@ 2022-09-29 17:51 zhongyunde at huawei dot com
  2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-09-29 17:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

            Bug ID: 107090
           Summary: [aarch64] sequence logic should be combined with mul
                    and umulh
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhongyunde at huawei dot com
  Target Milestone: ---

* test case: https://godbolt.org/z/x5jMhqW8s
```
#  define BN_BITS4        32
#  define BN_MASK2        (0xffffffffffffffffL)
#  define BN_MASK2l       (0xffffffffL)
#  define BN_MASK2h       (0xffffffff00000000L)
#  define BN_MASK2h1      (0xffffffff80000000L)
#  define LBITS(a)        ((a)&BN_MASK2l)
#  define HBITS(a)        (((a)>>BN_BITS4)&BN_MASK2l)
#  define L2HBITS(a)      (((a)<<BN_BITS4)&BN_MASK2)

void mul64(unsigned long in0, unsigned long in1,
           unsigned long &l, unsigned long &h) {
    unsigned long m, m1, lt, ht, bl, bh;
    lt = LBITS(in0);
    ht = HBITS(in0);
    bl = LBITS(in1);
    bh = HBITS(in1);
    m  = bh * lt;
    lt = bl * lt;
    m1 = bl * ht;
    ht = bh * ht;
    m  = (m + m1) & BN_MASK2;
    if (m < m1) ht += L2HBITS((unsigned long)1);
    ht += HBITS(m);
    m1 = L2HBITS(m);
    lt = (lt + m1) & BN_MASK2; if (lt < m1) ht++;
    l  = lt;
    h  = ht;
}
```
* The above source is equel to an mull operater for two 64bits integer vaules,
so it should be fold to similar assemble
```
   mul   x8,x1,x0
   umulh x9,x0,x1
   str   x8,[x2]
   str   x9,[x3]
   ret
```

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
@ 2022-09-29 17:55 ` pinskia at gcc dot gnu.org
  2022-09-29 21:32 ` [Bug tree-optimization/107090] " pinskia at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-29 17:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
  2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
@ 2022-09-29 21:32 ` pinskia at gcc dot gnu.org
  2022-10-01 23:42 ` zhongyunde at huawei dot com
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-29 21:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-09-29
          Component|middle-end                  |tree-optimization
     Ever confirmed|0                           |1
                 CC|                            |pinskia at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
A few issues.
First is:

  if (_26 != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  ht_15 = ht_13 + 4294967296;

  <bb 4> [local count: 1073741824]:
  # ht_2 = PHI <ht_13(2), ht_15(3)>

This should be done as:
tmp_ = _26 != 0
tmp1_ = (unsigned long) tmp_
tmp2_ = tmp1_ << 32;
ht_2 = ht_13 + tmp2_;

And then there is huge pattern matching with respect to doing widening multiple
here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
  2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
  2022-09-29 21:32 ` [Bug tree-optimization/107090] " pinskia at gcc dot gnu.org
@ 2022-10-01 23:42 ` zhongyunde at huawei dot com
  2022-10-01 23:51 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-01 23:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #2 from vfdff <zhongyunde at huawei dot com> ---
Thanks for your suggestion.

As the combine pass can't address more than 4 sequence insns, which pass may be
more suitable to match the huge pattern after fixing the 1st issue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (2 preceding siblings ...)
  2022-10-01 23:42 ` zhongyunde at huawei dot com
@ 2022-10-01 23:51 ` pinskia at gcc dot gnu.org
  2022-10-07  9:44 ` zhongyunde at huawei dot com
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-01 23:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to vfdff from comment #2)
> Thanks for your suggestion.
> 
> As the combine pass can't address more than 4 sequence insns, which pass may
> be more suitable to match the huge pattern after fixing the 1st issue.

match.pd can handle more than 4 gimple statement to do the matching.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (3 preceding siblings ...)
  2022-10-01 23:51 ` pinskia at gcc dot gnu.org
@ 2022-10-07  9:44 ` zhongyunde at huawei dot com
  2022-10-10  4:22 ` zhongyunde at huawei dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-07  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #4 from vfdff <zhongyunde at huawei dot com> ---
(In reply to Andrew Pinski from comment #1)
> A few issues.
> First is:
> 
>   if (_26 != 0)
>     goto <bb 3>; [50.00%]
>   else
>     goto <bb 4>; [50.00%]
> 
>   <bb 3> [local count: 536870913]:
>   ht_15 = ht_13 + 4294967296;
> 
>   <bb 4> [local count: 1073741824]:
>   # ht_2 = PHI <ht_13(2), ht_15(3)>
> 

For the 1st issue, I see the gcc works well before gcc8 with if-conversion,
https://godbolt.org/z/99a9e59Ge

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (4 preceding siblings ...)
  2022-10-07  9:44 ` zhongyunde at huawei dot com
@ 2022-10-10  4:22 ` zhongyunde at huawei dot com
  2022-10-10  4:30 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-10  4:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

vfdff <zhongyunde at huawei dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zhongyunde at huawei dot com

--- Comment #5 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53684
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53684&action=edit
Add A ? B + CST : B match and simplify optimizations

Fix the 1st issue of the pattern match

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (5 preceding siblings ...)
  2022-10-10  4:22 ` zhongyunde at huawei dot com
@ 2022-10-10  4:30 ` pinskia at gcc dot gnu.org
  2022-10-10  9:49 ` zhongyunde at huawei dot com
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-10  4:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |103216

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to vfdff from comment #5)
> Created attachment 53684 [details]
> Add A ? B + CST : B match and simplify optimizations
> 
> Fix the 1st issue of the pattern match

There is a generic way of implementing this which I had posted at
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html (you can
finish up that patch if you want).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103216
[Bug 103216] missed optimization, phiopt/vrp?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (6 preceding siblings ...)
  2022-10-10  4:30 ` pinskia at gcc dot gnu.org
@ 2022-10-10  9:49 ` zhongyunde at huawei dot com
  2022-10-12 13:36 ` zhongyunde at huawei dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-10  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

vfdff <zhongyunde at huawei dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #53684|0                           |1
        is obsolete|                            |

--- Comment #7 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53685
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53685&action=edit
[PHIOPT] Add A ? B + CST : B match and simplify optimizations

Thank you for your guidance, I have referred to your patch linked to supplement
the various types of operations.
Due to an error in building the (op @1 (cond^ @0 @2 {build_zero_cst (type);})),
I have not incorporated the modifications into your patch at this time. If it
is your consent, I would like to merge these modifications separately. And then
finish up that patch, after asking for further information.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (7 preceding siblings ...)
  2022-10-10  9:49 ` zhongyunde at huawei dot com
@ 2022-10-12 13:36 ` zhongyunde at huawei dot com
  2022-10-12 21:33 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-12 13:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #8 from vfdff <zhongyunde at huawei dot com> ---
hi @Andrew Pinski
  For the 2nd issue, I also matched the huge pattern, but it need return two
value, it seems don't work with current framework? so should I have to split it
into two simples to match the high and low values of ResHi and ResLo
separately?
```
 (i64 ResLo, i64 ResHi) = Mul64(i64 In0, i64 In1) {
    In0Hi = In0(D) & 4294967295;
    In0Lo = In0(D) >> 32;
    In1Hi = In1(D) & 4294967295;
    In1Lo = In1(D) >> 32;
    Mull_01 = In0Lo * In1Hi;
    Addc = In0Hi * In1Lo + Mull_01;
    addc32 = Addc << 32;
    ResLo = In0Hi * In1Hi + addc32;
    ResHi = ((long unsigned int) (addc32 > ResLo)) + In0Lo * In1Lo +
             (((long unsigned int) (Mull_01 > Addc)) << 32) + (Addc >> 32);
 }
```

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (8 preceding siblings ...)
  2022-10-12 13:36 ` zhongyunde at huawei dot com
@ 2022-10-12 21:33 ` pinskia at gcc dot gnu.org
  2022-10-13  2:57 ` zhongyunde at huawei dot com
  2022-10-29  1:54 ` zhongyunde at huawei dot com
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-10-12 21:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Look at how ctz_table_index is done and used.
The matching is done in match.pd language and then inside
simplify_count_trailing_zeroes (tree-ssa-forwprop.cc) it is used 

nop_atomic_bit_test_and_p is another example but that is more complex and is
used inside tree-ssa-ccp.cc .

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (9 preceding siblings ...)
  2022-10-12 21:33 ` pinskia at gcc dot gnu.org
@ 2022-10-13  2:57 ` zhongyunde at huawei dot com
  2022-10-29  1:54 ` zhongyunde at huawei dot com
  11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-13  2:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #10 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53698
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53698&action=edit
the huge bb sligtly change after match ResLo

Thanks for your suggestion, and I think both ctz_table_index and
nop_atomic_bit_test_and_p only return one value, so I'll try to match ResHi and
ResLo separately as the bb only sligtly change after we first matched the ResLo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/107090] [aarch64] sequence logic should be combined with mul and umulh
  2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
                   ` (10 preceding siblings ...)
  2022-10-13  2:57 ` zhongyunde at huawei dot com
@ 2022-10-29  1:54 ` zhongyunde at huawei dot com
  11 siblings, 0 replies; 13+ messages in thread
From: zhongyunde at huawei dot com @ 2022-10-29  1:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090

--- Comment #11 from vfdff <zhongyunde at huawei dot com> ---
Created attachment 53787
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53787&action=edit
has different operand order base on different commit node

hi @Andrew Pinski

* Showed as the figure swap_order.jpg attaiched, we can introduce flags :c for
the plus node m_13 to match commutated node according
https://gcc.gnu.org/onlinedocs/gccint/The-Language.html.

And for the plus node _24, does it also have some similar flag to simplify the
matching ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-10-29  1:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29 17:51 [Bug c/107090] New: [aarch64] sequence logic should be combined with mul and umulh zhongyunde at huawei dot com
2022-09-29 17:55 ` [Bug middle-end/107090] " pinskia at gcc dot gnu.org
2022-09-29 21:32 ` [Bug tree-optimization/107090] " pinskia at gcc dot gnu.org
2022-10-01 23:42 ` zhongyunde at huawei dot com
2022-10-01 23:51 ` pinskia at gcc dot gnu.org
2022-10-07  9:44 ` zhongyunde at huawei dot com
2022-10-10  4:22 ` zhongyunde at huawei dot com
2022-10-10  4:30 ` pinskia at gcc dot gnu.org
2022-10-10  9:49 ` zhongyunde at huawei dot com
2022-10-12 13:36 ` zhongyunde at huawei dot com
2022-10-12 21:33 ` pinskia at gcc dot gnu.org
2022-10-13  2:57 ` zhongyunde at huawei dot com
2022-10-29  1:54 ` zhongyunde at huawei dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).