public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
@ 2021-06-17 12:07 ` rguenth at gcc dot gnu.org
  2021-06-18  1:22 ` crazylht at gmail dot com
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-17 12:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-06-17

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The complication is that the define_insn looks like

(define_insn "*fma_fmaddsub_<mode>"
  [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x")
        (unspec:VF_128_256
          [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,v,x,x")
           (match_operand:VF_128_256 2 "nonimmediate_operand" "vm,v,vm,x,m")
           (match_operand:VF_128_256 3 "nonimmediate_operand" "v,vm,0,xm,x")]
          UNSPEC_FMADDSUB))]
  "TARGET_FMA || TARGET_FMA4"
  "@
   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
   vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
   vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"

so it can handle the {132,213,231} variants without much duplication from a
single pattern.  But it seems this cannot prevail when we open-code the
operation.  Likewise handling multiple modes with the VF_128_256 iterator
becomes difficult since the addsub requires either a vec_merge (as in the
addsub patterns) with a constant selector specific to the mode (unless we
can provide a large one that is implicitely truncated) or a vector constant
when using the (vec_select (vec_concat ...)) form.  So the above would
split into 12 somewhat repetitive patterns.

Doable.

Repeat for the AVX512 ones.  Tedious ;)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
  2021-06-17 12:07 ` [Bug target/87555] There is no need for UNSPEC_FMADDSUB rguenth at gcc dot gnu.org
@ 2021-06-18  1:22 ` crazylht at gmail dot com
  2021-06-18  1:57 ` crazylht at gmail dot com
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  1:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
I'll take a look.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
  2021-06-17 12:07 ` [Bug target/87555] There is no need for UNSPEC_FMADDSUB rguenth at gcc dot gnu.org
  2021-06-18  1:22 ` crazylht at gmail dot com
@ 2021-06-18  1:57 ` crazylht at gmail dot com
  2021-06-18  5:40 ` crazylht at gmail dot com
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  1:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> The complication is that the define_insn looks like
> 
> (define_insn "*fma_fmaddsub_<mode>"
>   [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x")
>         (unspec:VF_128_256
>           [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,v,x,x")
>            (match_operand:VF_128_256 2 "nonimmediate_operand" "vm,v,vm,x,m")
>            (match_operand:VF_128_256 3 "nonimmediate_operand" "v,vm,0,xm,x")]
>           UNSPEC_FMADDSUB))]
>   "TARGET_FMA || TARGET_FMA4"
>   "@
>    vfmaddsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
>    vfmaddsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
>    vfmaddsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
>    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
>    vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> 
> so it can handle the {132,213,231} variants without much duplication from a
> single pattern.  But it seems this cannot prevail when we open-code the
> operation.  Likewise handling multiple modes with the VF_128_256 iterator
> becomes difficult since the addsub requires either a vec_merge (as in the
> addsub patterns) with a constant selector specific to the mode (unless we
> can provide a large one that is implicitely truncated) or a vector constant
> when using the (vec_select (vec_concat ...)) form.  So the above would
> split into 12 somewhat repetitive patterns.

we already has *fma_fmadd_<mode> with one pattern

(define_insn "*fma_fmadd_<mode>"
  [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x")
        (fma:FMAMODE
          (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x")
          (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,v,vm,x,m")
          (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x")))]
  "TARGET_FMA || TARGET_FMA4"
  "@
   vfmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
   vfmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
   vfmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
   vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
   vfmadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
  [(set_attr "isa" "fma,fma,fma,fma4,fma4")
   (set_attr "type" "ssemuladd")
   (set_attr "mode" "<MODE>")])

So can't it be changed to 

(define_mode_attr addsub_cst [(V4DF "5") (V2DF "1")
                              (V4SF "5") (V8SF "85")])
(define_insn "*fma_fmaddsub_<mode>"
  [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x")
        (vec_merge:VF_128_256
          (fma:VF_128_256
            (match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,v,x,x")
            (match_operand:VF_128_256 2 "nonimmediate_operand" "vm,v,vm,x,m")
            (match_operand:VF_128_256 3 "nonimmediate_operand" "v,vm,0,xm,x"))
          (fma:VF_128_256
            (match_dup 1)
            (match_dup 2)
            (neg:VF_128_256 (match_dup 3))
          (const_int <addsub_cst>)))]
  "TARGET_FMA || TARGET_FMA4"
  "@
   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
   vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
   vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
  [(set_attr "isa" "fma,fma,fma,fma4,fma4")
   (set_attr "type" "ssemuladd")
   (set_attr "mode" "<MODE>")])

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-06-18  1:57 ` crazylht at gmail dot com
@ 2021-06-18  5:40 ` crazylht at gmail dot com
  2021-06-18  6:00 ` crazylht at gmail dot com
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  5:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---

> (define_mode_attr addsub_cst [(V4DF "5") (V2DF "1")
>                               (V4SF "5") (V8SF "85")])
> (define_insn "*fma_fmaddsub_<mode>"
>   [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x")
>   	(vec_merge:VF_128_256
> 	  (fma:VF_128_256
> 	    (match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,v,x,x")
> 	    (match_operand:VF_128_256 2 "nonimmediate_operand" "vm,v,vm,x,m")
> 	    (match_operand:VF_128_256 3 "nonimmediate_operand" "v,vm,0,xm,x"))
> 	  (fma:VF_128_256
> 	    (match_dup 1)
> 	    (match_dup 2)
> 	    (neg:VF_128_256 (match_dup 3))
> 	  (const_int <addsub_cst>)))]


Should be bellow, addsub means add in the highpart

(define_insn "*fma_fmaddsub_<mode>"
  [(set (match_operand:VF_128_256 0 "register_operand" "=v,v,v,x,x")
        (vec_merge:VF_128_256
          (fma:VF_128_256
            (match_operand:VF_128_256 1 "nonimmediate_operand" "%0,0,v,x,x")
            (match_operand:VF_128_256 2 "nonimmediate_operand" "vm,v,vm,x,m")
            (neg:VF_128_256 (match_operand:VF_128_256 3 "nonimmediate_operand"
"v,vm,0,xm,x"))
          (fma:VF_128_256
            (match_dup 1)
            (match_dup 2)
            (match_dup 3))
          (const_int <addsub_cst>)))]
  "TARGET_FMA || TARGET_FMA4"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-06-18  5:40 ` crazylht at gmail dot com
@ 2021-06-18  6:00 ` crazylht at gmail dot com
  2021-06-18  6:02 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  6:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
With open-code

successfully optimize

__m128d f1(__m128d x, __m128d y, __m128d z){
    __m128d tem = _mm_mul_pd (x,y);
    __m128d tem2 = tem + z;
    __m128d tem3 = tem - z;
    return __builtin_shuffle (tem2, tem3, (__m128i) {0, 3});
}

to

f1:
.LFB5481:
        .cfi_startproc
        vfmsubadd132pd  %xmm1, %xmm2, %xmm0
        ret
        .cfi_endproc


But failed to optimize

__m256d f2(__m256d x, __m256d y, __m256d z){
    __m256d tem = _mm256_mul_pd (x,y);
    __m256d tem2 = tem + z;
    __m256d tem3 = tem - z;
    return __builtin_shuffle (tem2, tem3, (__m256i) {0, 5, 2, 7});
}

since simplify_rtx didn't realize

Failed to match this instruction:
(set (reg:V4SF 88)
    (vec_merge:V4SF (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (neg:V4SF (reg/v:V4SF 87 [ z ])))
        (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (reg/v:V4SF 87 [ z ]))
        (const_int 10 [0xa])))

is equal to

(set (reg:V4SF 88)
    (vec_merge:V4SF 
        (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (reg/v:V4SF 87 [ z ]))
        (fma:V4SF (reg/v:V4SF 85 [ x ])
            (reg/v:V4SF 86 [ y ])
            (neg:V4SF (reg/v:V4SF 87 [ z ])))
        (const_int 5 [0x5])))

later is how our pattern is defined.

So it there any canonical rtx for vec_merge? 
(vec_merge (A B const_int 10) should abviously equal to (vec_merge B A
const_int 5)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-06-18  6:00 ` crazylht at gmail dot com
@ 2021-06-18  6:02 ` crazylht at gmail dot com
  2021-06-18  6:11 ` crazylht at gmail dot com
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  6:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---

> So it there any canonical rtx for vec_merge? 
> (vec_merge (A B const_int 10) should abviously equal to (vec_merge B A
> const_int 5)

A and B here are 4-element vector.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-06-18  6:02 ` crazylht at gmail dot com
@ 2021-06-18  6:11 ` crazylht at gmail dot com
  2021-06-18  6:34 ` crazylht at gmail dot com
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  6:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #6)
> > So it there any canonical rtx for vec_merge? 
> > (vec_merge (A B const_int 10) should abviously equal to (vec_merge B A
> > const_int 5)
> 
> A and B here are 4-element vector.

Similar for 8-element vector C, D.
(vec_merge (C D const_int 170 [0xaa])) equal to (vec_merge (D C const_int 85
[0x55]))

16-element vector [0xaaaa] and [0x5555]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-06-18  6:11 ` crazylht at gmail dot com
@ 2021-06-18  6:34 ` crazylht at gmail dot com
  2021-06-18  7:00 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  6:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #7)
> (In reply to Hongtao.liu from comment #6)
> > > So it there any canonical rtx for vec_merge? 
> > > (vec_merge (A B const_int 10) should abviously equal to (vec_merge B A
> > > const_int 5)
> > 
> > A and B here are 4-element vector.
> 
> Similar for 8-element vector C, D.
> (vec_merge (C D const_int 170 [0xaa])) equal to (vec_merge (D C const_int 85
> [0x55]))
> 
> 16-element vector [0xaaaa] and [0x5555]

And also 

(vec_merge (C D const_int 170 [0xaa])) equal to (vec_merge (D C const_int 85
[0x55]))
is equal to
 (vec_select (vec_concat D C) (0,9,2,11,4,13,6,15))

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-06-18  6:34 ` crazylht at gmail dot com
@ 2021-06-18  7:00 ` rguenth at gcc dot gnu.org
  2021-06-18  7:09 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-18  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I don't think there's a documented canonical form but if I were to write one
I'd say the smaller (as in integer) merge mask should win?  There might also be
the argument that fma vs fms (or add vs sub in the addsub case) can be the
tie-breaker.

Note I'm not sure that doing fmaddsub as merge of fma and fms will be
optimal since that most definitely will preclude combine from recognizing
fmaddsub from (addsub (mul ..) x) which would be another goal to support
(PR81904)

I'm looking at merging the addsub patterns with the (const_int <mode-attr>)
trick, but I'm struggling with alternatives - time to learn ;)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2021-06-18  7:00 ` rguenth at gcc dot gnu.org
@ 2021-06-18  7:09 ` crazylht at gmail dot com
  2021-06-18  7:09 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  7:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---

> Note I'm not sure that doing fmaddsub as merge of fma and fms will be
> optimal since that most definitely will preclude combine from recognizing
> fmaddsub from (addsub (mul ..) x) which would be another goal to support
> (PR81904)

I guess you're talking about 

#include <x86intrin.h>
__m128d f(__m128d x, __m128d y, __m128d z){
  return _mm_addsub_pd(_mm_mul_pd(x,y),z);
}

which pass_combine tries

Failed to match this instruction:
(set (reg:V2DF 88)
    (vec_merge:V2DF (minus:V2DF (mult:V2DF (reg:V2DF 90)
                (reg:V2DF 91))
            (reg:V2DF 92))
        (plus:V2DF (mult:V2DF (reg:V2DF 90)
                (reg:V2DF 91))
            (reg:V2DF 92))
        (const_int 1 [0x1])))

but doesn't realize fisrt merge operand is fms and second is fma.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2021-06-18  7:09 ` crazylht at gmail dot com
@ 2021-06-18  7:09 ` rguenth at gcc dot gnu.org
  2021-06-18  7:11 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-18  7:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #9)
> I don't think there's a documented canonical form but if I were to write one
> I'd say the smaller (as in integer) merge mask should win?

Alternatively the one with bit 0 set (first lane selected from first operand).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2021-06-18  7:09 ` rguenth at gcc dot gnu.org
@ 2021-06-18  7:11 ` rguenth at gcc dot gnu.org
  2021-06-18  7:16 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-18  7:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #10)
> > Note I'm not sure that doing fmaddsub as merge of fma and fms will be
> > optimal since that most definitely will preclude combine from recognizing
> > fmaddsub from (addsub (mul ..) x) which would be another goal to support
> > (PR81904)
> 
> I guess you're talking about 
> 
> #include <x86intrin.h>
> __m128d f(__m128d x, __m128d y, __m128d z){
>   return _mm_addsub_pd(_mm_mul_pd(x,y),z);
> }
> 
> which pass_combine tries
>  
> Failed to match this instruction:
> (set (reg:V2DF 88)
>     (vec_merge:V2DF (minus:V2DF (mult:V2DF (reg:V2DF 90)
>                 (reg:V2DF 91))
>             (reg:V2DF 92))
>         (plus:V2DF (mult:V2DF (reg:V2DF 90)
>                 (reg:V2DF 91))
>             (reg:V2DF 92))
>         (const_int 1 [0x1])))
> 
> but doesn't realize fisrt merge operand is fms and second is fma.

Yes.  This situation will happen when I push the SLP pattern detection
for addsub - we then no longer detect FMA on the GIMPLE level (we might
want to improve that as well, of course, exposing standard pattern names
for fmaddsub and fmsubadd).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2021-06-18  7:11 ` rguenth at gcc dot gnu.org
@ 2021-06-18  7:16 ` crazylht at gmail dot com
  2021-06-18  7:32 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #12)
> (In reply to Hongtao.liu from comment #10)
> > > Note I'm not sure that doing fmaddsub as merge of fma and fms will be
> > > optimal since that most definitely will preclude combine from recognizing
> > > fmaddsub from (addsub (mul ..) x) which would be another goal to support
> > > (PR81904)
> > 
> > I guess you're talking about 
> > 
> > #include <x86intrin.h>
> > __m128d f(__m128d x, __m128d y, __m128d z){
> >   return _mm_addsub_pd(_mm_mul_pd(x,y),z);
> > }
> > 
> > which pass_combine tries
> >  
> > Failed to match this instruction:
> > (set (reg:V2DF 88)
> >     (vec_merge:V2DF (minus:V2DF (mult:V2DF (reg:V2DF 90)
> >                 (reg:V2DF 91))
> >             (reg:V2DF 92))
> >         (plus:V2DF (mult:V2DF (reg:V2DF 90)
> >                 (reg:V2DF 91))
> >             (reg:V2DF 92))
> >         (const_int 1 [0x1])))
> > 
> > but doesn't realize fisrt merge operand is fms and second is fma.
> 
> Yes.  This situation will happen when I push the SLP pattern detection
> for addsub - we then no longer detect FMA on the GIMPLE level (we might
> want to improve that as well, of course, exposing standard pattern names
> for fmaddsub and fmsubadd).

if fm{a,s}_optab is supported in the backend, can we always simplify (minus A
(mult B C)) to (fma B C (neg A)) and (plus A (mult B C)) to (fma B C A)?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2021-06-18  7:16 ` crazylht at gmail dot com
@ 2021-06-18  7:32 ` rguenth at gcc dot gnu.org
  2021-06-18  7:48 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-18  7:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #13)
> (In reply to Richard Biener from comment #12)
> > (In reply to Hongtao.liu from comment #10)
> > > > Note I'm not sure that doing fmaddsub as merge of fma and fms will be
> > > > optimal since that most definitely will preclude combine from recognizing
> > > > fmaddsub from (addsub (mul ..) x) which would be another goal to support
> > > > (PR81904)
> > > 
> > > I guess you're talking about 
> > > 
> > > #include <x86intrin.h>
> > > __m128d f(__m128d x, __m128d y, __m128d z){
> > >   return _mm_addsub_pd(_mm_mul_pd(x,y),z);
> > > }
> > > 
> > > which pass_combine tries
> > >  
> > > Failed to match this instruction:
> > > (set (reg:V2DF 88)
> > >     (vec_merge:V2DF (minus:V2DF (mult:V2DF (reg:V2DF 90)
> > >                 (reg:V2DF 91))
> > >             (reg:V2DF 92))
> > >         (plus:V2DF (mult:V2DF (reg:V2DF 90)
> > >                 (reg:V2DF 91))
> > >             (reg:V2DF 92))
> > >         (const_int 1 [0x1])))
> > > 
> > > but doesn't realize fisrt merge operand is fms and second is fma.
> > 
> > Yes.  This situation will happen when I push the SLP pattern detection
> > for addsub - we then no longer detect FMA on the GIMPLE level (we might
> > want to improve that as well, of course, exposing standard pattern names
> > for fmaddsub and fmsubadd).
> 
> if fm{a,s}_optab is supported in the backend, can we always simplify (minus
> A (mult B C)) to (fma B C (neg A)) and (plus A (mult B C)) to (fma B C A)?

I suppose we could within the appropriate constraints (FP contraction allowed).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2021-06-18  7:32 ` rguenth at gcc dot gnu.org
@ 2021-06-18  7:48 ` crazylht at gmail dot com
  2021-06-18 10:29 ` rguenth at gcc dot gnu.org
  2021-09-12 23:44 ` pinskia at gcc dot gnu.org
  16 siblings, 0 replies; 17+ messages in thread
From: crazylht at gmail dot com @ 2021-06-18  7:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #11)
> (In reply to Richard Biener from comment #9)
> > I don't think there's a documented canonical form but if I were to write one
> > I'd say the smaller (as in integer) merge mask should win?
> 
> Alternatively the one with bit 0 set (first lane selected from first
> operand).

Also would (vec_merge (D, C 0x55)) always win (vec_select (vec_concat D C)
(0,9,2,11,4,13,6,15))?

If not should we define canonical rtx to alway transform the later to vec_merge
 if possible?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2021-06-18  7:48 ` crazylht at gmail dot com
@ 2021-06-18 10:29 ` rguenth at gcc dot gnu.org
  2021-09-12 23:44 ` pinskia at gcc dot gnu.org
  16 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-18 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #15)
> (In reply to Richard Biener from comment #11)
> > (In reply to Richard Biener from comment #9)
> > > I don't think there's a documented canonical form but if I were to write one
> > > I'd say the smaller (as in integer) merge mask should win?
> > 
> > Alternatively the one with bit 0 set (first lane selected from first
> > operand).
> 
> Also would (vec_merge (D, C 0x55)) always win (vec_select (vec_concat D C)
> (0,9,2,11,4,13,6,15))?
> 
> If not should we define canonical rtx to alway transform the later to
> vec_merge  if possible?

It's a historical wart that we have two representation for the same,
generally backends do not expect either to turn into the other and you'd
need to touch quite some patterns to avoid regressing if you do.

IMHO vec_merge should go away.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/87555] There is no need for UNSPEC_FMADDSUB
       [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2021-06-18 10:29 ` rguenth at gcc dot gnu.org
@ 2021-09-12 23:44 ` pinskia at gcc dot gnu.org
  16 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-12 23:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87555

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-09-12 23:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-87555-4@http.gcc.gnu.org/bugzilla/>
2021-06-17 12:07 ` [Bug target/87555] There is no need for UNSPEC_FMADDSUB rguenth at gcc dot gnu.org
2021-06-18  1:22 ` crazylht at gmail dot com
2021-06-18  1:57 ` crazylht at gmail dot com
2021-06-18  5:40 ` crazylht at gmail dot com
2021-06-18  6:00 ` crazylht at gmail dot com
2021-06-18  6:02 ` crazylht at gmail dot com
2021-06-18  6:11 ` crazylht at gmail dot com
2021-06-18  6:34 ` crazylht at gmail dot com
2021-06-18  7:00 ` rguenth at gcc dot gnu.org
2021-06-18  7:09 ` crazylht at gmail dot com
2021-06-18  7:09 ` rguenth at gcc dot gnu.org
2021-06-18  7:11 ` rguenth at gcc dot gnu.org
2021-06-18  7:16 ` crazylht at gmail dot com
2021-06-18  7:32 ` rguenth at gcc dot gnu.org
2021-06-18  7:48 ` crazylht at gmail dot com
2021-06-18 10:29 ` rguenth at gcc dot gnu.org
2021-09-12 23:44 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).