public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* x86: making better use of vpternlog{d,q}
@ 2023-05-24  7:57 Jan Beulich
  2023-05-24  9:01 ` Hongtao Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2023-05-24  7:57 UTC (permalink / raw)
  To: gcc; +Cc: Kirill Yukhin, Hongtao Liu

Hello,

for a couple of years I was meaning to extend the use of these AVX512F
insns beyond the pretty minimalistic ones there are so far. Now that I've
got around to at least draft something, I ran into a couple of issues I
cannot explain. I'd like to start with understanding the unexpected
effects of a change to an existing insn I have made (reproduced at the
bottom). I certainly was prepared to observe testsuite failures, but it
ends up failing tests I didn't expect it would fail, and - upon looking
at sibling ones - also ends up leaving intact tests which I would expect
would then need adjustment (because of using the new alternative).

In particular (all mentioned tests are in gcc.target/i386/)
- avx512f-andn-si-zmm-1.c (and its AVX512VL counterparts) fails because
  for whatever reason generated code reverts back to using vpbroadcastd,
- avx512f-andn-di-zmm-1.c, otoh, is unaffected (i.e. continues to use
  vpandnq with embedded broadcast),
- avx512f-andn-si-zmm-2.c doesn't use the new 4th insn alternative when
  at the same time a made-up DI variant of the test (akin to what might
  be an avx512f-andn-di-zmm-2.c testcase) does.
IOW: How is SI mode element size different here from DI mode one? Is
there anything wrong with the 4th alternative I'm adding, or is this
hinting at some anomaly elsewhere?

Just to mention it, avx512f-andn-si-zmm-5.c similarly fails
unexpectedly, but I guess for the same reason (and there aren't AVX512VL
or DI mode element counterparts thereof).

Jan

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17019,11 +17019,11 @@
   "TARGET_AVX512F")
 
 (define_insn "*andnot<mode>3"
-  [(set (match_operand:VI 0 "register_operand" "=x,x,v")
+  [(set (match_operand:VI 0 "register_operand" "=x,x,v,v")
 	(and:VI
-	  (not:VI (match_operand:VI 1 "vector_operand" "0,x,v"))
-	  (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
-  "TARGET_SSE"
+	  (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,mBr"))
+	  (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v")))]
+  "TARGET_SSE && (REG_P (operands[1]) || REG_P (operands[2]))"
 {
   char buf[64];
   const char *ops;
@@ -17090,6 +17090,11 @@
     case 2:
       ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
       break;
+    case 3:
+      tmp = "pternlog";
+      ssesuffix = "<ternlogsuffix>";
+      ops = "v%s%s\t{$0x44, %%1, %%2, %%0|%%0, %%2, %%1, $0x44}";
+      break;
     default:
       gcc_unreachable ();
     }
@@ -17098,7 +17103,7 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx")
+  [(set_attr "isa" "noavx,avx,avx,avx512f")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -17106,7 +17111,7 @@
 	    (eq_attr "mode" "TI"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "prefix" "orig,vex,evex,evex")
    (set (attr "mode")
 	(cond [(match_test "TARGET_AVX2")
 		 (const_string "<sseinsnmode>")
@@ -17119,7 +17124,11 @@
 		    (match_test "optimize_function_for_size_p (cfun)"))
 		 (const_string "V4SF")
 	      ]
-	      (const_string "<sseinsnmode>")))])
+	      (const_string "<sseinsnmode>")))
+   (set (attr "enabled")
+	(if_then_else (eq_attr "alternative" "3")
+		      (symbol_ref "<MODE_SIZE> == 64 ? TARGET_AVX512F : TARGET_AVX512VL")
+		      (const_string "*")))])
 
 ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
 (define_split

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: x86: making better use of vpternlog{d,q}
  2023-05-24  7:57 x86: making better use of vpternlog{d,q} Jan Beulich
@ 2023-05-24  9:01 ` Hongtao Liu
  2023-05-25  9:43   ` Jan Beulich
  0 siblings, 1 reply; 3+ messages in thread
From: Hongtao Liu @ 2023-05-24  9:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: gcc, Kirill Yukhin, Hongtao Liu

On Wed, May 24, 2023 at 3:58 PM Jan Beulich via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hello,
>
> for a couple of years I was meaning to extend the use of these AVX512F
> insns beyond the pretty minimalistic ones there are so far. Now that I've
> got around to at least draft something, I ran into a couple of issues I
> cannot explain. I'd like to start with understanding the unexpected
> effects of a change to an existing insn I have made (reproduced at the
> bottom). I certainly was prepared to observe testsuite failures, but it
> ends up failing tests I didn't expect it would fail, and - upon looking
> at sibling ones - also ends up leaving intact tests which I would expect
> would then need adjustment (because of using the new alternative).
>
> In particular (all mentioned tests are in gcc.target/i386/)
> - avx512f-andn-si-zmm-1.c (and its AVX512VL counterparts) fails because
>   for whatever reason generated code reverts back to using vpbroadcastd,
> - avx512f-andn-di-zmm-1.c, otoh, is unaffected (i.e. continues to use
>   vpandnq with embedded broadcast),
> - avx512f-andn-si-zmm-2.c doesn't use the new 4th insn alternative when
>   at the same time a made-up DI variant of the test (akin to what might
>   be an avx512f-andn-di-zmm-2.c testcase) does.
> IOW: How is SI mode element size different here from DI mode one? Is
> there anything wrong with the 4th alternative I'm adding, or is this
> hinting at some anomaly elsewhere?
__m512i is defined as __v8di, when it's used for _mm512_andnot_epi32,
it's explicitlt converted to (__v16si) and creates an extra subreg
which is not needed for DImode cases.
And pass_combine try to match the below pattern but failed due to the
condition REG_P (operands[1]) || REG_P (operands[2]). Here I think you
want register_operand instead of REG_P.
157(set (reg:V16SI 91)
158    (and:V16SI (not:V16SI (subreg:V16SI (reg:V8DI 98) 0))
159        (vec_duplicate:V16SI (mem:SI (reg:DI 99) [1 *f_3(D)+0 S4 A32]))))


>
> Just to mention it, avx512f-andn-si-zmm-5.c similarly fails
> unexpectedly, but I guess for the same reason (and there aren't AVX512VL
> or DI mode element counterparts thereof).
>
> Jan
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17019,11 +17019,11 @@
>    "TARGET_AVX512F")
>
>  (define_insn "*andnot<mode>3"
> -  [(set (match_operand:VI 0 "register_operand" "=x,x,v")
> +  [(set (match_operand:VI 0 "register_operand" "=x,x,v,v")
>         (and:VI
> -         (not:VI (match_operand:VI 1 "vector_operand" "0,x,v"))
> -         (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
> -  "TARGET_SSE"
> +         (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,mBr"))
> +         (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v")))]
> +  "TARGET_SSE && (REG_P (operands[1]) || REG_P (operands[2]))"
>  {
>    char buf[64];
>    const char *ops;
> @@ -17090,6 +17090,11 @@
>      case 2:
>        ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
>        break;
> +    case 3:
> +      tmp = "pternlog";
> +      ssesuffix = "<ternlogsuffix>";
> +      ops = "v%s%s\t{$0x44, %%1, %%2, %%0|%%0, %%2, %%1, $0x44}";
> +      break;
>      default:
>        gcc_unreachable ();
>      }
> @@ -17098,7 +17103,7 @@
>    output_asm_insn (buf, operands);
>    return "";
>  }
> -  [(set_attr "isa" "noavx,avx,avx")
> +  [(set_attr "isa" "noavx,avx,avx,avx512f")
>     (set_attr "type" "sselog")
>     (set (attr "prefix_data16")
>       (if_then_else
> @@ -17106,7 +17111,7 @@
>             (eq_attr "mode" "TI"))
>         (const_string "1")
>         (const_string "*")))
> -   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "prefix" "orig,vex,evex,evex")
>     (set (attr "mode")
>         (cond [(match_test "TARGET_AVX2")
>                  (const_string "<sseinsnmode>")
> @@ -17119,7 +17124,11 @@
>                     (match_test "optimize_function_for_size_p (cfun)"))
>                  (const_string "V4SF")
>               ]
> -             (const_string "<sseinsnmode>")))])
> +             (const_string "<sseinsnmode>")))
> +   (set (attr "enabled")
> +       (if_then_else (eq_attr "alternative" "3")
> +                     (symbol_ref "<MODE_SIZE> == 64 ? TARGET_AVX512F : TARGET_AVX512VL")
> +                     (const_string "*")))])
>
>  ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
>  (define_split



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: x86: making better use of vpternlog{d,q}
  2023-05-24  9:01 ` Hongtao Liu
@ 2023-05-25  9:43   ` Jan Beulich
  0 siblings, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2023-05-25  9:43 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: gcc, Kirill Yukhin, Hongtao Liu

On 24.05.2023 11:01, Hongtao Liu wrote:
> On Wed, May 24, 2023 at 3:58 PM Jan Beulich via Gcc <gcc@gcc.gnu.org> wrote:
>>
>> Hello,
>>
>> for a couple of years I was meaning to extend the use of these AVX512F
>> insns beyond the pretty minimalistic ones there are so far. Now that I've
>> got around to at least draft something, I ran into a couple of issues I
>> cannot explain. I'd like to start with understanding the unexpected
>> effects of a change to an existing insn I have made (reproduced at the
>> bottom). I certainly was prepared to observe testsuite failures, but it
>> ends up failing tests I didn't expect it would fail, and - upon looking
>> at sibling ones - also ends up leaving intact tests which I would expect
>> would then need adjustment (because of using the new alternative).
>>
>> In particular (all mentioned tests are in gcc.target/i386/)
>> - avx512f-andn-si-zmm-1.c (and its AVX512VL counterparts) fails because
>>   for whatever reason generated code reverts back to using vpbroadcastd,
>> - avx512f-andn-di-zmm-1.c, otoh, is unaffected (i.e. continues to use
>>   vpandnq with embedded broadcast),
>> - avx512f-andn-si-zmm-2.c doesn't use the new 4th insn alternative when
>>   at the same time a made-up DI variant of the test (akin to what might
>>   be an avx512f-andn-di-zmm-2.c testcase) does.
>> IOW: How is SI mode element size different here from DI mode one? Is
>> there anything wrong with the 4th alternative I'm adding, or is this
>> hinting at some anomaly elsewhere?
> __m512i is defined as __v8di, when it's used for _mm512_andnot_epi32,
> it's explicitlt converted to (__v16si) and creates an extra subreg
> which is not needed for DImode cases.
> And pass_combine try to match the below pattern but failed due to the
> condition REG_P (operands[1]) || REG_P (operands[2]). Here I think you
> want register_operand instead of REG_P.

Thanks, this has indeed made things match my expectations wrt testsuite
results. Sadly similar adjustments for other (new) insns didn't make
any difference with the further issues I'm facing. I may therefore need
to ask more questions; I hope they're not going to be too dumb.

Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-25  9:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-24  7:57 x86: making better use of vpternlog{d,q} Jan Beulich
2023-05-24  9:01 ` Hongtao Liu
2023-05-25  9:43   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).