From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MEVP=6K=mentor.com=Andrew_Stubbs@sourceware.org>
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
	by sourceware.org (Postfix) with ESMTPS id F2D6438582A3
	for <gcc-patches@gcc.gnu.org>; Tue, 14 Feb 2023 14:28:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F2D6438582A3
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.97,296,1669104000"; 
   d="scan'208";a="98060282"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
  by esa2.mentor.iphmx.com with ESMTP; 14 Feb 2023 06:28:41 -0800
IronPort-SDR: SneSMzPj9U/iuPCGeKVL8/cXTzJS3gJZt1mhLYqNwKebUqdznIhzlAziRB2FOstw0MzbDyVIRm
 u+JjejlHhF2Hr/yOLOb21gM4tFEU7XLugZ0eefhxTZH/jkhAifi4zHaIUEigIq/Iphmf6jIP2v
 spyC6WHNpDBFhe38RZGwGnPHWsnX95yv3MmDodbEC6GVqzec5syU8zu/vO+4fvV7BkL7+jzPwz
 b0D1nWoMdNDeYEdXTkiPqhN1XiQH2KG0wx3KgUGr56kgW5TNf0HzByMNF/vEcBJyXUzv0lorXt
 p10=
Message-ID: <ce7d56c5-be6b-8c06-f8ca-8107439bf0a4@codesourcery.com>
Date: Tue, 14 Feb 2023 14:28:36 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.6.1
Subject: Re: [PATCH] amdgcn: Add instruction patterns for vector operations on
 complex numbers
Content-Language: en-GB
To: Andrew Jenner <andrew@codesourcery.com>, GCC Patches
	<gcc-patches@gcc.gnu.org>
References: <b3ec97bf-eaa2-d472-3db8-74989015407e@codesourcery.com>
From: Andrew Stubbs <ams@codesourcery.com>
In-Reply-To: <b3ec97bf-eaa2-d472-3db8-74989015407e@codesourcery.com>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 8bit
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,BODY_8BITS,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 09/02/2023 20:13, Andrew Jenner wrote:
> This patch introduces instruction patterns for complex number operations 
> in the GCN machine description. These patterns are cmul, cmul_conj, 
> vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls 
> (cmla_conj and cmls_conj were not found to be favorable to implement). 
> As a side effect of adding cmls, I also added fms patterns corresponding 
> to the existing fma patterns. Tested on CDNA2 GFX90a.
> 
> OK to commit?
> 
> 
> gcc/ChangeLog:
> 
>      * config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn)
>          (gcn_expand_dpp_distribute_even_insn)
>          (gcn_expand_dpp_distribute_odd_insn): Declare.
>          * config/gcn/gcn-valu.md (@dpp_swap_pairs<mode>)
>          (@dpp_distribute_even<mode>, @dpp_distribute_odd<mode>)
>          (cmul<conj_op><mode>3, cml<addsub_as><mode>4, vec_addsub<mode>3)
>          (cadd<rot><mode>3, vec_fmaddsub<mode>4, vec_fmsubadd<mode>4)
>          (fms<mode>4<exec>, fms<mode>4_negop2<exec>, fms<mode>4)
>          (fms<mode>4_negop2): New patterns.
>          * config/gcn/gcn.cc (gcn_expand_dpp_swap_pairs_insn)
>          (gcn_expand_dpp_distribute_even_insn)
>          (gcn_expand_dpp_distribute_odd_insn): New functions.
>          * config/gcn/gcn.md: Add entries to unspec enum.
> 
> gcc/testsuite/ChangeLog:
> 
>      * gcc.target/gcn/complex.c: New test.

+;; It would be possible to represent these without the UNSPEC as
+;;
+;; (vec_merge
+;;   (fma op1 op2 op3)
+;;   (fma op1 op2 (neg op3))
+;;   (merge-const))
+;;
+;; But this doesn't seem useful in practice.
+
+(define_expand "vec_fmaddsub<mode>4"
+  [(set (match_operand:V_noHI 0 "register_operand" "=&v")
+        (unspec:V_noHI
+          [(match_operand:V_noHI 1 "register_operand" "v")
+           (match_operand:V_noHI 2 "register_operand" "v")
+           (match_operand:V_noHI 3 "register_operand" "v")]
+          UNSPEC_FMADDSUB))]

This is a define_expand pattern that has a custom-code expansion with an 
unconditional "DONE", so the actual RTL representation is irrelevant 
here: it only needs to have the match_operand entries. The 
UNSPEC_FMADDSUB is therefore dead (as in, it will never appear in the 
IR). We can safely remove those, although I don't hate them for 
readability purposes.

The UNSPEC_CMUL and UNSPEC_CMUL_CONJ are similarly "dead", but since you 
use them for an iterator they're still useful in the machine description.

+(define_insn "fms<mode>4<exec>"
+  [(set (match_operand:V_FP 0 "register_operand"  "=  v,   v")
+	(fma:V_FP
+	  (match_operand:V_FP 1 "gcn_alu_operand" "% vA,  vA")
+	(match_operand:V_FP 2 "gcn_alu_operand" "  vA,vSvA")
+	(neg:V_FP
+	  (match_operand:V_FP 3 "gcn_alu_operand" "vSvA,  vA"))))]
+  ""
+  "v_fma%i0\t%0, %1, %2, -%3"
+  [(set_attr "type" "vop3a")
+   (set_attr "length" "8")])

Please ensure that the alternatives are vertically aligned in the same 
style as the rest of the file.

+/* Generate DPP pairwise swap instruction.
+   The opcode is given by INSN.  */
+
+char *
+gcn_expand_dpp_swap_pairs_insn (machine_mode mode, const char *insn,
+				int ARG_UNUSED (unspec))
....
+/* Generate DPP distribute even instruction.
+   The opcode is given by INSN.  */
+
+char *
+gcn_expand_dpp_distribute_even_insn (machine_mode mode, const char *insn,
+				     int ARG_UNUSED (unspec))
....
+/* Generate DPP distribute odd instruction.
+   The opcode is given by INSN.  */
+
+char *
+gcn_expand_dpp_distribute_odd_insn (machine_mode mode, const char *insn,
+				    int ARG_UNUSED (unspec))

Please add a comment that isn't just the function name in words. Explain 
what operation happens here and maybe show an example of what it produces.

+++ b/gcc/testsuite/gcc.target/gcn/complex.c
@@ -0,0 +1,640 @@
+// { dg-do run }
+// { dg-options "-O -fopenmp-simd -ftree-loop-if-convert -fno-ssa-phiopt" }

Does the -fopenmp-simd option do anything here? There are no "omp 
declare simd" directives.

+void cmulF(float *td, float *te, float *tf, float *tg, int tas)
+{
+  typedef _Complex float complexT;
+  int array_size = tas/2;
+  complexT *d = (complexT*)(td);
+  complexT *e = (complexT*)(te);
+  complexT *f = (complexT*)(tf);
+#pragma omp target teams distribute parallel for simd
+  for (int i = 0; i < array_size; i++)
+    {
+      d[i] = e[i] * f[i];
+    }
+}

Tests in gcc.target/gcn won't do anything with "omp target" directives. 
I would expect the loop to vectorize without, at -O2 or above (or "-O1 
-ftree-vectorize"), but you might find the output easier to read with 
"__restrict" on the parameters as that will avoid emitting the runtime 
alias check and scalar code implementation.

I'd also expect you to have to do something to avoid inlining.

+  td = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, 
omp_default_mem_alloc);
+  te = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, 
omp_default_mem_alloc);
+  tf = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, 
omp_default_mem_alloc);
+  tg = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, 
omp_default_mem_alloc);

There's no need to use libgomp to allocate memory on the device -- 
malloc works just fine -- and it doesn't need to be specifically aligned 
unless you're wanting performance.

In general I'm confused by this testcase because it looks like it was 
written for an offloading toolchain, but it's placed into the 
bare-machine GCN testsuite.

Andrew