From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id F2D6438582A3 for ; Tue, 14 Feb 2023 14:28:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F2D6438582A3 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.97,296,1669104000"; d="scan'208";a="98060282" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 14 Feb 2023 06:28:41 -0800 IronPort-SDR: SneSMzPj9U/iuPCGeKVL8/cXTzJS3gJZt1mhLYqNwKebUqdznIhzlAziRB2FOstw0MzbDyVIRm u+JjejlHhF2Hr/yOLOb21gM4tFEU7XLugZ0eefhxTZH/jkhAifi4zHaIUEigIq/Iphmf6jIP2v spyC6WHNpDBFhe38RZGwGnPHWsnX95yv3MmDodbEC6GVqzec5syU8zu/vO+4fvV7BkL7+jzPwz b0D1nWoMdNDeYEdXTkiPqhN1XiQH2KG0wx3KgUGr56kgW5TNf0HzByMNF/vEcBJyXUzv0lorXt p10= Message-ID: Date: Tue, 14 Feb 2023 14:28:36 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers Content-Language: en-GB To: Andrew Jenner , GCC Patches References: From: Andrew Stubbs In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,BODY_8BITS,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 09/02/2023 20:13, Andrew Jenner wrote: > This patch introduces instruction patterns for complex number operations > in the GCN machine description. These patterns are cmul, cmul_conj, > vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls > (cmla_conj and cmls_conj were not found to be favorable to implement). > As a side effect of adding cmls, I also added fms patterns corresponding > to the existing fma patterns. Tested on CDNA2 GFX90a. > > OK to commit? > > > gcc/ChangeLog: > >     * config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn) >         (gcn_expand_dpp_distribute_even_insn) >         (gcn_expand_dpp_distribute_odd_insn): Declare. >         * config/gcn/gcn-valu.md (@dpp_swap_pairs) >         (@dpp_distribute_even, @dpp_distribute_odd) >         (cmul3, cml4, vec_addsub3) >         (cadd3, vec_fmaddsub4, vec_fmsubadd4) >         (fms4, fms4_negop2, fms4) >         (fms4_negop2): New patterns. >         * config/gcn/gcn.cc (gcn_expand_dpp_swap_pairs_insn) >         (gcn_expand_dpp_distribute_even_insn) >         (gcn_expand_dpp_distribute_odd_insn): New functions. >         * config/gcn/gcn.md: Add entries to unspec enum. > > gcc/testsuite/ChangeLog: > >     * gcc.target/gcn/complex.c: New test. +;; It would be possible to represent these without the UNSPEC as +;; +;; (vec_merge +;; (fma op1 op2 op3) +;; (fma op1 op2 (neg op3)) +;; (merge-const)) +;; +;; But this doesn't seem useful in practice. + +(define_expand "vec_fmaddsub4" + [(set (match_operand:V_noHI 0 "register_operand" "=&v") + (unspec:V_noHI + [(match_operand:V_noHI 1 "register_operand" "v") + (match_operand:V_noHI 2 "register_operand" "v") + (match_operand:V_noHI 3 "register_operand" "v")] + UNSPEC_FMADDSUB))] This is a define_expand pattern that has a custom-code expansion with an unconditional "DONE", so the actual RTL representation is irrelevant here: it only needs to have the match_operand entries. The UNSPEC_FMADDSUB is therefore dead (as in, it will never appear in the IR). We can safely remove those, although I don't hate them for readability purposes. The UNSPEC_CMUL and UNSPEC_CMUL_CONJ are similarly "dead", but since you use them for an iterator they're still useful in the machine description. +(define_insn "fms4" + [(set (match_operand:V_FP 0 "register_operand" "= v, v") + (fma:V_FP + (match_operand:V_FP 1 "gcn_alu_operand" "% vA, vA") + (match_operand:V_FP 2 "gcn_alu_operand" " vA,vSvA") + (neg:V_FP + (match_operand:V_FP 3 "gcn_alu_operand" "vSvA, vA"))))] + "" + "v_fma%i0\t%0, %1, %2, -%3" + [(set_attr "type" "vop3a") + (set_attr "length" "8")]) Please ensure that the alternatives are vertically aligned in the same style as the rest of the file. +/* Generate DPP pairwise swap instruction. + The opcode is given by INSN. */ + +char * +gcn_expand_dpp_swap_pairs_insn (machine_mode mode, const char *insn, + int ARG_UNUSED (unspec)) .... +/* Generate DPP distribute even instruction. + The opcode is given by INSN. */ + +char * +gcn_expand_dpp_distribute_even_insn (machine_mode mode, const char *insn, + int ARG_UNUSED (unspec)) .... +/* Generate DPP distribute odd instruction. + The opcode is given by INSN. */ + +char * +gcn_expand_dpp_distribute_odd_insn (machine_mode mode, const char *insn, + int ARG_UNUSED (unspec)) Please add a comment that isn't just the function name in words. Explain what operation happens here and maybe show an example of what it produces. +++ b/gcc/testsuite/gcc.target/gcn/complex.c @@ -0,0 +1,640 @@ +// { dg-do run } +// { dg-options "-O -fopenmp-simd -ftree-loop-if-convert -fno-ssa-phiopt" } Does the -fopenmp-simd option do anything here? There are no "omp declare simd" directives. +void cmulF(float *td, float *te, float *tf, float *tg, int tas) +{ + typedef _Complex float complexT; + int array_size = tas/2; + complexT *d = (complexT*)(td); + complexT *e = (complexT*)(te); + complexT *f = (complexT*)(tf); +#pragma omp target teams distribute parallel for simd + for (int i = 0; i < array_size; i++) + { + d[i] = e[i] * f[i]; + } +} Tests in gcc.target/gcn won't do anything with "omp target" directives. I would expect the loop to vectorize without, at -O2 or above (or "-O1 -ftree-vectorize"), but you might find the output easier to read with "__restrict" on the parameters as that will avoid emitting the runtime alias check and scalar code implementation. I'd also expect you to have to do something to avoid inlining. + td = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, omp_default_mem_alloc); + te = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, omp_default_mem_alloc); + tf = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, omp_default_mem_alloc); + tg = (float*)omp_aligned_alloc(ALIGNMENT, sizeof(float)*array_size, omp_default_mem_alloc); There's no need to use libgomp to allocate memory on the device -- malloc works just fine -- and it doesn't need to be specifically aligned unless you're wanting performance. In general I'm confused by this testcase because it looks like it was written for an offloading toolchain, but it's placed into the bare-machine GCN testsuite. Andrew