From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 92A6B3857C67; Wed, 21 Oct 2020 08:38:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 92A6B3857C67 From: "yangyang305 at huawei dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/96342] [SVE] Add support for "omp declare simd" Date: Wed, 21 Oct 2020 08:38:20 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: yangyang305 at huawei dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2020 08:38:20 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96342 --- Comment #3 from yangyang --- Hi, Sorry for the slow reply. After studying the specification of SVE "omp declare simd" and GCC's current implementation of "omp declare simd", I have developed a rough plan to support GCC to generating SVE functions for "omp declare simd". However, there are still some uncertainties in the plan which might need further discussion.=20 The work is mainly composed of three parts: the generating of SVE funct= ions for "omp declare simd" in pass_omp_simd_clone, the supporting of SVE PCS of non-builtin types, and the generating of the call of SVE vectoried function= s in pass_vect. I plan to finish this work in the following five steps, each step corresponds to a patch: Part 1) Change the type of the field "simdlen" of struct cgraph_simd_clone = from unsigned int to poly_uint64 and related adaptation. Since the length might = be variable for the SVE cases. PR96342-part1-v1.patch Part 2) During debugging, I find that all the calls to interface simd_clone_subparts needing to be replaced with calls to TYPE_VECTOR_SUBPAR= TS due to the introduction of SVE simdclones. So I plan to complete all the replacements in a patch. PR96342-part1-v2.patch Part 3) Add the generating of VLA SVE (vector length agnostic, without "simdlen") functions for "omp declare simd" and skip the VLS (vector length specific) ones, specifically: a) In aarch64_simd_clone_compute_vecsize_and_simdlen, add 1 to =E2=80=9Ccou= nt=E2=80=9D when TARGE_SVE is specified. b) Add bool type field =E2=80=9Calways_masked=E2=80=9D in struct cgraph_sim= d_clone to mark simdclones that always masked and skip the generating of noinbranch version when always_masked is true. In aarch64_simd_clone_compute_vecsize_and_simdl= en, set it to true when processing SVE simdclones. c) In aarch64_simd_clone_compute_vecsize_and_simdlen, set the =E2=80=9Cvecs= ize_mangle=E2=80=9D to =E2=80=98s=E2=80=99, and the =E2=80=9Cvec_bits=E2=80=9D to BITS_PER_SVE_= VECTOR when processing VLA SVE simdclones. Report an unsupported warning when processing VLS SVE simdclone= s. d) Adjust simd_clone_mangle. e) Support SVE masking: For SVE vector functions, masked signatures are generated by add a svbool_t mask (corresponds to a predicate register) as t= he last parameter. Since aarch64 GCC currently doesn=E2=80=99t support muti-ty= pes simdclones, the input predicate works for all the types, GCC doesn=E2=80=99= t need to do special adjustment. For now, I plan to follow current scheme, transform the input predicate into a bool array with [16, 16] elements (since the input predicate always has a mode of VNx16BImode), and use the active elements to build the branch, the following gimple stmts are expected to be generated: MEM > [( *)&mask.34] = =3D mask.37_17(D); =E2=80=A6 _9 =3D iter.38_6 * 4; _8 =3D mask.34[_9]; if (_8 =3D=3D 0) =E2=80=A6 The number 4 in _9 =3D iter.38_6 * 4; comes from arg_unit_size / mask_unit_= size. For how to do this, set =E2=80=9Cclonei->mask_mode=E2=80=9D to VNx16BImode = when processing SVE simdclones in aarch64_simd_clone_compute_vecsize_and_simdlen. And when processing cgraph_simd_clone->mask_mode in common codes, add special treatm= ent if cgraph_simd_clone->mask_mode !=3D VOIDmode and cgraph_simd_clone->mask_m= ode is VECTOR_MODE, which corresponds to the SVE cases (It=E2=80=99s OK to do so s= ince cgraph_simd_clone->mask_mode !=3D VOIDmode is established only when the mas= k is passed in integer argument(s) in current GCC). f) In pass_expand, only when a =E2=80=9CSVE type=E2=80=9D attribute is adde= d to the tree nodes of the types of arguments and return type, these types use the SVE PCS. For now, GCC only has a mechanism for adding attributes to SVE builtin type, so= I plan to define a new hook to add attribute to the types of arguments and re= turn type of simdclones generated if needed. The related processing functions are planned to be moved to aarch64.c from aarch64-sve-builtin.cc in addition. Part 4) Add the generating of VLS SVE functions for "omp declare simd". The specification writes: =E2=80=9CWhen using a simdlen(len) clause, the compil= er expects a VLS vector version of the function that is tuned for a specific implementat= ion of SVE. =E2=80=9D. Therefore I think only when the number of bits in a SVE = vector register of the target is specified and coincides with the simdlen clause, = GCC is supposed to generate the VLS SVE functions for "omp declare simd", specifically: a) In aarch64_simd_clone_compute_vecsize_and_simdlen, when processing VLS S= VE simdclones, if the number of bits in an SVE vector register is specified and coincides with the simdlen clause, set =E2=80=9Cclonei->vecsize_mangle=E2= =80=9D, =E2=80=9Cclonei->mask_mode=E2=80=9D, and =E2=80=9Cclonei->always_masked=E2= =80=9D and calculate the =E2=80=9Cvec_bits=E2=80=9D, otherwise report a warning and return NULL. b) In this case, the field "simdlen" is a constant, so using build_vector_t= ype to build the vector type will get an advanced SIMD version instead of a SVE version, which seems to be wrong. I plan to add a new hook. The hook does s= ome special treatment to build a SVE version vector type when processing VLS SVE simdclones, while call build_vector_type directly in other cases. Part 5) Generate the call of SVE vectoried functions in pass_vect, specifically: a) Define a new hook that return true if the target support variable vector length simdclones and set the aarch64 return value to true if TARGET_SVE. In vectorizable_simd_clone_call, continue analyzing instead of directly return= ing false. b) Adjustment to the calculation of badness. c) The generating of mask. Since there is still not enough debugging, the detailed implementation plans of Part 5) b) and Part 5) c) have not been developed yet. For now, I=E2=80=99m working on Part 3) and Part 4). I think it=E2=80= =99s necessary to propose the plan to be reviewed and see if there is any suggestion, since t= here are many detailed designs that I=E2=80=99m not sure whether they are the be= st ways to do so, any comments? In addition, I have finished the first two patches and attached them on this PR. Is it necessary to send the patchs to the GCC patches mailing list= for reviewing?=