From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 7EC613858D37 for ; Thu, 1 Feb 2024 07:19:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7EC613858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7EC613858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706771976; cv=none; b=kUSZnXOqY9lswznvStFXcgWvp8+ut19Tp44zFuVQSal+PLxHlkwWE4fbbmZ0uO1DzSaVgc/rMXoNpaECDJUWLLo22MNqSiVBRIgjRTKtgQBCaqQi/XkSkul+8XkqH9rGyIlpBHwzkNsF/E2qFaTv+tbjPCmHHJ4wzwNNy0l+4tM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706771976; c=relaxed/simple; bh=bI0iDnIUo+LQF40BxB1KT44UMLiYwlTqiJLJCuR9ZD0=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=sfapZJ9Up4Vp6oB7hzXmB1PInhzJSZng4uk/C0j7biyXQNOU9AmbeH+PoIDdhLI38URVyUkYsropHD/idENeXPAxp92CLCxMhH388YHit8d1q99ktDRbRcZqjn7cmNQSzKvOA81Y5gwvyFyj+thBGju5myu3wAmwcTIQK8Fskds= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 5D432221B3; Thu, 1 Feb 2024 07:19:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1706771972; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=xkBWRSr8eeCONi5WP/HreFE8tz9U2jq6yOug+PmGDt4=; b=OxeVz8U7dWTDW+O5iG9PioTk2ntzD3fAu9tnnkjgDnereOH+NAuqmIBQEnTuuo4TAFRdSp 29XYyuI62/tKeaVO8u35xYYsUq2mhJfmVioGt38UAFdQONg/e+ZL/59y28GdxxmDq5yxae QHUifaRXPWyW0nrzS+U5XJhCpncdQzc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1706771972; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=xkBWRSr8eeCONi5WP/HreFE8tz9U2jq6yOug+PmGDt4=; b=iIQtRUhqjlWiDoN7by107D7ioj8XuKWu1mSkeMpKJrtyi2pDcnL14iJ9E8lKgda8SLCUgo V4AEAKf5mf5B7gDw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1706771972; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=xkBWRSr8eeCONi5WP/HreFE8tz9U2jq6yOug+PmGDt4=; b=OxeVz8U7dWTDW+O5iG9PioTk2ntzD3fAu9tnnkjgDnereOH+NAuqmIBQEnTuuo4TAFRdSp 29XYyuI62/tKeaVO8u35xYYsUq2mhJfmVioGt38UAFdQONg/e+ZL/59y28GdxxmDq5yxae QHUifaRXPWyW0nrzS+U5XJhCpncdQzc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1706771972; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=xkBWRSr8eeCONi5WP/HreFE8tz9U2jq6yOug+PmGDt4=; b=iIQtRUhqjlWiDoN7by107D7ioj8XuKWu1mSkeMpKJrtyi2pDcnL14iJ9E8lKgda8SLCUgo V4AEAKf5mf5B7gDw== Date: Thu, 1 Feb 2024 08:19:32 +0100 (CET) From: Richard Biener To: "Andre Vieira (lists)" cc: gcc-patches@gcc.gnu.org, Richard.Sandiford@arm.com Subject: Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE In-Reply-To: Message-ID: References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> <20240130143132.9575-2-andre.simoesdiasvieira@arm.com> <47e1aeb2-94ac-4733-b49f-ea97932cc49f@arm.com> <545r8s73-675p-4o48-sr66-q6956nqp6r6p@fhfr.qr> <3rq8sn71-8188-o4rq-9spp-q9spn98163q5@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: smtp-out1.suse.de; none X-Spam-Level: X-Spam-Score: -4.30 X-Spamd-Result: default: False [-4.30 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > On 31/01/2024 14:35, Richard Biener wrote: > > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 31/01/2024 13:58, Richard Biener wrote: > >>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > >>> > >>>> > >>>> > >>>> On 31/01/2024 12:13, Richard Biener wrote: > >>>>> On Wed, 31 Jan 2024, Richard Biener wrote: > >>>>> > >>>>>> On Tue, 30 Jan 2024, Andre Vieira wrote: > >>>>>> > >>>>>>> > >>>>>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure > >>>>>>> the > >>>>>>> target can reject a simd_clone based on the vector mode it is using. > >>>>>>> This is needed because for VLS SVE vectorization the vectorizer > >>>>>>> accepts > >>>>>>> Advanced SIMD simd clones when vectorizing using SVE types because the > >>>>>>> simdlens > >>>>>>> might match. This will cause type errors later on. > >>>>>>> > >>>>>>> Other targets do not currently need to use this argument. > >>>>>> > >>>>>> Can you instead pass down the mode? > >>>>> > >>>>> Thinking about that again the cgraph_simd_clone info in the clone > >>>>> should have sufficient information to disambiguate. If it doesn't > >>>>> then we should amend it. > >>>>> > >>>>> Richard. > >>>> > >>>> Hi Richard, > >>>> > >>>> Thanks for the review, I don't think cgraph_simd_clone_info is the right > >>>> place > >>>> to pass down this information, since this is information about the caller > >>>> rather than the simdclone itself. What we are trying to achieve here is > >>>> making > >>>> the vectorizer being able to accept or reject simdclones based on the ISA > >>>> we > >>>> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we > >>>> use > >>>> modes, I am also not sure that's ideal but it is what we currently use. > >>>> So > >>>> to > >>>> answer your earlier question, yes I can also pass down mode if that's > >>>> preferable. > >>> > >>> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere > >>> whether that's POLY or constant. I wonder how aarch64_sve_mode_p > >>> comes into play here which in the end classifies VLS SVE modes as > >>> non-SVE? > >>> > >> > >> Using -msve-vector-bits=128 > >> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) > >> $4 = E_VNx4SImode > >> (gdb) p TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)) > >> $5 = (tree) 0xfffff741c1b0 > >> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) > >> 128 > >> (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))) > >> $5 = true > >> > >> and for reference without vls codegen: > >> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) > >> $1 = E_VNx4SImode > >> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) > >> POLY_INT_CST [128, 128] > >> > >> Having said that I believe that the USABLE targethook implementation for > >> aarch64 should also block other uses, like an Advanced SIMD mode being used > >> as > >> input for a SVE VLS SIMDCLONE. The reason being that for instance 'half' > >> registers like VNx2SI are packed differently from V2SI. > >> > >> We could teach the vectorizer to support these of course, but that requires > >> more work and is not extremely useful just yet. I'll add the extra check > >> that > >> to the patch once we agree on how to pass down the information we need. > >> Happy > >> to use either mode, or stmt_vec_info and extract the mode from it like it > >> does > >> now. > > > > As said, please pass down 'mode'. But I wonder how to document it, > > which mode is that supposed to be? Any of result or any argument > > mode that happens to be a vector? I think that we might be able > > to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128 > > in the same loop? > > > > Are the simd clones you don't want to use with -msve-vector-bits=128 > > having constant simdlen? If so why do you generate them in the first > > place? > > So this is where things get a bit confusing and I will write up some text for > these cases to put in our ABI document (currently in Beta and in need of some > tlc). > > Our intended behaviour is for a 'declare simd' without a simdlen to generate > simdclones for: > * Advanced SIMD 128 and 64-bit vectors, where possible (we don't allow for > simdlen 1, Tamar fixed that in gcc recently), > * SVE VLA vectors. > > Let me illustrate this with an example: > > __attribute__ ((simd (notinbranch), const)) float cosf(float); > > Should tell the compiler the following simd clones are available: > __ZGVnN4v_cosf 128-bit 4x4 float Advanced SIMD clone > __ZGVnN2v_cosf 64-bit 4x2 float Advanced SIMD clone > __ZGVsMxv_cosf [128, 128]-bit 4x4xN SVE SIMD clone > > [To save you looking into the abi let me break this down, _ZGV is prefix, then > 'n' or 's' picks between Advanced SIMD and SVE, 'N' or 'M' picks between Not > Masked and Masked (SVE is always masked even if we ask for notinbranch), then > a digit or 'x' picks between Vector Length or VLA, and after that you get a > letter per argument, where v = vector mapped] > > Regardless of -msve-vector-bits, however, the vectorizer (and any other part > of the compiler) may assume that the VL of the VLA SVE clone is that specified > by -msve-vector-bits, which if the clone is written in a VLA way will still > work. > > If the attribute is used with a function definition rather than declaration, > so: > > __attribute__ ((simd (notinbranch), const)) float fn0(float a) > { > return a + 1.0f; > } > > the compiler should again generate the three simd clones: > __ZGVnN4v_fn0 128-bit 4x4 float Advanced SIMD clone > __ZGVnN2v_fn0 64-bit 4x2 float Advanced SIMD clone > __ZGVsMxv_fn0 [128, 128]-bit 4x4xN SVE SIMD clone > > However, in the last one it may assume a VL for the codegen of the body and > it's the user's responsibility to only use it for targets with that length , > much like any other code produced this way. > > So that's what we tell the compiler is available and what the compiler > generates depending on where we use the attribute. The question at hand here > is, what can the vectorizer use for a specific loop. If we are using Advanced > SIMD modes then it needs to call an Advanced SIMD clone, and if we are using > SVE modes then it needs to call an SVE clone. At least until we support the > ABI conversion, because like I said for an unpacked argument they behave > differently. > > PS: In the future OpenMP may add specifications that allow us to define a > specific VLA simdlen... in other words, whether we want [128, 128] or [256, > 256], [512, 512] ... etc, but that still needs agreement on the OpenMP Spec, > which is why for now we piggy back on the simdlen-less definition to provide > us a VLA SVE simdclone with [128, 128] VL. > > Hopefully this makes things a bit clearer :/ So where does it go wrong? What case does the patch fix? For the non-definition case the SVE clone should have a POLY_INT simdlen and as you say it should be fine to use that even with -msve-vector-bits. For the definition case the SVE clone might have a constant simdlen but so does the caller (unless we allow different setting between functions/TUs?). The only thing the vectorizer looks at is I think if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen, &num_calls) || (!n->simdclone->inbranch && (masked_call_offset > 0)) || (nargs != simd_nargs)) continue; plus your 2nd patch rejecting num_calls > 1 for variable-length SVE. The patch didn't come with a testcase so it's really hard to tell what goes wrong now and how it is fixed ... Richard. > > > > That said, I wonder how we end up mixing things up in the first place. > > > > Richard. > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)