From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 264553858D20 for ; Thu, 29 Feb 2024 07:26:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 264553858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 264553858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709191614; cv=none; b=DT3AkZiMNs5HhNysl/79O7FIlaDYYwgwpsW9WsBADvAN0FraortDbwpmatR75zj24kL66BTbSaNTn/NYfMPY2rLjYvJosqBge2aoVpODSB0vcRghxctbpvTZyTfPEVnmACd3w+6NhhOHzXdbsIEJlgT5/DWlNVFD8vqPVZsYqlI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709191614; c=relaxed/simple; bh=eSuGbjMpvNiBgKMPGf4LsBE7/3p7hU/p7CWpSSsANh4=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=Y/6DQHyBGg2mn6mf/XibYPY+c3rFvsqqyjLKT1CDyfWeFwZscIVsWVXMaWpIZslRBM4o2CaklCxCKFRA39s9h3ixk+QJOVK2zanQS0T5ILzsYeUqa12sIoXQZ2ZiGrQB/fotsKfYTB0E6eAYFz9REzMJ5Us7QXZI6kziuik1kLE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id F3C2B2289D; Thu, 29 Feb 2024 07:26:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1709191609; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hufzY0+oveC3yIJrhbW96EEODpSfYVSnHOzN2l7916U=; b=rVw2jFoKlCykFeTNokZ7bvnvC9gMQMWHsbSbYFIa3yVHyCp2L5k55TrRepf9bxZs46d1ca KfYNsM3wqY1JS1IzXiwvS6iN2QHldOtR5gIg7oUFqoqR3d/VpvhXlZPeswcd20DwqonBaC JbNUzue8jCx7fnW6BI9r/yhQtbz6JBE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1709191609; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hufzY0+oveC3yIJrhbW96EEODpSfYVSnHOzN2l7916U=; b=9+APHJsDQLLVpMf+3C41yU3PTlhhw+2Ikv3RQ+pBTYkZHjtBzS4wL3j7wNjtiCBTvKWEsw XM6PQdt5WmtTT3DQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1709191609; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hufzY0+oveC3yIJrhbW96EEODpSfYVSnHOzN2l7916U=; b=rVw2jFoKlCykFeTNokZ7bvnvC9gMQMWHsbSbYFIa3yVHyCp2L5k55TrRepf9bxZs46d1ca KfYNsM3wqY1JS1IzXiwvS6iN2QHldOtR5gIg7oUFqoqR3d/VpvhXlZPeswcd20DwqonBaC JbNUzue8jCx7fnW6BI9r/yhQtbz6JBE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1709191609; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hufzY0+oveC3yIJrhbW96EEODpSfYVSnHOzN2l7916U=; b=9+APHJsDQLLVpMf+3C41yU3PTlhhw+2Ikv3RQ+pBTYkZHjtBzS4wL3j7wNjtiCBTvKWEsw XM6PQdt5WmtTT3DQ== Date: Thu, 29 Feb 2024 08:26:48 +0100 (CET) From: Richard Biener To: "Andre Vieira (lists)" cc: gcc-patches@gcc.gnu.org, Richard.Sandiford@arm.com Subject: Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE In-Reply-To: <0a69b67c-7fa3-4a79-86c4-c6311cc56ea7@arm.com> Message-ID: <00p926sn-s1oo-s856-9p4o-2p2n8oo1q0s8@fhfr.qr> References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> <20240130143132.9575-2-andre.simoesdiasvieira@arm.com> <47e1aeb2-94ac-4733-b49f-ea97932cc49f@arm.com> <545r8s73-675p-4o48-sr66-q6956nqp6r6p@fhfr.qr> <3rq8sn71-8188-o4rq-9spp-q9spn98163q5@fhfr.qr> <359e8112-65c9-40b1-9566-aa31165c05e8@arm.com> <0a69b67c-7fa3-4a79-86c4-c6311cc56ea7@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: smtp-out1.suse.de; none X-Spamd-Result: default: False [-3.10 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Level: X-Spam-Score: -3.10 X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 28 Feb 2024, Andre Vieira (lists) wrote: > > > On 27/02/2024 08:47, Richard Biener wrote: > > On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 05/02/2024 09:56, Richard Biener wrote: > >>> On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: > >>> > >>>> > >>>> > >>>> On 01/02/2024 07:19, Richard Biener wrote: > >>>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > >>>>> > >>>>> > >>>>> The patch didn't come with a testcase so it's really hard to tell > >>>>> what goes wrong now and how it is fixed ... > >>>> > >>>> My bad! I had a testcase locally but never added it... > >>>> > >>>> However... now I look at it and ran it past Richard S, the codegen isn't > >>>> 'wrong', but it does have the potential to lead to some pretty slow > >>>> codegen, > >>>> especially for inbranch simdclones where it transforms the SVE predicate > >>>> into > >>>> an Advanced SIMD vector by inserting the elements one at a time... > >>>> > >>>> An example of which can be seen if you do: > >>>> > >>>> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S > >>>> > >>>> with the following t.c: > >>>> #pragma omp declare simd simdlen(4) inbranch > >>>> int __attribute__ ((const)) fn5(int); > >>>> > >>>> void fn4 (int *a, int *b, int n) > >>>> { > >>>> for (int i = 0; i < n; ++i) > >>>> b[i] = fn5(a[i]); > >>>> } > >>>> > >>>> Now I do have to say, for our main usecase of libmvec we won't have any > >>>> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course > >>>> that > >>>> doesn't mean user-code will. > >>> > >>> It seems to use SVE masks with vector(4) and the > >>> ABI says the mask is vector(4) int. You say that's because we choose > >>> a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). > >>> > >>> The vectorizer creates > >>> > >>> _44 = VEC_COND_EXPR ; > >>> > >>> and then vector lowering decomposes this. That means the vectorizer > >>> lacks a check that the target handles this VEC_COND_EXPR. > >>> > >>> Of course I would expect that SVE with VLS vectors is able to > >>> code generate this operation, so it's missing patterns in the end. > >>> > >>> Richard. > >>> > >> > >> What should we do for GCC-14? Going forward I think the right thing to do > >> is > >> to add these patterns. But I am not even going to try to do that right now > >> and > >> even though we can codegen for this, the result doesn't feel like it would > >> ever be profitable which means I'd rather not vectorize, or well pick a > >> different vector mode if possible. > >> > >> This would be achieved with the change to the targethook. If I change the > >> hook > >> to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now? > > > > Passing in a mode is OK. I'm still not fully understanding why the > > clone isn't fully specifying 'mode' and if it does not why the > > vectorizer itself can not disregard it. > > > We could check that the modes of the parameters & return type are the same as > the vector operands & result in the vectorizer. But then we'd also want to > make sure we don't reject cases where we have simdclones with compatible > modes, aka same element type, but a multiple element count. Which is where'd > we get in trouble again I think, because we'd want to accept V8SI -> 2x V4SI, > but not V8SI -> 2x VNx4SI (with VLS and aarch64_sve_vg = 2), not because it's > invalid, but because right now the codegen is bad. And it's easier to do this > in the targethook, which we can technically also use to 'rank' simdclones by > setting a target_badness value, so in the future we could decide to assign > some 'badness' to influence the rank a SVE simdclone for Advanced SIMD loops > vs an Advanced SIMD clone for Advanced SIMD loops. > > This does touch another issue of simdclone costing, which is a larger issue in > general and one we (arm) might want to approach in the future. It's a complex > issue, because the vectorizer doesn't know the performance impact of a > simdclone, we assume (as we should) that its faster than the original scalar, > though we currently don't record costs for either, but we don't know by how > much or how much impact it has, so the vectorizer can't reason whether it's > beneficial to use a simdclone if it has to do a lot of operand preparation, we > can merely tell it to use it, or not and all the other operations in the loop > will determine costing. > > > > From the past discussion I understood the existing situation isn't > > as bad as initially thought and no bad things happen right now? > Nope, I thought they compiler would fall apart, but it seems to be able to > transform the operands from one mode into the other, so without the targethook > it just generates slower loops in certain cases, which we'd rather avoid given > the usecase for simdclones is to speed things up ;) > > > Attached reworked patch. > > > This patch adds a machine_mode argument to TARGET_SIMD_CLONE_USABLE to make > sure the target can reject a simd_clone based on the vector mode it is using. > This is needed because for VLS SVE vectorization the vectorizer accepts > Advanced SIMD simd clones when vectorizing using SVE types because the > simdlens might match, this currently leads to suboptimal codegen. > > Other targets do not currently need to use this argument. +ix86_simd_clone_usable (struct cgraph_node *node, + machine_mode ARG_UNUSED (vector_mode)) we use C++, just omit the parameter name. You use STMT_VINFO_VECTYPE conditional and vinfo->vector_mode otherwise. I think simdclones without a return value might be a thing? What type would STMT_VINFO_VECTYPE correspond to? The documentation also doesn't say whether it's the mode of a return value, or which argument value. It seems it is just a random mode that might or might not provide properties of the incoming(?) argument values and that might differ from the actual argument modes? As said, I think the vectorizer has more info - it knows the incoming mode for each arg and the expected argument modes for each arg, including mask modes involved (that extra mode argument is for values and never for masks?). Anyway, I can live with this but I'll leave it to Richard S. to approve (and take the blame ;)). Thanks, Richard. > gcc/ChangeLog: > > * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector_mode > to call TARGET_SIMD_CLONE_USABLE. > * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument > and use it to reject the use of SVE simd clones with Advanced SIMD > modes. > * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. > * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)