From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 91B9B3857C43 for ; Thu, 1 Feb 2024 17:01:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 91B9B3857C43 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 91B9B3857C43 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706806905; cv=none; b=JVnrckHLIGc0JMG36qg9nCM5MI+PmpUIh5VBc38LYUyxeFi6eicw1AU+AFFPVdcrD7kphBPtqkSfIpttGf6/e7VJqmiv5v0E/m3YmKKHHUzbyKNKuxsCFLYKPoz+q8xH8LreIHlcDb3nsYZu7JsyEl/gp1pTx+B/4KGhh/AT3Us= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706806905; c=relaxed/simple; bh=+Ks8fUFsMU6WFfG0/4zbneV8nuOSx7VoqFlRitBrToA=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=p64LAcN+X0VBwLlN+E4gJwMFZzD88yxq555ESxJMqcdMkIct7aE/W/lA1xqdb5y7T4BRm0pZgq4kfHumfpsc30aLla9I133ouGfhqAxkDKcPPxCppBVYLj4wJDTtGQm6+GlJzo71IOAqIlgmfVuZdqbXCJsTgo23VKCiUpmIaSw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E78A2DA7; Thu, 1 Feb 2024 09:02:25 -0800 (PST) Received: from [10.1.31.159] (E121495.arm.com [10.1.31.159]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CDEEA3F738; Thu, 1 Feb 2024 09:01:42 -0800 (PST) Message-ID: <359e8112-65c9-40b1-9566-aa31165c05e8@arm.com> Date: Thu, 1 Feb 2024 17:01:40 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Content-Language: en-US To: Richard Biener Cc: gcc-patches@gcc.gnu.org, Richard.Sandiford@arm.com References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> <20240130143132.9575-2-andre.simoesdiasvieira@arm.com> <47e1aeb2-94ac-4733-b49f-ea97932cc49f@arm.com> <545r8s73-675p-4o48-sr66-q6956nqp6r6p@fhfr.qr> <3rq8sn71-8188-o4rq-9spp-q9spn98163q5@fhfr.qr> From: "Andre Vieira (lists)" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 01/02/2024 07:19, Richard Biener wrote: > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > The patch didn't come with a testcase so it's really hard to tell > what goes wrong now and how it is fixed ... My bad! I had a testcase locally but never added it... However... now I look at it and ran it past Richard S, the codegen isn't 'wrong', but it does have the potential to lead to some pretty slow codegen, especially for inbranch simdclones where it transforms the SVE predicate into an Advanced SIMD vector by inserting the elements one at a time... An example of which can be seen if you do: gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S with the following t.c: #pragma omp declare simd simdlen(4) inbranch int __attribute__ ((const)) fn5(int); void fn4 (int *a, int *b, int n) { for (int i = 0; i < n; ++i) b[i] = fn5(a[i]); } Now I do have to say, for our main usecase of libmvec we won't have any 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that doesn't mean user-code will. I'm gonna remove this patch and run another test regression to see if it catches anything weird, but if not then I guess we do have the option to not use this patch and aim to solve the costing or codegen issue in GCC-15. We don't currently do any simdclone costing and I don't have a clear suggestion for how given openmp has no mechanism that I know off to expose the speedup of a simdclone over it's scalar variant, so how would we 'compare' a simdclone call with extra overhead of argument preparation vs scalar, though at least we could prefer a call to a different simdclone with less argument preparation. Anyways I digress. Other tests, these require aarch64-autovec-preference=2 so that also has me worried less... gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 --param aarch64-autovec-preference=2 -fopenmp-simd t.c -S t.c: #pragma omp declare simd simdlen(2) notinbranch float __attribute__ ((const)) fn1(double); void fn0 (float *a, float *b, int n) { for (int i = 0; i < n; ++i) b[i] = fn1((double) a[i]); } #pragma omp declare simd simdlen(2) notinbranch float __attribute__ ((const)) fn3(float); void fn2 (float *a, double *b, int n) { for (int i = 0; i < n; ++i) b[i] = (double) fn3(a[i]); } > Richard. > >>> >>> That said, I wonder how we end up mixing things up in the first place. >>> >>> Richard. >> >