From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 439003858C78 for ; Wed, 13 Dec 2023 10:18:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 439003858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 439003858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702462698; cv=none; b=xT89GQ+HPt412ODO/We5aXC30v+IFJtHzT+l9Xb6LIkEEp7tl9oi+RUEpXNigiGdZCpzCKNAWit76EshHsASioWN9jRqp0XGOks3+BeU9N7kN2Ua4OBakLTbKV2E06JKgsliBjGufE/kcPAXkkmcfF8Oc1PKY9kBQBDSNx75VM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702462698; c=relaxed/simple; bh=Spcr1vC8kWSiynu0EsHcZpYrNiUkPRqPAtkgwhfTe6I=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=G4njYX6crZUJOFLGfmH0ZsH82INupsX8RrD9bpjsblCbih9z7AKsNiCOouedxKFyYe+I4GR1FQKz9pa4Nf9NJ2SYAbJY/nHt1sUVR+IXcY0CDPCSpxEOZGg2Yfd8xRK+hgVjAHDBhI4/QmR9DR3dtrqUDaULw+PNn646NMRhYRE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D0A5C22390; Wed, 13 Dec 2023 10:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702462696; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9ZBLTSYkWtumMbHyun5L+/IAPbP7ee51CLitmdVMGmg=; b=nGP5OaCpIzDbu+RoTkzgz8XlImuZeiTnxcdRlDV8kYy9u9UKV12vUzz7063vTMrNqNj3Gh uKQwkp5TO4Ng/RYi5ss5KqCMr8NSFYS+c9cxisCQWVdhYuwW1FMiYq97y8rjd7MuBsXdhU 3eG1DMLYmI8lR96orI07BD8p8+ASe0c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702462696; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9ZBLTSYkWtumMbHyun5L+/IAPbP7ee51CLitmdVMGmg=; b=+iOSAZ1nJndp6RT76a9kj9FNX/5oI1mm1AEpx0MnjgFI1Xht1yeLgwDYo2FfoNNvbdhkE5 UHA8g+qpxZ4qn0Cg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702462696; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9ZBLTSYkWtumMbHyun5L+/IAPbP7ee51CLitmdVMGmg=; b=nGP5OaCpIzDbu+RoTkzgz8XlImuZeiTnxcdRlDV8kYy9u9UKV12vUzz7063vTMrNqNj3Gh uKQwkp5TO4Ng/RYi5ss5KqCMr8NSFYS+c9cxisCQWVdhYuwW1FMiYq97y8rjd7MuBsXdhU 3eG1DMLYmI8lR96orI07BD8p8+ASe0c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702462696; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9ZBLTSYkWtumMbHyun5L+/IAPbP7ee51CLitmdVMGmg=; b=+iOSAZ1nJndp6RT76a9kj9FNX/5oI1mm1AEpx0MnjgFI1Xht1yeLgwDYo2FfoNNvbdhkE5 UHA8g+qpxZ4qn0Cg== Date: Wed, 13 Dec 2023 11:17:12 +0100 (CET) From: Richard Biener To: Juzhe-Zhong cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com, jeffreyalaw@gmail.com Subject: Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model In-Reply-To: <20231213092107.191733-1-juzhe.zhong@rivai.ai> Message-ID: <44on3q56-415s-n6s6-4922-sr36p2s5p33r@fhfr.qr> References: <20231213092107.191733-1-juzhe.zhong@rivai.ai> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: X-Spam-Score: -1.08 Authentication-Results: smtp-out1.suse.de; none X-Spam-Level: X-Spam-Score: -1.08 X-Spamd-Result: default: False [-1.08 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(2.92)[0.972]; NEURAL_HAM_LONG(-1.00)[-1.000]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:~]; FREEMAIL_CC(0.00)[gcc.gnu.org,arm.com,gmail.com]; BAYES_HAM(-3.00)[100.00%] X-Spam-Flag: NO X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 13 Dec 2023, Juzhe-Zhong wrote: > Hi, before this patch, a simple conversion case for RVV codegen: > > foo: > ble a2,zero,.L8 > addiw a5,a2,-1 > li a4,6 > bleu a5,a4,.L6 > srliw a3,a2,3 > slli a3,a3,3 > add a3,a3,a0 > mv a5,a0 > mv a4,a1 > vsetivli zero,8,e16,m1,ta,ma > .L4: > vle8.v v2,0(a5) > addi a5,a5,8 > vzext.vf2 v1,v2 > vse16.v v1,0(a4) > addi a4,a4,16 > bne a3,a5,.L4 > andi a5,a2,-8 > beq a2,a5,.L10 > .L3: > slli a4,a5,32 > srli a4,a4,32 > subw a2,a2,a5 > slli a2,a2,32 > slli a5,a4,1 > srli a2,a2,32 > add a0,a0,a4 > add a1,a1,a5 > vsetvli zero,a2,e16,m1,ta,ma > vle8.v v2,0(a0) > vzext.vf2 v1,v2 > vse16.v v1,0(a1) > .L8: > ret > .L10: > ret > .L6: > li a5,0 > j .L3 > > This vectorization go through first loop: > > vsetivli zero,8,e16,m1,ta,ma > .L4: > vle8.v v2,0(a5) > addi a5,a5,8 > vzext.vf2 v1,v2 > vse16.v v1,0(a4) > addi a4,a4,16 > bne a3,a5,.L4 > > Each iteration processes 8 elements. > > For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128. > But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits. > only half of the vector units are working and another half is idle. > > After investigation, I realize that I forgot to adjust COST for SELECT_VL. > So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since > after this patch: > > foo: > ble a2,zero,.L5 > .L3: > vsetvli a5,a2,e16,m1,ta,ma -----> SELECT_VL cost. > vle8.v v2,0(a0) > slli a4,a5,1 -----> additional shift of outcome SELECT_VL for memory address calculation. > vzext.vf2 v1,v2 > sub a2,a2,a5 > vse16.v v1,0(a1) > add a0,a0,a5 > add a1,a1,a4 > bne a2,zero,.L3 > .L5: > ret > > This patch is a simple fix that I previous forgot. > > Ok for trunk ? OK. Richard. > If not, I am going to adjust cost in backend cost model. > > PR target/111317 > > gcc/ChangeLog: > > * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test. > > --- > .../gcc.dg/vect/costmodel/riscv/rvv/pr111317.c | 12 ++++++++++++ > gcc/tree-vect-loop.cc | 17 ++++++++++++++--- > 2 files changed, 26 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c > new file mode 100644 > index 00000000000..d4bea242a9a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize --param=riscv-autovec-lmul=m1" } */ > + > +void > +foo (char *__restrict a, short *__restrict b, int n) > +{ > + for (int i = 0; i < n; i++) > + b[i] = (short) a[i]; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e16,\s*m1,\s*t[au],\s*m[au]} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */ > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 6261cd1be1d..19e38b8637b 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -4870,10 +4870,21 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, > if (partial_load_store_bias != 0) > body_stmts += 1; > > - /* Each may need two MINs and one MINUS to update lengths in body > - for next iteration. */ > + unsigned int length_update_cost = 0; > + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) > + /* For decrement IV style, we use a single SELECT_VL since > + beginning to calculate the number of elements need to be > + processed in current iteration, and a SHIFT operation to > + compute the next memory address instead of adding vectorization > + factor. */ > + length_update_cost = 2; > + else > + /* For increment IV stype, Each may need two MINs and one MINUS to > + update lengths in body for next iteration. */ > + length_update_cost = 3; > + > if (need_iterate_p) > - body_stmts += 3 * num_vectors; > + body_stmts += length_update_cost * num_vectors; > } > > (void) add_stmt_cost (target_cost_data, prologue_stmts, > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)