From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id AD1E13858C41; Mon, 22 Apr 2024 22:23:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AD1E13858C41 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1713824632; bh=iW1IWZ9e9ooJ/qm7PuyBpSCrqAMtWlYklywAHjEARtA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=YAs0CmAmR+jIzdrL0MlZhzNtq98JzyswNu3tqzK2I0zyo6UoXqKDP2ZhsngjmmdUC 5YZ3viLnC3VhFTMEi81u/TVjWPTSTdXhd6Ep2DefQuKX5Fh2A3oEC4Faa/l8jA7LBb B1GUzbUumZ6sQ0dv4Zt3cfgXeqBwBf4OieOk6Tz8= From: "juzhe.zhong at rivai dot ai" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114809] [RISC-V RVV] Counting elements might be simpler Date: Mon, 22 Apr 2024 22:23:52 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: juzhe.zhong at rivai dot ai X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114809 JuzheZhong changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |juzhe.zhong at rivai dot ai --- Comment #3 from JuzheZhong --- For missed peephole optimization, I already noticed it long time ago, and I have filed PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113014 Such issue will gone after Richard Standiford @arm merged late-combine PASS= in GCC 15. Also, GCC support dynamic LMUL optimization with -mrvv-max-lmul=3Ddynamic: https://godbolt.org/z/646nYoKbv ASM: count_chars(char const*, unsigned long, char): beq a1,zero,.L4 vsetvli a4,zero,e8,m1,ta,ma vmv.v.x v1,a2 vsetvli zero,zero,e64,m8,ta,ma vmv.v.i v8,0 .L3: vsetvli a5,a1,e8,m1,ta,ma vle8.v v0,0(a0) sub a1,a1,a5 add a0,a0,a5 vmseq.vv v0,v0,v1 vsetvli zero,zero,e64,m8,tu,mu vadd.vi v8,v8,1,v0.t bne a1,zero,.L3 vsetvli a5,zero,e64,m8,ta,ma li a4,0 vmv.s.x v1,a4 vredsum.vs v8,v8,v1 vmv.x.s a0,v8 ret .L4: li a0,0 ret GCC picks LMUL =3D 8, since it doesn't cause additional register spillings according to the program register pressure.=