From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9FC423858D38; Wed, 24 Jan 2024 14:42:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9FC423858D38 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1706107351; bh=2L2p3YLO2MQchbuSGPey7QRODql+XvQCwcfvzFCuh9g=; h=From:To:Subject:Date:In-Reply-To:References:From; b=rweM68raAJ/wk/l4GZXFddIiteCiC4o947lu5rKTLbLcgPEFQaIQuum8up8NEApBG DjTOiWRiQ+5cnRgrZz7xuiLhvmEGPV+lMfnO+8bJmYVGP2grF4DaiJmGLYg/QZ+eIc T4pduMyWVaG+yNAGYrnmvpiSo36g+w4CeKx4GnnA= From: "juzhe.zhong at rivai dot ai" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/113583] Main loop in 519.lbm not vectorized. Date: Wed, 24 Jan 2024 14:42:31 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: juzhe.zhong at rivai dot ai X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113583 --- Comment #1 from JuzheZhong --- It's interesting, for Clang only RISC-V can vectorize it. I think there are 2 topics: 1. Support vectorization of this codes of in loop vectorizer. 2. Transform gather/scatter into strided load/store for RISC-V. For 2nd topic: LLVM does it by RISC-V target specific lowering pass: RISC-V gather/scatter lowering (riscv-gather-scatter-lowering) This is the RISC-V LLVM backend codes: if (II->getIntrinsicID() =3D=3D Intrinsic::masked_gather) Call =3D Builder.CreateIntrinsic( Intrinsic::riscv_masked_strided_load, {DataType, BasePtr->getType(), Stride->getType()}, {II->getArgOperand(3), BasePtr, Stride, II->getArgOperand(2)}); else Call =3D Builder.CreateIntrinsic( Intrinsic::riscv_masked_strided_store, {DataType, BasePtr->getType(), Stride->getType()}, {II->getArgOperand(0), BasePtr, Stride, II->getArgOperand(3)}); I have ever tried to support strided load/store in GCC loop vectorizer, but it seems to be unacceptable. Maybe we can support strided load/stores by leveraging LLVM approach ??? Btw, LLVM risc-v gather/scatter didn't do a perfect job here: vid.v v8 vmul.vx v8, v8, a3 .... vsoxei64.v v10, (s2), v14 This is in-order indexed store which is very costly in hardware. It should be unorder indexed store or strided store. Anyway, I think we should investigate first how to support vectorization of= lbm in loop vectorizer.=