From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5BCE13858C2A; Mon, 16 Oct 2023 09:29:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5BCE13858C2A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1697448579; bh=WOjNbavnKehIhBWkIaV95wy/hoiuO6a/9i6TumK7rl4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=yeu9GU2t9yRa07gGlu9Z++XeDsdrZEUYuxMMJgJJwfPHdANUKXMU2Fm6oWN9gKQhc AHg6WzyNyKqPrlSgWmO1IHs3KHicfzBmm1c1kViW0l/FEFYqemMG6v094TnP928hE+ G8nS6K60gPxb0yL8Zow+qGRX/Q+aL++jvgLjuK48= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision Date: Mon, 16 Oct 2023 09:29:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111794 --- Comment #8 from rguenther at suse dot de --- On Mon, 16 Oct 2023, rdapp at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111794 >=20 > --- Comment #7 from Robin Dapp --- > vectp.4_188 =3D x_50(D); > vect__1.5_189 =3D MEM [(int *)vectp.4_188]; > mask__2.6_190 =3D { 1, 1, 1, 1, 1, 1, 1, 1 } =3D=3D vect__1.5_189; > mask_patt_156.7_191 =3D VIEW_CONVERT_EXPR >(mask__2.6_190); > _1 =3D *x_50(D); > _2 =3D _1 =3D=3D 1; > vectp.9_192 =3D y_51(D); > vect__3.10_193 =3D MEM [(short int *)vectp.9_192]; > mask__4.11_194 =3D { 2, 2, 2, 2, 2, 2, 2, 2 } =3D=3D vect__3.10_193; > mask_patt_157.12_195 =3D mask_patt_156.7_191 & mask__4.11_194; > vect_patt_158.13_196 =3D VEC_COND_EXPR 1, 1, 1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>; > vect_patt_159.14_197 =3D (vector(8) int) vect_patt_158.13_196; >=20 >=20 > This yields the following assembly: > vsetivli zero,8,e32,m2,ta,ma > vle32.v v2,0(a0) > vmv.v.i v4,1 > vle16.v v1,0(a1) > vmseq.vv v0,v2,v4 > vsetvli zero,zero,e16,m1,ta,ma > vmseq.vi v1,v1,2 > vsetvli zero,zero,e32,m2,ta,ma > vmv.v.i v2,0 > vmand.mm v0,v0,v1 > vmerge.vvm v2,v2,v4,v0 > vse32.v v2,0(a0) >=20 > Apart from CSE'ing v4 this looks pretty good to me. My connection is rea= lly > poor at the moment so I cannot quickly compare what aarch64 does for that > example. That looks reasonable. Note this then goes through vectorizable_assignment as a no-op move. The question is if we can arrive here with signed bool : 2 vs. _Bool : 2 somehow (I wonder how we arrive with singed bool : 1 here - that's from pattern recog, right? why didn't that produce a COND_EXPR for this?). I think for more thorough testing the condition should change to /* But a conversion that does not change the bit-pattern is ok. */ && !(INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest)) && INTEGRAL_TYPE_P (TREE_TYPE (op)) && ((TYPE_PRECISION (TREE_TYPE (scalar_dest)) > TYPE_PRECISION (TREE_TYPE (op))) && TYPE_UNSIGNED (TREE_TYPE (op)))) || TYPE_PRECISION (TREE_TYPE (scalar_dest)) =3D=3D TYPE_PRECISION (TREE_TYPE (op))))) rather than just doing >=3D which would be odd (why allow to skip sign-extenting from the unsigned MSB but not allow to skip zero-extending from it)=