From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 655AB385C41C; Fri, 30 Sep 2022 12:30:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 655AB385C41C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1664541046; bh=ILnzlupcDVx0wQwtZg+i07qO4Hec2k+pgb7VyGkU+Gg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=HOe9Z9WexxvRWFGc5lpjz8t9YM8fdjnsXB7YCkyxPkQtryw36+kqDMneHft0EzPO/ bP10R38HTP/5zFibM8flAwNZ9Tt7KkVcaSCRLAcyf+yI+zq6Nj57Hf9HYSdk4g3E3i R6qwrgQiPbccJUP68f4acxvy1z7O5+1INbhduQyY= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/107096] Fully masking vectorization with AVX512 ICEs gcc.dg/vect/vect-over-widen-*.c Date: Fri, 30 Sep 2022 12:30:45 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cf_gcctarget cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107096 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |x86_64-*-* CC| |rsandifo at gcc dot gnu.org --- Comment #1 from Richard Biener --- Take the simplified void foo (int * __restrict dst, short *src, int n) { for (int i =3D 0; i < n; ++i) dst[i] =3D src[2*i] + src[2*i+1]; } here we get vect_record_loop_mask twice with nvectors =3D=3D 2 and V16HImode for the load which populates masks[2]. Then we once get V8SImode but also with nvectors =3D=3D 2 which leaves the data unadjusted since it looks at masks[2] as well but if it were to come first we'd have recorded a different mask vector type. The masks seem to be constructed in a way to produce two bits per lane (but we still apply it naively?!) and the V_C_E does actually look wrong to me. Huh. With SVE I seem to get (besides a .LOAD_LANE version) permutes of the mask vector: loop_mask_116 =3D VEC_PERM_EXPR ; loop_mask_117 =3D VEC_PERM_EXPR ; ... vect__69.20_118 =3D .MASK_LOAD (vectp_src.18_114, 16B, loop_mask_116); vectp_src.18_119 =3D vectp_src.18_114 + POLY_INT_CST [8, 8]; vect__69.21_120 =3D .MASK_LOAD (vectp_src.18_119, 16B, loop_mask_117); ... .MASK_STORE (vectp_dst.25_129, 32B, loop_mask_131, vect__79.24_125); so the original mask provider here is the larger element vector type and the smaller element masks are produced from that. That's something I'd expect for AVX512 as well, not sure where it goes "wrong". See PR107093 for a patch implementing WHILE_ULT for AVX512.=