From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8A3AC386186E; Thu, 15 Feb 2024 08:40:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8A3AC386186E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1707986400; bh=rSbelcbPD1Pf70Cw8DFrO4GozQiywdKPGhun2PbJ6WI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MSpb0aZBUT5wCkNNleu87yO0dDx259zYBjoLpzsgtW6eQGISbsZnvBExnJWJXBlGz Bzz+VvAqUMQN0MNHPhehcE+AH6tkJ6trP8HvDS2HAlmv6X6HQSDj/RnxgUyymvMv70 FF/8Qn7qe9XxGdqIi1/Zn8yCIln2gsGy/RSMO4yo= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/111156] [14 Regression] aarch64 aarch64/sve/mask_struct_store_4.c failures Date: Thu, 15 Feb 2024 08:39:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: testsuite-fail, wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111156 --- Comment #14 from Tamar Christina --- (In reply to Richard Biener from comment #13) > I didn't add STMT_VINFO_SLP_VECT_ONLY, I'm quite sure we can now do both = SLP > of masked loads and stores, so yes, STMT_VINFO_SLP_VECT_ONLY (when we for= med > a DR group of stmts we cannot combine without SLP as the masks are not eq= ual) > should be set for both loads and stores. >=20 > The can_group_stmts_p checks as present seem correct here (but the dump > should not say "Load" but maybe "Access") I guess I'm wondering because of this usage: /* Check that the data-refs have same first location (except init) and they are both either store or load (not load and store, not masked loads or stores). */ if (DR_IS_READ (dra) !=3D DR_IS_READ (drb) || data_ref_compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb)) !=3D 0 || data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)) != =3D 0 || !can_group_stmts_p (stmtinfo_a, stmtinfo_b, true)) break; We don't exit there now for non-SLP. >=20 > So what's the testcase comment#9 talks about? You should be able to reproduce it with: --- typedef __SIZE_TYPE__ size_t; typedef signed char int8_t; typedef unsigned short uint16_t ; void __attribute__((noinline, noclone)) test_i8_i8_i16_2(int8_t *__restrict dest, int8_t *__restrict src, uint16_t *__restrict cond, size_t n) { for (size_t i =3D 0; i < n; ++i) { if (cond[i] < 8) dest[i * 2] =3D src[i]; if (cond[i] > 2) dest[i * 2 + 1] =3D src[i]; } } void __attribute__((noinline, noclone)) test_i8_i8_i16_2_1(volatile int8_t * dest, volatile int8_t * src, volatile uint16_t * cond, size_t n) { #pragma GCC novector for (size_t i =3D 0; i < n; ++i) { if (cond[i] < 8) dest[i * 2] =3D src[i]; if (cond[i] > 2) dest[i * 2 + 1] =3D src[i]; } } #define size 16 int8_t srcarray[size]; uint16_t maskarray[size]; int8_t destarray[size*2]; int8_t destarray1[size*2]; int main() { #pragma GCC novector for(int i =3D 0; i < size; i++) { maskarray[i] =3D i =3D=3D 10 ? 0 : (i =3D=3D 5 ? 9 : (21111*i) & 0xff); srcarray[i] =3D i; } #pragma GCC novector for(int i =3D 0; i < size*2; i++) { destarray[i] =3D i; destarray1[i] =3D i; } test_i8_i8_i16_2(destarray, srcarray, maskarray, size); test_i8_i8_i16_2_1(destarray1, srcarray, maskarray, size); #pragma GCC novector for(int i =3D 0; i < size*2; i++) { if (destarray[i] !=3D destarray1[i]) __builtin_abort(); } } --- since really only one of the functions needs to vectorize.=