From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BE832385C40A; Wed, 4 Aug 2021 10:31:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BE832385C40A From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98138] BB vect fail to SLP one case Date: Wed, 04 Aug 2021 10:31:15 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 10:31:16 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98138 --- Comment #9 from Richard Biener --- The full satd_8x4 looks like the following, the 2nd loop isn't to be disregarded typedef unsigned char uint8_t; typedef unsigned short uint16_t; typedef unsigned int uint32_t; #define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\ int t0 =3D s0 + s1;\ int t1 =3D s0 - s1;\ int t2 =3D s2 + s3;\ int t3 =3D s2 - s3;\ d0 =3D t0 + t2;\ d2 =3D t0 - t2;\ d1 =3D t1 + t3;\ d3 =3D t1 - t3;\ } static inline uint32_t abs2( uint32_t a ) { uint32_t s =3D ((a>>15)&0x10001)*0xffff; return (a+s)^s; } int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pi= x2 ) { uint32_t tmp[4][4]; uint32_t a0, a1, a2, a3; int sum =3D 0; for( int i =3D 0; i < 4; i++, pix1 +=3D i_pix1, pix2 +=3D i_pix2 ) { a0 =3D (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16); a1 =3D (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16); a2 =3D (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16); a3 =3D (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16); HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a2,a3 = ); } for( int i =3D 0; i < 4; i++ ) { HADAMARD4( a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][= i] ); sum +=3D abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3); } return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1; }=