From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 932003858401; Tue, 2 Jan 2024 10:07:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 932003858401 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1704190020; bh=qI3VrvVD8hPU/KIX1k6YPuC59YUNRgllqW5t/BuiryI=; h=From:To:Subject:Date:From; b=lgpIE1/iTzxAmdn9X4K5w3zFe6FYkt386lgThpV6ECwPlkvNVbIifXT3cTITRg6tl zDk8Dm3LMONvB5Tcb1vlBOSHnbdbv9hAWR5Tf/AZv4NE6ke1qZu0/AX4+qIYvo44n4 r4NRRcW6gyEVRezf4sgTHB4u8J+if7Cmrh4uy568= From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/113196] New: [14 Regression] Failure to use ushll{,2} Date: Tue, 02 Jan 2024 10:06:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter cc target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113196 Bug ID: 113196 Summary: [14 Regression] Failure to use ushll{,2} Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org CC: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* For this testcase, adapted from the one for PR110625: int test(unsigned array[4][4]); int foo(unsigned short *a, unsigned long n) { unsigned array[4][4]; for (unsigned i =3D 0; i < 4; i++, a +=3D 4) { array[i][0] =3D a[0] << 6; array[i][1] =3D a[1] << 6; array[i][2] =3D a[2] << 6; array[i][3] =3D a[3] << 6; } return test(array); } GCC now uses: mov x1, x0 stp x29, x30, [sp, -80]! movi v30.4s, 0 mov x29, sp ldp q0, q29, [x1] add x0, sp, 16 zip1 v1.8h, v0.8h, v30.8h zip1 v31.8h, v29.8h, v30.8h zip2 v0.8h, v0.8h, v30.8h zip2 v29.8h, v29.8h, v30.8h shl v1.4s, v1.4s, 6 shl v31.4s, v31.4s, 6 shl v0.4s, v0.4s, 6 shl v29.4s, v29.4s, 6 stp q1, q0, [sp, 16] stp q31, q29, [sp, 48] bl test(unsigned int (*) [4]) ldp x29, x30, [sp], 80 ret whereas previously it used USHLL{,2}: mov x1, x0 stp x29, x30, [sp, -80]! mov x29, sp ldp q1, q0, [x1] add x0, sp, 16 ushll v3.4s, v1.4h, 6 ushll v2.4s, v0.4h, 6 ushll2 v1.4s, v1.8h, 6 ushll2 v0.4s, v0.8h, 6 stp q3, q1, [sp, 16] stp q2, q0, [sp, 48] bl test(unsigned int (*) [4]) ldp x29, x30, [sp], 80 ret This changed with g:f26f92b534f9, which expanded zero-extensions to ZIPs. = The patch included *ADDW patterns for the new representation, but it looks like there are several more that should be included for full coverage. AIUI, the point of lowering to ZIPs during expand was to allow the zero to = be hoisted. An alternative might be to lower during split, but forcibly hoist= the zero by inserting around the FUNCTION_BEG note. We could then cache the in= sn that does that for manual CSE. Godbolt link: https://godbolt.org/z/vzfnebMhb=