From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BCD8B385781A; Wed, 17 Mar 2021 13:09:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BCD8B385781A From: "wilco at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64 Date: Wed, 17 Mar 2021 13:09:26 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: wilco at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 10.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2021 13:09:26 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98891 --- Comment #4 from Wilco --- (In reply to Jakub Jelinek from comment #1) > Reduced testcase: > extern unsigned long long a, b, c; >=20 > void > foo (void) > { > a =3D b | ~c; > } >=20 > Seems this is the usual dilemma between split double-word operations early > vs. split it late, each has its advantages and serious disadvantages. > By splitting early, combiner can't really do much with it, it is split in= to > loads, not, or and store of the halves separately and combiner doesn't see > the two halves together, one would need essentially vectorization on RTL = to > match that. Splitting early is required since it results in much more efficient code. However the real underlying problem is the concept that a type can map to different register files. Generally a compiler must decide the register file for each operand before register allocation, but GCC does this during regis= ter allocation. And it does it badly with incomplete knowledge and way too many costing hacks. To get decent code for AArch64 we had to add special hooks to force the allocator to strongly prefer allocating integer types to integer registers and FP/SIMD types to FP/SIMD registers.=