From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1904 invoked by alias); 5 Oct 2010 07:16:49 -0000 Received: (qmail 1888 invoked by uid 22791); 5 Oct 2010 07:16:47 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,MISSING_MID X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 05 Oct 2010 07:16:44 +0000 From: "ramana at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: ramana at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Tue, 05 Oct 2010 07:16:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-10/txt/msg00371.txt.bz2 Message-ID: <20101005071600.26_E-a4RcI30BH4yTVgpVJrE6YURRDOPbVh1srUVg4s@z> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725 --- Comment #4 from Ramana Radhakrishnan 2010-10-05 07:16:35 UTC --- (In reply to comment #2) > (In reply to comment #1) > > So the compiler is correct not to be using vld1 for this code. The memory > > format of int32x4_t is defined to be the format of a neon register that has > > been filled from an array of int32 values and then stored to memory using VSTM > > (or equivalent sequence). The implication of all this is that int32x4_t does > > not (necessarily) have the same memory layout as int32_t[4]. > > Could you elaborate on this? Specifically about the case when memory format for > VSTM and VST1 may differ. > > I thought that VST1 instruction could be always used as a replacement for VSTM, > it is just a little bit less convenient in some cases because it is lacking > some more advanced addressing modes. Moreover, VSTM is VFP instruction and VST1 > is NEON one. So I guess mixing VSTM with true NEON instructions may be > additionally a bad idea (for performance reasons on Cortex-A9 or other > processors?). The ARM ARM states that VLDM / VSTM and VLDR / VSTR for 64 bit values are compliant with VFPv2 / VFPv3 and advanced SIMD i.e. they can be executed by both the units . Thus there should be no performance regressions on the A9 AFAIK for VLDM and VSTM / VLDR and VSTR of 64 bit registers interleaved with other Neon instructions. cheers Ramana