From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-333341-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 26763 invoked by alias); 8 Oct 2010 14:13:30 -0000
Received: (qmail 26752 invoked by uid 22791); 8 Oct 2010 14:13:29 -0000
X-SWARE-Spam-Status: No, hits=-2.4 required=5.0	tests=ALL_TRUSTED,AWL,BAYES_00,MISSING_MID
X-Spam-Check-By: sourceware.org
Received: from localhost (HELO gcc.gnu.org) (127.0.0.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 08 Oct 2010 14:13:25 +0000
From: "siarhei.siamashka at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/43725] Poor instructions selection, scheduling and registers allocation for ARM NEON intrinsics
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: siarhei.siamashka at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
In-Reply-To: <bug-43725-4@http.gcc.gnu.org/bugzilla/>
References: <bug-43725-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Date: Fri, 08 Oct 2010 14:13:00 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2010-10/txt/msg00730.txt.bz2
Message-ID: <20101008141300.K5ScD5cIb09I9aY-h33f5U3IO2q65Mh3BX-hFISmzrc@z>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725

--- Comment #5 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-08 14:13:08 UTC ---
(In reply to comment #3)
> On Mon, 4 Oct 2010, siarhei.siamashka at gmail dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43725
> > 
> > --- Comment #2 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-04 22:59:56 UTC ---
> > (In reply to comment #1)
> > > So the compiler is correct not to be using vld1 for this code.  The memory
> > > format of int32x4_t is defined to be the format of a neon register that has
> > > been filled from an array of int32 values and then stored to memory using VSTM
> > > (or equivalent sequence).  The implication of all this is that int32x4_t does
> > > not (necessarily) have the same memory layout as int32_t[4].
> > 
> > Could you elaborate on this? Specifically about the case when memory format for
> > VSTM and VST1 may differ.
> 
> Big-endian.

OK, I see. Looks like VLDM/VSTM instructions could be replaced with VLD1/VST1
(by artificially forcing element size to 64) in almost all cases except when
SCTLR.A == 1 due to unwanted alignment traps potentially happening in this
case.

But the question is whether it is really necessary to suffer from a performance
penalty on little endian systems?

> I previously explained the issues with big-endian NEON vectors in GCC at 
> length:
> 
> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html

Thanks for the link, something seems to be seriously overengineered. Looks like
you brought a problem upon yourself and now are trying to valiantly solve it.

Does (efficient) support of NEON intrinsics on big endian systems even have any
practical value? Maybe it makes sense to get a reasonable performance at least
on little endian systems first. To me it looks like you are just running after
two hares...