From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18685 invoked by alias); 12 May 2003 23:16:04 -0000 Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-prs-owner@gcc.gnu.org Received: (qmail 18455 invoked by uid 71); 12 May 2003 23:16:01 -0000 Date: Mon, 12 May 2003 23:16:00 -0000 Message-ID: <20030512231601.18429.qmail@sources.redhat.com> To: hubicka@gcc.gnu.org Cc: gcc-prs@gcc.gnu.org, From: Kevin J Bowers Subject: Re: target/10691: Invalid assembly emitted when using _m128 datatypes on x86 Reply-To: Kevin J Bowers X-SW-Source: 2003-05/txt/msg01389.txt.bz2 List-Id: The following reply was made to PR target/10691; it has been noted by GNATS. From: Kevin J Bowers To: gcc-prs@gcc.gnu.org, hubicka@gcc.gnu.org, kbowers@lanl.gov, gcc-bugs@gcc.gnu.org, gcc-gnats@gcc.gnu.org Cc: Subject: Re: target/10691: Invalid assembly emitted when using _m128 datatypes on x86 Date: Mon, 12 May 2003 17:13:51 -0600 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=10691 Brief followup: I've had the problem crop up in a couple of other situations. Here is some information that might help in isolating the problem. Consider _mm_storel_pi in "xmmintrin.h": static __inline void _mm_storel_pi (__m64 *__P, __m128 __A) { __builtin_ia32_storelps ((__v2si *)__P, (__v4sf)__A); } In a -g compile, it appears that the the macro is not expanded inline. For the function call, the compiler puts the arguments on the stack as: (%esp) -> __P 4(%esp) -> __A To store the caller's __A at 4(%esp), the compiler emits something along the lines of: movaps mem128, %xmm_reg movaps %xmm_reg, 4(%esp) ==> faults ... call _mm_storel_pi At higher optimization levels, sometimes the faulting move gets optimized away and sometimes it doesn't. I suspect that the faults will go away if __m128 arguments in a function call are placed such that they fall on 16-byte boundaries. In the above example, this would mean switching the order of __P and __A arguments. In any case, the faults seems to go away if I override the various xmmstore macros to byass the inline intrinsic functions. That is: #define _mm_storel_pi(__P,__A) \ __builtin_ia32_storeaps((__v2si *)__P, (__v4sf)__A) The preferred solution is that __m128 arguments put onto the stack be placed on 16-byte boundaries. I solution that would work (but that would defeat the point of using SSE instructions) would be to use a "movups" to put the __m128's onto the stack. A previous response says that this problem might have been fixed. Does this mean fixed in gcc-3.3? I've had this problem in gcc-3.2.x. -- Kevin J Bowers, Ph.D. Plasma Physics Group (X-1) Applied Physics Division Los Alamos National Lab