From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23638 invoked by alias); 8 Aug 2011 17:14:29 -0000 Received: (qmail 23612 invoked by uid 22791); 8 Aug 2011 17:14:28 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,TW_DC,TW_DD,TW_DQ,TW_OV,TW_PX,TW_VD,TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-qw0-f47.google.com (HELO mail-qw0-f47.google.com) (209.85.216.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 08 Aug 2011 17:14:14 +0000 Received: by qwh5 with SMTP id 5so1134773qwh.20 for ; Mon, 08 Aug 2011 10:14:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.40.10 with SMTP id i10mr4429985qce.161.1312823653504; Mon, 08 Aug 2011 10:14:13 -0700 (PDT) Received: by 10.229.29.7 with HTTP; Mon, 8 Aug 2011 10:14:13 -0700 (PDT) In-Reply-To: References: <201108081530.p78FUAgM029764@d06av02.portsmouth.uk.ibm.com> Date: Mon, 08 Aug 2011 17:14:00 -0000 Message-ID: Subject: Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) From: "H.J. Lu" To: Uros Bizjak Cc: Ulrich Weigand , gcc-patches@gcc.gnu.org, GCC Development Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-08/txt/msg00170.txt.bz2 On Mon, Aug 8, 2011 at 10:11 AM, Uros Bizjak wrote: > On Mon, Aug 8, 2011 at 5:30 PM, Ulrich Weigand wrot= e: >> Uros Bizjak wrote: >> >>> Although, it would be nice for reload to subsequently fix CSE'd >>> non-offsetable memory by copying address to temporary reg (*as said in >>> the documentation*), we could simply require an XMM temporary for >>> TImode reloads to/from integer registers, and this fixes ICE for x32. >> >> Moves are special as far as reload is concerned. =A0If there is already >> a move instruction present *before* reload, it will get fixed up >> according to its constraints as any other instruction. >> >> However, reload will *introduce* new moves as part of its operation, >> and those will *not* themselves get reloaded. =A0Instead, reload simply >> assumes that every plain move will just succeed without requiring >> any reload; if this is not true, the target *must* provide a >> secondary reload for this move. >> >> (Note that the secondary reload could also work by reloading the >> target address into a temporary; that's up to the target to >> implement.) > > Whoa, indeed. > > Using attached patch that reloads memory address instead of going > through XMM register, the code for the testcase improves from: > > test: > .LFB0: > =A0 =A0 =A0 =A0.cfi_startproc > =A0 =A0 =A0 =A0pushq =A0 %rbx > =A0 =A0 =A0 =A0.cfi_def_cfa_offset 16 > =A0 =A0 =A0 =A0.cfi_offset 3, -16 > =A0 =A0 =A0 =A0sall =A0 =A0$4, %esi > =A0 =A0 =A0 =A0addl =A0 =A0%edi, %esi > =A0 =A0 =A0 =A0movdqa =A0(%esi), %xmm0 > =A0 =A0 =A0 =A0movdqa =A0%xmm0, -16(%rsp) > =A0 =A0 =A0 =A0movq =A0 =A0-16(%rsp), %rcx > =A0 =A0 =A0 =A0movq =A0 =A0-8(%rsp), %rbx > =A0 =A0 =A0 =A0addq =A0 =A0$1, %rcx > =A0 =A0 =A0 =A0adcq =A0 =A0$0, %rbx > =A0 =A0 =A0 =A0movq =A0 =A0%rcx, -16(%rsp) > =A0 =A0 =A0 =A0sall =A0 =A0$4, %edx > =A0 =A0 =A0 =A0movq =A0 =A0%rbx, -8(%rsp) > =A0 =A0 =A0 =A0movdqa =A0-16(%rsp), %xmm0 > =A0 =A0 =A0 =A0movdqa =A0%xmm0, (%esi) > =A0 =A0 =A0 =A0pxor =A0 =A0%xmm0, %xmm0 > =A0 =A0 =A0 =A0movdqa =A0%xmm0, (%edx,%esi) > =A0 =A0 =A0 =A0popq =A0 =A0%rbx > =A0 =A0 =A0 =A0.cfi_def_cfa_offset 8 > =A0 =A0 =A0 =A0ret > > to: > > test: > .LFB0: > =A0 =A0 =A0 =A0.cfi_startproc > =A0 =A0 =A0 =A0sall =A0 =A0$4, %esi > =A0 =A0 =A0 =A0pushq =A0 %rbx > =A0 =A0 =A0 =A0.cfi_def_cfa_offset 16 > =A0 =A0 =A0 =A0.cfi_offset 3, -16 > =A0 =A0 =A0 =A0addl =A0 =A0%edi, %esi > =A0 =A0 =A0 =A0pxor =A0 =A0%xmm0, %xmm0 > =A0 =A0 =A0 =A0mov =A0 =A0 %esi, %eax > =A0 =A0 =A0 =A0movq =A0 =A0(%rax), %rcx > =A0 =A0 =A0 =A0movq =A0 =A08(%rax), %rbx > =A0 =A0 =A0 =A0addq =A0 =A0$1, %rcx > =A0 =A0 =A0 =A0adcq =A0 =A0$0, %rbx > =A0 =A0 =A0 =A0sall =A0 =A0$4, %edx > =A0 =A0 =A0 =A0movq =A0 =A0%rcx, (%rax) > =A0 =A0 =A0 =A0movq =A0 =A0%rbx, 8(%rax) > =A0 =A0 =A0 =A0movdqa =A0%xmm0, (%edx,%esi) > =A0 =A0 =A0 =A0popq =A0 =A0%rbx > =A0 =A0 =A0 =A0.cfi_def_cfa_offset 8 > =A0 =A0 =A0 =A0ret > > H.J., can you please test attached patch? This optimization won't > trigger on x86_64 anymore. > I will test it. Thanks. --=20 H.J.