From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4826 invoked by alias); 21 Apr 2014 18:59:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 4810 invoked by uid 89); 21 Apr 2014 18:59:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-wg0-f41.google.com Received: from mail-wg0-f41.google.com (HELO mail-wg0-f41.google.com) (74.125.82.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 21 Apr 2014 18:59:40 +0000 Received: by mail-wg0-f41.google.com with SMTP id n12so3047300wgh.24 for ; Mon, 21 Apr 2014 11:59:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=wXFpUZTZ1MdKaJ9x9+Di9jVpI9Kh6jX1VAPhFqmBxw8=; b=b0DsgkuDyOpyQ2dg6nCiZXqWPtDJ0HQM/jwA0JEgWGSmrAE6A3cNKbHEwrjxG9SJNt TgdETK8ULjtuv0CNFda2MgOQeiQxWMQ2QaJ7jNZxWy2fNcMsRvd2T4nVka990orj3odS RVn9QP8e/9pLpmQBFgY8tvTOIBoJFNYnVpEMtX6pLwsOv0pBEr28HAPljaduI60SKvoH z5i6YPd8LNHaWx83jpR1lpoygctgOHLmjqlijYmxCDlLpFedXhS2yGFwnvlirbXrnjDt fBfOaoGSsgiVEGorFM7fxJvbiDT4IqTOOtG3gS8mQO0+tJmxFetCZz4HvIvdHIttuH9K eBdg== X-Gm-Message-State: ALoCoQlsoUPm42uQW7/Jv1C6+HxT7I+ShPj8eHHyeCRdwxCWKMJYl9WXZHyc9f8HF12tjwQBGS01jNOjY9P6tOjRvJ3d/fz94sLfAzDOl/FcLPAxrJPzNeRF/oQZtc7Iim7cVHifeTAVA0viQpS+TVhGerFk6/MFP58PgK+NsPAKNEZp5nc0Gl61uutMeSlwTr1XVLBqQ74yjcEITLbL9mNlY7NCQSYOiw== MIME-Version: 1.0 X-Received: by 10.194.60.4 with SMTP id d4mr14018105wjr.28.1398106777004; Mon, 21 Apr 2014 11:59:37 -0700 (PDT) Received: by 10.180.24.134 with HTTP; Mon, 21 Apr 2014 11:59:36 -0700 (PDT) In-Reply-To: References: Date: Mon, 21 Apr 2014 19:26:00 -0000 Message-ID: Subject: Re: [PATCH, x86] merge movsd/movhpd pair in peephole From: Xinliang David Li To: "Bin.Cheng" Cc: Wei Mi , GCC Patches , Uros Bizjak , "H.J. Lu" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2014-04/txt/msg01244.txt.bz2 Bin, when will the patch for the generic pass be available for review? David On Wed, Apr 9, 2014 at 7:27 PM, Bin.Cheng wrote: > On Thu, Apr 10, 2014 at 8:18 AM, Wei Mi wrote: >> Hi, >> >> For the testcase 1.c >> >> #include >> >> double a[1000]; >> >> __m128d foo1() { >> __m128d res; >> res = _mm_load_sd(&a[1]); >> res = _mm_loadh_pd(res, &a[2]); >> return res; >> } >> >> llvm will merge movsd/movhpd to movupd while gcc will not. The merge >> is beneficial on x86 machines starting from Nehalem. >> >> The patch is to add the merging in peephole. >> bootstrap and regression pass. Is it ok for stage1? >> >> Thanks, >> Wei. >> >> gcc/ChangeLog: >> >> 2014-04-09 Wei Mi >> >> * config/i386/i386.c (get_memref_parts): New function. >> (adjacent_mem_locations): Ditto. >> * config/i386/i386-protos.h: Add decl for adjacent_mem_locations. >> * config/i386/sse.md: Add define_peephole rule. >> >> gcc/testsuite/ChangeLog: >> >> 2014-04-09 Wei Mi >> >> * gcc.target/i386/sse2-unaligned-mov.c: New test. >> >> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h >> index 6e32978..3ae0d6d 100644 >> --- a/gcc/config/i386/i386-protos.h >> +++ b/gcc/config/i386/i386-protos.h >> @@ -312,6 +312,7 @@ extern enum attr_cpu ix86_schedule; >> #endif >> >> extern const char * ix86_output_call_insn (rtx insn, rtx call_op); >> +extern bool adjacent_mem_locations (rtx mem1, rtx mem2); >> >> #ifdef RTX_CODE >> /* Target data for multipass lookahead scheduling. >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 3eefe4a..a330e84 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -46737,6 +46737,70 @@ ix86_atomic_assign_expand_fenv (tree *hold, >> tree *clear, tree *update) >> atomic_feraiseexcept_call); >> } >> >> +/* Try to determine BASE/OFFSET/SIZE parts of the given MEM. >> + Return true if successful, false if all the values couldn't >> + be determined. >> + >> + This function only looks for REG/SYMBOL or REG/SYMBOL+CONST >> + address forms. */ >> + >> +static bool >> +get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset, >> + HOST_WIDE_INT *size) >> +{ >> + rtx addr_rtx; >> + if MEM_SIZE_KNOWN_P (mem) >> + *size = MEM_SIZE (mem); >> + else >> + return false; >> + >> + if (GET_CODE (XEXP (mem, 0)) == CONST) >> + addr_rtx = XEXP (XEXP (mem, 0), 0); >> + else >> + addr_rtx = (XEXP (mem, 0)); >> + >> + if (GET_CODE (addr_rtx) == REG >> + || GET_CODE (addr_rtx) == SYMBOL_REF) >> + { >> + *base = addr_rtx; >> + *offset = 0; >> + } >> + else if (GET_CODE (addr_rtx) == PLUS >> + && CONST_INT_P (XEXP (addr_rtx, 1))) >> + { >> + *base = XEXP (addr_rtx, 0); >> + *offset = INTVAL (XEXP (addr_rtx, 1)); >> + } >> + else >> + return false; >> + >> + return true; >> +} >> + >> +/* If MEM1 is adjacent to MEM2 and MEM1 has lower address, >> + return true. */ >> + >> +extern bool >> +adjacent_mem_locations (rtx mem1, rtx mem2) >> +{ >> + rtx base1, base2; >> + HOST_WIDE_INT off1, size1, off2, size2; >> + >> + if (get_memref_parts (mem1, &base1, &off1, &size1) >> + && get_memref_parts (mem2, &base2, &off2, &size2)) >> + { >> + if (GET_CODE (base1) == SYMBOL_REF >> + && GET_CODE (base2) == SYMBOL_REF >> + && SYMBOL_REF_DECL (base1) == SYMBOL_REF_DECL (base2)) >> + return (off1 + size1 == off2); >> + else if (REG_P (base1) >> + && REG_P (base2) >> + && REGNO (base1) == REGNO (base2)) >> + return (off1 + size1 == off2); >> + } >> + return false; >> +} >> + >> /* Initialize the GCC target structure. */ >> #undef TARGET_RETURN_IN_MEMORY >> #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory >> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >> index 72a4d6d..4bf8461 100644 >> --- a/gcc/config/i386/sse.md >> +++ b/gcc/config/i386/sse.md >> @@ -15606,3 +15606,37 @@ >> [(set_attr "type" "sselog1") >> (set_attr "length_immediate" "1") >> (set_attr "mode" "TI")]) >> + >> +;; merge movsd/movhpd to movupd when TARGET_SSE_UNALIGNED_LOAD_OPTIMAL >> +;; is true. >> +(define_peephole2 >> + [(set (match_operand:DF 0 "register_operand") >> + (match_operand:DF 1 "memory_operand")) >> + (set (match_operand:V2DF 2 "register_operand") >> + (vec_concat:V2DF (match_dup 0) >> + (match_operand:DF 3 "memory_operand")))] >> + "TARGET_SSE_UNALIGNED_LOAD_OPTIMAL >> + && REGNO (operands[0]) == REGNO (operands[2]) >> + && adjacent_mem_locations (operands[1], operands[3])" >> + [(set (match_dup 2) >> + (unspec:V2DF [(match_dup 4)] UNSPEC_LOADU))] >> +{ >> + operands[4] = gen_rtx_MEM (V2DFmode, XEXP(operands[1], 0)); >> +}) >> + >> +;; merge movsd/movhpd to movupd when TARGET_SSE_UNALIGNED_STORE_OPTIMAL >> +;; is true. >> +(define_peephole2 >> + [(set (match_operand:DF 0 "memory_operand") >> + (vec_select:DF (match_operand:V2DF 1 "register_operand") >> + (parallel [(const_int 0)]))) >> + (set (match_operand:DF 2 "memory_operand") >> + (vec_select:DF (match_dup 1) >> + (parallel [(const_int 1)])))] >> + "TARGET_SSE_UNALIGNED_STORE_OPTIMAL >> + && adjacent_mem_locations (operands[0], operands[2])" >> + [(set (match_dup 3) >> + (unspec:V2DF [(match_dup 1)] UNSPEC_STOREU))] >> +{ >> + operands[3] = gen_rtx_MEM (V2DFmode, XEXP(operands[0], 0)); >> +}) >> diff --git a/gcc/testsuite/gcc.target/i386/sse2-unaligned-mov.c >> b/gcc/testsuite/gcc.target/i386/sse2-unaligned-mov.c >> new file mode 100644 >> index 0000000..28470ce >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/i386/sse2-unaligned-mov.c >> @@ -0,0 +1,20 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-mtune=corei7 -O2" } */ >> + >> +#include >> + >> +double a[1000]; >> + >> +__m128d foo1() { >> + __m128d res; >> + res = _mm_load_sd(&a[1]); >> + res = _mm_loadh_pd(res, &a[2]); >> + return res; >> +} >> + >> +void foo2(__m128d res) { >> + _mm_store_sd(&a[1], res); >> + _mm_storeh_pd(&a[2], res); >> +} >> + >> +/* { dg-final { scan-assembler-times "movup" 2 } } */ > > Hi Wei, > Just FYI, though I am not sure whether it can help. I am working on > load store pairing on ARM too. On ARM (maybe other machines like > mips), the problem is more general because we can pair memory accesses > with respect to both core register and fpu register. The current > strategy taken by GCC is to use peephole2 but it's too weak, for > example, it can't handle pairs which are intervened by other > instructions. Right now I am working on a new pass in GCC to do that > work, and have already worked out a draft patch. The instrumental > data looks promising with many opportunities captured besides the > original peephole2 pass. Furthermore, the result could be even better > if register renaming is enabled. Hopefully I will try to bring back > register renaming for this purpose later. > > Thanks, > bin