From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 45752 invoked by alias); 31 Jul 2018 12:39:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 45727 invoked by uid 89); 31 Jul 2018 12:39:42 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=please!, H*r:sk:b15-v6s X-HELO: mail-oi0-f66.google.com Received: from mail-oi0-f66.google.com (HELO mail-oi0-f66.google.com) (209.85.218.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Jul 2018 12:39:40 +0000 Received: by mail-oi0-f66.google.com with SMTP id b15-v6so27583915oib.10 for ; Tue, 31 Jul 2018 05:39:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=NwZH6momFOaULCYUIl4f0bFPSyYUquA2Kq3d0iD4tOk=; b=G2azkdIZGMryjiQYP0sue/pQZVgu38+JzIwA8suQLg9Ujmyd4YYrlAXqF6un/ZBX4p eAMAahBfwFN/U8lnZIbEFq+LWUJ5Psx6pHtUGbFqUHcG2Qo0CeBWtFhiQGadz+xCDTYo 6ppalAb7YbnTGayZzsjd3jlS5ocVr2S/060OMMqn9X/9bUuYfC6NcHXkp3clgi+nNhWJ rngIyxFSveDoTZmvSmvn4fWhuUxjo6gUCWpKU3+pIMgQY4sE9fSIVScndKVeIRtKRVLa H1+VoZH4CCSuuS/NRMKv5F/6IhXEilaRHhLsAy+jwcRzThT11mxr0v6otzCyTVUA21xu DRpg== MIME-Version: 1.0 Received: by 2002:a4a:2145:0:0:0:0:0 with HTTP; Tue, 31 Jul 2018 05:39:37 -0700 (PDT) In-Reply-To: References: <402e00c62fa533333b1e1dd69f468f7f4e43939b.1532449714.git.segher@kernel.crashing.org> From: "H.J. Lu" Date: Tue, 31 Jul 2018 12:39:00 -0000 Message-ID: Subject: Re: [PATCH] combine: Allow combining two insns to two insns To: Richard Biener Cc: Segher Boessenkool , GCC Patches Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-07/txt/msg01935.txt.bz2 On Wed, Jul 25, 2018 at 1:28 AM, Richard Biener wrote: > On Tue, Jul 24, 2018 at 7:18 PM Segher Boessenkool > wrote: >> >> This patch allows combine to combine two insns into two. This helps >> in many cases, by reducing instruction path length, and also allowing >> further combinations to happen. PR85160 is a typical example of code >> that it can improve. >> >> This patch does not allow such combinations if either of the original >> instructions was a simple move instruction. In those cases combining >> the two instructions increases register pressure without improving the >> code. With this move test register pressure does no longer increase >> noticably as far as I can tell. >> >> (At first I also didn't allow either of the resulting insns to be a >> move instruction. But that is actually a very good thing to have, as >> should have been obvious). >> >> Tested for many months; tested on about 30 targets. >> >> I'll commit this later this week if there are no objections. > > Sounds good - but, _any_ testcase? Please! ;) > Here is a testcase: For --- #define N 16 float f[N]; double d[N]; int n[N]; __attribute__((noinline)) void f3 (void) { int i; for (i = 0; i < N; i++) d[i] = f[i]; } --- r263067 improved -O3 -mavx2 -mtune=generic -m64 from .cfi_startproc vmovaps f(%rip), %xmm2 vmovaps f+32(%rip), %xmm3 vinsertf128 $0x1, f+16(%rip), %ymm2, %ymm0 vcvtps2pd %xmm0, %ymm1 vextractf128 $0x1, %ymm0, %xmm0 vmovaps %xmm1, d(%rip) vextractf128 $0x1, %ymm1, d+16(%rip) vcvtps2pd %xmm0, %ymm0 vmovaps %xmm0, d+32(%rip) vextractf128 $0x1, %ymm0, d+48(%rip) vinsertf128 $0x1, f+48(%rip), %ymm3, %ymm0 vcvtps2pd %xmm0, %ymm1 vextractf128 $0x1, %ymm0, %xmm0 vmovaps %xmm1, d+64(%rip) vextractf128 $0x1, %ymm1, d+80(%rip) vcvtps2pd %xmm0, %ymm0 vmovaps %xmm0, d+96(%rip) vextractf128 $0x1, %ymm0, d+112(%rip) vzeroupper ret .cfi_endproc to .cfi_startproc vcvtps2pd f(%rip), %ymm0 vmovaps %xmm0, d(%rip) vextractf128 $0x1, %ymm0, d+16(%rip) vcvtps2pd f+16(%rip), %ymm0 vmovaps %xmm0, d+32(%rip) vextractf128 $0x1, %ymm0, d+48(%rip) vcvtps2pd f+32(%rip), %ymm0 vextractf128 $0x1, %ymm0, d+80(%rip) vmovaps %xmm0, d+64(%rip) vcvtps2pd f+48(%rip), %ymm0 vextractf128 $0x1, %ymm0, d+112(%rip) vmovaps %xmm0, d+96(%rip) vzeroupper ret .cfi_endproc This is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86752 H.J.