From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 108158 invoked by alias); 20 Mar 2018 17:57:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 107554 invoked by uid 89); 20 Mar 2018 17:57:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=ivopts X-HELO: mail-wm0-f45.google.com Received: from mail-wm0-f45.google.com (HELO mail-wm0-f45.google.com) (74.125.82.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 20 Mar 2018 17:57:00 +0000 Received: by mail-wm0-f45.google.com with SMTP id t7so5166559wmh.5 for ; Tue, 20 Mar 2018 10:57:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:user-agent:in-reply-to:references :mime-version:content-transfer-encoding:subject:to:cc:from :message-id; bh=L4sdZTGGd7ZM1ljcOZ2NQpwMhUUFOoLlNkHdfyhn2jc=; b=aUfVytp3A2WBHoYyE3x8P0wrjpB7dSzl5xrETawYDJrqjOuLea1/b4mkrprzKDdMeO oROFUu/D+S54ieNbsZoShJMlTrlSkIQYSTaqtaHwBHjtGi/Pd5ZQPUZ5QKl57TlnvoKm WDivXIrKAEWalnG2FgAATH9c6mzBq4DGW47KGLMqOsEoUUn4qaNfkQptj8K3VsjEyx7H O5Ug8hsj1I7+dNo4v1HCkUMKT5oG89oBAiTvEwxWnzaGNNpxORHYozL8Q+Ye+g1pqCuy Vyi3KenWD08pTCz3wMUjh8GWTLEItXzHhAkz29jmEP+bSfuKgNgUPLuEJSN3F3J4L8UA NUdA== X-Gm-Message-State: AElRT7Hr8HSFv/90+GAjJ4b3fg+k1INjOrDYEXig+9yHH71FKIlYdnB3 c/VZ2JLdIvtG+xtfJayRIIQ= X-Google-Smtp-Source: AG47ELvQqOkYDbiobsiXNtYWQFnMOLqH8XTnCeInqC6xgM3ojkyA2TgnMIQWSTuRLhr7KXv4Vs1Ugg== X-Received: by 10.28.210.85 with SMTP id j82mr438994wmg.64.1521568618821; Tue, 20 Mar 2018 10:56:58 -0700 (PDT) Received: from [192.168.178.32] (p2E530C99.dip0.t-ipconnect.de. [46.83.12.153]) by smtp.gmail.com with ESMTPSA id y15sm351466wrh.39.2018.03.20.10.56.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Mar 2018 10:56:57 -0700 (PDT) Date: Tue, 20 Mar 2018 18:00:00 -0000 User-Agent: K-9 Mail for Android In-Reply-To: References: <9f418de9-7a65-7a25-2bf4-d4bec6681211@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PR middle-end/70359] uncoalesce IVs outside of loops To: "Bin.Cheng" ,Aldy Hernandez CC: gcc-patches From: Richard Biener Message-ID: X-IsSubscribed: yes X-SW-Source: 2018-03/txt/msg00989.txt.bz2 On March 20, 2018 6:11:53 PM GMT+01:00, "Bin.Cheng" = wrote: >On Mon, Mar 19, 2018 at 5:08 PM, Aldy Hernandez >wrote: >> Hi Richard. >> >> As discussed in the PR, the problem here is that we have two >different >> iterations of an IV live outside of a loop. This inhibits us from >using >> autoinc/dec addressing on ARM, and causes extra lea's on x86. >> >> An abbreviated example is this: >> >> loop: >> # p_9 =3D PHI >> p_20 =3D p_9 + 18446744073709551615; >> goto loop >> p_24 =3D p_9 + 18446744073709551614; >> MEM[(char *)p_20 + -1B] =3D 45; >> >> Here we have both the previous IV (p_9) and the current IV (p_20) >used >> outside of the loop. On Arm this keeps us from using auto-dec >addressing, >> because one use is -2 and the other one is -1. >> >> With the attached patch we attempt to rewrite out-of-loop uses of the >IV in >> terms of the current/last IV (p_20 in the case above). With it, we >end up >> with: >> >> p_24 =3D p_20 + 18446744073709551615; >> *p_24 =3D 45; >> >> ...which helps both x86 and Arm. >> >> As you have suggested in comment 38 on the PR, I handle specially >> out-of-loop IV uses of the form IV+CST and propagate those >accordingly >> (along with the MEM_REF above). Otherwise, in less specific cases, >we un-do >> the IV increment, and use that value in all out-of-loop uses. For >instance, >> in the attached testcase, we rewrite: >> >> george (p_9); >> >> into >> >> _26 =3D p_20 + 1; >> ... >> george (_26); >> >> The attached testcase tests the IV+CST specific case, as well as the >more >> generic case with george(). >> >> Although the original PR was for ARM, this behavior can be noticed on >x86, >> so I tested on x86 with a full bootstrap + tests. I also ran the >specific >> test on an x86 cross ARM build and made sure we had 2 auto-dec with >the >> test. For the original test (slightly different than the testcase in >this >> patch), with this patch we are at 104 bytes versus 116 without it.=20 >There is >> still the issue of a division optimization which would further reduce >the >> code size. I will discuss this separately as it is independent from >this >> patch. >> >> Oh yeah, we could make this more generic, and maybe handle any >multiple of >> the constant, or perhaps *=3D and /=3D. Perhaps something for next >stage1... >> >> OK for trunk? >Just FYI, this looks similar to what I did in >https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00535.html >That change was non-trivial and didn't give obvious improvement back >in time. But I still wonder if this >can be done at rewriting iv_use in a light-overhead way. Certainly, but the issue is we wreck it again at forwprop time as ivopts ru= ns too early.=20 Richard.=20 > >Thanks, >bin >> Aldy