From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26357 invoked by alias); 26 Nov 2015 09:51:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 26241 invoked by uid 89); 26 Nov 2015 09:50:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 26 Nov 2015 09:50:56 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-31-Q_Mb51t2Ru-H3fm4uqMkJg-1; Thu, 26 Nov 2015 09:50:50 +0000 Received: from [10.2.206.200] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 26 Nov 2015 09:50:49 +0000 Message-ID: <5656D5FA.90805@arm.com> Date: Thu, 26 Nov 2015 09:55:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Segher Boessenkool CC: GCC Patches Subject: Re: [PATCH][combine] PR rtl-optimization/68381: Only restrict pure simplification in mult-extend subst case, allow other substitutions References: <564DA3D1.10905@arm.com> <20151119105739.GA32343@gate.crashing.org> <564DD0ED.9010604@arm.com> <20151119144102.GB32343@gate.crashing.org> <564DE415.1090600@arm.com> <564DE8B6.9040902@arm.com> <20151124001553.GA9049@gate.crashing.org> In-Reply-To: <20151124001553.GA9049@gate.crashing.org> X-MC-Unique: Q_Mb51t2Ru-H3fm4uqMkJg-1 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-11/txt/msg03192.txt.bz2 On 24/11/15 00:15, Segher Boessenkool wrote: > On Thu, Nov 19, 2015 at 03:20:22PM +0000, Kyrill Tkachov wrote: >> Hmmm, so the answer to that is a bit further down the validate_replaceme= nt: >> path. >> It's the code after the big comment: >> /* See if this is a PARALLEL of two SETs where one SET's destination = is >> a register that is unused and this isn't marked as an instruction = that >> might trap in an EH region. In that case, we just need the other = SET. >> We prefer this over the PARALLEL. >> >> This can occur when simplifying a divmod insn. We *must* test for= this >> case here because the code below that splits two independent SETs >> doesn't >> handle this case correctly when it updates the register status. >> >> It's pointless doing this if we originally had two sets, one from >> i3, and one from i2. Combining then splitting the parallel results >> in the original i2 again plus an invalid insn (which we delete). >> The net effect is only to move instructions around, which makes >> debug info less accurate. */ >> >> The code extracts all the valid sets inside the PARALLEL and calls >> recog_for_combine on them >> individually, ignoring the clobber. > Before I made this use is_parallel_of_n_reg_sets the code used to test > if it is a parallel of two sets, and no clobbers allowed. So it would > never allow a clobber of zero. But now it does. I'll fix this in > is_parallel_of_n_reg_sets. > > Thanks for finding the problem! Thanks for fixing the wrong-code issue. As I mentioned on IRC, this patch improves codegen on aarch64 as well. I've re-checked SPEC2006 and it seems to improve codegen around multiply-ex= tend-accumulate instructions. For example the sequence: mov w4, 64 mov x1, 24 smaddl x1, w9, w4, x1 // multiply-sign-extend-accumulate add x1, x3, x1 becomes something like this: mov w3, 64 smaddl x1, w9, w3, x0 add x1, x1, 24 // constant 24 propagated into the add Another was transforming the muliply-extend into something cheaper: mov x0, 40 mov w22, 32 umaddl x22, w21, w22, x0 // multiply-zero-extend-accumulate changed becomes: ubfiz x22, x21, 5, 32 // ASHIFT+extend add x22, x22, 40 which should be always beneficial. From what I can see we don't lose any of the multiply-extend-accumulate opportunities that we gained from the original combine patch. So can we take this patch in as well? Thanks, Kyrill > Segher >