From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31482 invoked by alias); 23 Sep 2014 16:40:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 31467 invoked by uid 89); 23 Sep 2014 16:40:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wi0-f178.google.com Received: from mail-wi0-f178.google.com (HELO mail-wi0-f178.google.com) (209.85.212.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 23 Sep 2014 16:40:37 +0000 Received: by mail-wi0-f178.google.com with SMTP id z2so5476218wiv.17 for ; Tue, 23 Sep 2014 09:40:34 -0700 (PDT) X-Received: by 10.180.91.133 with SMTP id ce5mr24050886wib.62.1411490434031; Tue, 23 Sep 2014 09:40:34 -0700 (PDT) Received: from android-4c5a376a18c0e957.fritz.box (p54959A30.dip0.t-ipconnect.de. [84.149.154.48]) by mx.google.com with ESMTPSA id ic8sm16489674wjb.25.2014.09.23.09.40.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 23 Sep 2014 09:40:33 -0700 (PDT) User-Agent: K-9 Mail for Android In-Reply-To: References: <5420CC52.8030707@redhat.com> <54218928.4040709@redhat.com> <54219053.4030805@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: RFA: another patch to fix PR61360 From: Richard Biener Date: Tue, 23 Sep 2014 16:40:00 -0000 To: Uros Bizjak ,Vladimir Makarov CC: GCC Patches ,Richard Sandiford ,"Gopalasubramanian, Ganesh" Message-ID: <592c8313-30b3-4130-b1e6-f6a9016deb51@email.android.com> X-IsSubscribed: yes X-SW-Source: 2014-09/txt/msg02002.txt.bz2 On September 23, 2014 5:33:35 PM CEST, Uros Bizjak wrote: >On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov >wrote: > >>>> You are right constrain_operands is not upto LRA possibilities and >we should make the following change: >>>> >>>> Index: recog.c >>>> =================================================================== >>>> --- recog.c (revision 215337) >>>> +++ recog.c (working copy) >>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict) >>>> || (strict < 0 && CONSTANT_P (op)) >>>> /* During reload, accept a pseudo >*/ >>>> || (reload_in_progress && REG_P (op) >>>> - && REGNO (op) >= >FIRST_PSEUDO_REGISTER))) >>>> + && REGNO (op) >= >FIRST_PSEUDO_REGISTER) >>>> + /* LRA can put reg value into memory >if >>>> + it is necessary. */ >>>> + || (strict <= 0 && targetm.lra_p () >&& REG_P (op))) >>>> win = 1; >>>> else if (insn_extra_address_constraint (cn) >>>> /* Every address operand can be reloaded >to fit. */ >>>> >>>> But that is a different story (for insns with single alternative >containing only "m"). >>>> >>>> I guess I should submit such change for recog.c as a separate >patch. >>> I think that the above is the right approach to fix PR60704, so the >>> current PR60704 fix [1] should be reverted. >>> >>> [1] >https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989 >>> >>> >> Ok. I can submit patch reverting it + the change in recog.c. >> >> I have still a question: do we really need >> >> (eq_attr "alternative" "1") >> (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS >> || optimize_function_for_size_p (cfun)") >> >> As I wrote I'd always enable the alternative. I don't expect >performance improvement in disabling this alternative when path r->x is >slow (as I heard it is implemented internally by moving through cache >anyway). Even it is slow I believe it is still not faster than >r->m->x. What do you think? > >The "r->x" alternative results in "vector" decoding on amdfam10. This >is AMD-speak for microcoded instructions, and AMD optimization manual >strongly recommends avoiding them. I have CC'd Ganesh, maybe he can >provide more relevant data on the performance impact. IIRC a vector decoded instruction merely limits the frontend which can at most decode and dispatch one such insn at a time. So the performance impact depends on the context. Richard. >Thanks, >Uros.