From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9914 invoked by alias); 24 Sep 2012 20:48:55 -0000 Received: (qmail 9883 invoked by uid 22791); 24 Sep 2012 20:48:54 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-wg0-f51.google.com (HELO mail-wg0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 24 Sep 2012 20:48:41 +0000 Received: by wgbed3 with SMTP id ed3so3835023wgb.8 for ; Mon, 24 Sep 2012 13:48:39 -0700 (PDT) Received: by 10.216.136.157 with SMTP id w29mr7874804wei.148.1348519719746; Mon, 24 Sep 2012 13:48:39 -0700 (PDT) Received: from localhost ([2.26.188.227]) by mx.google.com with ESMTPS id bc2sm18747241wib.0.2012.09.24.13.48.37 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 24 Sep 2012 13:48:37 -0700 (PDT) From: Richard Sandiford To: "Maciej W. Rozycki" Mail-Followup-To: "Maciej W. Rozycki" ,Sandra Loosemore , , rdsandiford@googlemail.com Cc: Sandra Loosemore , Subject: Re: PING Re: [PATCH, MIPS] add new peephole for 74k dspr2 References: <502D0DF4.3070302@codesourcery.com> <87zk5qu783.fsf@talisman.home> <5034EBC6.5080602@codesourcery.com> <87bohw1ecb.fsf@talisman.home> <5058ACDA.5060706@codesourcery.com> <87lig719ig.fsf@talisman.home> Date: Mon, 24 Sep 2012 21:40:00 -0000 In-Reply-To: (Maciej W. Rozycki's message of "Mon, 24 Sep 2012 16:08:55 +0100") Message-ID: <87sja7tadr.fsf@talisman.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-09/txt/msg01666.txt.bz2 "Maciej W. Rozycki" writes: > On Tue, 18 Sep 2012, Richard Sandiford wrote: > >> > Have you had time to think about this some more? I am not sure I can >> > guess how you'd like me to fix this patch now without some more specific >> > review and/or suggestions about where the optimization should happen and >> > what cases it should be extended to detect in addition to the dsp >> > accumulator multiplies. >> >> The patch below is the one I've been testing. But I got sidetracked >> by looking into the possibility of removing the MD0_REG and MD1_REG >> classes, in order to get more sensible costs. I think that was needed >> for the madd-9.c test to pass. > > Sorry to come up with this so late -- I have only now noticed this being > discussed. > >> @@ -4105,39 +4105,55 @@ mips_subword (rtx op, bool high_p) >> return simplify_gen_subreg (word_mode, op, mode, byte); >> } >> >> -/* Return true if a 64-bit move from SRC to DEST should be split into two. */ >> +/* Return true if SRC can be moved into DEST using MULT $0, $0. */ >> + >> +static bool >> +mips_mult_move_p (rtx dest, rtx src) >> +{ >> + return (src == const0_rtx >> + && REG_P (dest) >> + && GET_MODE_SIZE (GET_MODE (dest)) == 2 * UNITS_PER_WORD >> + && (ISA_HAS_DSP_MULT >> + ? ACC_REG_P (REGNO (dest)) >> + : MD_REG_P (REGNO (dest)))); >> +} >> + >> +/* Return true if a move from SRC to DEST should be split into two. */ > > Does the DSP ASE guarantee that a MULT $0, $0 is going not to be slower > than MTHI $0/MTLO $0? The latency of multiplication varies among > implementations, for example the original R3000 took 12 cycles (of course > the R3000 itself is not relevant for this change, but you see the > picture!). On the other hand in some (but not all!) processors > multiplication runs in parallel to the main pipeline so it is the > difference, if positive, between the number of cycles consumed by other > instructions up to the next HI/LO access instruction and the latency of > MULT run in the background that matters. > > From the context I am assuming none of this matters for the 74K (and > presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall > isn't it something that should be decided based on instruction costs from > DFA schedulers? Is there anything that I've missed here? It doesn't > appear to me your (and neither the original) proposal takes instruction > cost calculation into consideration. In practice, we only move 0 into HI and LO for MADD- and MSUB-style operations. We deliberately don't use HI and LO as scratch space. I think it's a reasonable default assumption that anything that supports those instructions also has a fast path from MULT to MADD or MULT to MSUB. I certainly don't know of any counter-examples. The decision is deliberately centeralised in one place so that the condition can be tweaked in future if necessary. Richard