From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14029 invoked by alias); 20 Aug 2007 23:25:35 -0000 Received: (qmail 13983 invoked by uid 22791); 20 Aug 2007 23:25:34 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 20 Aug 2007 23:25:30 +0000 Received: (qmail 12850 invoked from network); 20 Aug 2007 23:25:28 -0000 Received: from unknown (HELO bullfrog.localdomain) (sandra@127.0.0.2) by mail.codesourcery.com with ESMTPA; 20 Aug 2007 23:25:28 -0000 Message-ID: <46CA222D.2050107@codesourcery.com> Date: Mon, 20 Aug 2007 23:38:00 -0000 From: Sandra Loosemore User-Agent: Thunderbird 2.0.0.5 (X11/20070716) MIME-Version: 1.0 To: GCC Patches , Nigel Stephens , Guy Morrogh , David Ung , Thiemo Seufer , Mark Mitchell , richard@codesourcery.com Subject: Re: PATCH: fine-tuning for can_store_by_pieces References: <46C3343A.5080407@codesourcery.com> <87ps1nop2x.fsf@firetop.home> <46C778D6.5060808@codesourcery.com> <87y7g6r50c.fsf@firetop.home> In-Reply-To: <87y7g6r50c.fsf@firetop.home> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2007-08/txt/msg01312.txt.bz2 Richard Sandiford wrote: > Thanks for the testing. In that case, I agree 4 is fine for everything. > If you still have the results, could you post the totals? I'm curious > what kind of figures we're talking about here. Here's what I have. Except for measuring the original version of the patch on a mips64-elfoabi build, everything else was done with mips32r2-elfoabi; the numbers are total sizes from CSiBE. default -mips16 -mabicalls mips64 baseline 3583977 2860177 3558373 call ratio 3 3566997 2859401 4039960 3541493 call ratio 4 3565961 2858881 4037876 call ratio 5 3566857 2859901 4037172 call ratio 6 4037332 >> + #define MIPS_CALL_RATIO 4 > > I think the number you use in CLEAR_RATIO (MIPS_CALL_RATIO + 2) > is effectively estimating the number of instruction for a call. > ISTM CLEAR_RATIO is basically being compared against an estimate of > the number of zero stores, and zero stores are 1 instruction on MIPS. > (Also, nothing really explained why CLEAR_RATIO adds a magic 2 to the > ratio.) > > So I think this should really be 6 and that CLEAR_RATIO should be: > > #define CLEAR_RATIO (optimize_size ? MIPS_CALL_RATIO : 15) > > Then... > >> + #define MOVE_RATIO ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO : 2) > > ...a comment in the original patch said that MOVE_RATIO effectively > counted memory-to-memory moves. I think that was a useful comment, > and that the use of the old MIPS_CALL_RATIO above should be the new > MIPS_CALL_RATIO / 2. Conveniently, that gives us the 3 that you had > in the original patch. Except that 4 seems to be a better number, and that number doesn't fall out of this theory. I guess I could run some tests with different values for CLEAR_RATIO too, and just document both numbers as being experimentally determined? > (You didn't say whether you'd benchmarked > -mips16 or -mmemcpy; if so, did you see any difference between a > MOVE_RATIO of 3 and a MOVE_RATIO of 4?) I tried -mips16 but not -mmemcpy. See table above. >> + /* STORE_BY_PIECES_P can be used when copying a constant string, but >> + in that case each word takes 3 insns (lui, ori, sw), or more in >> + 64-bit mode, instead of 2 (lw, sw). So better to always fail this >> + and let the move_by_pieces code copy the string from read-only >> + memory. */ >> + >> + #define STORE_BY_PIECES_P(SIZE, ALIGN) 0 > > You asked when lui/ori/sw might be faster. Consider a three-word > store on a typical 2-way superscalar target: > > Cycle 1: lui lui > 2: ori ori > 3: sw lui > 4: sw ori > 5: sw > > That's 5 cycles. The equivalent lw/sw version is at least 6 cycles > (more if the read-only string is not in cache). OK, but what I was really asking was, is there a way to *test* for situations where we should generate the lui/ori/sw sequences instead of the lw/sw? Some combination of TARGET_foo flags and/or the size of the string? -Sandra the clueless