From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26534 invoked by alias); 15 Jan 2003 20:14:21 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 26520 invoked from network); 15 Jan 2003 20:14:19 -0000 Received: from unknown (HELO linisoft.localdomain) (24.80.72.10) by sources.redhat.com with SMTP; 15 Jan 2003 20:14:19 -0000 Received: from linisoft.com (IDENT:reza@linisoft.localdomain [127.0.0.1]) by linisoft.localdomain (8.9.3/8.8.7) with ESMTP id MAA18608; Wed, 15 Jan 2003 12:11:26 -0800 Message-ID: <3E25C06E.4E7D37E8@linisoft.com> Date: Thu, 16 Jan 2003 10:53:00 -0000 From: Reza Roboubi X-Accept-Language: en MIME-Version: 1.0 To: Bonzini CC: gcc@gcc.gnu.org, gcc-help@gcc.gnu.org Subject: Re: optimizations References: <005801c2bcc5$e9e2f6a0$421f1897@bonz> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2003-01/txt/msg00731.txt.bz2 Bonzini wrote: > > > > Could you please also tell me if 3.3 and 3.4 remove the extra mov's in > and out > > > of %eax. Ideally, there should be no more than 4 instructions in the > critical > > > loop. > > > > .L2: > > movl -4(%ebp), %eax <== still does the load > > cmpl $16, %eax > > je .L7 > > incl %eax > > movl %eax, -4(%ebp) <== and store > > jmp .L2 > > .L7: > > > > For some reason it is not (even with -fnew-ra), but on PPC there > > is no extra load/store. > > Instruction counts do not tell the whole story; gcc is simply putting more > pressure on the decoding unit but less pressure on the execution unit (which > otherwise would execute two loads in the `taken' case). Things might be Would you please elaborate on that? I don't understand what you mean by the "taken case." The suggested optimization is: CHANGE: ------- .L2: movl -4(%ebp), %eax <== still does the load cmpl $16, %eax je .L7 incl %eax movl %eax, -4(%ebp) <== and store jmp .L2 .L7: TO: ------- movl -4(%ebp), %eax .L2: cmpl $16, %eax je .L7 incl %eax jmp .L2 .L7: movl %eax, -4(%ebp) The mov's have moved _outside_ of the critical loop, and the register allocator may still be able to remove the extra mov at entry to the loop. The total number of instructions, and hence total program size will remain the same even in the worst possible case. Furthermore, an extra jump can be removed from the critical loop. If you compile: i=0; for(;i<10;i++); write(1,&i,4) //make i volatile then you will see that gcc optimizes away even this redundant jump, hence producing only _three_ lines of code. But when a while() loop is used instead of the equivalent for() loop that does not happen. This seems like a crystal clear case for optimization, unless I am missing something that you should kindly explain to me in more detail. Thanks, Reza. > different if gcc is given other options like -mtune=i386.