From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-66250-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 26534 invoked by alias); 15 Jan 2003 20:14:21 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 26520 invoked from network); 15 Jan 2003 20:14:19 -0000
Received: from unknown (HELO linisoft.localdomain) (24.80.72.10)
  by sources.redhat.com with SMTP; 15 Jan 2003 20:14:19 -0000
Received: from linisoft.com (IDENT:reza@linisoft.localdomain [127.0.0.1])
	by linisoft.localdomain (8.9.3/8.8.7) with ESMTP id MAA18608;
	Wed, 15 Jan 2003 12:11:26 -0800
Message-ID: <3E25C06E.4E7D37E8@linisoft.com>
Date: Thu, 16 Jan 2003 10:53:00 -0000
From: Reza Roboubi <reza@linisoft.com>
X-Accept-Language: en
MIME-Version: 1.0
To: Bonzini <bonzini@gnu.org>
CC: gcc@gcc.gnu.org, gcc-help@gcc.gnu.org
Subject: Re: optimizations
References: <005801c2bcc5$e9e2f6a0$421f1897@bonz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SW-Source: 2003-01/txt/msg00731.txt.bz2

Bonzini wrote:
> 
> > > Could you please also tell me if 3.3 and 3.4 remove the extra mov's in
> and out
> > > of %eax. Ideally, there should be no more than 4 instructions in the
> critical
> > > loop.
> >
> > .L2:
> > movl -4(%ebp), %eax <== still does the load
> > cmpl $16, %eax
> > je .L7
> > incl %eax
> > movl %eax, -4(%ebp) <== and store
> > jmp .L2
> > .L7:
> >
> > For some reason it is not (even with -fnew-ra), but on PPC there
> > is no extra load/store.
> 
> Instruction counts do not tell the whole story; gcc is simply putting more
> pressure on the decoding unit but less pressure on the execution unit (which
> otherwise would execute two loads in the `taken' case).  Things might be

Would you please elaborate on that?  I don't understand what you mean by the
"taken case."  The suggested optimization is:

CHANGE:
-------
.L2:
movl -4(%ebp), %eax <== still does the load
cmpl $16, %eax
je .L7
incl %eax
movl %eax, -4(%ebp) <== and store
jmp .L2
.L7:

TO:
-------
movl -4(%ebp), %eax
.L2:
cmpl $16, %eax
je .L7
incl %eax
jmp .L2
.L7:
movl %eax, -4(%ebp)

The mov's have moved _outside_ of the critical loop, and the register allocator
may still be able to remove the extra mov at entry to the loop.

The total number of instructions, and hence total program size will remain the
same even in the worst possible case.

Furthermore, an extra jump can be removed from the critical loop. If you
compile:
i=0;
for(;i<10;i++);
write(1,&i,4)   //make i volatile

then you will see that gcc optimizes away even this redundant jump, hence
producing only _three_ lines of code. But when a while() loop is used instead of
the equivalent for() loop that does not happen.

This seems like a crystal clear case for optimization, unless I am missing
something that you should kindly explain to  me in more detail.

Thanks, Reza.


> different if gcc is given other options like -mtune=i386.