From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-276619-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 4401 invoked by alias); 28 Oct 2010 18:44:38 -0000
Received: (qmail 4381 invoked by uid 22791); 28 Oct 2010 18:44:37 -0000
X-SWARE-Spam-Status: No, hits=-1.4 required=5.0	tests=AWL,BAYES_00,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 28 Oct 2010 18:44:32 +0000
Received: (qmail 10938 invoked from network); 28 Oct 2010 18:44:31 -0000
Received: from unknown (HELO ?192.168.1.66?) (vries@127.0.0.2)  by mail.codesourcery.com with ESMTPA; 28 Oct 2010 18:44:31 -0000
Message-ID: <4CC9C4EC.4060200@codesourcery.com>
Date: Thu, 28 Oct 2010 20:45:00 -0000
From: Tom de Vries <tom@codesourcery.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100411)
MIME-Version: 1.0
To: Eric Botcazou <ebotcazou@adacore.com>
CC: gcc-patches@gcc.gnu.org, Bernd Schmidt <bernds@codesourcery.com>
Subject: Re: new sign/zero extension elimination pass
References: <4CBC698B.3080204@codesourcery.com> <201010221030.30363.ebotcazou@adacore.com>
In-Reply-To: <201010221030.30363.ebotcazou@adacore.com>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-10/txt/msg02435.txt.bz2

Eric Botcazou wrote:
>> This new pass manages to analyze this pattern, and replace the sign_extend
>> with a regcopy, which results in 1 less instruction in the assembly. The
>> other passes that eliminate sign/zero extensions do no manage to do that.
>> Combine doesn't work since the def and the use of reg 199 are in a
>> different bb. Implicit_zee does not work here since it only combines an
>> extension with the defs of its src operand, which is not applicable in this
>> case.
> 
> Was it originally written for a pre-4.3 compiler?  

No, it was written on top of a 4.5.1 compiler.

> If not, why does it not use 
> the DF framework instead of recomputing DU chains manually?
> 

The current form of the pass does not compute DU-chains. It treats all defs and
uses of the same reg as if all defs reach all uses. This is imprecise, and I'm
expecting to find an example where this is a limiting factor, but until now I
haven't.

>> The pass does a couple of linear scans over the insns, so it's not
>> expensive in terms of runtime.
> 
> At -O2 it can do up to 5 full scans over the insns AFAICS; the head comment in 
> the file has "a number of times".  This appears to be quite suboptimal and you 
> rightfully mentioned it in the head comment.  Couldn't we avoid doing a full 
> scan for each new round during the propagation phase, even without using a 
> fully-fledged iterative algorithm?
> 

If we save the use size for each use in the second scan, rather than just saving
the biggest use size for each reg, we only have to visit instructions where
propagation can happen in a propagating scan. So we can make a list of
instructions where that can happen, and iterate over those only. The second
example I mentioned has 14 insns, of which 8 would be in such a list. Actually,
currently it's 6, but it will be 8 if we also propagate over extensions, which
we don't do yet. Either way, this is already better than doing a full scan.

Another runtime improvement would be to keep track of the propagated size per
insn, which we can use to avoid processing an insn (applying the transfer
function) when it won't have an effect.

A further runtime improvement would be to move insn out the the list when
processed, and back into the list when necessary, using UD-chains. Essentially
this is already an iterative algorithm with a worklist.

The best version I can think now of is an iterative solution using DU/UD-chains,
like this:
- Calculate and store the size of each use.
- Calculate and store the biggest use of each def (using DU-chains).
- Push the progating defs where the biggest use size is smaller than word size
  on a stack.
- Pop a def.
- For the def, check if the propagated biggest use is bigger than the current
  biggest use.
  If not, back to pop a def.
  If so, store current biggest use as propagated biggest use.
- For the def, propagate the biggest use of the def to the operands.
- For each operand, if the use size was reduced, find the reaching definitions
  (using UD-chains).
- For each propagating definition, recompute the biggest use. If that one was
  reduced below the propagated biggest use, push the def on the stack.
- back to pop a def.

I think it would make sense to do either the first 2 improvements (which are
relatively easy to do on top of the current version), or to go directly for the
last version (which would mean a rewrite).

- Tom