From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-276698-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 29755 invoked by alias); 29 Oct 2010 14:22:22 -0000
Received: (qmail 29737 invoked by uid 22791); 29 Oct 2010 14:22:21 -0000
X-SWARE-Spam-Status: No, hits=-1.6 required=5.0	tests=AWL,BAYES_00,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 29 Oct 2010 14:22:16 +0000
Received: (qmail 26294 invoked from network); 29 Oct 2010 14:22:14 -0000
Received: from unknown (HELO ?192.168.1.66?) (vries@127.0.0.2)  by mail.codesourcery.com with ESMTPA; 29 Oct 2010 14:22:14 -0000
Message-ID: <4CCAD8F8.4000408@codesourcery.com>
Date: Fri, 29 Oct 2010 15:06:00 -0000
From: Tom de Vries <tom@codesourcery.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100411)
MIME-Version: 1.0
To: Andrew Pinski <pinskia@gmail.com>
CC: gcc-patches@gcc.gnu.org, Bernd Schmidt <bernds@codesourcery.com>
Subject: Re: new sign/zero extension elimination pass
References: <4CBC698B.3080204@codesourcery.com>	<AANLkTik3umTLD4-EFgpgemZZX0gNmhO+2qU_gr+EEyAc@mail.gmail.com>	<AANLkTi==BHdr3ZZNt_99tu4tbGWqz8xJQf1m4FkDMxz4@mail.gmail.com>	<4CC9CF40.2020904@codesourcery.com> <AANLkTi=7QkZr_PC2EDVoCvvqK1xoBM24JGNMMVRO9sJB@mail.gmail.com>
In-Reply-To: <AANLkTi=7QkZr_PC2EDVoCvvqK1xoBM24JGNMMVRO9sJB@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-10/txt/msg02514.txt.bz2

Andrew,

Andrew Pinski wrote:
> On Thu, Oct 28, 2010 at 12:30 PM, Tom de Vries <tom@codesourcery.com> wrote:
>> I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html
>> that for the example with which I submitted the pass initially, it would make
>> sense to handle it in fwprop. However, I also think that for the example
>> mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that
>> wouldn't work, so we still need the new pass.
> 
> Well when I look at the tree dumps I see:
> (short unsigned int) ((int) (*D.1255)[1] + (int) (*D.1255)[0])
> 

I'm not able to reproduce that tree. Can you tell me how you got that one?

I compiled the example http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893#c0 with
MIPS as default target:
...
$ rm -f *.c.* ; gcc -O3 test2.c -S -fdump-rtl-all -fdump-tree-all
...

and grepped for a pattern matching the tree code you mention:
...
$ grep '(short unsigned int)' test2.c.* | grep '+'
test2.c.003t.original:  (*d)[0] = (int16_t) ((short unsigned int) d0 + (short
unsigned int) d1);
...

The only match is 003t.original, which looks like:
...
{
  int d0 = (int) (*d)[0] + (int) (*d)[1];
  int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1];

    int d0 = (int) (*d)[0] + (int) (*d)[1];
    int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1];
  (*d)[0] = (int16_t) ((short unsigned int) d0 + (short unsigned int) d1);
  (*d)[1] = (int16_t) ((short unsigned int) d0 - (short unsigned int) d1);
}
...

I did a clean checkout and build for x86_64 build this morning. Same tree as for
MIPS, no other match for the grep.

The tree you mention looks like '(short unsigned int) d0' combined with the def
of d0.

> So we should be optimizing this at the tree level such that we don't
> see extra sign extends there.  We would optimize it such that it looks
> like:
> (short unsigned int)(*D.1255)[1] + (short unsigned int)(*D.1255)[0]
> 
> I think we need a tree combiner for that and fold to do the folding.
> See PR 14844 for another case of the problem.
> 

Thanks for the links to PR 14844 and the tree combiner (PR 15459). Do you know
of an updated (to gimple) version of the tree combiner? I would like to try it
on this example.

If I look at the representation after cselim, the 2 uses of d0 are cse-ed:
...
{
  int d1;
  int d0;
  int16_t D.2096;
  short unsigned int D.2095;
  int16_t D.2094;
  short unsigned int D.2093;
  short unsigned int D.2092;
  short unsigned int D.2091;
  int D.2090;
  int16_t D.2089;
  int D.2088;
  int16_t D.2087;
  int D.2085;
  int16_t D.2084;
  int D.2083;
  int16_t D.2082;

<bb 2>:
  D.2082_2 = *d_1(D)[0];
  D.2083_3 = (int) D.2082_2;
  D.2084_4 = *d_1(D)[1];
  D.2085_5 = (int) D.2084_4;
  d0_6 = D.2083_3 + D.2085_5;
  D.2087_8 = MEM[(int16_t[2] *)d_1(D) + 4B][0];
  D.2088_9 = (int) D.2087_8;
  D.2089_11 = MEM[(int16_t[2] *)d_1(D) + 4B][1];
  D.2090_12 = (int) D.2089_11;
  d1_13 = D.2088_9 + D.2090_12;
  D.2091_14 = (short unsigned int) d0_6;
  D.2092_15 = (short unsigned int) d1_13;
  D.2093_16 = D.2091_14 + D.2092_15;
  D.2094_17 = (int16_t) D.2093_16;
  *d_1(D)[0] = D.2094_17;
  D.2095_20 = D.2091_14 - D.2092_15;
  D.2096_21 = (int16_t) D.2095_20;
  *d_1(D)[1] = D.2096_21;
  return;
}
...

So if I filter out all the statements related to d0 we get:
...
{
  int d0;
  short unsigned int D.2091;
  int D.2085;
  int16_t D.2084;
  int D.2083;
  int16_t D.2082;

<bb 2>:
  D.2082_2 = *d_1(D)[0];
  D.2083_3 = (int) D.2082_2;
  D.2084_4 = *d_1(D)[1];
  D.2085_5 = (int) D.2084_4;
  d0_6 = D.2083_3 + D.2085_5;
  D.2091_14 = (short unsigned int) d0_6;
  ...
}
...

I think a tree combiner would work here like you suggested, although it would
have to combine 6 gimple statements to get there.

But I wonder, the rtl combiner has a weak point: it only combines chains of
instructions that have no other uses of the intermediate results (PR 18395/1).
AFAIU the source code of the tree combiner, it suffers from the same problem. Is
my understanding correct there?

Thanks,
- Tom