From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29755 invoked by alias); 29 Oct 2010 14:22:22 -0000 Received: (qmail 29737 invoked by uid 22791); 29 Oct 2010 14:22:21 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 29 Oct 2010 14:22:16 +0000 Received: (qmail 26294 invoked from network); 29 Oct 2010 14:22:14 -0000 Received: from unknown (HELO ?192.168.1.66?) (vries@127.0.0.2) by mail.codesourcery.com with ESMTPA; 29 Oct 2010 14:22:14 -0000 Message-ID: <4CCAD8F8.4000408@codesourcery.com> Date: Fri, 29 Oct 2010 15:06:00 -0000 From: Tom de Vries User-Agent: Thunderbird 2.0.0.24 (X11/20100411) MIME-Version: 1.0 To: Andrew Pinski CC: gcc-patches@gcc.gnu.org, Bernd Schmidt Subject: Re: new sign/zero extension elimination pass References: <4CBC698B.3080204@codesourcery.com> <4CC9CF40.2020904@codesourcery.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-10/txt/msg02514.txt.bz2 Andrew, Andrew Pinski wrote: > On Thu, Oct 28, 2010 at 12:30 PM, Tom de Vries wrote: >> I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html >> that for the example with which I submitted the pass initially, it would make >> sense to handle it in fwprop. However, I also think that for the example >> mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that >> wouldn't work, so we still need the new pass. > > Well when I look at the tree dumps I see: > (short unsigned int) ((int) (*D.1255)[1] + (int) (*D.1255)[0]) > I'm not able to reproduce that tree. Can you tell me how you got that one? I compiled the example http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893#c0 with MIPS as default target: ... $ rm -f *.c.* ; gcc -O3 test2.c -S -fdump-rtl-all -fdump-tree-all ... and grepped for a pattern matching the tree code you mention: ... $ grep '(short unsigned int)' test2.c.* | grep '+' test2.c.003t.original: (*d)[0] = (int16_t) ((short unsigned int) d0 + (short unsigned int) d1); ... The only match is 003t.original, which looks like: ... { int d0 = (int) (*d)[0] + (int) (*d)[1]; int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1]; int d0 = (int) (*d)[0] + (int) (*d)[1]; int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1]; (*d)[0] = (int16_t) ((short unsigned int) d0 + (short unsigned int) d1); (*d)[1] = (int16_t) ((short unsigned int) d0 - (short unsigned int) d1); } ... I did a clean checkout and build for x86_64 build this morning. Same tree as for MIPS, no other match for the grep. The tree you mention looks like '(short unsigned int) d0' combined with the def of d0. > So we should be optimizing this at the tree level such that we don't > see extra sign extends there. We would optimize it such that it looks > like: > (short unsigned int)(*D.1255)[1] + (short unsigned int)(*D.1255)[0] > > I think we need a tree combiner for that and fold to do the folding. > See PR 14844 for another case of the problem. > Thanks for the links to PR 14844 and the tree combiner (PR 15459). Do you know of an updated (to gimple) version of the tree combiner? I would like to try it on this example. If I look at the representation after cselim, the 2 uses of d0 are cse-ed: ... { int d1; int d0; int16_t D.2096; short unsigned int D.2095; int16_t D.2094; short unsigned int D.2093; short unsigned int D.2092; short unsigned int D.2091; int D.2090; int16_t D.2089; int D.2088; int16_t D.2087; int D.2085; int16_t D.2084; int D.2083; int16_t D.2082; : D.2082_2 = *d_1(D)[0]; D.2083_3 = (int) D.2082_2; D.2084_4 = *d_1(D)[1]; D.2085_5 = (int) D.2084_4; d0_6 = D.2083_3 + D.2085_5; D.2087_8 = MEM[(int16_t[2] *)d_1(D) + 4B][0]; D.2088_9 = (int) D.2087_8; D.2089_11 = MEM[(int16_t[2] *)d_1(D) + 4B][1]; D.2090_12 = (int) D.2089_11; d1_13 = D.2088_9 + D.2090_12; D.2091_14 = (short unsigned int) d0_6; D.2092_15 = (short unsigned int) d1_13; D.2093_16 = D.2091_14 + D.2092_15; D.2094_17 = (int16_t) D.2093_16; *d_1(D)[0] = D.2094_17; D.2095_20 = D.2091_14 - D.2092_15; D.2096_21 = (int16_t) D.2095_20; *d_1(D)[1] = D.2096_21; return; } ... So if I filter out all the statements related to d0 we get: ... { int d0; short unsigned int D.2091; int D.2085; int16_t D.2084; int D.2083; int16_t D.2082; : D.2082_2 = *d_1(D)[0]; D.2083_3 = (int) D.2082_2; D.2084_4 = *d_1(D)[1]; D.2085_5 = (int) D.2084_4; d0_6 = D.2083_3 + D.2085_5; D.2091_14 = (short unsigned int) d0_6; ... } ... I think a tree combiner would work here like you suggested, although it would have to combine 6 gimple statements to get there. But I wonder, the rtl combiner has a weak point: it only combines chains of instructions that have no other uses of the intermediate results (PR 18395/1). AFAIU the source code of the tree combiner, it suffers from the same problem. Is my understanding correct there? Thanks, - Tom