From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13216 invoked by alias); 14 May 2003 15:31:50 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 13205 invoked from network); 14 May 2003 15:31:49 -0000 Received: from unknown (HELO omnigroup.com) (198.151.161.1) by sources.redhat.com with SMTP; 14 May 2003 15:31:49 -0000 Received: from omnigroup.com (seel.omnigroup.com [198.151.161.19]) by omnigroup.com (8.10.2/8.9.1) with ESMTP id h4EFVmH06280; Wed, 14 May 2003 08:31:48 -0700 (PDT) Date: Wed, 14 May 2003 15:31:00 -0000 Subject: Re: [tree-ssa] RFC: Dropping INDIRECT_REF variables Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v552) Cc: gcc@gcc.gnu.org To: Diego Novillo From: "Timothy J. Wood" In-Reply-To: <20030514150917.GA12344@tornado.toronto.redhat.com> Message-Id: <2D0876A9-8621-11D7-93E7-000A9567A046@omnigroup.com> Content-Transfer-Encoding: 7bit X-SW-Source: 2003-05/txt/msg01412.txt.bz2 On Wednesday, May 14, 2003, at 08:09 AM, Diego Novillo wrote: > - We disable the ability to treat non-aliased pointer > dereferences as if they were variables. We would lose the > ability to do some optimizations like: Please forgive me if I'm way off base here, but one example on PPC that your code snippet reminded me of was int->float conversion. On PPC this happens by building storing a pair of 32-bit constructed ints on the stack, loading them as a double and then doing some more contortions with this. The current problem is that if you have float conversion in a loop: void convert(int *input, float *output, unsigned int count) { while (count--) *output++ = *input++; } both words of the double on the stack are written on each loop even though the one 32-bit portion is always the same. This store should be hoisted outside the loop (in the degenerate case above this can by a pretty big performance win). I don't have a 3.3 compiler right now, but on Mac OS X 10.2 (3.1-based): cc -mdynamic-no-pic -S -O2 foo.c .data .literal8 .align 3 LC0: .long 1127219200 .long -2147483648 .text .align 2 .globl _convert _convert: cmpwi cr0,r5,0 addi r5,r5,-1 beqlr- cr0 addi r5,r5,1 lis r11,ha16(LC0) mtctr r5 lfd f13,lo16(LC0)(r11) lis r9,0x4330 L8: lwz r0,0(r3) addi r3,r3,4 stw r9,-16(r1) <--- should be outside the loop xoris r0,r0,0x8000 stw r0,-12(r1) lfd f0,-16(r1) fsub f0,f0,f13 frsp f0,f0 stfs f0,0(r4) addi r4,r4,4 bdnz L8 blr I'm not sure this applies to what you were talking about, but I thought I'd bring it up since it looks similar and we could benefit from having this optimization. -tim