From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 53808 invoked by alias); 4 Oct 2016 14:29:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 53788 invoked by uid 89); 4 Oct 2016 14:29:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=H*M:104, reordering, rts, GSoC X-HELO: mailout11.t-online.de Received: from mailout11.t-online.de (HELO mailout11.t-online.de) (194.25.134.85) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 04 Oct 2016 14:28:59 +0000 Received: from fwd24.aul.t-online.de (fwd24.aul.t-online.de [172.20.26.129]) by mailout11.t-online.de (Postfix) with SMTP id DD0E74257803; Tue, 4 Oct 2016 16:28:56 +0200 (CEST) Received: from [192.168.0.16] (ToVVCkZU8hUrApUvP7W0kyoO8QJ0vw9TOrBPE2x7U6w4kLzLHgp0L1Eixjr1KwIw9M@[115.165.93.200]) by fwd24.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1brQiF-3RFhZ20; Tue, 4 Oct 2016 16:28:55 +0200 Message-ID: <1475591331.19773.104.camel@t-online.de> Subject: Re: [RFC][PATCH] Canonicalize address multiplies From: Oleg Endo To: Wilco Dijkstra , GCC Patches Cc: nd , Erik Varga Date: Tue, 04 Oct 2016 14:29:00 -0000 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2016-10/txt/msg00160.txt.bz2 On Tue, 2016-10-04 at 12:53 +0000, Wilco Dijkstra wrote: > GCC currently doesn't canonicalize address expressions. As a result > inefficient code is generated even for trivial index address > expressions, > blocking CSE and other optimizations: > > int f(int *p, int i) { return p[i+2] + p[i+1]; } > > sxtw x1, w1 > add x1, x1, 2 > add x2, x0, x1, lsl 2 > ldr w0, [x0, x1, lsl 2] > ldr w1, [x2, -4] > add w0, w1, w0 > ret > > After this patch: > > add x1, x0, x1, sxtw 2 > ldp w0, w2, [x1, 4] > add w0, w2, w0 > ret > > The reason for this is that array index expressions are preferably > kept in the *(p + (i + C0) * C1) form eventhough it is best on most > targets to make use of an offset in memory accesses - ie. *(p + i * > C1 + (C0*C1)). > > This patch disables the folding in fold_plusminus_mult_expr that > changes the latter form into the former.  Unfortunately it isn't > possible to know it is an address expression, and neither is there a > way to decide when C0*C1 is too complex.  > > So is there a better way/place to do this, or do we need an address > canonicalization phase in the tree that ensures we expand addresses > in an efficient manner, taking into account target offsets? There's been an effort to implement address mode selection (AMS) optimization in GCC as part of the GSoC program.  However, it hasn't been mainlined yet and it's for SH only, but I'd like to move that forward and make it available to other backends, too. It's an RTL pass and works by analyzing memory accesses inside basic blocks, figuring out the effective address expressions, querying the backend for address mode alternatives for each memory access and the associated costs.  With that information it tries to find a minimal solution (minimizing address register calculations and minimizing address mode alternative costs), which is currently implemented with backtracking. For SH, the AMS pass can convert your example above from this _f: mov r5,r0 add #2,r0 shll2 r0 mov r4,r1 add r0,r1 mov.l @(r0,r4),r0 add #-4,r1 mov.l @r1,r2 rts add r2,r0 into this: _f: shll2 r5 add r5,r4 mov.l @(4,r4),r0 mov.l @(8,r4),r1 rts add r1,r0 .. which is minimal on SH. It also fixes several missed auto-inc opportunities and was meant to allow further address mode related optimizations like displacement range fitting or access reordering. Although not yet ready for mainline, the current code can be found on github: https://github.com/erikvarga/gcc/commits/master https://github.com/erikvarga/gcc/blob/master/gcc/ams.h https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc The way AMS gets address mode information from the backend is different to GCC's current approach: https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11946 Since the SH ISA is a bit irregular, there are a bunch of exceptions and special cases in the cost calculations which take surrounding insns and mem accesses into account.  I guess a more regular or less restrictive ISA wouldn't need too many special cases. Cheers, Oleg