From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-367418-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 1791 invoked by alias); 14 May 2014 03:46:26 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 1774 invoked by uid 89); 14 May 2014 03:46:25 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-vc0-f175.google.com
Received: from mail-vc0-f175.google.com (HELO mail-vc0-f175.google.com) (209.85.220.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 14 May 2014 03:46:23 +0000
Received: by mail-vc0-f175.google.com with SMTP id hu19so1687039vcb.34        for <gcc-patches@gcc.gnu.org>; Tue, 13 May 2014 20:46:21 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.52.228.134 with SMTP id si6mr697249vdc.5.1400039181039; Tue, 13 May 2014 20:46:21 -0700 (PDT)
Received: by 10.220.11.5 with HTTP; Tue, 13 May 2014 20:46:20 -0700 (PDT)
In-Reply-To: <20140514030448.GJ5162@bubble.grove.modra.org>
References: <20140508014846.GA5162@bubble.grove.modra.org>	<CAGWvnyme6a+QLSxXwzUOxWAHWTokxZCySua_+25hUzaEVYvgPA@mail.gmail.com>	<20140509024054.GE5162@bubble.grove.modra.org>	<CAGWvny=3+XuqiWtxhB4HSXDWHRZMpNkhioTVeEpEs9C4vMWRGg@mail.gmail.com>	<20140514030448.GJ5162@bubble.grove.modra.org>
Date: Wed, 14 May 2014 03:46:00 -0000
Message-ID: <CAGWvnymJhh1iR4wQW7GOBkAd=C6RH9eHMOJe8JimNrBsN3MOHg@mail.gmail.com>
Subject: Re: [RS6000] Fix PR61098, Poor code setting count register
From: David Edelsohn <dje.gcc@gmail.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>, Alan Modra <amodra@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-SW-Source: 2014-05/txt/msg01033.txt.bz2

Alan,

Danny may have re-organized the code, but I thought that it originally
came from Tom Rixx, if not earlier.

I seem to remember problems in the past with late creation of TOC
entries for constants causing problems, so it was easier to fall back
to materializing all integer constants inline. I don't remember the
PRs, but I think there were issues with creating a TOC if the late
constant were the only TOC reference, or maybe the issue was buying a
stack frame to materialize the TOC/GOT for a late constant.  And
maximum 5 instruction sequence is not really bad relative to a load
from the TOC (even with medium model / data in TOC). There are a lot
of trade-offs with respect to I$ expansion versus the load hitting in
the L1 D$.

Alpha emit_set_const does limit the number of instructions, but that
is a search for a more efficient sequence than the naive sequence. The
Alpha splitters use alpha_split_const_mov(), which tries
alpha_emit_set_const() for an efficient sequence and then falls back
to alpha_emit_set_long_const() for a naive sequence.  Alpha uses PLUS
instead of IOR because of the way its ISA works.
alpha_emit_set_long_const() always will materialize the constant and
does not check for a maximum number of instructions. This is why it's
comment says "fall back to straight forward decomposition".

However, alpha_emit_set_long_const() and alpha_split_const_mov() can
fail, presumably because emit_move_insn() fails, not because of
reaching a maximum number of instructions.

alpha_legitimate_constant_p() rejecs expensive constants early. Once
the splitter is invoked, it always tries to materialize the constant,
but the splitter apparently can fail for other reasons.

I don't mind exploring the benefits of tighening up
rs6000_legitimate_const(), but I'm not sure it's an obvious win,
especially with the potential failure corner cases.

However, I want to have a better understanding about the part of the
patch that removes the FAIL path from the splitters.

Thanks, David


On Tue, May 13, 2014 at 11:04 PM, Alan Modra <amodra@gmail.com> wrote:
> On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote:
>> On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote:
>>
>> >> Please do not remove all of the comments from the two functions. The
>> >> comments should provide some documentation about the different
>> >> purposes of the two functions other than setting DEST to a CONST.
>> >
>> > I believe my updated comment covers the complete purpose of the
>> > function nowadays.  The comments I removed are out-dated, and should
>> > have been removed a long time ago..  rs6000_emit_set_const does not
>> > even look at N, it always returns a non-zero result, and the return is
>> > only tested for non-zero.  I removed MODE too, because that is always
>> > the same as GET_MODE (dest).
>>
>> It is helpful if the comment expresses more than restating the
>> information one can glean from the function name. It's useful to note
>> that rs6000_emit_set_long_const is a standard decomposition with a
>> bounded number of instructions.
>>
>> >> I think that the way you rearranged the invocations of copy_rtx() in
>> >> rs6000_emit_set_long_const() is okay, but it would be good for someone
>> >> else to double check.
>> >
>> > Yeah, that function is a bit messy.  I took the approach of always use
>> > a bare "dest" once in the last instruction emitted, with every other
>> > use getting hit with copy_rtx.  The previous approach was similar,
>> > but used the bare "dest" on the first instruction emitted.  Obviously
>> > you don't need copy_rtx anywhere with the new code when
>> > can_create_pseudo_p is true, but I felt it wasn't worth optimising
>> > that for the added source complication.
>>
>> Can you help clarify the removal of the code that tests if the
>> splitter failed?  The splitters in the Alpha port follow mostly the
>> same rhythm, with a little bit of further cleanup and consolidation
>> relative to the rs6000 port. alpha_split_const_mov() falls back on
>> alpha_emit_set_long_const(), but checks that the target is valid and
>> allows the splitter to fail. Either the Alpha port is doing
>> unnecessary work or this cleanup patch is too aggressive. Either way,
>> a comment seems necessary.
>
> OK, I've had a good look at the history of this code.
>
> rs6000_emit_set_const and rs6000_emit_set_long_const were introduced
> with revision 44516, a largish patch by Dan Berlin.  As you hint
> above, it seems the functions were copied from alpha.  So the
> parameters were unnecessary and the comments just plain wrong for the
> rs6000 version of code right from the initial commit.  Worse, only
> half of necessary infrastructure was copied from alpha..
>
> So let me lay out what I believe should be happening with
> (set (reg) (constant))
>
> At expand time, if the above set can't be implemented in a single
> instruction, then it should be decomposed to the equivalent set high
> part, ori low part, and possibly shift instructions so long as the
> resulting sequence is small.  I think we basically do this correctly
> in rs6000_emit_move.  See the num_insns_constant call there.
> Constants that can't be evaluated inline by two (or three)
> instructions will be replaced with a load from the TOC.
>
> The same thing ought to happen in the splitters that use
> rs6000_emit_set_const.  rs6000_emit_set_const should refuse to expand
> to too many instructions (just like alpha).  We don't do this, but if
> we did, this would leave some (set (reg) (constant)) instructions in
> the RTL.  Alternatively the splitters could generate loads from the
> TOC, but see pr57836, which shows the loads from the TOC we crafted
> at expand time being reduced back to (set (reg) (constant)).
>
> Finally, at reload time, any remaining (set (reg) (constant))
> (ie. those that result in a long inline sequence) should be forced to
> the TOC.  This is the missing part of the infrastructure that wasn't
> copied from alpha.  Our legitimate_constant_p needs to reject some
> constants..  As it is, reload simply expands to a four or five
> inline instruction sequence.
>
> David, I'd like some help with the legitimate_constant_p
> implementation.  I have something that seems to work (not yet
> regression tested) but there are a number of things that I'm not clear
> on (eg. the revision 20229 change) so likely will get it wrong.
>
> --
> Alan Modra
> Australia Development Lab, IBM