From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 56834 invoked by alias); 2 May 2017 12:12:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 56802 invoked by uid 89); 2 May 2017 12:12:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=BAYES_00,GIT_PATCH_1,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 02 May 2017 12:12:17 +0000 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 44132250950; Tue, 2 May 2017 12:12:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 44132250950 Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jakub@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 44132250950 Received: from tucnak.zalov.cz (ovpn-116-29.ams2.redhat.com [10.36.116.29]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DC90278C1B; Tue, 2 May 2017 12:12:17 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id v42CCFiM007177; Tue, 2 May 2017 14:12:15 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id v42CCDvT007176; Tue, 2 May 2017 14:12:13 +0200 Date: Tue, 02 May 2017 12:12:00 -0000 From: Jakub Jelinek To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Optimize in VRP loads from constant arrays (take 2) Message-ID: <20170502121213.GB1809@tucnak> Reply-To: Jakub Jelinek References: <20170421140659.GB1809@tucnak> <5842BC6C-6CD8-4059-9DCF-625D39D2BAA6@suse.de> <20170429162849.GQ1809@tucnak> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.1 (2016-10-04) X-IsSubscribed: yes X-SW-Source: 2017-05/txt/msg00089.txt.bz2 On Tue, May 02, 2017 at 01:13:19PM +0200, Richard Biener wrote: > On Sat, 29 Apr 2017, Jakub Jelinek wrote: > > > On Fri, Apr 21, 2017 at 04:42:03PM +0200, Richard Biener wrote: > > > On April 21, 2017 4:06:59 PM GMT+02:00, Jakub Jelinek wrote: > > > >This patch attempts to implement the optimization mentioned in > > > >http://gcc.gnu.org/ml/gcc-patches/2017-02/msg00945.html > > > >If we know a (relatively small) value range for ARRAY_REF index > > > >in constant array load and the values in the array for all the indexes > > > >in the array are the same (and constant), we can just replace the load > > > >with the constant. > > > > > > But this should be done during propagation (and thus can even produce a range rather than just a constant). > > > > True, I've changed the implementation to do that. > > Except that it is only useful during propagation for integral or pointer > > types, for anything else (and for pointers when not just doing NULL vs. > > non-NULL vr) it is also performed during simplification (in that case > > it requires operand_equal_p values). > > > > The patch also newly computes range for the index and range from the array > > bounds (taking into account arrays at struct end) and intersects them, > > plus takes into account that some iterators might have a couple of zero LBS > > bits (as computed by ccp). > > Hmm, while I can see how this helps I'd rather not do this in this way. > (see PR80533 and my followup with a testcase which shows > array_at_struct_end_p is wrong) Ideally we'd add asserts for array > indices instead. Thus > > i_3 = ASSERT_EXPR > .. = a[i_3]; > > which of course needs adjustment to -Warray-bounds to look at the > range of the original SSA name (and in loops even that might degrade...). > > Was this necessary to get any reasonable results? Yes, it is very common that you end up with a VR that has negative min, or very large max, but if the array is in the middle of a struct, or it is a toplevel non-common array variable etc., which is quite common case, then we still should be able to optimize it. Adding extra ASSERT_EXPRs would work too, but wouldn't it be too expensive? I mean there can be lots of array accesses with the same index, if we'd add an ASSERT_EXPR for every case, VRP work could increase many times. It is true that we'd be able to optimize: int foo [5] = { 1, 2, 3, 4, 5 }; void foo (int i, int j) { if (j > 10) { foo[i] = 10; if (i < 0 || i >= 5) link_error (); } foo[i]++; } If array_at_struct_end_p is wrong, it should be fixed ;) > Still given the high cost of get_array_ctor_element_at_index which > does linear searching I'd add a few early-outs like > > base = get_base_address (rhs); > ctor = ctor_for_folding (base); > if (! ctor) > return NULL_TREE; This one I can add easily. > I'd restructure the patch quite different, using for_each_index on the > ref gather an array of index pointers (bail out on sth unhandled). > Then I'd see if I have interesting ranges for them, if not, bail out. > Also compute the size product of all ranges and test that against > PARAM_MAX_VRP_CONSTANT_ARRAY_LOADS. Then store the minimum range > value in the index places (temporarily) and use get_base_ref_and_extent to > get at the constant "starting" offset. From there iterate using > the remembered indices (remember the ref tree as well via for_each_index), > directly adjusting the constant offset so you can feed > fold_ctor_reference the constant offsets of all elements that need to > be considered. As optimization fold_ctor_reference would know how > to start from the "last" offset (much refactoring would need to be > done here given nested ctors and multiple indices I guess). But for this, don't you want to take it over? I agree that the current implementation is not very efficient and that is why it is also limited to that small number of iterations. As many cases just aren't able to use the valueize callback, handling arbitrary numbers of non-constant indexes would be harder. Jakub