From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-434009-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 48819 invoked by alias); 15 Aug 2016 16:57:03 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 48088 invoked by uid 89); 15 Aug 2016 16:57:03 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=consists, definitly, peeling, Failures
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 15 Aug 2016 16:57:01 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 19E22C057EC7;	Mon, 15 Aug 2016 16:57:00 +0000 (UTC)
Received: from localhost.localdomain (ovpn-116-13.phx2.redhat.com [10.3.116.13])	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u7FGuxpv023043;	Mon, 15 Aug 2016 12:56:59 -0400
Subject: Re: backward threading heuristics tweek
To: Jan Hubicka <hubicka@ucw.cz>, James Greenhalgh <james.greenhalgh@arm.com>
References: <20160606101953.GC12313@kam.mff.cuni.cz> <CA+=Sn1mXz9ONrYAh_D3NO=YyPxiy7KKe5yjd+gj3G04ch8Hz3Q@mail.gmail.com> <20160808082830.GA14494@arm.com> <20160811123558.GB67433@kam.mff.cuni.cz>
Cc: Andrew Pinski <pinskia@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>,        nd@arm.com
From: Jeff Law <law@redhat.com>
Message-ID: <d09ca941-22c6-fe75-11e1-e8e50ebfd1f0@redhat.com>
Date: Mon, 15 Aug 2016 16:57:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160811123558.GB67433@kam.mff.cuni.cz>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2016-08/txt/msg01134.txt.bz2

On 08/11/2016 06:35 AM, Jan Hubicka wrote:
>> This also caused:
>>
>> FAIL: gcc.dg/tree-ssa/pr69270-3.c scan-tree-dump-times uncprop1 ", 1" 4
>>
>>   Failures:
>> 	gcc.dg/tree-ssa/pr69270-3.c
>> 	
>>   Bisected to:
>>
>>   Author: hubicka
>>   Date:   Sun Aug 7 10:50:16 2016 +0000
>>
>>     	* tree-ssa-threadbackward.c: Include tree-inline.h
>>     	(profitable_jump_thread_path): Use estimate_num_insns to estimate
>>     	size of copied block; for cold paths reduce duplication.
>>     	(find_jump_threads_backwards): Remove redundant tests.
>>     	(pass_thread_jumps::gate): Enable for -Os.
>>     	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update testcase.
>>
>>   svn+ssh://gcc.gnu.org/svn/gcc/trunk@239219
>>
>> On my aarch64-none-linux-gnu and x86_64-none-linux-gnu automatic bisect
>> robots.
>
> Sorry for that - it seems I have missed this failure.  The reason is that FSM
> now stops on:
>   /* We avoid creating irreducible inner loops unless we thread through
>      a multiway branch, in which case we have deemed it worth losing
>      other loop optimizations later.
>
>      We also consider it worth creating an irreducible inner loop if
>      the number of copied statement is low relative to the length of
>      the path -- in that case there's little the traditional loop
>      optimizer would have done anyway, so an irreducible loop is not
>      so bad.  */
>   if (!threaded_multiway_branch && creates_irreducible_loop
>       && (n_insns * PARAM_VALUE (PARAM_FSM_SCALE_PATH_STMTS)
>           > path_length * PARAM_VALUE (PARAM_FSM_SCALE_PATH_BLOCKS)))
>
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file,
>                  "FSM would create irreducible loop without threading "
>                  "multiway branch.\n");
>       path->pop ();
>       return NULL;
>     }
>
> The path threaded now gets n_insn==13 and path_lengt=6. I guess the difference
> is that the path consists of several calls that are considered heavy by the
> new code size estimate which is correct. It is definitly heaver than path
> consisting of few increments.
>
> <bb 6>:
> if (phi_inserted_5 == 0)
>   goto <bb 8>;
> else
>   goto <bb 7>;
>
> <bb 4>:
> _2 = boo ();
> if (_2 != 20)
>   goto <bb 8>;
> else
>   goto <bb 6>;
>
> <bb 3>:
> _1 = arf ();
> if (_1 != 10)
>   goto <bb 8>;
> else
>   goto <bb 4>;
>
> <bb 9>:
> # phi_inserted_5 = PHI <0(2), phi_inserted_4(8)>
> _3 = end_imm_use_stmt_p ();
> if (_3 == 0)
>   goto <bb 3>;
> else
>   goto <bb 10>;
>
> loop latch.
> <bb 8>:
> # phi_inserted_4 = PHI <phi_inserted_5(4), 1(6), phi_inserted_5(7), phi_inserted_5(3)>
> next_imm_use_stmt ();
>
> <bb 6>:
> if (phi_inserted_5 == 0)
>   goto <bb 8>;
> else
>   goto <bb 7>;
>
>
> I would say that optimizing this path to dead is not the most important thing.  The question is whether
> there is really problem with an irreducible loop. THere are two loops in the function body prior threading:
>
> ;; Loop 0
> ;;  header 0, latch 1
> ;;  depth 0, outer -1
> ;;  nodes: 0 1 2 3 4 6 7 8 9 10
> ;;
> ;; Loop 1
> ;;  header 9, latch 8
> ;;  depth 1, outer 0
> ;;  nodes: 9 8 4 6 7 3
> ;; 2 succs { 9 }
> ;; 3 succs { 8 4 }
> ;; 4 succs { 8 6 }
> ;; 6 succs { 8 7 }
> ;; 7 succs { 8 }
> ;; 8 succs { 9 }
> ;; 9 succs { 3 10 }
> ;; 10 succs { 1 }
>
> So the threaded path lives fully inside loop1: 6->8->9->3->4->6 propagating
> that phi_inserted is 0 after the first iteration of the loop.  This looks like
> useful loop peeling oppurtunity which does not garble loop structure. So
> perhaps threading paths starting and passing loop latch (i.e. peeling) is
> sane? Perhaps all paths fully captured in the loop in question are?
Peeling like this has long been a point of contention -- it totally 
mucks things up like vectorizing.

The general issue that the threader knows nothing about the 
characteristics of the loop -- thus peeling is at this point is 
premature and just as likely to hinder performance as improve it.

I'm never been happy with how this aspect of threading vs loop opts 
turned out and we have open BZs related to this rats nest of issues.

jeff