From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 93762 invoked by alias); 14 Sep 2016 16:43:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 93750 invoked by uid 89); 14 Sep 2016 16:43:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=BinChengarmcom, Bin.Cheng@arm.com, bin.cheng@arm.com, binchengarmcom X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Sep 2016 16:43:46 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5CDAEA0BF5; Wed, 14 Sep 2016 16:43:45 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-109.phx2.redhat.com [10.3.116.109]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8EGhhj4011289; Wed, 14 Sep 2016 12:43:44 -0400 Subject: Re: [PATCH GCC 6/9]Simplify control flow graph for vectorized loop To: Richard Biener , Bin Cheng References: Cc: "gcc-patches@gcc.gnu.org" , nd From: Jeff Law Message-ID: <40df8fb3-0047-0095-507e-80273b331940@redhat.com> Date: Wed, 14 Sep 2016 16:52:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-09/txt/msg00853.txt.bz2 On 09/14/2016 07:21 AM, Richard Biener wrote: > On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng wrote: >> Hi, >> This is the main patch improving control flow graph for vectorized loop. It generally rewrites loop peeling stuff in vectorizer. As described in patch, for a typical loop to be vectorized like: >> >> preheader: >> LOOP: >> header_bb: >> loop_body >> if (exit_loop_cond) goto exit_bb >> else goto header_bb >> exit_bb: >> >> This patch peels prolog and epilog from the loop, adds guards skipping PROLOG and EPILOG for various conditions. As a result, the changed CFG would look like: >> >> guard_bb_1: >> if (prefer_scalar_loop) goto merge_bb_1 >> else goto guard_bb_2 >> >> guard_bb_2: >> if (skip_prolog) goto merge_bb_2 >> else goto prolog_preheader >> >> prolog_preheader: >> PROLOG: >> prolog_header_bb: >> prolog_body >> if (exit_prolog_cond) goto prolog_exit_bb >> else goto prolog_header_bb >> prolog_exit_bb: >> >> merge_bb_2: >> >> vector_preheader: >> VECTOR LOOP: >> vector_header_bb: >> vector_body >> if (exit_vector_cond) goto vector_exit_bb >> else goto vector_header_bb >> vector_exit_bb: >> >> guard_bb_3: >> if (skip_epilog) goto merge_bb_3 >> else goto epilog_preheader >> >> merge_bb_1: >> >> epilog_preheader: >> EPILOG: >> epilog_header_bb: >> epilog_body >> if (exit_epilog_cond) goto merge_bb_3 >> else goto epilog_header_bb >> >> merge_bb_3: >> >> >> Note this patch peels prolog and epilog only if it's necessary, as well as adds different guard_conditions/branches. Also the first guard/branch could be further improved by merging it with loop versioning. >> >> Before this patch, up to 4 branch instructions need to be executed before the vectorized loop is reached in the worst case, while the number is reduced to 2 with this patch. The patch also does better in compile time analysis to avoid unnecessary peeling/branching. >> From implementation's point of view, vectorizer needs to update induction variables and iteration bounds along with control flow changes. Unfortunately, it also becomes much harder to follow because slpeel_* functions updates SSA by itself, rather than using update_ssa interface. This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes. This should make the implementation easier to read, and I think it maybe a step forward to replace slpeel_* functions with generic GIMPLE loop copy interfaces as Richard suggested. > > I've skimmed over the patch and it looks reasonable to me. THanks. I was maybe 15% of the way through the main patch. Nothing that gave me cause for concern, but I wasn't ready to ACK it myself yet. jeff