From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8002 invoked by alias); 6 Nov 2019 12:16:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7986 invoked by uid 89); 6 Nov 2019 12:16:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-lj1-f194.google.com Received: from mail-lj1-f194.google.com (HELO mail-lj1-f194.google.com) (209.85.208.194) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Nov 2019 12:16:14 +0000 Received: by mail-lj1-f194.google.com with SMTP id q2so19273561ljg.7 for ; Wed, 06 Nov 2019 04:16:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CygTZ/HpZKTjHM/tUZsU3+zJaKRQ1s+fSxI4oy+wcSg=; b=XfIjahQ+BXCyi2DDxI6abiaIUdpLwDF8u1rAfzdjc9TX95W2fnERNql7YUXgSMTgih urlbq2qLEltKwrZP2gF67k6g9bP3/lp94+9ocIlLGbAIEyK3ipeIfO1082XYbTiOboEL hyedPU+O0/dSmtnym2QvZQZPoXFFwEdlKCgfmWbX6p/bxtM2rh4v6mRq86KGx22dLFDD T+X/Mv46rilZogTLUBxRJZWUg5QLM6Ob2PHvz/aqxc8MKdwtAXthfzSGBqYwWuzNimb7 nSthn51Y5INiry8ellQ76cQR+j/W/gdgQHKGCceHtEiHNZJQGkfbc6RELOpwU+upfz2K es/A== MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 06 Nov 2019 12:16:00 -0000 Message-ID: Subject: Re: [5/6] Account for the cost of generating loop masks To: Richard Sandiford Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg00385.txt.bz2 On Tue, Nov 5, 2019 at 3:31 PM Richard Sandiford wrote: > > We didn't take the cost of generating loop masks into account, and so > tended to underestimate the cost of loops that need multiple masks. OK. > > 2019-11-05 Richard Sandiford > > gcc/ > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Include > the cost of generating loop masks. > > gcc/testsuite/ > * gcc.target/aarch64/sve/mask_struct_store_3.c: Add > -fno-vect-cost-model. > * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise. > * gcc.target/aarch64/sve/peel_ind_3.c: Likewise. > * gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise. > > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2019-11-05 14:19:58.781197820 +0000 > +++ gcc/tree-vect-loop.c 2019-11-05 14:20:40.188909187 +0000 > @@ -3435,6 +3435,32 @@ vect_estimate_min_profitable_iters (loop > si->kind, si->stmt_info, si->misalign, > vect_epilogue); > } > + > + /* Calculate how many masks we need to generate. */ > + unsigned int num_masks = 0; > + rgroup_masks *rgm; > + unsigned int num_vectors_m1; > + FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), num_vectors_m1, rgm) > + if (rgm->mask_type) > + num_masks += num_vectors_m1 + 1; > + gcc_assert (num_masks > 0); > + > + /* In the worst case, we need to generate each mask in the prologue > + and in the loop body. One of the loop body mask instructions > + replaces the comparison in the scalar loop, and since we don't > + count the scalar comparison against the scalar body, we shouldn't > + count that vector instruction against the vector body either. > + > + Sometimes we can use unpacks instead of generating prologue > + masks and sometimes the prologue mask will fold to a constant, > + so the actual prologue cost might be smaller. However, it's > + simpler and safer to use the worst-case cost; if this ends up > + being the tie-breaker between vectorizing or not, then it's > + probably better not to vectorize. */ > + (void) add_stmt_cost (target_cost_data, num_masks, vector_stmt, > + NULL, 0, vect_prologue); > + (void) add_stmt_cost (target_cost_data, num_masks - 1, vector_stmt, > + NULL, 0, vect_body); > } > else if (npeel < 0) > { > Index: gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c 2019-03-08 18:14:29.768994780 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c 2019-11-05 14:20:40.184909216 +0000 > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */ > +/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */ > > #include > > Index: gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3_run.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3_run.c 2019-03-08 18:14:29.772994767 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3_run.c 2019-11-05 14:20:40.184909216 +0000 > @@ -1,5 +1,5 @@ > /* { dg-do run { target aarch64_sve_hw } } */ > -/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */ > +/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */ > > #include "mask_struct_store_3.c" > > Index: gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c 2019-03-08 18:14:29.776994751 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c 2019-11-05 14:20:40.184909216 +0000 > @@ -1,7 +1,7 @@ > /* { dg-do compile } */ > /* Pick an arbitrary target for which unaligned accesses are more > expensive. */ > -/* { dg-options "-O3 -msve-vector-bits=256 -mtune=thunderx" } */ > +/* { dg-options "-O3 -msve-vector-bits=256 -mtune=thunderx -fno-vect-cost-model" } */ > > #define N 32 > #define MAX_START 8 > Index: gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3_run.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3_run.c 2019-03-08 18:14:29.784994721 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3_run.c 2019-11-05 14:20:40.184909216 +0000 > @@ -1,6 +1,6 @@ > /* { dg-do run { target aarch64_sve_hw } } */ > -/* { dg-options "-O3 -mtune=thunderx" } */ > -/* { dg-options "-O3 -mtune=thunderx -msve-vector-bits=256" { target aarch64_sve256_hw } } */ > +/* { dg-options "-O3 -mtune=thunderx -fno-vect-cost-model" } */ > +/* { dg-options "-O3 -mtune=thunderx -msve-vector-bits=256 -fno-vect-cost-model" { target aarch64_sve256_hw } } */ > > #include "peel_ind_3.c" >