From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 83222 invoked by alias); 15 Oct 2019 12:11:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 83212 invoked by uid 89); 15 Oct 2019 12:11:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=reductions, trees X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 15 Oct 2019 12:11:41 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 995BCB2CF; Tue, 15 Oct 2019 12:11:38 +0000 (UTC) Date: Tue, 15 Oct 2019 12:17:00 -0000 From: Richard Biener To: Joel Hutton cc: GCC Patches , nd Subject: Re: [SLP] SLP vectorization: vectorize vector constructors In-Reply-To: <5edb0b00-4ae2-41c0-80ec-76de15d0b110@arm.com> Message-ID: References: <5edb0b00-4ae2-41c0-80ec-76de15d0b110@arm.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1609908220-274247742-1571141498=:5566" X-SW-Source: 2019-10/txt/msg01082.txt.bz2 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609908220-274247742-1571141498=:5566 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-length: 10304 On Fri, 11 Oct 2019, Joel Hutton wrote: > Hi Richard, > > Thanks for your help, I've reworked my SLP RFC based on your feedback. > > I think a better place for the loop searching for CONSTRUCTORs is > > vect_slp_analyze_bb_1 where I'd put it before the check you remove, > > and I'd simply append found CONSTRUCTORs to the grouped_stores > > array > I've moved this check into a separate function and called it from > vect_slp_analyze_bb_1 > > > The fixup you do in vectorizable_operation doesn't > > belong there either, I'd add a new field to the SLP instance > > structure refering to the CONSTRUCTOR stmt and do the fixup > > in vect_schedule_slp_instance instead where you can simply > > replace the CONSTRUCTOR with the vectorized SSA name then. > > Done. > > > +           /* Check that the constructor elements are unique.  */ > > +           FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) > > +             { > > +               tree prev_val; > > +               int j; > > +               FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, > > prev_val) > > +               { > > +                 if (val == prev_val && i!=j) > > > > why's that necessary? (it looks incomplete, also doesn't catch > > [duplicate] constants) > > The thinking was that there was no benefit in vectorizing a constructor of > duplicates, or a vector of constants, although now you mention it that > thinking > may be flawed. I've removed it > > > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS > > (we can have omitted trailing zeros). ^^^ I don't see this being handled? You give up on non-SSA names but not on the omitted trailing zeros. > ... > > What happens if you have a vector constructor that is twice > > as large as the machine supports?  The vectorizer will happily > > produce a two vector SSA name vectorized result but your > > CONSTRUCTOR replacement doesn't work here.  I think this should > > be made work correctly (not give up on that case). > > I've reworked the patch to account for this, by checking that the > vectorized version > has one vectorized stmt at the root of the SLP tree. I'm not sure how to > handle > a vector constructor twice as large as the machine supports, as far as I > can see, > when only analyzing a within a basic block, the SSA name of the > constructor has to > be maintained. You build a CONSTRUCTOR of vectors, thus _orig_ssa_1 = { vect_part1_2, vect_part2_3, ... }; + + if (constructor) + { + SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt; + } + else + SLP_INSTANCE_ROOT_STMT (new_instance) = NULL; + too much vertical space, no {} around single-stmt if clauses @@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb, stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt); if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) { + /* Check this is not a constructor that will be vectorized + away. */ + if (BB_VINFO_GROUPED_STORES (vinfo).contains (use_stmt_info)) + continue; hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead? In theory the stmt should be part of the SLP tree itself but that's probably too awkward to be made work at the moment ;) vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new functions which need comments. + /* For vector constructors, the same SSA name must be used to maintain data + flow into other basic blocks. */ + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) + && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1 + && SLP_TREE_VEC_STMTS (node).exists ()) + { it should read /* Vectorize the instance root. */ and be in vect_schedule_slp after the vect_schedule_slp_instance. As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1, you also cannot simply do "nothing" here, "failing" vectorization (well, you probably can - DCE will remove the vectorized code - but at least if there were calls in the SLP tree they will be mangled by vectorization so the scalar code is wrecked). SO it should be if (SLP_INSTANCE_ROOT_STMT (instance)) .. you may not fail to replace the scalar root stmt here .. + if (CONSTRUCTOR_NELTS (rhs) == 0) + vectorizable = false; + if you use continue; you can elide the 'vectorizable' variable. + if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt))) + vectorizable = false; + why's that? no comments that clarify ;) The vector may be used as argument to a call or as source of a store. So I'd simply remove this check (and the function). Thanks, Richard. > Currently SLP vectorization can build SLP trees starting from reductions or > from group stores. This patch adds a third starting point: vector > constructors. > > For the following test case (compiled with -O3): > > char g_d[1024], g_s1[1024], g_s2[1024]; > void test_loop(void) > { >   char d = g_d, s1 = g_s1, *s2 = g_s2; >   for ( int y = 0; y < 128; y++ ) > >   { >     for ( int x = 0; x < 16; x++ ) >       d[x] = s1[x] + s2[x]; >     d += 16; >   } > } > > before patch: > test_loop: > .LFB0: >         .cfi_startproc >         adrp    x0, g_s1 >         adrp    x2, g_s2 >         add     x3, x0, :lo12:g_s1 >         add     x4, x2, :lo12:g_s2 >         ldrb    w7, [x2, #:lo12:g_s2] >         ldrb    w1, [x0, #:lo12:g_s1] >         adrp    x0, g_d >         ldrb    w6, [x4, 1] >         add     x0, x0, :lo12:g_d >         ldrb    w5, [x3, 1] >         add     w1, w1, w7 >         fmov    s0, w1 >         ldrb    w7, [x4, 2] >         add     w5, w5, w6 >         ldrb    w1, [x3, 2] >         ldrb    w6, [x4, 3] >         add     x2, x0, 2048 >         ins     v0.b[1], w5 >         add     w1, w1, w7 >         ldrb    w7, [x3, 3] >         ldrb    w5, [x4, 4] >         add     w7, w7, w6 >         ldrb    w6, [x3, 4] >         ins     v0.b[2], w1 >         ldrb    w8, [x4, 5] >         add     w6, w6, w5 >         ldrb    w5, [x3, 5] >         ldrb    w9, [x4, 6] >         add     w5, w5, w8 >         ldrb    w1, [x3, 6] >         ins     v0.b[3], w7 >         ldrb    w8, [x4, 7] >         add     w1, w1, w9 >         ldrb    w11, [x3, 7] >         ldrb    w7, [x4, 8] >         add     w11, w11, w8 >         ldrb    w10, [x3, 8] >         ins     v0.b[4], w6 >         ldrb    w8, [x4, 9] >         add     w10, w10, w7 >         ldrb    w9, [x3, 9] >         ldrb    w7, [x4, 10] >         add     w9, w9, w8 >         ldrb    w8, [x3, 10] >         ins     v0.b[5], w5 >         ldrb    w6, [x4, 11] >         add     w8, w8, w7 >         ldrb    w7, [x3, 11] >         ldrb    w5, [x4, 12] >         add     w7, w7, w6 >         ldrb    w6, [x3, 12] >         ins     v0.b[6], w1 >         ldrb    w12, [x4, 13] >         add     w6, w6, w5 >         ldrb    w5, [x3, 13] >         ldrb    w1, [x3, 14] >         add     w5, w5, w12 >         ldrb    w13, [x4, 14] >         ins     v0.b[7], w11 >         ldrb    w12, [x4, 15] >         add     w4, w1, w13 >         ldrb    w1, [x3, 15] >         add     w1, w1, w12 >         ins     v0.b[8], w10 >         ins     v0.b[9], w9 >         ins     v0.b[10], w8 >         ins     v0.b[11], w7 >         ins     v0.b[12], w6 >         ins     v0.b[13], w5 >         ins     v0.b[14], w4 >         ins     v0.b[15], w1 >         .p2align 3,,7 > .L2: >         str     q0, [x0], 16 >         cmp     x2, x0 >         bne     .L2 >         ret >         .cfi_endproc > .LFE0: > > After patch: > > test_loop: > .LFB0: >         .cfi_startproc >         adrp    x3, g_s1 >         adrp    x2, g_s2 >         add     x3, x3, :lo12:g_s1 >         add     x2, x2, :lo12:g_s2 >         adrp    x0, g_d >         add     x0, x0, :lo12:g_d >         add     x1, x0, 2048 >         ldr     q1, [x2] >         ldr     q0, [x3] >         add     v0.16b, v0.16b, v1.16b >         .p2align 3,,7 > .L2: >         str     q0, [x0], 16 >         cmp     x0, x1 >         bne     .L2 >         ret >         .cfi_endproc > .LFE0: > > > 2019-10-11  Joel Hutton  Joel.Hutton@arm.com > >     * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector > constructors. >     (vect_bb_slp_scalar_cost): Likewise. >     (vect_ssa_use_outside_bb): New function. >     (vect_slp_check_for_constructors): New function. >     (vect_slp_analyze_bb_1): Add check for vector constructors. >     (vect_schedule_slp_instance): Add case to fixup vector constructor > stmt. >     * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. > > > gcc/testsuite/ChangeLog: > > 2019-10-11  Joel Hutton  Joel.Hutton@arm.com > >     * gcc.dg/vect/bb-slp-40.c: New test. > > bootstrapped and regression tested on aarch64-none-linux-gnu > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ---1609908220-274247742-1571141498=:5566--