From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by sourceware.org (Postfix) with ESMTPS id 3B524385782D; Tue, 15 Sep 2020 06:51:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3B524385782D Received: by mail-ej1-x642.google.com with SMTP id q13so3370773ejo.9; Mon, 14 Sep 2020 23:51:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DS3qjBDrS0fcCe+BV1HLZR5DTkeYlrS8zje0cMy2TeE=; b=QJES5cul/jXbJYNo9Srv/7FQ7Eld7nIB1FKmHfNF+9M9fbujWCAAXLsEhPreWMSAUC 0NSuIsh7Itf9jyuQMVLm6IWteuhalMCWXC3cqUqMubNxb7PRoV7p5xAnT6DRHn6ls6AM pk0mHI44XmqU1+rOymZUiTlqAuwb9/nJWbhcewlckoSegTTdym7JtDtwExAisHXGq0fX ovYMfuUQZhy+w4hr6ipOly6ya3dqdAhKyxpXlS9Pzj9buHR+UoS3Xo967ABchCvfdfWp piquT6csB+Klpcke4IrYuWI+GgRXmlagXMDQQK8PH1Wbec1ilH9KHmoBwUyO7wR87Enz NtEA== X-Gm-Message-State: AOAM531kCWz0HHKmIVBIW952XCPSueDdbryRzq0pqdGnGPZWf+f4rHDX redQL2r3yycW+o+NVFKYs4mFH8LzqSj6gagj0Zs= X-Google-Smtp-Source: ABdhPJyU7hCqpUGJUgaLGk/zm5bU+GfDllqMVUnQetZVyKIz+UiiBE97ISgPuivnwC8NZc3HkeiRb8LotpVxaNwRkSY= X-Received: by 2002:a17:906:91d3:: with SMTP id b19mr19075898ejx.235.1600152680260; Mon, 14 Sep 2020 23:51:20 -0700 (PDT) MIME-Version: 1.0 References: <20200904102357.GF28786@gate.crashing.org> <98b124ee-b32d-71f7-a662-e0ce2520de6a@linux.ibm.com> <20200909134739.GX28786@gate.crashing.org> <20200909160057.GZ28786@gate.crashing.org> <572b36a7-2d52-2f46-05bd-140e16794a61@linux.ibm.com> <4013a2e0-2fe2-b626-31ae-f7325a9be678@linux.ibm.com> In-Reply-To: <4013a2e0-2fe2-b626-31ae-f7325a9be678@linux.ibm.com> From: Richard Biener Date: Tue, 15 Sep 2020 08:51:09 +0200 Message-ID: Subject: Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251] To: luoxhu Cc: Segher Boessenkool , GCC Patches , David Edelsohn , Bill Schmidt , linkw@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2020 06:51:23 -0000 On Tue, Sep 15, 2020 at 5:56 AM luoxhu wrote: > > > > On 2020/9/14 17:47, Richard Biener wrote: > > On Mon, Sep 14, 2020 at 10:05 AM luoxhu wrote: > > >> Not sure whether this reflects the issues you discussed above. > >> I constructed below test cases and tested with and without this patch, > >> only if "a+c"(which means store only), the performance is getting bad with > >> this patch due to extra load/store(about 50% slower). > >> For "v = vec_insert (i, v, n);" usage, v is always loaded after store, so this > >> patch will always get big benefit. > >> But for "v[n % 4] = i;" usage, it depends on whether "v" is used immediately > >> inside the function or out of the function soon. Does this mean unify the two > >> usage to same gimple code not a good idea sometimes? Or is it possible to > >> detect the generated IFN ".VEC_INSERT (&v, i_4(D), _1);" destination "v" is > >> used not far away inside or out side of the function? > >> > >> > >> #define TYPE int > >> > >> vector TYPE v = {1, 2, 3, 4}; // a. global vector. > >> vector TYPE s = {4, 3, 2, 1}; > >> > >> __attribute__ ((noinline)) > >> vector TYPE > >> test (vector TYPE u, TYPE i, size_t n) > >> { > >> v[n % 4] = i; > > > > ^^^ > > > > this should be > > > > u[n % 4] = i; > > > > I guess. Is the % 4 mandated by the vec_insert semantics btw? > > Yes. Segher pasted the builtin description in his reply. "v = vec_insert (i, u, n);" > is a bit different with "u[n % 4] = i;" since it returns a copy of u instead of modify > u. The adjust in lower __builtin_vec_insert_xxx gimple will make a copy of u first to > meet the requirements. > > > > > If you tested with the above error you probably need to re-measure. > > No I did test for u as local instead of global before. If u is not used very > soon, the performance is almost the same for generating single store or IFN_SET > of inserting with variable index. > > source: > __attribute__ ((noinline)) vector TYPE > test (vector TYPE u, TYPE i, size_t n) > { > u[n % 4] = i; > vector TYPE r = {0}; > for (long k = 0; k < 100; k++) > { > r += v; > } > return u+r; > } > > => store hit load is relieved due to long distance. > > ASM: > 0: addis 2,12,.TOC.-.LCF0@ha > addi 2,2,.TOC.-.LCF0@l > .localentry test,.-test > addis 10,2,.LANCHOR0@toc@ha > li 8,50 > xxspltib 32,0 > addi 9,1,-16 > rldic 6,6,2,60 > stxv 34,-16(1) > addi 10,10,.LANCHOR0@toc@l > mtctr 8 > xxlor 33,32,32 > stwx 5,9,6 // short store > lxv 45,0(10) > .p2align 4,,15 > .L2: > vadduwm 0,0,13 > vadduwm 1,1,13 > bdnz .L2 > vadduwm 0,0,1 > lxv 33,-16(1) // wide load > vadduwm 2,0,1 > blr > > > Then I intend to use "v" there for global memory insert test to separate > the store and load into different function ("v" is stored in function, > but loaded out of function will get different performance for single store > or lxv+xxperm+xxsel+stxv, the inner function doesn't know the distance > between load and store across functions). > > Do you mean that we don't need generate IFN_SET if "v" is global memory? Yes. > I only see VAR_DECL and PARM_DECL, is there any function to check the tree > variable is global? I added DECL_REGISTER, but the RTL still expands to stack: is_global_var () or alternatively !auto_var_in_fn_p (), I think doing IFN_SET only makes sense if there's the chance we can promote the variable to a register. But it would be an incorrect transform (it stores the whole vector) if the vector storage could "escape" to another thread - which means you probably have to check !TREE_ADDRESSABLE as well. > > gcc/internal-fn.c: rtx to_rtx = expand_expr (view_op0, NULL_RTX, VOIDmode, EXPAND_WRITE); > > (gdb) p view_op0 > $584 = > (gdb) p DECL_REGISTER(view_op0) > $585 = 1 > (gdb) p to_rtx > $586 = (rtx_def *) (mem/c:V4SI (reg/f:DI 112 virtual-stack-vars) [1 D.3190+0 S16 A128]) > > > > > > On gimple the above function (after fixing it) looks like > > > > VIEW_CONVERT_EXPR(u)[_1] = i_4(D); > > > > and the IFN idea I had would - for non-global memory 'u' only - transform > > this to > > > > vector_register_2 = u; > > vector_register_3 = .IFN_VEC_SET (vector_register_2, _1, i_4(D)); > > u = vector_register_3; > > > > if vec_set can handle variable indexes. This then becomes a > > vec_set on a register and if that was the only variable indexing of 'u' > > will also cause 'u' to be expanded as register rather than stack memory. > > > > Note we can't use the direct-optab method here since the vec_set optab > > modifies operand 0 which isn't possible in SSA form. That might hint > > at that we eventually want to extend BIT_INSERT_EXPR to handle > > a non-constant bit position but for experiments using an alternate > > internal function is certainly easier. > > > > My current implementation does: > > 1) v = vec_insert (i, u, n); > > =>gimple: > { > register __vector signed int D.3190; > D.3190 = u; // *new decl and copy u first.* > _1 = n & 3; > VIEW_CONVERT_EXPR(D.3190)[_1] = i; // *update op0 of VIEW_CONVERT_EXPR* > _2 = D.3190; > ... > } > > =>isel: > { > register __vector signed int D.3190; > D.3190 = u_4(D); > _1 = n_6(D) & 3; > .VEC_SET (&D.3190, i_7(D), _1); why are you passing the address of D.3190 to .VEC_SET? That will not make D.3190 be expanded to a pseudo. You really need to have GIMPLE registers here (SSA name) and thus a return value, leaving the argument unmodified D.3190_3 = .VEC_SET (D.3190_4, i_7(D), _1); note this is why I asked about the actual CPU instruction - as I read Seghers mail the instruction modifies a vector register, not memory. > _2 = D.3190; > ... > } > > > 2) u[n % 4] = i; > > =>gimple: > { > __vector signed int D.3191; > _1 = n & 3; > VIEW_CONVERT_EXPR(u)[_1] = i; // *update op0 of VIEW_CONVERT_EXPR* > D.3191 = u; > ... > } > > =>isel: > { > D.3190 = u_4(D); > _1 = n_6(D) & 3; > .VEC_SET (&D.3191, i_7(D), _1); > _2 = D.3190; > v = _2; > } > > The IFN ".VEC_SET" behavior quite like the other IFN STORE functions and doesn't > require a dest operand to be set? Both 1) and 2) are modifying operand 0 of > VIEW_CONVERT_EXPR just like vec_set optab. > > Attached the IFN vec_set patch part, the expand part is moved from expr.c:expand_assignment > to internal-fn.c:expand_vec_set_optab_fn now. > > > Thanks, > Xionghu