From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sourceware.org (Postfix) with ESMTPS id 7BA55384403C for ; Fri, 7 Aug 2020 12:15:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7BA55384403C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=inria.fr Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=marc.glisse@inria.fr X-IronPort-AV: E=Sophos;i="5.75,445,1589234400"; d="scan'208";a="356131267" Received: from grove.saclay.inria.fr ([193.55.177.244]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 07 Aug 2020 14:15:32 +0200 Date: Fri, 7 Aug 2020 14:15:28 +0200 (CEST) From: Marc Glisse X-X-Sender: glisse@grove.saclay.inria.fr To: Richard Biener cc: Christophe Lyon , GCC Patches Subject: Re: VEC_COND_EXPR optimizations v2 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-1907453594-1596802532=:596504" X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Aug 2020 12:15:35 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-1907453594-1596802532=:596504 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII On Fri, 7 Aug 2020, Richard Biener wrote: > On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse wrote: >> >> On Fri, 7 Aug 2020, Richard Biener wrote: >> >>> On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse wrote: >>>> >>>> On Thu, 6 Aug 2020, Christophe Lyon wrote: >>>> >>>>>> Was I on the right track configuring with >>>>>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9 >>>>>> --with-fpu=neon-fp16 >>>>>> then compiling without any special option? >>>>> >>>>> Maybe you also need --with-float=hard, I don't remember if it's >>>>> implied by the 'hf' target suffix >>>> >>>> Thanks! That's what I was missing to reproduce the issue. Now I can >>>> reproduce it with just >>>> >>>> typedef unsigned int vec __attribute__((vector_size(16))); >>>> typedef int vi __attribute__((vector_size(16))); >>>> vi f(vec a,vec b){ >>>> return a==5 | b==7; >>>> } >>>> >>>> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 -O1 >>>> >>>> _1 = a_5(D) == { 5, 5, 5, 5 }; >>>> _3 = b_6(D) == { 7, 7, 7, 7 }; >>>> _9 = _1 | _3; >>>> _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107); >>>> >>>> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns >>>> false), while with -fdisable-tree-forwprop4 we do manage to expand >>>> >>>> _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 112); >>>> >>>> It doesn't make much sense to me that we can expand the more complicated >>>> form and not the simpler form of the same operation (both compare a to 5 >>>> and produce a vector of -1 or 0 of the same size), especially when the >>>> target has an instruction (vceq) that does just what we want. >>>> >>>> Introducing boolean vectors was fine, but I think they should be real >>>> types, that we can operate on, not be forced to appear only as the first >>>> argument of a vcond. >>>> >>>> I can think of 2 natural ways to improve things: either implement vector >>>> comparisons in the ARM backend (possibly by forwarding to their existing >>>> code for vcond), or in the generic expansion code try using vcond if the >>>> direct comparison opcode is not provided. >>>> >>>> We can temporarily revert my patch, but I would like it to be temporary. >>>> Since aarch64 seems to handle the same code just fine, maybe someone who >>>> knows arm could copy the relevant code over? >>>> >>>> Does my message make sense, do people have comments? >>> >>> So what complicates things now (and to some extent pre-existed when you >>> used AVX512 which _could_ operate on boolean vectors) is that we >>> have split out the condition from VEC_COND_EXPR to separate stmts >>> but we do not expect backends to be able to code-generate the separate >>> form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs >>> to .VCOND[U] "merging" the compares again. Now that process breaks >>> down once we have things like _9 = _1 | _3; - at some point I argued >>> that we should handle vector compares [and operations on boolean vectors] >>> as well in ISEL but then when it came up again for some reason I >>> disregarded that again. >>> >>> Thus - we don't want to go back to fixing up the generic expansion code >>> (which looks at one instruction at a time and is restricted by TER single-use >>> restrictions). >> >> Here, it would only be handling comparisons left over by ISEL because they >> do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would >> be more consistent, but then we may have to teach the vector lowering pass >> about this. >> >>> Instead we want to deal with this in ISEL which should >>> behave more intelligently. In the above case it might involve turning >>> the _1 and _3 defs into .VCOND [with different result type], doing >>> _9 in that type and then somehow dealing with _7 ... but this eventually >>> means undoing the match simplification that introduced the code? >> >> For targets that do not have compact boolean vectors, >> VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality. >> Removing it to allow more simplifications makes sense, reintroducing it >> for expansion can make sense as well, I think it is ok if the second one >> reverses the first, but very locally, without having to propagate a change >> of type to the uses. So on ARM we could turn _1 and _3 into .VCOND >> producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or >> does using bool vectors like that seem bad? (maybe they aren't guaranteed >> to be equivalent to signed integer vectors with values 0 and -1 and we >> need to query the target to know if that is the case, or introduce an >> extra .VCOND) >> >> Also, we only want to replace comparisons with .VCOND if the target >> doesn't handle the comparison directly. For AVX512, we do want to produce >> SImode bool vectors (for k* registers) and operate on them directly, >> that's the whole point of introducing the bool vectors (if bool vectors >> were only used to feed VEC_COND_EXPR and were all turned into .VCOND >> before expansion, I don't see the point of specifying different types for >> bool vectors for AVX512 vs non-AVX512, as it would make no difference on >> what is passed to the backend). >> >> I think targets should provide some number of operations, including for >> instance bit_and and bit_ior on bool vectors, and be a bit consistent >> about what they provide, it becomes unmanageable in the middle-end >> otherwise... > > It's currently somewhat of a mess I guess. It might help to > restrict the simplification patterns to passes before vector lowering > so maybe add something like (or re-use) > canonicalize_math[_after_vectorization]_p ()? That's indeed a nice way to gain time to sort out the mess :-) This patch solved the issue on ARM for at least 1 testcase. I have a bootstrap+regtest running on x86_64-linux-gnu, at least the tests added in the previous patch still work. 2020-08-05 Marc Glisse * generic-match-head.c (optimize_vectors_before_lowering_p): New function. * gimple-match-head.c (optimize_vectors_before_lowering_p): Likewise. * match.pd ((v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Use it. -- Marc Glisse --8323329-1907453594-1596802532=:596504 Content-Type: TEXT/x-diff; name=lvec.patch Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: inline; filename=lvec.patch ZGlmZiAtLWdpdCBhL2djYy9nZW5lcmljLW1hdGNoLWhlYWQuYyBiL2djYy9n ZW5lcmljLW1hdGNoLWhlYWQuYw0KaW5kZXggMjQ1NGJhYWM5ZDQuLmZkYjUy OGQ5Njg2IDEwMDY0NA0KLS0tIGEvZ2NjL2dlbmVyaWMtbWF0Y2gtaGVhZC5j DQorKysgYi9nY2MvZ2VuZXJpYy1tYXRjaC1oZWFkLmMNCkBAIC04MCw2ICs4 MCwxNiBAQCBjYW5vbmljYWxpemVfbWF0aF9hZnRlcl92ZWN0b3JpemF0aW9u X3AgKCkNCiAgIHJldHVybiBmYWxzZTsNCiB9DQogDQorLyogUmV0dXJuIHRy dWUgaWYgd2UgY2FuIHN0aWxsIHBlcmZvcm0gdHJhbnNmb3JtYXRpb25zIHRo YXQgbWF5IGludHJvZHVjZQ0KKyAgIHZlY3RvciBvcGVyYXRpb25zIHRoYXQg YXJlIG5vdCBzdXBwb3J0ZWQgYnkgdGhlIHRhcmdldC4gVmVjdG9yIGxvd2Vy aW5nDQorICAgbm9ybWFsbHkgaGFuZGxlcyB0aG9zZSwgYnV0IGFmdGVyIHRo YXQgcGFzcywgaXQgYmVjb21lcyB1bnNhZmUuICAqLw0KKw0KK3N0YXRpYyBp bmxpbmUgYm9vbA0KK29wdGltaXplX3ZlY3RvcnNfYmVmb3JlX2xvd2VyaW5n X3AgKCkNCit7DQorICByZXR1cm4gdHJ1ZTsNCit9DQorDQogLyogUmV0dXJu IHRydWUgaWYgc3VjY2Vzc2l2ZSBkaXZpc2lvbnMgY2FuIGJlIG9wdGltaXpl ZC4NCiAgICBEZWZlciB0byBHSU1QTEUgb3B0cy4gICovDQogDQpkaWZmIC0t Z2l0IGEvZ2NjL2dpbXBsZS1tYXRjaC1oZWFkLmMgYi9nY2MvZ2ltcGxlLW1h dGNoLWhlYWQuYw0KaW5kZXggOWIzZTcyOThkODcuLjRhNjViZTcwM2I5IDEw MDY0NA0KLS0tIGEvZ2NjL2dpbXBsZS1tYXRjaC1oZWFkLmMNCisrKyBiL2dj Yy9naW1wbGUtbWF0Y2gtaGVhZC5jDQpAQCAtMTE1OCw2ICsxMTU4LDE2IEBA IGNhbm9uaWNhbGl6ZV9tYXRoX2FmdGVyX3ZlY3Rvcml6YXRpb25fcCAoKQ0K ICAgcmV0dXJuICFjZnVuIHx8IChjZnVuLT5jdXJyX3Byb3BlcnRpZXMgJiBQ Uk9QX2dpbXBsZV9sdmVjKSAhPSAwOw0KIH0NCiANCisvKiBSZXR1cm4gdHJ1 ZSBpZiB3ZSBjYW4gc3RpbGwgcGVyZm9ybSB0cmFuc2Zvcm1hdGlvbnMgdGhh dCBtYXkgaW50cm9kdWNlDQorICAgdmVjdG9yIG9wZXJhdGlvbnMgdGhhdCBh cmUgbm90IHN1cHBvcnRlZCBieSB0aGUgdGFyZ2V0LiBWZWN0b3IgbG93ZXJp bmcNCisgICBub3JtYWxseSBoYW5kbGVzIHRob3NlLCBidXQgYWZ0ZXIgdGhh dCBwYXNzLCBpdCBiZWNvbWVzIHVuc2FmZS4gICovDQorDQorc3RhdGljIGlu bGluZSBib29sDQorb3B0aW1pemVfdmVjdG9yc19iZWZvcmVfbG93ZXJpbmdf cCAoKQ0KK3sNCisgIHJldHVybiAhY2Z1biB8fCAoY2Z1bi0+Y3Vycl9wcm9w ZXJ0aWVzICYgUFJPUF9naW1wbGVfbHZlYykgPT0gMDsNCit9DQorDQogLyog UmV0dXJuIHRydWUgaWYgcG93KGNzdCwgeCkgc2hvdWxkIGJlIG9wdGltaXpl ZCBpbnRvIGV4cChsb2coY3N0KSAqIHgpLg0KICAgIEFzIGEgd29ya2Fyb3Vu ZCBmb3IgU1BFQyBDUFUyMDE3IDYyOC5wb3AyX3MsIGRvbid0IGRvIGl0IGlm IGFyZzANCiAgICBpcyBhbiBleGFjdCBpbnRlZ2VyLCBhcmcxID0gcGhpX3Jl cyArLy0gY3N0MSBhbmQgcGhpX3JlcyA9IFBISSA8Y3N0MiwgLi4uPg0KZGlm ZiAtLWdpdCBhL2djYy9tYXRjaC5wZCBiL2djYy9tYXRjaC5wZA0KaW5kZXgg ZDhlMzkyN2QzYzcuLjdlNWM1YTZlYWU2IDEwMDY0NA0KLS0tIGEvZ2NjL21h dGNoLnBkDQorKysgYi9nY2MvbWF0Y2gucGQNCkBAIC0zNDYxLDQwICszNDYx LDQyIEBAIERFRklORV9JTlRfQU5EX0ZMT0FUX1JPVU5EX0ZOIChSSU5UKQ0K ICAgKHZlY19jb25kIEAwIChvcCEgQDMgQDEpIChvcCEgQDMgQDIpKSkpDQog I2VuZGlmDQogDQotLyogKHYgPyB3IDogMCkgPyBhIDogYiBpcyBqdXN0ICh2 ICYgdykgPyBhIDogYiAgKi8NCisvKiAodiA/IHcgOiAwKSA/IGEgOiBiIGlz IGp1c3QgKHYgJiB3KSA/IGEgOiBiDQorICAgQ3VycmVudGx5IGRpc2FibGVk IGFmdGVyIHBhc3MgbHZlYyBiZWNhdXNlIEFSTSB1bmRlcnN0YW5kcw0KKyAg IFZFQ19DT05EX0VYUFI8dj09dywtMSwwPiBidXQgbm90IGEgcGxhaW4gdj09 dyBmZWQgdG8gQklUX0lPUl9FWFBSLiAgKi8NCiAoc2ltcGxpZnkNCiAgKHZl Y19jb25kICh2ZWNfY29uZDpzIEAwIEAzIGludGVnZXJfemVyb3ApIEAxIEAy KQ0KLSAoaWYgKHR5cGVzX21hdGNoIChAMCwgQDMpKQ0KKyAoaWYgKG9wdGlt aXplX3ZlY3RvcnNfYmVmb3JlX2xvd2VyaW5nX3AgKCkgJiYgdHlwZXNfbWF0 Y2ggKEAwLCBAMykpDQogICAodmVjX2NvbmQgKGJpdF9hbmQgQDAgQDMpIEAx IEAyKSkpDQogKHNpbXBsaWZ5DQogICh2ZWNfY29uZCAodmVjX2NvbmQ6cyBA MCBpbnRlZ2VyX2FsbF9vbmVzcCBAMykgQDEgQDIpDQotIChpZiAodHlwZXNf bWF0Y2ggKEAwLCBAMykpDQorIChpZiAob3B0aW1pemVfdmVjdG9yc19iZWZv cmVfbG93ZXJpbmdfcCAoKSAmJiB0eXBlc19tYXRjaCAoQDAsIEAzKSkNCiAg ICh2ZWNfY29uZCAoYml0X2lvciBAMCBAMykgQDEgQDIpKSkNCiAoc2ltcGxp ZnkNCiAgKHZlY19jb25kICh2ZWNfY29uZDpzIEAwIGludGVnZXJfemVyb3Ag QDMpIEAxIEAyKQ0KLSAoaWYgKHR5cGVzX21hdGNoIChAMCwgQDMpKQ0KKyAo aWYgKG9wdGltaXplX3ZlY3RvcnNfYmVmb3JlX2xvd2VyaW5nX3AgKCkgJiYg dHlwZXNfbWF0Y2ggKEAwLCBAMykpDQogICAodmVjX2NvbmQgKGJpdF9pb3Ig QDAgKGJpdF9ub3QgQDMpKSBAMiBAMSkpKQ0KIChzaW1wbGlmeQ0KICAodmVj X2NvbmQgKHZlY19jb25kOnMgQDAgQDMgaW50ZWdlcl9hbGxfb25lc3ApIEAx IEAyKQ0KLSAoaWYgKHR5cGVzX21hdGNoIChAMCwgQDMpKQ0KKyAoaWYgKG9w dGltaXplX3ZlY3RvcnNfYmVmb3JlX2xvd2VyaW5nX3AgKCkgJiYgdHlwZXNf bWF0Y2ggKEAwLCBAMykpDQogICAodmVjX2NvbmQgKGJpdF9hbmQgQDAgKGJp dF9ub3QgQDMpKSBAMiBAMSkpKQ0KIA0KIC8qIGMxID8gYzIgPyBhIDogYiA6 IGIgIC0tPiAgKGMxICYgYzIpID8gYSA6IGIgICovDQogKHNpbXBsaWZ5DQog ICh2ZWNfY29uZCBAMCAodmVjX2NvbmQ6cyBAMSBAMiBAMykgQDMpDQotIChp ZiAodHlwZXNfbWF0Y2ggKEAwLCBAMSkpDQorIChpZiAob3B0aW1pemVfdmVj dG9yc19iZWZvcmVfbG93ZXJpbmdfcCAoKSAmJiB0eXBlc19tYXRjaCAoQDAs IEAxKSkNCiAgICh2ZWNfY29uZCAoYml0X2FuZCBAMCBAMSkgQDIgQDMpKSkN CiAoc2ltcGxpZnkNCiAgKHZlY19jb25kIEAwIEAyICh2ZWNfY29uZDpzIEAx IEAyIEAzKSkNCi0gKGlmICh0eXBlc19tYXRjaCAoQDAsIEAxKSkNCisgKGlm IChvcHRpbWl6ZV92ZWN0b3JzX2JlZm9yZV9sb3dlcmluZ19wICgpICYmIHR5 cGVzX21hdGNoIChAMCwgQDEpKQ0KICAgKHZlY19jb25kIChiaXRfaW9yIEAw IEAxKSBAMiBAMykpKQ0KIChzaW1wbGlmeQ0KICAodmVjX2NvbmQgQDAgKHZl Y19jb25kOnMgQDEgQDIgQDMpIEAyKQ0KLSAoaWYgKHR5cGVzX21hdGNoIChA MCwgQDEpKQ0KKyAoaWYgKG9wdGltaXplX3ZlY3RvcnNfYmVmb3JlX2xvd2Vy aW5nX3AgKCkgJiYgdHlwZXNfbWF0Y2ggKEAwLCBAMSkpDQogICAodmVjX2Nv bmQgKGJpdF9pb3IgKGJpdF9ub3QgQDApIEAxKSBAMiBAMykpKQ0KIChzaW1w bGlmeQ0KICAodmVjX2NvbmQgQDAgQDMgKHZlY19jb25kOnMgQDEgQDIgQDMp KQ0KLSAoaWYgKHR5cGVzX21hdGNoIChAMCwgQDEpKQ0KKyAoaWYgKG9wdGlt aXplX3ZlY3RvcnNfYmVmb3JlX2xvd2VyaW5nX3AgKCkgJiYgdHlwZXNfbWF0 Y2ggKEAwLCBAMSkpDQogICAodmVjX2NvbmQgKGJpdF9hbmQgKGJpdF9ub3Qg QDApIEAxKSBAMiBAMykpKQ0KIA0KIC8qIFNpbXBsaWZpY2F0aW9uIG1vdmVk IGZyb20gZm9sZF9jb25kX2V4cHJfd2l0aF9jb21wYXJpc29uLiAgSXQgbWF5 IGFsc28NCg== --8323329-1907453594-1596802532=:596504--