From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 68FC2385700A for ; Wed, 27 Jul 2022 11:37:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 68FC2385700A Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 5F8672059B; Wed, 27 Jul 2022 11:37:53 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 574272C141; Wed, 27 Jul 2022 11:37:53 +0000 (UTC) Date: Wed, 27 Jul 2022 11:37:53 +0000 (UTC) From: Richard Biener To: "Andre Vieira (lists)" cc: "gcc-patches@gcc.gnu.org" , Richard Sandiford , pinskia@gmail.com Subject: Re: [RFC] Teach vectorizer to deal with bitfield reads In-Reply-To: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com> Message-ID: References: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2022 11:37:55 -0000 On Tue, 26 Jul 2022, Andre Vieira (lists) wrote: > Hi, > > This is a RFC for my prototype for bitfield read vectorization. This patch > enables bit-field read vectorization by removing the rejection of bit-field > read's during DR analysis and by adding two vect patterns. The first one > transforms TREE_COMPONENT's with BIT_FIELD_DECL's into BIT_FIELD_REF's, this > is a temporary one as I believe there are plans to do this lowering earlier in > the compiler. To avoid having to wait for that to happen we decided to > implement this temporary vect pattern. > The second one looks for conversions of the result of BIT_FIELD_REF's from a > 'partial' type to a 'full-type' and transforms it into a 'full-type' load > followed by the necessary shifting and masking. > > The patch is not perfect, one thing I'd like to change for instance is the way > the 'full-type' load is represented. I currently abuse the fact that the > vectorizer transforms the original TREE_COMPONENT with a BIT_FIELD_DECL into a > full-type vector load, because it uses the smallest mode necessary for that > precision. The reason I do this is because I was struggling to construct a > MEM_REF that the vectorizer would accept and this for some reason seemed to > work ... I'd appreciate some pointers on how to do this properly :) > > Another aspect that I haven't paid much attention to yet is costing, I've > noticed some testcases fail to vectorize due to costing where I think it might > be wrong, but like I said, I haven't paid much attention to it. > > Finally another aspect I'd eventually like to tackle is the sinking of the > masking when possible, for instance in bit-field-read-3.c the 'masking' does > not need to be inside the loop because we are doing bitwise operations. Though > I suspect we are better off looking at things like this elsewhere, maybe where > we do codegen for the reduction... Haven't looked at this at all yet. > > Let me know if you believe this is a good approach? I've ran regression tests > and this hasn't broken anything so far... I don't think this is a good approach for what you gain and how necessarily limited it will be. Similar to the recent experiment with handling _Complex loads/stores this is much better tackled by lowering things earlier (it will be lowered at RTL expansion time). One place to do this experimentation would be to piggy-back on the if-conversion pass so the lowering would happen only on the vectorized code path. Note that the desired lowering would look like the following for reads: _1 = a.b; to _2 = a.; _1 = BIT_FIELD_REF <2, ...>; // extract bits and for writes: a.b = _1; to _2 = a.; _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits a. = _3; so you trade now handled loads/stores with not handled BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to pattern match to shifts and logical ops in the vectorizer. There's a separate thing of actually promoting all uses, for example struct { long long x : 33; } a; a.a = a.a + 1; will get you 33bit precision adds (for bit fields less than 32bits they get promoted to int but not for larger bit fields). RTL expansion again will rewrite this into larger ops plus masking. So I think the time is better spent in working on the lowering of bitfield accesses, if sufficiently separated it could be used from if-conversion by working on loop SEME regions. The patches doing previous implementations are probably not too useful anymore (I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR) Richard.