From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 9038D385742C for ; Fri, 29 Jul 2022 09:11:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9038D385742C Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id B78CC34528; Fri, 29 Jul 2022 09:11:46 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id B02872C141; Fri, 29 Jul 2022 09:11:46 +0000 (UTC) Date: Fri, 29 Jul 2022 09:11:46 +0000 (UTC) From: Richard Biener To: "Andre Vieira (lists)" cc: "gcc-patches@gcc.gnu.org" , Richard Sandiford , pinskia@gmail.com Subject: Re: [RFC] Teach vectorizer to deal with bitfield reads In-Reply-To: <4ea91767-de43-9d30-5bd2-a9a0759760c7@arm.com> Message-ID: References: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com> <4ea91767-de43-9d30-5bd2-a9a0759760c7@arm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jul 2022 09:11:50 -0000 On Fri, 29 Jul 2022, Andre Vieira (lists) wrote: > Hi Richard, > > Thanks for the review, I don't completely understand all of the below, so I > added some extra questions to help me understand :) > > On 27/07/2022 12:37, Richard Biener wrote: > > On Tue, 26 Jul 2022, Andre Vieira (lists) wrote: > > > > I don't think this is a good approach for what you gain and how > > necessarily limited it will be. Similar to the recent experiment with > > handling _Complex loads/stores this is much better tackled by lowering > > things earlier (it will be lowered at RTL expansion time). > I assume the approach you are referring to here is the lowering of the > BIT_FIELD_DECL to BIT_FIELD_REF in the vect_recog part of the vectorizer. I am > all for lowering earlier, the reason I did it there was as a 'temporary' > approach until we have that earlier loading. I understood, but "temporary" things in GCC tend to be still around 10 years later, so ... > > > > One place to do this experimentation would be to piggy-back on the > > if-conversion pass so the lowering would happen only on the > > vectorized code path. > This was one of my initial thoughts, though the if-conversion changes are a > bit more intrusive for a temporary approach and not that much earlier. It does > however have the added benefit of not having to make any changes to the > vectorizer itself later if we do do the earlier lowering, assuming the > lowering results in the same. > > The 'only on the vectorized code path' remains the same though as vect_recog > also only happens on the vectorized code path right? > > Note that the desired lowering would look like > > the following for reads: > > > > _1 = a.b; > > > > to > > > > _2 = a.; > > _1 = BIT_FIELD_REF <2, ...>; // extract bits > I don't yet have a well formed idea of what '' is > supposed to look like in terms of tree expressions. I understand what it's > supposed to be representing, the 'larger than bit-field'-load. But is it going > to be a COMPONENT_REF with a fake 'FIELD_DECL' with the larger size? Like I > said on IRC, the description of BIT_FIELD_REF makes it sound like this isn't > how we are supposed to use it, are we intending to make a change to that here? is what DECL_BIT_FIELD_REPRESENTATIVE (FIELD_DECL-for-b) gives you, it is a "fake" FIELD_DECL for the underlying storage, conveniently made available to you by the middle-end. For your 31bit field it would be simply 'int' typed. The BIT_FIELD_REF then extracts the actual bitfield from the underlying storage, but it's now no longer operating on memory but on the register holding the underlying data. To the vectorizer we'd probably have to pattern-match this to shifts & masks and hope for the conversion to combine with a later one. > > and for writes: > > > > a.b = _1; > > > > to > > > > _2 = a.; > > _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits > > a. = _3; > I was going to avoid writes for now because they are somewhat more > complicated, but maybe it's not that bad, I'll add them too. Only handling loads at start is probably fine as experiment, but handling stores should be straight forward - of course the BIT_INSERT_EXPR lowering to shifts & masks will be more complicated. > > so you trade now handled loads/stores with not handled > > BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to > > pattern match to shifts and logical ops in the vectorizer. > Yeah that vect_recog pattern already exists in my RFC patch, though I can > probably simplify it by moving the bit-field-ref stuff to ifcvt. > > > > There's a separate thing of actually promoting all uses, for > > example > > > > struct { long long x : 33; } a; > > > > a.a = a.a + 1; > > > > will get you 33bit precision adds (for bit fields less than 32bits > > they get promoted to int but not for larger bit fields). RTL > > expansion again will rewrite this into larger ops plus masking. > Not sure I understand why this is relevant here? The current way I am doing > this would likely lower a  bit-field like that to a 64-bit load  followed by > the masking away of the top 31 bits, same would happen with a ifcvt-lowering > approach. Yes, but if there's anything besides loading or storing you will have a conversion from, say int:31 to int in the IL before any arithmetic. I've not looked but your patch probably handles conversions to/from bitfield types by masking / extending. What I've mentioned with the 33bit example is that with that you can have arithmetic in 33 bits _without_ intermediate conversions, so you'd have to properly truncate after every such operation (or choose not to vectorize which I think is what would happen now). > > > > So I think the time is better spent in working on the lowering of > > bitfield accesses, if sufficiently separated it could be used > > from if-conversion by working on loop SEME regions. > I will start to look at modifying ifcvt to add the lowering there. Will likely > require two pass though because we can no longer look at the number of BBs to > determine whether ifcvt is even needed, so we will first need to look for > bit-field-decls, then version the loops and then look for them again for > transformation, but I guess that's fine? Ah, yeah - I guess that's fine. Just add an need_to_lower_bitfields along need_to_predicate and friends. Richard. > > The patches > > doing previous implementations are probably not too useful anymore > > (I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR) > > > > Richard. > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)