From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <andre.simoesdiasvieira@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 469253858D28
	for <gcc-patches@gcc.gnu.org>; Thu,  8 Sep 2022 09:07:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 469253858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2A0C914BF;
	Thu,  8 Sep 2022 02:08:05 -0700 (PDT)
Received: from [10.57.17.180] (unknown [10.57.17.180])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DC5683F73D;
	Thu,  8 Sep 2022 02:07:57 -0700 (PDT)
Message-ID: <fa20f10f-d221-8eb3-254d-ed200f9aa69a@arm.com>
Date: Thu, 8 Sep 2022 10:07:47 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.13.0
Subject: Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was:
 [RFC] Teach vectorizer to deal with bitfield reads)
Content-Language: en-US
To: Richard Biener <rguenther@suse.de>
Cc: Jakub Jelinek <jakub@redhat.com>,
 Richard Sandiford <richard.sandiford@arm.com>,
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
References: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com>
 <nycvar.YFH.7.77.849.2207271122000.6583@jbgna.fhfr.qr>
 <4ea91767-de43-9d30-5bd2-a9a0759760c7@arm.com> <YuO3FvWoUm148X8J@tucnak>
 <nycvar.YFH.7.77.849.2207291049350.4208@jbgna.fhfr.qr>
 <69abe824-94f5-95b5-fb7f-6fa076973e05@arm.com>
 <nycvar.YFH.7.77.849.2208011313430.4208@jbgna.fhfr.qr>
 <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com>
 <nycvar.YFH.7.77.849.2208091350580.3343@jbgna.fhfr.qr>
 <aa8a155a-d420-83e2-7687-7b4fb5c50f2d@arm.com>
 <nycvar.YFH.7.77.849.2208171201490.13569@jbgna.fhfr.qr>
 <c985486d-ce6d-59e2-adaa-50d1c5ee06f6@arm.com>
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
In-Reply-To: <c985486d-ce6d-59e2-adaa-50d1c5ee06f6@arm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-15.8 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Ping.

On 25/08/2022 10:09, Andre Vieira (lists) via Gcc-patches wrote:
>
> On 17/08/2022 13:49, Richard Biener wrote:
>> Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
>> of the representative from DECL_FIELD_BIT_OFFSET of the original 
>> bitfield
>> access - that's the offset within the representative (by construction
>> both fields share DECL_FIELD_OFFSET).
> Doh! That makes sense...
>>> So instead I change bitpos such that:
>>> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
>>> bitpos -= bitpos.to_constant () / align_of_representative *
>>> align_of_representative;
>> ?  Not sure why alignment comes into play here?
> Yeah just forget about this... it was my ill attempt at basically 
> doing what you described above.
>> Not sure what you are saying but "yes", all shifting and masking should
>> happen in the type of the representative.
>>
>> +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
>>
>> for your convenience there's bitsize_int (bitpos) you can use.
>>
>> I don't think you are using the correct bitpos though, you fail to
>> adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
> Not sure I understand what you mean? I do adjust it, I've changed it 
> now so it should hopefully be clearer.
>>
>> +                        build_int_cst (bitsizetype, TYPE_PRECISION
>> (bf_type)),
>>
>> the size of the bitfield reference is DECL_SIZE of the original
>> FIELD_DECL - it might be bigger than the precision of its type.
>> You probably want to double-check it's equal to the precision
>> (because of the insert but also because of all the masking) and
>> refuse to lower if not.
> I added a check for this but out of curiosity, how can the DECL_SIZE 
> of a bitfield FIELD_DECL be different than it's type's precision?
>>
>> +/* Return TRUE if there are bitfields to lower in this LOOP. Fill
>> TO_LOWER
>> +   with data structures representing these bitfields.  */
>> +
>> +static bool
>> +bitfields_to_lower_p (class loop *loop,
>> +                     vec <gassign *> &reads_to_lower,
>> +                     vec <gassign *> &writes_to_lower)
>> +{
>> +  basic_block *bbs = get_loop_body (loop);
>> +  gimple_stmt_iterator gsi;
>>
>> as said I'd prefer to do this walk as part of the other walks we
>> already do - if and if only because get_loop_body () is a DFS
>> walk over the loop body (you should at least share that).
> I'm now sharing the use of ifc_bbs. The reason why I'd rather not 
> share the walk over them is because it becomes quite complex to split 
> out the decision to not lower if's because there are none, for which 
> we will still want to lower bitfields, versus not lowering if's when 
> they are there but aren't lowerable at which point we will forego 
> lowering bitfields since we will not vectorize this loop anyway.
>>
>> +      value = fold_build1 (NOP_EXPR, load_type, value);
>>
>> fold_convert (load_type, value)
>>
>> +      if (!CONSTANT_CLASS_P (value))
>> +       {
>> +         pattern_stmt
>> +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
>> NULL),
>> +                                  value);
>> +         value = gimple_get_lhs (pattern_stmt);
>>
>> there's in principle
>>
>>       gimple_seq stmts = NULL;
>>       value = gimple_convert (&stmts, load_type, value);
>>       if (!gimple_seq_empty_p (stmts))
>>         {
>>           pattern_stmt = gimple_seq_first_stmt (stmts);
>>           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
>>         }
>>
>> though a append_pattern_def_seq helper to add a convenience sequence
>> would be nice to have here.
> Ended up using the existing 'vect_convert_input', seems to do nicely 
> here.
>> You probably want to double-check your lowering code by
>> bootstrapping / testing with -ftree-loop-if-convert.
> Done, this lead me to find a new failure mode, where the type of the 
> first operand of BIT_FIELD_REF was a FP type (TF mode), which then 
> lead to failures when constructing the masking and shifting. I ended 
> up adding a nop-conversion to an INTEGER type of the same width first 
> if necessary. Also did a follow-up bootstrap with the addition of 
> `-ftree-vectorize` and `-fno-vect-cost-model` to further test the 
> codegen. All seems to be working on aarch64-linux-gnu.