On 17/08/2022 13:49, Richard Biener wrote: > Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET > of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield > access - that's the offset within the representative (by construction > both fields share DECL_FIELD_OFFSET). Doh! That makes sense... >> So instead I change bitpos such that: >> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative)); >> bitpos -= bitpos.to_constant () / align_of_representative * >> align_of_representative; > ? Not sure why alignment comes into play here? Yeah just forget about this... it was my ill attempt at basically doing what you described above. > Not sure what you are saying but "yes", all shifting and masking should > happen in the type of the representative. > > + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); > > for your convenience there's bitsize_int (bitpos) you can use. > > I don't think you are using the correct bitpos though, you fail to > adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR. Not sure I understand what you mean? I do adjust it, I've changed it now so it should hopefully be clearer. > > + build_int_cst (bitsizetype, TYPE_PRECISION > (bf_type)), > > the size of the bitfield reference is DECL_SIZE of the original > FIELD_DECL - it might be bigger than the precision of its type. > You probably want to double-check it's equal to the precision > (because of the insert but also because of all the masking) and > refuse to lower if not. I added a check for this but out of curiosity, how can the DECL_SIZE of a bitfield FIELD_DECL be different than it's type's precision? > > +/* Return TRUE if there are bitfields to lower in this LOOP. Fill > TO_LOWER > + with data structures representing these bitfields. */ > + > +static bool > +bitfields_to_lower_p (class loop *loop, > + vec &reads_to_lower, > + vec &writes_to_lower) > +{ > + basic_block *bbs = get_loop_body (loop); > + gimple_stmt_iterator gsi; > > as said I'd prefer to do this walk as part of the other walks we > already do - if and if only because get_loop_body () is a DFS > walk over the loop body (you should at least share that). I'm now sharing the use of ifc_bbs. The reason why I'd rather not share the walk over them is because it becomes quite complex to split out the decision to not lower if's because there are none, for which we will still want to lower bitfields, versus not lowering if's when they are there but aren't lowerable at which point we will forego lowering bitfields since we will not vectorize this loop anyway. > > + value = fold_build1 (NOP_EXPR, load_type, value); > > fold_convert (load_type, value) > > + if (!CONSTANT_CLASS_P (value)) > + { > + pattern_stmt > + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, > NULL), > + value); > + value = gimple_get_lhs (pattern_stmt); > > there's in principle > > gimple_seq stmts = NULL; > value = gimple_convert (&stmts, load_type, value); > if (!gimple_seq_empty_p (stmts)) > { > pattern_stmt = gimple_seq_first_stmt (stmts); > append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); > } > > though a append_pattern_def_seq helper to add a convenience sequence > would be nice to have here. Ended up using the existing 'vect_convert_input', seems to do nicely here. > You probably want to double-check your lowering code by > bootstrapping / testing with -ftree-loop-if-convert. Done, this lead me to find a new failure mode, where the type of the first operand of BIT_FIELD_REF was a FP type (TF mode), which then lead to failures when constructing the masking and shifting. I ended up adding a nop-conversion to an INTEGER type of the same width first if necessary. Also did a follow-up bootstrap with the addition of `-ftree-vectorize` and `-fno-vect-cost-model` to further test the codegen. All seems to be working on aarch64-linux-gnu.