Hi, For the first piece of code ,I tried: unsigned int nitems_per_iter = dest_rgm->max_nscalars_per_iter * dest_rgm->factor; step = gimple_build (seq, MULT_EXPR, iv_type, step, build_int_cst (iv_type, nitems_per_iter)); Then optimized IR: loop_len_34 = MIN_EXPR ; _74 = loop_len_34 * 4; loop_len_51 = _74 + 18446744073709551604; _16 = (void *) ivtmp.27_41; _17 = &MEM [(short int *)_16]; vect__1.7_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0); vect__2.8_23 = VIEW_CONVERT_EXPR(vect__1.7_33); vect__3.9_22 = vect__2.8_23 + { 1, 2, 1, 2, 1, 2, 1, 2 }; vect__4.10_21 = VIEW_CONVERT_EXPR(vect__3.9_22); .LEN_STORE (_17, 16B, loop_len_34, vect__4.10_21, 0); _20 = (void *) ivtmp.28_1; _31 = &MEM [(int *)_20]; vect__10.15_52 = .LEN_LOAD (_31, 32B, 4, 0); _30 = (void *) ivtmp.31_4; _29 = &MEM [(int *)_30]; vect__10.16_54 = .LEN_LOAD (_29, 32B, 4, 0); _26 = (void *) ivtmp.32_8; _25 = &MEM [(int *)_26]; vect__10.17_56 = .LEN_LOAD (_25, 32B, 4, 0); _79 = (void *) ivtmp.33_12; _80 = &MEM [(int *)_79]; vect__10.18_58 = .LEN_LOAD (_80, 32B, loop_len_51, 0); Is it correct ? It looks wierd ? juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2023-05-25 00:00 To: 钟居哲 CC: gcc-patches; rguenther Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support 钟居哲 writes: > Oh. I see. Thank you so much for pointing this. > Could you tell me what I should do in the codes? > It seems that I should adjust it in > vect_adjust_loop_lens_control > > muliply by some factor ? Is this correct multiply by max_nscalars_per_iter > ? max_nscalars_per_iter * factor rather than just max_nscalars_per_iter Note that it's possible for later max_nscalars_per_iter * factor to be smaller, so a division might be needed in rare cases. E.g.: uint64_t x[100]; uint16_t y[200]; void f() { for (int i = 0, j = 0; i < 100; i += 2, j += 4) { x[i + 0] += 1; x[i + 1] += 2; y[j + 0] += 1; y[j + 1] += 2; y[j + 2] += 3; y[j + 3] += 4; } } where y has a single-control rgroup with max_nscalars_per_iter == 4 and x has a 2-control rgroup with max_nscalars_per_iter == 2 What gives the best code in these cases? Is emitting a multiplication better? Or is using a new IV better? Thanks, Richard