Hi, Richard. After I fix codes, now IR is correct I think: loop_len_34 = MIN_EXPR ; _74 = loop_len_34 * 2; loop_len_48 = MIN_EXPR <_74, 4>; _75 = _74 - loop_len_48; loop_len_49 = MIN_EXPR <_75, 4>; _76 = _75 - loop_len_49; loop_len_50 = MIN_EXPR <_76, 4>; loop_len_51 = _76 - loop_len_50; ... vect__1.8_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0); ... .LEN_STORE (_17, 16B, loop_len_34, vect__4.11_21, 0); ... vect__10.16_52 = .LEN_LOAD (_31, 32B, loop_len_48, 0); ... vect__10.17_54 = .LEN_LOAD (_29, 32B, loop_len_49, 0); ... vect__10.18_56 = .LEN_LOAD (_25, 32B, loop_len_50, 0); ... vect__10.19_58 = .LEN_LOAD (_80, 32B, loop_len_51, 0); For this case: uint64_t x2[100]; uint16_t y2[200]; void f2(int n) { for (int i = 0, j = 0; i < n; i += 2, j += 4) { x2[i + 0] += 1; x2[i + 1] += 2; y2[j + 0] += 1; y2[j + 1] += 2; y2[j + 2] += 3; y2[j + 3] += 4; } } The IR is like this: loop_len_56 = MIN_EXPR ; _66 = loop_len_56 * 4; loop_len_43 = _66 + 18446744073709551614; ... vect__1.44_44 = .LEN_LOAD (_6, 64B, 2, 0); ... vect__1.45_46 = .LEN_LOAD (_14, 64B, loop_len_43, 0); vect__2.46_47 = vect__1.44_44 + { 1, 2 }; vect__2.46_48 = vect__1.45_46 + { 1, 2 }; .LEN_STORE (_6, 64B, 2, vect__2.46_47, 0); .LEN_STORE (_14, 64B, loop_len_43, vect__2.46_48, 0); ... vect__6.51_57 = .LEN_LOAD (_10, 16B, loop_len_56, 0); vect__7.52_58 = vect__6.51_57 + { 1, 2, 3, 4, 1, 2, 3, 4 }; .LEN_STORE (_10, 16B, loop_len_56, vect__7.52_58, 0); It seems correct too ? >> What gives the best code in these cases? Is emitting a multiplication >> better? Or is using a new IV better? Could you give me more detail information about "new refresh IV" approach. I'd like to try that. Thanks. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2023-05-25 00:00 To: 钟居哲 CC: gcc-patches; rguenther Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support 钟居哲 writes: > Oh. I see. Thank you so much for pointing this. > Could you tell me what I should do in the codes? > It seems that I should adjust it in > vect_adjust_loop_lens_control > > muliply by some factor ? Is this correct multiply by max_nscalars_per_iter > ? max_nscalars_per_iter * factor rather than just max_nscalars_per_iter Note that it's possible for later max_nscalars_per_iter * factor to be smaller, so a division might be needed in rare cases. E.g.: uint64_t x[100]; uint16_t y[200]; void f() { for (int i = 0, j = 0; i < 100; i += 2, j += 4) { x[i + 0] += 1; x[i + 1] += 2; y[j + 0] += 1; y[j + 1] += 2; y[j + 2] += 3; y[j + 3] += 4; } } where y has a single-control rgroup with max_nscalars_per_iter == 4 and x has a 2-control rgroup with max_nscalars_per_iter == 2 What gives the best code in these cases? Is emitting a multiplication better? Or is using a new IV better? Thanks, Richard