* [PATCH] LoongArch: Use LSX and LASX for block move @ 2023-09-07 16:14 Xi Ruoyao 2023-09-09 6:10 ` chenglulu 2023-09-09 7:04 ` [PATCH] " chenglulu 0 siblings, 2 replies; 7+ messages in thread From: Xi Ruoyao @ 2023-09-07 16:14 UTC (permalink / raw) To: gcc-patches; +Cc: chenglulu, Chenghui Pan, i, xuchenghua, Xi Ruoyao gcc/ChangeLog: * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): Define to the maximum amount of bytes able to be loaded or stored with one machine instruction. * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): New static function. (loongarch_block_move_straight): Call loongarch_mode_for_move_size for machine_mode to be moved. (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN instead of UNITS_PER_WORD. --- Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch applied, the "lib_build_self_spec = %<..." line in t-linux commented out (because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx". Ok for trunk? gcc/config/loongarch/loongarch.cc | 22 ++++++++++++++++++---- gcc/config/loongarch/loongarch.h | 3 +++ 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 6698414281e..509ef2b97f1 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, return true; } +static machine_mode +loongarch_mode_for_move_size (HOST_WIDE_INT size) +{ + switch (size) + { + case 32: + return V32QImode; + case 16: + return V16QImode; + } + + return int_mode_for_size (size * BITS_PER_UNIT, 0).require (); +} + /* Emit straight-line code to move LENGTH bytes from SRC to DEST. Assume that the areas do not overlap. */ @@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) { - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); + mode = loongarch_mode_for_move_size (delta_cur); for (; offs + delta_cur <= length; offs += delta_cur, i++) { @@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) { - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); + mode = loongarch_mode_for_move_size (delta_cur); for (; offs + delta_cur <= length; offs += delta_cur, i++) loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]); @@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align) HOST_WIDE_INT align = INTVAL (r_align); - if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD) - align = UNITS_PER_WORD; + if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN) + align = LARCH_MAX_MOVE_PER_INSN; if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT) { diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index 3fc9dc43ab1..7e391205583 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -1181,6 +1181,9 @@ typedef struct { least twice. */ #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2) +#define LARCH_MAX_MOVE_PER_INSN \ + (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD)) + /* The base cost of a memcpy call, for MOVE_RATIO and friends. These values were determined experimentally by benchmarking with CSiBE. */ -- 2.42.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] LoongArch: Use LSX and LASX for block move 2023-09-07 16:14 [PATCH] LoongArch: Use LSX and LASX for block move Xi Ruoyao @ 2023-09-09 6:10 ` chenglulu 2023-09-09 7:03 ` Pushed: [PATCH v2] " Xi Ruoyao 2023-09-09 7:04 ` [PATCH] " chenglulu 1 sibling, 1 reply; 7+ messages in thread From: chenglulu @ 2023-09-09 6:10 UTC (permalink / raw) To: Xi Ruoyao, gcc-patches; +Cc: Chenghui Pan, i, xuchenghua 在 2023/9/8 上午12:14, Xi Ruoyao 写道: > gcc/ChangeLog: > > * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): > Define to the maximum amount of bytes able to be loaded or > stored with one machine instruction. > * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): > New static function. > (loongarch_block_move_straight): Call > loongarch_mode_for_move_size for machine_mode to be moved. > (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN > instead of UNITS_PER_WORD. > --- > > Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch > applied, the "lib_build_self_spec = %<..." line in t-linux commented out > (because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie > is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx". Ok for trunk? I think test cases need to be added here. Otherwise OK, thanks! > gcc/config/loongarch/loongarch.cc | 22 ++++++++++++++++++---- > gcc/config/loongarch/loongarch.h | 3 +++ > 2 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc > index 6698414281e..509ef2b97f1 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, > return true; > } > > +static machine_mode > +loongarch_mode_for_move_size (HOST_WIDE_INT size) > +{ > + switch (size) > + { > + case 32: > + return V32QImode; > + case 16: > + return V16QImode; > + } > + > + return int_mode_for_size (size * BITS_PER_UNIT, 0).require (); > +} > + > /* Emit straight-line code to move LENGTH bytes from SRC to DEST. > Assume that the areas do not overlap. */ > > @@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, > > for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) > { > - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); > + mode = loongarch_mode_for_move_size (delta_cur); > > for (; offs + delta_cur <= length; offs += delta_cur, i++) > { > @@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, > > for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) > { > - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); > + mode = loongarch_mode_for_move_size (delta_cur); > > for (; offs + delta_cur <= length; offs += delta_cur, i++) > loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]); > @@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align) > > HOST_WIDE_INT align = INTVAL (r_align); > > - if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD) > - align = UNITS_PER_WORD; > + if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN) > + align = LARCH_MAX_MOVE_PER_INSN; > > if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT) > { > diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h > index 3fc9dc43ab1..7e391205583 100644 > --- a/gcc/config/loongarch/loongarch.h > +++ b/gcc/config/loongarch/loongarch.h > @@ -1181,6 +1181,9 @@ typedef struct { > least twice. */ > #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2) > > +#define LARCH_MAX_MOVE_PER_INSN \ > + (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD)) > + > /* The base cost of a memcpy call, for MOVE_RATIO and friends. These > values were determined experimentally by benchmarking with CSiBE. > */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Pushed: [PATCH v2] LoongArch: Use LSX and LASX for block move 2023-09-09 6:10 ` chenglulu @ 2023-09-09 7:03 ` Xi Ruoyao 0 siblings, 0 replies; 7+ messages in thread From: Xi Ruoyao @ 2023-09-09 7:03 UTC (permalink / raw) To: chenglulu, gcc-patches; +Cc: Chenghui Pan, i, xuchenghua [-- Attachment #1: Type: text/plain, Size: 1335 bytes --] Pushed r14-3818 with test cases added. The pushed patch is attached. On Sat, 2023-09-09 at 14:10 +0800, chenglulu wrote: > > 在 2023/9/8 上午12:14, Xi Ruoyao 写道: > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): > > Define to the maximum amount of bytes able to be loaded or > > stored with one machine instruction. > > * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): > > New static function. > > (loongarch_block_move_straight): Call > > loongarch_mode_for_move_size for machine_mode to be moved. > > (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN > > instead of UNITS_PER_WORD. > > --- > > > > Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch > > applied, the "lib_build_self_spec = %<..." line in t-linux commented out > > (because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie > > is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx". Ok for trunk? > > I think test cases need to be added here. > > Otherwise OK, thanks! /* snip */ -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University [-- Attachment #2: v2-0001-LoongArch-Use-LSX-and-LASX-for-block-move.patch --] [-- Type: text/x-patch, Size: 5978 bytes --] From 35adc54b55aa199f17e2c84e382792e424b6171e Mon Sep 17 00:00:00 2001 From: Xi Ruoyao <xry111@xry111.site> Date: Tue, 5 Sep 2023 21:02:38 +0800 Subject: [PATCH v2] LoongArch: Use LSX and LASX for block move gcc/ChangeLog: * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): Define to the maximum amount of bytes able to be loaded or stored with one machine instruction. * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): New static function. (loongarch_block_move_straight): Call loongarch_mode_for_move_size for machine_mode to be moved. (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN instead of UNITS_PER_WORD. gcc/testsuite/ChangeLog: * gcc.target/loongarch/memcpy-vec-1.c: New test. * gcc.target/loongarch/memcpy-vec-2.c: New test. * gcc.target/loongarch/memcpy-vec-3.c: New test. --- gcc/config/loongarch/loongarch.cc | 22 +++++++++++++++---- gcc/config/loongarch/loongarch.h | 3 +++ .../gcc.target/loongarch/memcpy-vec-1.c | 11 ++++++++++ .../gcc.target/loongarch/memcpy-vec-2.c | 12 ++++++++++ .../gcc.target/loongarch/memcpy-vec-3.c | 6 +++++ 5 files changed, 50 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 6698414281e..509ef2b97f1 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, return true; } +static machine_mode +loongarch_mode_for_move_size (HOST_WIDE_INT size) +{ + switch (size) + { + case 32: + return V32QImode; + case 16: + return V16QImode; + } + + return int_mode_for_size (size * BITS_PER_UNIT, 0).require (); +} + /* Emit straight-line code to move LENGTH bytes from SRC to DEST. Assume that the areas do not overlap. */ @@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) { - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); + mode = loongarch_mode_for_move_size (delta_cur); for (; offs + delta_cur <= length; offs += delta_cur, i++) { @@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) { - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); + mode = loongarch_mode_for_move_size (delta_cur); for (; offs + delta_cur <= length; offs += delta_cur, i++) loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]); @@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align) HOST_WIDE_INT align = INTVAL (r_align); - if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD) - align = UNITS_PER_WORD; + if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN) + align = LARCH_MAX_MOVE_PER_INSN; if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT) { diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index 3fc9dc43ab1..7e391205583 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -1181,6 +1181,9 @@ typedef struct { least twice. */ #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2) +#define LARCH_MAX_MOVE_PER_INSN \ + (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD)) + /* The base cost of a memcpy call, for MOVE_RATIO and friends. These values were determined experimentally by benchmarking with CSiBE. */ diff --git a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c new file mode 100644 index 00000000000..8d9fedc9e4f --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d -march=la464 -mno-strict-align" } */ +/* { dg-final { scan-assembler-times "xvst" 2 } } */ +/* { dg-final { scan-assembler-times "\tvst" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.d|stptr\\.d" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.w|stptr\\.w" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.h" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.b" 1 } } */ + +extern char a[], b[]; +void test() { __builtin_memcpy(a, b, 95); } diff --git a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c new file mode 100644 index 00000000000..6b28b884db0 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d -march=la464 -mno-strict-align" } */ +/* { dg-final { scan-assembler-times "xvst" 2 } } */ +/* { dg-final { scan-assembler-times "\tvst" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.d|stptr\\.d" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.w|stptr\\.w" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.h" 1 } } */ +/* { dg-final { scan-assembler-times "st\\.b" 1 } } */ + +typedef char __attribute__ ((vector_size (32), aligned (32))) vec; +extern vec a[], b[]; +void test() { __builtin_memcpy(a, b, 95); } diff --git a/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c new file mode 100644 index 00000000000..233ed215078 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/memcpy-vec-3.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=la464 -mabi=lp64d -mstrict-align" } */ +/* { dg-final { scan-assembler-not "vst" } } */ + +extern char a[], b[]; +void test() { __builtin_memcpy(a, b, 16); } -- 2.42.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] LoongArch: Use LSX and LASX for block move 2023-09-07 16:14 [PATCH] LoongArch: Use LSX and LASX for block move Xi Ruoyao 2023-09-09 6:10 ` chenglulu @ 2023-09-09 7:04 ` chenglulu 2023-09-09 7:06 ` Xi Ruoyao 1 sibling, 1 reply; 7+ messages in thread From: chenglulu @ 2023-09-09 7:04 UTC (permalink / raw) To: Xi Ruoyao, gcc-patches; +Cc: Chenghui Pan, i, xuchenghua Hi,RuoYao: I think the test example memcpy-vec-3.c submitted in r14-3818 is implemented incorrectly. The 16-byte length in this test example will cause can_move_by_pieces to return true when with '-mstrict-align', so no vector load instructions will be generated. 在 2023/9/8 上午12:14, Xi Ruoyao 写道: > gcc/ChangeLog: > > * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): > Define to the maximum amount of bytes able to be loaded or > stored with one machine instruction. > * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): > New static function. > (loongarch_block_move_straight): Call > loongarch_mode_for_move_size for machine_mode to be moved. > (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN > instead of UNITS_PER_WORD. > --- > > Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch > applied, the "lib_build_self_spec = %<..." line in t-linux commented out > (because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie > is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx". Ok for trunk? > > gcc/config/loongarch/loongarch.cc | 22 ++++++++++++++++++---- > gcc/config/loongarch/loongarch.h | 3 +++ > 2 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc > index 6698414281e..509ef2b97f1 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, > return true; > } > > +static machine_mode > +loongarch_mode_for_move_size (HOST_WIDE_INT size) > +{ > + switch (size) > + { > + case 32: > + return V32QImode; > + case 16: > + return V16QImode; > + } > + > + return int_mode_for_size (size * BITS_PER_UNIT, 0).require (); > +} > + > /* Emit straight-line code to move LENGTH bytes from SRC to DEST. > Assume that the areas do not overlap. */ > > @@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, > > for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) > { > - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); > + mode = loongarch_mode_for_move_size (delta_cur); > > for (; offs + delta_cur <= length; offs += delta_cur, i++) > { > @@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, > > for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) > { > - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); > + mode = loongarch_mode_for_move_size (delta_cur); > > for (; offs + delta_cur <= length; offs += delta_cur, i++) > loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]); > @@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align) > > HOST_WIDE_INT align = INTVAL (r_align); > > - if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD) > - align = UNITS_PER_WORD; > + if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN) > + align = LARCH_MAX_MOVE_PER_INSN; > > if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT) > { > diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h > index 3fc9dc43ab1..7e391205583 100644 > --- a/gcc/config/loongarch/loongarch.h > +++ b/gcc/config/loongarch/loongarch.h > @@ -1181,6 +1181,9 @@ typedef struct { > least twice. */ > #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2) > > +#define LARCH_MAX_MOVE_PER_INSN \ > + (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD)) > + > /* The base cost of a memcpy call, for MOVE_RATIO and friends. These > values were determined experimentally by benchmarking with CSiBE. > */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] LoongArch: Use LSX and LASX for block move 2023-09-09 7:04 ` [PATCH] " chenglulu @ 2023-09-09 7:06 ` Xi Ruoyao 2023-09-09 7:14 ` chenglulu 0 siblings, 1 reply; 7+ messages in thread From: Xi Ruoyao @ 2023-09-09 7:06 UTC (permalink / raw) To: chenglulu, gcc-patches; +Cc: Chenghui Pan, i, xuchenghua On Sat, 2023-09-09 at 15:04 +0800, chenglulu wrote: > Hi,RuoYao: > > I think the test example memcpy-vec-3.c submitted in r14-3818 is > implemented incorrectly. > > The 16-byte length in this test example will cause can_move_by_pieces to > return true when with '-mstrict-align', so no vector load instructions > will be generated. Yes, in this case we cannot use vst because we don't know if b is aligned. Thus a { scan-assembler-not "vst" } guarantees that. Or am I understanding something wrongly here? -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] LoongArch: Use LSX and LASX for block move 2023-09-09 7:06 ` Xi Ruoyao @ 2023-09-09 7:14 ` chenglulu 2023-09-09 7:15 ` Xi Ruoyao 0 siblings, 1 reply; 7+ messages in thread From: chenglulu @ 2023-09-09 7:14 UTC (permalink / raw) To: Xi Ruoyao, gcc-patches; +Cc: Chenghui Pan, i, xuchenghua 在 2023/9/9 下午3:06, Xi Ruoyao 写道: > On Sat, 2023-09-09 at 15:04 +0800, chenglulu wrote: >> Hi,RuoYao: >> >> I think the test example memcpy-vec-3.c submitted in r14-3818 is >> implemented incorrectly. >> >> The 16-byte length in this test example will cause can_move_by_pieces to >> return true when with '-mstrict-align', so no vector load instructions >> will be generated. > Yes, in this case we cannot use vst because we don't know if b is > aligned. Thus a { scan-assembler-not "vst" } guarantees that. > > Or am I understanding something wrongly here? > Well, what I mean is that even if '-mno-strict-align' is used here, vst/vld will not be used, so this test example cannot test what we want to test. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] LoongArch: Use LSX and LASX for block move 2023-09-09 7:14 ` chenglulu @ 2023-09-09 7:15 ` Xi Ruoyao 0 siblings, 0 replies; 7+ messages in thread From: Xi Ruoyao @ 2023-09-09 7:15 UTC (permalink / raw) To: chenglulu, gcc-patches; +Cc: Chenghui Pan, i, xuchenghua On Sat, 2023-09-09 at 15:14 +0800, chenglulu wrote: > > 在 2023/9/9 下午3:06, Xi Ruoyao 写道: > > On Sat, 2023-09-09 at 15:04 +0800, chenglulu wrote: > > > Hi,RuoYao: > > > > > > I think the test example memcpy-vec-3.c submitted in r14-3818 is > > > implemented incorrectly. > > > > > > The 16-byte length in this test example will cause can_move_by_pieces to > > > return true when with '-mstrict-align', so no vector load instructions > > > will be generated. > > Yes, in this case we cannot use vst because we don't know if b is > > aligned. Thus a { scan-assembler-not "vst" } guarantees that. > > > > Or am I understanding something wrongly here? > > > Well, what I mean is that even if '-mno-strict-align' is used here, > vst/vld will not be used, > > so this test example cannot test what we want to test. Let me revise it... -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-09-09 7:15 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-09-07 16:14 [PATCH] LoongArch: Use LSX and LASX for block move Xi Ruoyao 2023-09-09 6:10 ` chenglulu 2023-09-09 7:03 ` Pushed: [PATCH v2] " Xi Ruoyao 2023-09-09 7:04 ` [PATCH] " chenglulu 2023-09-09 7:06 ` Xi Ruoyao 2023-09-09 7:14 ` chenglulu 2023-09-09 7:15 ` Xi Ruoyao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).