* [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit @ 2023-04-10 14:48 juzhe.zhong 2023-04-10 14:54 ` Jeff Law 2023-04-10 15:10 ` Jakub Jelinek 0 siblings, 2 replies; 63+ messages in thread From: juzhe.zhong @ 2023-04-10 14:48 UTC (permalink / raw) To: gcc-patches Cc: kito.cheng, palmer, jeffreyalaw, jakub, richard.sandiford, rguenther, Juzhe-Zhong From: Juzhe-Zhong <juzhe.zhong@rivai.ai> According RVV ISA: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8 Also, for segment instructions, we have tuple type for NF = 2 ~ 8. For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t, we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t. So we will end up with over 220+ vector machine mode for RVV. PLUS the scalar machine modes that we already have in RISC-V port. The total machine modes in RISC-V port > 256. Current GCC can not allow us support RVV segment instructions tuple types. So extend machine mode size from 8bit to 16bit. I have another solution related to this patch, May be adding a target dependent macro is better? Revise this patch like this: #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256 ENUM_BITFIELD(machine_mode) last_set_mode : 16; #else ENUM_BITFIELD(machine_mode) last_set_mode : 8; #endif Not sure whether this solution is better? This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow. Expecting land in GCC-14, any suggestions ? gcc/ChangeLog: * combine.cc (struct reg_stat_type): Extend 8bit to 16bit. * cse.cc (struct qty_table_elem): Ditto. (struct table_elt): Ditto. (struct set): Ditto. * genopinit.cc (main): Ditto. * ira-int.h (struct ira_allocno): Ditto. * ree.cc (struct ATTRIBUTE_PACKED): Ditto. * rtl-ssa/accesses.h: Ditto. * rtl.h (struct GTY): Ditto. (subreg_shape::unique_id): Ditto. * rtlanal.h: Ditto. * tree-core.h (struct tree_type_common): Ditto. (struct tree_decl_common): Ditto. --- gcc/combine.cc | 4 ++-- gcc/cse.cc | 6 +++--- gcc/genopinit.cc | 2 +- gcc/ira-int.h | 4 ++-- gcc/ree.cc | 2 +- gcc/rtl-ssa/accesses.h | 2 +- gcc/rtl.h | 4 ++-- gcc/rtlanal.h | 2 +- gcc/tree-core.h | 4 ++-- 9 files changed, 15 insertions(+), 15 deletions(-) diff --git a/gcc/combine.cc b/gcc/combine.cc index 053879500b7..af9bae23c92 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -200,7 +200,7 @@ struct reg_stat_type { unsigned HOST_WIDE_INT last_set_nonzero_bits; char last_set_sign_bit_copies; - ENUM_BITFIELD(machine_mode) last_set_mode : 8; + ENUM_BITFIELD(machine_mode) last_set_mode : 16; /* Set nonzero if references to register n in expressions should not be used. last_set_invalid is set nonzero when this register is being @@ -235,7 +235,7 @@ struct reg_stat_type { truncation if we know that value already contains a truncated value. */ - ENUM_BITFIELD(machine_mode) truncated_to_mode : 8; + ENUM_BITFIELD(machine_mode) truncated_to_mode : 16; }; diff --git a/gcc/cse.cc b/gcc/cse.cc index 8fbda4ecc86..d78efaa39f7 100644 --- a/gcc/cse.cc +++ b/gcc/cse.cc @@ -251,7 +251,7 @@ struct qty_table_elem /* The sizes of these fields should match the sizes of the code and mode fields of struct rtx_def (see rtl.h). */ ENUM_BITFIELD(rtx_code) comparison_code : 16; - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; }; /* The table of all qtys, indexed by qty number. */ @@ -406,7 +406,7 @@ struct table_elt int regcost; /* The size of this field should match the size of the mode field of struct rtx_def (see rtl.h). */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; char in_memory; char is_const; char flag; @@ -4146,7 +4146,7 @@ struct set /* Original machine mode, in case it becomes a CONST_INT. The size of this field should match the size of the mode field of struct rtx_def (see rtl.h). */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; /* Hash value of constant equivalent for SET_SRC. */ unsigned src_const_hash; /* A constant equivalent for SET_SRC, if any. */ diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc index 83cb7504fa1..3ca3e9fd946 100644 --- a/gcc/genopinit.cc +++ b/gcc/genopinit.cc @@ -182,7 +182,7 @@ main (int argc, const char **argv) progname = "genopinit"; - if (NUM_OPTABS > 0xffff || MAX_MACHINE_MODE >= 0xff) + if (NUM_OPTABS > 0xffff || MAX_MACHINE_MODE >= 0xffff) fatal ("genopinit range assumptions invalid"); if (!init_rtx_reader_args_cb (argc, argv, handle_arg)) diff --git a/gcc/ira-int.h b/gcc/ira-int.h index e2de47213b4..65ec1678146 100644 --- a/gcc/ira-int.h +++ b/gcc/ira-int.h @@ -281,10 +281,10 @@ struct ira_allocno int regno; /* Mode of the allocno which is the mode of the corresponding pseudo-register. */ - ENUM_BITFIELD (machine_mode) mode : 8; + ENUM_BITFIELD (machine_mode) mode : 16; /* Widest mode of the allocno which in at least one case could be for paradoxical subregs where wmode > mode. */ - ENUM_BITFIELD (machine_mode) wmode : 8; + ENUM_BITFIELD (machine_mode) wmode : 16; /* Register class which should be used for allocation for given allocno. NO_REGS means that we should use memory. */ ENUM_BITFIELD (reg_class) aclass : 16; diff --git a/gcc/ree.cc b/gcc/ree.cc index 413aec7c8eb..e74b96cdfac 100644 --- a/gcc/ree.cc +++ b/gcc/ree.cc @@ -567,7 +567,7 @@ enum ext_modified_kind struct ATTRIBUTE_PACKED ext_modified { /* Mode from which ree has zero or sign extended the destination. */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; /* Kind of modification of the insn. */ ENUM_BITFIELD(ext_modified_kind) kind : 2; diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h index c5180b9308a..3c928058490 100644 --- a/gcc/rtl-ssa/accesses.h +++ b/gcc/rtl-ssa/accesses.h @@ -254,7 +254,7 @@ private: unsigned int m_spare : 2; // The value returned by the accessor above. - machine_mode m_mode : 8; + machine_mode m_mode : 16; }; // A contiguous array of access_info pointers. Used to represent a diff --git a/gcc/rtl.h b/gcc/rtl.h index 52f0419af29..c228c89da63 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -313,7 +313,7 @@ struct GTY((desc("0"), tag("0"), ENUM_BITFIELD(rtx_code) code: 16; /* The kind of value the expression has. */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; /* 1 in a MEM if we should keep the alias set for this mem unchanged when we access a component. @@ -2157,7 +2157,7 @@ subreg_shape::operator != (const subreg_shape &other) const inline unsigned HOST_WIDE_INT subreg_shape::unique_id () const { - { STATIC_ASSERT (MAX_MACHINE_MODE <= 256); } + { STATIC_ASSERT (MAX_MACHINE_MODE <= 32768); } { STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 3); } { STATIC_ASSERT (sizeof (offset.coeffs[0]) <= 2); } int res = (int) inner_mode + ((int) outer_mode << 8); diff --git a/gcc/rtlanal.h b/gcc/rtlanal.h index 5fbed816e20..bdd84e39c76 100644 --- a/gcc/rtlanal.h +++ b/gcc/rtlanal.h @@ -100,7 +100,7 @@ public: /* The mode of the reference. If IS_MULTIREG, this is the mode of REGNO - MULTIREG_OFFSET. */ - machine_mode mode : 8; + machine_mode mode : 16; /* If IS_MULTIREG, the offset of REGNO from the start of the register. */ unsigned int multireg_offset : 8; diff --git a/gcc/tree-core.h b/gcc/tree-core.h index fd2be57b78c..19d7c011530 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1693,7 +1693,7 @@ struct GTY(()) tree_type_common { unsigned restrict_flag : 1; unsigned contains_placeholder_bits : 2; - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1776,7 +1776,7 @@ struct GTY(()) tree_decl_common { struct tree_decl_minimal common; tree size; - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; unsigned nonlocal_flag : 1; unsigned virtual_flag : 1; -- 2.36.1 ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 14:48 [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit juzhe.zhong @ 2023-04-10 14:54 ` Jeff Law 2023-04-10 15:02 ` juzhe.zhong ` (2 more replies) 2023-04-10 15:10 ` Jakub Jelinek 1 sibling, 3 replies; 63+ messages in thread From: Jeff Law @ 2023-04-10 14:54 UTC (permalink / raw) To: juzhe.zhong, gcc-patches Cc: kito.cheng, palmer, jakub, richard.sandiford, rguenther On 4/10/23 08:48, juzhe.zhong@rivai.ai wrote: > From: Juzhe-Zhong <juzhe.zhong@rivai.ai> > > According RVV ISA: > https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype > We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8 > Also, for segment instructions, we have tuple type for NF = 2 ~ 8. > For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t, > we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t. > So we will end up with over 220+ vector machine mode for RVV. > > PLUS the scalar machine modes that we already have in RISC-V port. > > The total machine modes in RISC-V port > 256. > > Current GCC can not allow us support RVV segment instructions tuple types. > > So extend machine mode size from 8bit to 16bit. > > I have another solution related to this patch, > May be adding a target dependent macro is better? > Revise this patch like this: > > #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256 > ENUM_BITFIELD(machine_mode) last_set_mode : 16; > #else > ENUM_BITFIELD(machine_mode) last_set_mode : 8; > #endif > > Not sure whether this solution is better? > > This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow. > > Expecting land in GCC-14, any suggestions ? > > gcc/ChangeLog: > > * combine.cc (struct reg_stat_type): Extend 8bit to 16bit. > * cse.cc (struct qty_table_elem): Ditto. > (struct table_elt): Ditto. > (struct set): Ditto. > * genopinit.cc (main): Ditto. > * ira-int.h (struct ira_allocno): Ditto. > * ree.cc (struct ATTRIBUTE_PACKED): Ditto. > * rtl-ssa/accesses.h: Ditto. > * rtl.h (struct GTY): Ditto. > (subreg_shape::unique_id): Ditto. > * rtlanal.h: Ditto. > * tree-core.h (struct tree_type_common): Ditto. > (struct tree_decl_common): Ditto. This is likely going to be very controversial. It's going to increase the size of two of most heavily used data structures in GCC (rtx and trees). The first thing I would ask is whether or not we really need the full matrix in practice or if we can combine some of the modes. Why hasn't aarch64 stumbled over this problem? Jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 14:54 ` Jeff Law @ 2023-04-10 15:02 ` juzhe.zhong 2023-04-10 15:14 ` juzhe.zhong 2023-04-10 15:18 ` Jakub Jelinek 2 siblings, 0 replies; 63+ messages in thread From: juzhe.zhong @ 2023-04-10 15:02 UTC (permalink / raw) To: Jeff Law, gcc-patches Cc: kito.cheng, palmer, jakub, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 3127 bytes --] Since RVV has much more types than aarch64. You can see rvv-intrinsic doc there are so many rvv intrinsics: https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md The rvv intrinsics explode. For segment instructions, RVV has array type supporting NF from 2 ~ 8 for LMUL <= 1 (MF8,MF4,MF2,M1) Wheras aarch64 only has array type with array size 2 ~ 4 only for a LMUL = 1(a whole vector). I think, kito can explain more clearly about such issue. juzhe.zhong@rivai.ai From: Jeff Law Date: 2023-04-10 22:54 To: juzhe.zhong; gcc-patches CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On 4/10/23 08:48, juzhe.zhong@rivai.ai wrote: > From: Juzhe-Zhong <juzhe.zhong@rivai.ai> > > According RVV ISA: > https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype > We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8 > Also, for segment instructions, we have tuple type for NF = 2 ~ 8. > For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t, > we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t. > So we will end up with over 220+ vector machine mode for RVV. > > PLUS the scalar machine modes that we already have in RISC-V port. > > The total machine modes in RISC-V port > 256. > > Current GCC can not allow us support RVV segment instructions tuple types. > > So extend machine mode size from 8bit to 16bit. > > I have another solution related to this patch, > May be adding a target dependent macro is better? > Revise this patch like this: > > #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256 > ENUM_BITFIELD(machine_mode) last_set_mode : 16; > #else > ENUM_BITFIELD(machine_mode) last_set_mode : 8; > #endif > > Not sure whether this solution is better? > > This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow. > > Expecting land in GCC-14, any suggestions ? > > gcc/ChangeLog: > > * combine.cc (struct reg_stat_type): Extend 8bit to 16bit. > * cse.cc (struct qty_table_elem): Ditto. > (struct table_elt): Ditto. > (struct set): Ditto. > * genopinit.cc (main): Ditto. > * ira-int.h (struct ira_allocno): Ditto. > * ree.cc (struct ATTRIBUTE_PACKED): Ditto. > * rtl-ssa/accesses.h: Ditto. > * rtl.h (struct GTY): Ditto. > (subreg_shape::unique_id): Ditto. > * rtlanal.h: Ditto. > * tree-core.h (struct tree_type_common): Ditto. > (struct tree_decl_common): Ditto. This is likely going to be very controversial. It's going to increase the size of two of most heavily used data structures in GCC (rtx and trees). The first thing I would ask is whether or not we really need the full matrix in practice or if we can combine some of the modes. Why hasn't aarch64 stumbled over this problem? Jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 14:54 ` Jeff Law 2023-04-10 15:02 ` juzhe.zhong @ 2023-04-10 15:14 ` juzhe.zhong 2023-04-11 9:16 ` Jakub Jelinek 2023-04-11 9:46 ` Richard Sandiford 2023-04-10 15:18 ` Jakub Jelinek 2 siblings, 2 replies; 63+ messages in thread From: juzhe.zhong @ 2023-04-10 15:14 UTC (permalink / raw) To: Jeff Law, gcc-patches Cc: kito.cheng, palmer, jakub, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 3264 bytes --] ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t As far as I known, they don't have tuple type for partial vector. However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t, vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t, vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t, vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t ....etc So many tuple types. I saw there are redundant scalar mode in RISC-V port backend like UQQmode, HQQmode,.... Not sure maybe we can reduce these scalar modes to make total machine modes less than 256? juzhe.zhong@rivai.ai From: Jeff Law Date: 2023-04-10 22:54 To: juzhe.zhong; gcc-patches CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On 4/10/23 08:48, juzhe.zhong@rivai.ai wrote: > From: Juzhe-Zhong <juzhe.zhong@rivai.ai> > > According RVV ISA: > https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype > We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8 > Also, for segment instructions, we have tuple type for NF = 2 ~ 8. > For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t, > we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t. > So we will end up with over 220+ vector machine mode for RVV. > > PLUS the scalar machine modes that we already have in RISC-V port. > > The total machine modes in RISC-V port > 256. > > Current GCC can not allow us support RVV segment instructions tuple types. > > So extend machine mode size from 8bit to 16bit. > > I have another solution related to this patch, > May be adding a target dependent macro is better? > Revise this patch like this: > > #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256 > ENUM_BITFIELD(machine_mode) last_set_mode : 16; > #else > ENUM_BITFIELD(machine_mode) last_set_mode : 8; > #endif > > Not sure whether this solution is better? > > This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow. > > Expecting land in GCC-14, any suggestions ? > > gcc/ChangeLog: > > * combine.cc (struct reg_stat_type): Extend 8bit to 16bit. > * cse.cc (struct qty_table_elem): Ditto. > (struct table_elt): Ditto. > (struct set): Ditto. > * genopinit.cc (main): Ditto. > * ira-int.h (struct ira_allocno): Ditto. > * ree.cc (struct ATTRIBUTE_PACKED): Ditto. > * rtl-ssa/accesses.h: Ditto. > * rtl.h (struct GTY): Ditto. > (subreg_shape::unique_id): Ditto. > * rtlanal.h: Ditto. > * tree-core.h (struct tree_type_common): Ditto. > (struct tree_decl_common): Ditto. This is likely going to be very controversial. It's going to increase the size of two of most heavily used data structures in GCC (rtx and trees). The first thing I would ask is whether or not we really need the full matrix in practice or if we can combine some of the modes. Why hasn't aarch64 stumbled over this problem? Jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 15:14 ` juzhe.zhong @ 2023-04-11 9:16 ` Jakub Jelinek 2023-04-11 9:46 ` juzhe.zhong 2023-04-11 9:46 ` Richard Sandiford 1 sibling, 1 reply; 63+ messages in thread From: Jakub Jelinek @ 2023-04-11 9:16 UTC (permalink / raw) To: juzhe.zhong Cc: Jeff Law, gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther On Mon, Apr 10, 2023 at 11:14:46PM +0800, juzhe.zhong@rivai.ai wrote: > ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t > As far as I known, they don't have tuple type for partial vector. > However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t, > vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t > > But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t, > vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t > > vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t, > vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t > > ....etc > > So many tuple types. Do all of them need their own mode? I mean, can't you instead use say some backend aggregate types which act like homogenous aggregates in various backends? Modes are needed for something that can appear in instructions, for something that can be lowered say during expansion at latest you don't need special modes. I admit I don't know much about RVV, but if those tuples are to be handled as configure the CPU for certain vector length, perform some instruction on effectively variable length vector with certain element and then reconfigure the CPU again for something else, couldn't the only vector modes there be the variable length ones? Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 9:16 ` Jakub Jelinek @ 2023-04-11 9:46 ` juzhe.zhong 2023-04-11 10:11 ` Jakub Jelinek 0 siblings, 1 reply; 63+ messages in thread From: juzhe.zhong @ 2023-04-11 9:46 UTC (permalink / raw) To: jakub Cc: jeffreyalaw, gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 2849 bytes --] I am not sure whether aggregate type without a tuple mode can work for us. Here is the example: We already had a vector type "vint8mf8_t", the corresponding mode is VNx1SImode. Now we have an intrinsic as following: vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) { return __riscv_vlseg2e8_v_i8mf8(base, vl); } This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest operand of the intrinsic should be 2 continguous registers. Another intrinsic: vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) { return __riscv_vlseg3e8_v_i8mf8(base, vl); } This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest operand of the intrinsic should be 3 continguous registers. Now, my plan is to build_array_type for both "vint8mf8x2_t" and "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" corresponding like ARM SVE. Then define the RTL pattern which has dest operand is a register_operand with mode "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen. If we don't have a mode for "vint8mf8x2_t" and "vint8mf8x3_t", I don't known how to define such instruction RTL pattern. Should its dest operand mode be BLKmode? But we want the dest operand is a register operand. juzhe.zhong@rivai.ai From: Jakub Jelinek Date: 2023-04-11 17:16 To: juzhe.zhong CC: Jeff Law; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Mon, Apr 10, 2023 at 11:14:46PM +0800, juzhe.zhong@rivai.ai wrote: > ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t > As far as I known, they don't have tuple type for partial vector. > However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t, > vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t > > But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t, > vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t > > vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t, > vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t > > ....etc > > So many tuple types. Do all of them need their own mode? I mean, can't you instead use say some backend aggregate types which act like homogenous aggregates in various backends? Modes are needed for something that can appear in instructions, for something that can be lowered say during expansion at latest you don't need special modes. I admit I don't know much about RVV, but if those tuples are to be handled as configure the CPU for certain vector length, perform some instruction on effectively variable length vector with certain element and then reconfigure the CPU again for something else, couldn't the only vector modes there be the variable length ones? Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 9:46 ` juzhe.zhong @ 2023-04-11 10:11 ` Jakub Jelinek 2023-04-11 10:25 ` juzhe.zhong 0 siblings, 1 reply; 63+ messages in thread From: Jakub Jelinek @ 2023-04-11 10:11 UTC (permalink / raw) To: juzhe.zhong Cc: jeffreyalaw, gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther, Vladimir Makarov On Tue, Apr 11, 2023 at 05:46:15PM +0800, juzhe.zhong@rivai.ai wrote: > I am not sure whether aggregate type without a tuple mode can work for us. > Here is the example: > > We already had a vector type "vint8mf8_t", the corresponding mode is VNx1SImode. > > Now we have an intrinsic as following: > vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) { > return __riscv_vlseg2e8_v_i8mf8(base, vl); > } > > This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest operand of the intrinsic should be 2 continguous registers. > > Another intrinsic: > vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) { > return __riscv_vlseg3e8_v_i8mf8(base, vl); > } > > This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest operand of the intrinsic should be 3 continguous registers. > > Now, my plan is to build_array_type for both "vint8mf8x2_t" and "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" corresponding like ARM SVE. > Then define the RTL pattern which has dest operand is a register_operand with mode "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen. Another possibility would be just make it explicit in the RTL that it sets 3 VNx1SI mode REGs rather than one, as long as there is some way to tell RA that they need to be consecutive. CCing Vlad on that. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 10:11 ` Jakub Jelinek @ 2023-04-11 10:25 ` juzhe.zhong 2023-04-11 10:52 ` Jakub Jelinek 0 siblings, 1 reply; 63+ messages in thread From: juzhe.zhong @ 2023-04-11 10:25 UTC (permalink / raw) To: jakub Cc: jeffreyalaw, gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther, vmakarov [-- Attachment #1: Type: text/plain, Size: 2527 bytes --] Explicit sets multiple VNx1SImode with multiple dest operand and let RA to assign them with continguous regsiters will make RTL patterns in RVV hard to maintain. Assume we have a new pattern flag to tell RA assign continguous registers for multiple dest operand, and RA can handle this: in RVV, we have NF = 2 ~ 8 Then we need to define RTL pattern for "vlseg" as follows: NF = 2: define_insn "vlseg2" [(parallel_with_continguous_reg (set dest operand 0) (set dest operand 1)...]) NF = 3: define_insn "vlseg3" [(parallel_with_continguous_reg (set dest operand 0) (set dest operand 1) (set dest operand 2)...]) ... NF = 7: define_insn "vlseg7" [(parallel_with_continguous_reg (set dest operand 0) (set dest operand 1) (set dest operand 2) (set dest operand 2) (set dest operand 2)...]) juzhe.zhong@rivai.ai From: Jakub Jelinek Date: 2023-04-11 18:11 To: juzhe.zhong@rivai.ai CC: jeffreyalaw; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther; Vladimir Makarov Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Tue, Apr 11, 2023 at 05:46:15PM +0800, juzhe.zhong@rivai.ai wrote: > I am not sure whether aggregate type without a tuple mode can work for us. > Here is the example: > > We already had a vector type "vint8mf8_t", the corresponding mode is VNx1SImode. > > Now we have an intrinsic as following: > vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) { > return __riscv_vlseg2e8_v_i8mf8(base, vl); > } > > This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest operand of the intrinsic should be 2 continguous registers. > > Another intrinsic: > vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) { > return __riscv_vlseg3e8_v_i8mf8(base, vl); > } > > This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest operand of the intrinsic should be 3 continguous registers. > > Now, my plan is to build_array_type for both "vint8mf8x2_t" and "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" corresponding like ARM SVE. > Then define the RTL pattern which has dest operand is a register_operand with mode "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen. Another possibility would be just make it explicit in the RTL that it sets 3 VNx1SI mode REGs rather than one, as long as there is some way to tell RA that they need to be consecutive. CCing Vlad on that. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 10:25 ` juzhe.zhong @ 2023-04-11 10:52 ` Jakub Jelinek 0 siblings, 0 replies; 63+ messages in thread From: Jakub Jelinek @ 2023-04-11 10:52 UTC (permalink / raw) To: juzhe.zhong Cc: jeffreyalaw, gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther, vmakarov On Tue, Apr 11, 2023 at 06:25:58PM +0800, juzhe.zhong@rivai.ai wrote: > Explicit sets multiple VNx1SImode with multiple dest operand and let RA to assign them with continguous regsiters > will make RTL patterns in RVV hard to maintain. Not necessarily. It can be handled through define_subst. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 15:14 ` juzhe.zhong 2023-04-11 9:16 ` Jakub Jelinek @ 2023-04-11 9:46 ` Richard Sandiford 2023-04-11 9:59 ` Jakub Jelinek ` (2 more replies) 1 sibling, 3 replies; 63+ messages in thread From: Richard Sandiford @ 2023-04-11 9:46 UTC (permalink / raw) To: juzhe.zhong; +Cc: Jeff Law, gcc-patches, kito.cheng, palmer, jakub, rguenther <juzhe.zhong@rivai.ai> writes: > ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t > As far as I known, they don't have tuple type for partial vector. Yeah, there are no separate types for partial vectors, but there are separate modes. E.g. VNx2QI is a partial vector of QIs, with each QI stored in a 64-bit container. I agree with all the comments about the danger of growing the number of modes too much. But it looks like rtx_def should be easy to rearrange. Unless I'm missing something, there are less than 256 rtx codes at present. So one simple option would be to make the code 8 bits and the machine_mode 16 bits (and swap them, so that they stay well-aligned wrt their size). That of course would create new problem if we want more than 256 codes in future. But then there would be the option of a non-power-of-2 split (12/12 or whatever). Also, it's possible to multiplex operations into a single code by adding an extra operand, whereas it's harder to multiplex modes. Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 9:46 ` Richard Sandiford @ 2023-04-11 9:59 ` Jakub Jelinek 2023-04-11 10:11 ` juzhe.zhong 2023-04-11 10:05 ` Richard Earnshaw 2023-04-11 10:59 ` Richard Biener 2 siblings, 1 reply; 63+ messages in thread From: Jakub Jelinek @ 2023-04-11 9:59 UTC (permalink / raw) To: juzhe.zhong, Jeff Law, gcc-patches, kito.cheng, palmer, rguenther, richard.sandiford On Tue, Apr 11, 2023 at 10:46:25AM +0100, Richard Sandiford wrote: > I agree with all the comments about the danger of growing the number of > modes too much. But it looks like rtx_def should be easy to rearrange. > Unless I'm missing something, there are less than 256 rtx codes at > present. So one simple option would be to make the code 8 bits and > the machine_mode 16 bits (and swap them, so that they stay well-aligned > wrt their size). > > That of course would create new problem if we want more than 256 codes > in future. But then there would be the option of a non-power-of-2 > split (12/12 or whatever). Also, it's possible to multiplex operations > into a single code by adding an extra operand, whereas it's harder to > multiplex modes. We have 151 rtx codes if not a generator, 201 otherwise. That is closer to the limit except for the RISCV proposed changes. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 9:59 ` Jakub Jelinek @ 2023-04-11 10:11 ` juzhe.zhong 0 siblings, 0 replies; 63+ messages in thread From: juzhe.zhong @ 2023-04-11 10:11 UTC (permalink / raw) To: jakub, jeffreyalaw, gcc-patches, kito.cheng, palmer, rguenther, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 1495 bytes --] May RTX code grow faster than machine mode ? Since RTX code grows target independent wheras machine mode grows target dependent. In the future, we may easily have more and more targets that some target may have a lot of machine mode. Maybe Richard Sandiford suggestion is a good idea to fix it? Thanks for all comments. juzhe.zhong@rivai.ai From: Jakub Jelinek Date: 2023-04-11 17:59 To: juzhe.zhong; Jeff Law; gcc-patches; kito.cheng; palmer; rguenther; richard.sandiford Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Tue, Apr 11, 2023 at 10:46:25AM +0100, Richard Sandiford wrote: > I agree with all the comments about the danger of growing the number of > modes too much. But it looks like rtx_def should be easy to rearrange. > Unless I'm missing something, there are less than 256 rtx codes at > present. So one simple option would be to make the code 8 bits and > the machine_mode 16 bits (and swap them, so that they stay well-aligned > wrt their size). > > That of course would create new problem if we want more than 256 codes > in future. But then there would be the option of a non-power-of-2 > split (12/12 or whatever). Also, it's possible to multiplex operations > into a single code by adding an extra operand, whereas it's harder to > multiplex modes. We have 151 rtx codes if not a generator, 201 otherwise. That is closer to the limit except for the RISCV proposed changes. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 9:46 ` Richard Sandiford 2023-04-11 9:59 ` Jakub Jelinek @ 2023-04-11 10:05 ` Richard Earnshaw 2023-04-11 10:15 ` Richard Sandiford 2023-04-11 10:59 ` Richard Biener 2 siblings, 1 reply; 63+ messages in thread From: Richard Earnshaw @ 2023-04-11 10:05 UTC (permalink / raw) To: juzhe.zhong, Jeff Law, gcc-patches, kito.cheng, palmer, jakub, rguenther, richard.sandiford On 11/04/2023 10:46, Richard Sandiford via Gcc-patches wrote: > <juzhe.zhong@rivai.ai> writes: >> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t >> As far as I known, they don't have tuple type for partial vector. > > Yeah, there are no separate types for partial vectors, but there > are separate modes. E.g. VNx2QI is a partial vector of QIs, > with each QI stored in a 64-bit container. > > I agree with all the comments about the danger of growing the number of > modes too much. But it looks like rtx_def should be easy to rearrange. > Unless I'm missing something, there are less than 256 rtx codes at > present. So one simple option would be to make the code 8 bits and > the machine_mode 16 bits (and swap them, so that they stay well-aligned > wrt their size). > > That of course would create new problem if we want more than 256 codes > in future. But then there would be the option of a non-power-of-2 > split (12/12 or whatever). Also, it's possible to multiplex operations > into a single code by adding an extra operand, whereas it's harder to > multiplex modes. > > Thanks, > Richard The rtx code and mode are both accessed quite frequently, making them non-native machine sizes might have impact on the performance of accessing the fields. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 10:05 ` Richard Earnshaw @ 2023-04-11 10:15 ` Richard Sandiford 0 siblings, 0 replies; 63+ messages in thread From: Richard Sandiford @ 2023-04-11 10:15 UTC (permalink / raw) To: Richard Earnshaw Cc: juzhe.zhong, Jeff Law, gcc-patches, kito.cheng, palmer, jakub, rguenther Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > On 11/04/2023 10:46, Richard Sandiford via Gcc-patches wrote: >> <juzhe.zhong@rivai.ai> writes: >>> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t >>> As far as I known, they don't have tuple type for partial vector. >> >> Yeah, there are no separate types for partial vectors, but there >> are separate modes. E.g. VNx2QI is a partial vector of QIs, >> with each QI stored in a 64-bit container. >> >> I agree with all the comments about the danger of growing the number of >> modes too much. But it looks like rtx_def should be easy to rearrange. >> Unless I'm missing something, there are less than 256 rtx codes at >> present. So one simple option would be to make the code 8 bits and >> the machine_mode 16 bits (and swap them, so that they stay well-aligned >> wrt their size). >> >> That of course would create new problem if we want more than 256 codes >> in future. But then there would be the option of a non-power-of-2 >> split (12/12 or whatever). Also, it's possible to multiplex operations >> into a single code by adding an extra operand, whereas it's harder to >> multiplex modes. >> >> Thanks, >> Richard > > The rtx code and mode are both accessed quite frequently, making them > non-native machine sizes might have impact on the performance of > accessing the fields. Yeah, that's why I suggested that having a subcode operand would be an alternative to abandoning non-power-of-2 sizes. It seems unlikely that any new codes we add now will be so frequently used that an extra operand would be a problem in terms of either size or speed. Having a subcode operand would be very much UNSPECs today. But as it is, we've added 9 new rtx codes in the last 10 years. So even with 203 at present, with the current rate of expansion, it would be at least the 2070s before this becomes an issue. Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 9:46 ` Richard Sandiford 2023-04-11 9:59 ` Jakub Jelinek 2023-04-11 10:05 ` Richard Earnshaw @ 2023-04-11 10:59 ` Richard Biener 2023-04-11 11:11 ` Richard Sandiford 2 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-04-11 10:59 UTC (permalink / raw) To: Richard Sandiford Cc: juzhe.zhong, Jeff Law, gcc-patches, kito.cheng, palmer, jakub On Tue, 11 Apr 2023, Richard Sandiford wrote: > <juzhe.zhong@rivai.ai> writes: > > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t > > As far as I known, they don't have tuple type for partial vector. > > Yeah, there are no separate types for partial vectors, but there > are separate modes. E.g. VNx2QI is a partial vector of QIs, > with each QI stored in a 64-bit container. > > I agree with all the comments about the danger of growing the number of > modes too much. But it looks like rtx_def should be easy to rearrange. > Unless I'm missing something, there are less than 256 rtx codes at > present. So one simple option would be to make the code 8 bits and > the machine_mode 16 bits (and swap them, so that they stay well-aligned > wrt their size). But then the bigger issue is tree_type_common where we agreed to bump precision from 10 to 16 bits, with bumping machine_mode from 8 to 16 we then are left with only 3 spare bits from 15 now - if the comments are correct. In tree_decl_common we have 13 unused bits. IRA allocno would also increase and it's hard_regno field looks suspiciously unaligned already (unless unsigned/signed re-aligns bitfields). > That of course would create new problem if we want more than 256 codes > in future. But then there would be the option of a non-power-of-2 > split (12/12 or whatever). Also, it's possible to multiplex operations > into a single code by adding an extra operand, whereas it's harder to > multiplex modes. > > Thanks, > Richard > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 10:59 ` Richard Biener @ 2023-04-11 11:11 ` Richard Sandiford 2023-04-11 11:19 ` juzhe.zhong 0 siblings, 1 reply; 63+ messages in thread From: Richard Sandiford @ 2023-04-11 11:11 UTC (permalink / raw) To: Richard Biener Cc: juzhe.zhong, Jeff Law, gcc-patches, kito.cheng, palmer, jakub Richard Biener <rguenther@suse.de> writes: > On Tue, 11 Apr 2023, Richard Sandiford wrote: > >> <juzhe.zhong@rivai.ai> writes: >> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t >> > As far as I known, they don't have tuple type for partial vector. >> >> Yeah, there are no separate types for partial vectors, but there >> are separate modes. E.g. VNx2QI is a partial vector of QIs, >> with each QI stored in a 64-bit container. >> >> I agree with all the comments about the danger of growing the number of >> modes too much. But it looks like rtx_def should be easy to rearrange. >> Unless I'm missing something, there are less than 256 rtx codes at >> present. So one simple option would be to make the code 8 bits and >> the machine_mode 16 bits (and swap them, so that they stay well-aligned >> wrt their size). > > But then the bigger issue is tree_type_common where we agreed to > bump precision from 10 to 16 bits, with bumping machine_mode from > 8 to 16 we then are left with only 3 spare bits from 15 now - if > the comments are correct. Hmm, true. I guess the two options are: (1) Increase the size of the machine_mode field by the smallest amount possible (accepting that it will be non-power-of-2). I'd be surprised if that's a significant performance issue, since modes aren't as fundamental to trees as rtxes (and since a non-power-of-2 precision doesn't seem to have hurt). (2) Increase the size to 16 anyway, with the understanding that the mode is the first thing to shrink if we need a fourth spare bit. > In tree_decl_common we have 13 unused bits. > > IRA allocno would also increase and it's hard_regno field looks > suspiciously unaligned already (unless unsigned/signed re-aligns > bitfields). Yeah, agree it looks unaligned. If I've read it correctly, it looks like there's a 32-bit gap on 64-bit hosts before objects[2]. So perhaps we could move the mode fields there and put hard_regno where the modes are now. Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 11:11 ` Richard Sandiford @ 2023-04-11 11:19 ` juzhe.zhong 2023-04-11 13:50 ` Kito Cheng 0 siblings, 1 reply; 63+ messages in thread From: juzhe.zhong @ 2023-04-11 11:19 UTC (permalink / raw) To: richard.sandiford, rguenther Cc: jeffreyalaw, gcc-patches, kito.cheng, palmer, jakub [-- Attachment #1: Type: text/plain, Size: 2502 bytes --] 9 bit (512 modes) mode should be enough for RVV. In the future, I would expect we will have BF16 vector, FP16 vector,.. matrix modes. And I think it will not be more 512 modes in the future. juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2023-04-11 19:11 To: Richard Biener CC: juzhe.zhong; Jeff Law; gcc-patches; kito.cheng; palmer; jakub Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Richard Biener <rguenther@suse.de> writes: > On Tue, 11 Apr 2023, Richard Sandiford wrote: > >> <juzhe.zhong@rivai.ai> writes: >> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t >> > As far as I known, they don't have tuple type for partial vector. >> >> Yeah, there are no separate types for partial vectors, but there >> are separate modes. E.g. VNx2QI is a partial vector of QIs, >> with each QI stored in a 64-bit container. >> >> I agree with all the comments about the danger of growing the number of >> modes too much. But it looks like rtx_def should be easy to rearrange. >> Unless I'm missing something, there are less than 256 rtx codes at >> present. So one simple option would be to make the code 8 bits and >> the machine_mode 16 bits (and swap them, so that they stay well-aligned >> wrt their size). > > But then the bigger issue is tree_type_common where we agreed to > bump precision from 10 to 16 bits, with bumping machine_mode from > 8 to 16 we then are left with only 3 spare bits from 15 now - if > the comments are correct. Hmm, true. I guess the two options are: (1) Increase the size of the machine_mode field by the smallest amount possible (accepting that it will be non-power-of-2). I'd be surprised if that's a significant performance issue, since modes aren't as fundamental to trees as rtxes (and since a non-power-of-2 precision doesn't seem to have hurt). (2) Increase the size to 16 anyway, with the understanding that the mode is the first thing to shrink if we need a fourth spare bit. > In tree_decl_common we have 13 unused bits. > > IRA allocno would also increase and it's hard_regno field looks > suspiciously unaligned already (unless unsigned/signed re-aligns > bitfields). Yeah, agree it looks unaligned. If I've read it correctly, it looks like there's a 32-bit gap on 64-bit hosts before objects[2]. So perhaps we could move the mode fields there and put hard_regno where the modes are now. Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 11:19 ` juzhe.zhong @ 2023-04-11 13:50 ` Kito Cheng 2023-04-12 7:53 ` Richard Biener 0 siblings, 1 reply; 63+ messages in thread From: Kito Cheng @ 2023-04-11 13:50 UTC (permalink / raw) To: juzhe.zhong Cc: richard.sandiford, rguenther, jeffreyalaw, gcc-patches, palmer, jakub Let me give more explanation why RISC-V vector need so many modes than AArch64. The following will use "RVV" as an abbreviation for "RISC-V Vector" instructions. There are two key points here: - RVV has a concept called LMUL - you can understand that as register grouping, we can group up to 8 adjacent registers together and then operate at once, e.g. one vadd can operate on adding two 8-reg groups at once. - We have segment load/store that require vector tuple types. - AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or svint32x2_t. In order to model LMUL in backend, we have to the combination of scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8 different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF, so basically we'll have 7 (LMUL type) * 7 (scalar type) here. Okay, let's talk about tuple type AArch64 also having tuple type, but why is it not having such a huge number of modes? It mainly cause by LMUL; use a concrete example to explain why this cause different design on machine mode, using scalable vector mode with SI mode tuple here: AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI) svint32x3_t (VNx16SI) AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple, so we already have 8 different types for each scalar mode even though we don't count LMUL concept yet. RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t (VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t (VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI) Using VLEN=128 as the base type system, you can ignore it if you don't understand the meaning for now. And let's consider LMUL now, add LMUL=2 case here, RVV has a constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so we have only 3 extra modes for LMUL=2. RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t (VNx24SI) vint32m2x4_t (VNx32SI) However, there is a big problem RVV have different register constraint for different LMUL type, LMUL <= 1 can use any register, LMUL=2 type require register align to multiple-of-2 (v0, v2, …), and LMUL=4 type requires register align to multiple-of-4 (v0, v4, …). So vint32m1x2_t (LMUL=1x2) and vint32m2_t (LMUL=2) have the same size and NUNIT, but they have different register constraint, vint32m1x2_t is LMUL 1, so we don't have register constraint, but vint32m2_t is LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2. Based on the above reason, those tuple types must have separated machine mode even if they have the same size and NUNIT. Why Neon and SVE didn't have such an issue? Because SVE and Neon didn't have the concept of LMUL, so tuple type in SVE and Neon won't have two vector types that have the same size but different register constraints or alignment - one size is one type. So based on LMUL and register constraint issue of tuple type, we must have 37 types for vector tuples, and plus 48 modes variable-length vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds like still less than 256, so what happened? RVV has one more thing special thing in our type system due to ISA design, the minimal vector length of RVV is 32 bit unlike SVE guarantee, the minimal is 128 bits, so we did some tricks one our type system is we have a different mode for minimal vector length (MIN_VLEN) is 32, 64 or large or equal to 128, this design is because it would be more friendly for vectorizer, and also model things precisely for better code gen. e.g. vint32m1_t is VNx1SI in MIN_VLEN>=32 vint32m1_t is VNx2SI in MIN_VLEN>=64 vint32m1_t is VNx4SI in MIN_VLEN>=128 So actually we will have 37 * 3 modes for vector tuple mode, and now ~210 modes now (the result is little different than JuZhe's number since I ignore some mode isn't used in C, but it defined in machine mode due the the current GCC will always define all possible scalar mode for a vector mode) We also plan to add some traditional fixed length vector types like V2SI in future…and apparently 256 mode isn't enough for this plan :( ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-11 13:50 ` Kito Cheng @ 2023-04-12 7:53 ` Richard Biener 2023-04-12 9:06 ` Kito Cheng 0 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-04-12 7:53 UTC (permalink / raw) To: Kito Cheng Cc: juzhe.zhong, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub On Tue, 11 Apr 2023, Kito Cheng wrote: > Let me give more explanation why RISC-V vector need so many modes than AArch64. > > The following will use "RVV" as an abbreviation for "RISC-V Vector" > instructions. > > There are two key points here: > > - RVV has a concept called LMUL - you can understand that as register > grouping, we can group up to 8 adjacent registers together and then > operate at once, e.g. one vadd can operate on adding two 8-reg groups > at once. > - We have segment load/store that require vector tuple types. - > AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or > svint32x2_t. > > In order to model LMUL in backend, we have to the combination of > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8 > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF, > so basically we'll have 7 (LMUL type) * 7 (scalar type) here. Other archs have load/store-multiple instructions, IIRC those are modeled with the appropriate set of operands. Do RVV LMUL group inputs/outputs overlap with the non-LMUL grouped registers and can they be used as aliases or is this supposed to be implemented transparently on the register file level only? But yes, implementing this as operations on multi-register ops with large modes is probably the only sensible approach. I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you explain? Is that supposed to virtually increase the number of registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first and the third "virtual" register decomposed from r0) in GCC? To me the natural way would be a subreg of r0? Somehow RVV seems to have more knobs than necessary for tuning the actual vector register layout (aka N axes but only N-1 dimensions thus the axes are not orthogonal). > Okay, let's talk about tuple type AArch64 also having tuple type, but > why is it not having such a huge number of modes? It mainly cause by > LMUL; use a concrete example to explain why this cause different > design on machine mode, using scalable vector mode with SI mode tuple > here: > > AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI) > svint32x3_t (VNx16SI) > > AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple, > so we already have 8 different types for each scalar mode even though > we don't count LMUL concept yet. > > RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t > (VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t > (VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI) > > Using VLEN=128 as the base type system, you can ignore it if you don't > understand the meaning for now. > > And let's consider LMUL now, add LMUL=2 case here, RVV has a > constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so > we have only 3 extra modes for LMUL=2. > > RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t > (VNx24SI) vint32m2x4_t (VNx32SI) > > However, there is a big problem RVV have different register constraint > for different LMUL type, LMUL <= 1 can use any register, LMUL=2 type > require register align to multiple-of-2 (v0, v2, ?), and LMUL=4 type > requires register align to multiple-of-4 (v0, v4, ?). > > So vint32m1x2_t (LMUL=1x2) and vint32m2_t (LMUL=2) have the same size > and NUNIT, but they have different register constraint, vint32m1x2_t > is LMUL 1, so we don't have register constraint, but vint32m2_t is > LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2. > > Based on the above reason, those tuple types must have separated > machine mode even if they have the same size and NUNIT. > > Why Neon and SVE didn't have such an issue? Because SVE and Neon > didn't have the concept of LMUL, so tuple type in SVE and Neon won't > have two vector types that have the same size but different register > constraints or alignment - one size is one type. > > So based on LMUL and register constraint issue of tuple type, we must > have 37 types for vector tuples, and plus 48 modes variable-length > vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds > like still less than 256, so what happened? > > > RVV has one more thing special thing in our type system due to ISA > design, the minimal vector length of RVV is 32 bit unlike SVE > guarantee, the minimal is 128 bits, so we did some tricks one our type > system is we have a different mode for minimal vector length > (MIN_VLEN) is 32, 64 or large or equal to 128, this design is because > it would be more friendly for vectorizer, and also model things > precisely for better code gen. > > e.g. > > vint32m1_t is VNx1SI in MIN_VLEN>=32 > > vint32m1_t is VNx2SI in MIN_VLEN>=64 > > vint32m1_t is VNx4SI in MIN_VLEN>=128 > > So actually we will have 37 * 3 modes for vector tuple mode, and now > ~210 modes now (the result is little different than JuZhe's number > since I ignore some mode isn't used in C, but it defined in machine > mode due the the current GCC will always define all possible scalar > mode for a vector mode) > > We also plan to add some traditional fixed length vector types like > V2SI in future?and apparently 256 mode isn't enough for this plan :( > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-12 7:53 ` Richard Biener @ 2023-04-12 9:06 ` Kito Cheng 2023-04-12 9:21 ` Richard Biener 0 siblings, 1 reply; 63+ messages in thread From: Kito Cheng @ 2023-04-12 9:06 UTC (permalink / raw) To: Richard Biener Cc: juzhe.zhong, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub Hi Richard: > > In order to model LMUL in backend, we have to the combination of > > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8 > > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF, > > so basically we'll have 7 (LMUL type) * 7 (scalar type) here. > > Other archs have load/store-multiple instructions, IIRC those > are modeled with the appropriate set of operands. Do RVV LMUL > group inputs/outputs overlap with the non-LMUL grouped registers > and can they be used as aliases or is this supposed to be > implemented transparently on the register file level only? LMUL and non-LMUL (or LMUL=1) modes use the same vector register file. Reg for LMUL=1/2 : { {v0, v1, ...v31} } Reg for LMUL=1 : { {v0, v1, ...v31} } Reg for LMUL=2 : { {v0, v1}, {v2, v3}, ... {v30, v31} } // reg. must align to multiple of 2. Reg for LMUL=4 : { {v0, v1, v2, v3}, {v4, v5, v6, v7}, ... {v28, v29, v30, v31} } // reg. must align to multiple of 4. .. Reg for 2-tuples of LMUL=1 : { {v0, v1}, {v1, v2}, ... {v29, v30}, {v30, v31} } Reg for 2-tuples of LMUL=2 : { {v0, v1, v2, v3}, {v2, v3, v4, v5}, ... {v28, v29, v30, v31}, {v28, v29, v30, v31} } // reg. must align to multiple of 2. ... > But yes, implementing this as operations on multi-register > ops with large modes is probably the only sensible approach. > > I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you > explain? Is that supposed to virtually increase the number of > registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first > and the third "virtual" register decomposed from r0) in GCC? To > me the natural way would be a subreg of r0? > > Somehow RVV seems to have more knobs than necessary for tuning > the actual vector register layout (aka N axes but only N-1 dimensions > thus the axes are The concept of fractional LMUL is the same as the concept of AArch64's partial SVE vectors, so they can only access the lowest part, like SVE's partial vector. We want to spill/restore the exact size of those modes (1/2, 1/4, 1/8), so adding dedicated modes for those partial vector modes should be unavoidable IMO. And even if we use sub-vector, we still need to define those partial vector types. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-12 9:06 ` Kito Cheng @ 2023-04-12 9:21 ` Richard Biener 2023-04-12 9:31 ` Kito Cheng 0 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-04-12 9:21 UTC (permalink / raw) To: Kito Cheng Cc: juzhe.zhong, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub On Wed, 12 Apr 2023, Kito Cheng wrote: > Hi Richard: > > > > In order to model LMUL in backend, we have to the combination of > > > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8 > > > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF, > > > so basically we'll have 7 (LMUL type) * 7 (scalar type) here. > > > > Other archs have load/store-multiple instructions, IIRC those > > are modeled with the appropriate set of operands. Do RVV LMUL > > group inputs/outputs overlap with the non-LMUL grouped registers > > and can they be used as aliases or is this supposed to be > > implemented transparently on the register file level only? > > LMUL and non-LMUL (or LMUL=1) modes use the same vector register file. > > Reg for LMUL=1/2 : { {v0, v1, ...v31} } > Reg for LMUL=1 : { {v0, v1, ...v31} } > Reg for LMUL=2 : { {v0, v1}, {v2, v3}, ... {v30, v31} } // reg. must > align to multiple of 2. > Reg for LMUL=4 : { {v0, v1, v2, v3}, {v4, v5, v6, v7}, ... {v28, v29, > v30, v31} } // reg. must align to multiple of 4. > .. > Reg for 2-tuples of LMUL=1 : { {v0, v1}, {v1, v2}, ... {v29, v30}, {v30, v31} } > Reg for 2-tuples of LMUL=2 : { {v0, v1, v2, v3}, {v2, v3, v4, v5}, ... > {v28, v29, v30, v31}, {v28, v29, v30, v31} } // reg. must align to > multiple of 2. > ... > > > But yes, implementing this as operations on multi-register > > ops with large modes is probably the only sensible approach. > > > > I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you > > explain? Is that supposed to virtually increase the number of > > registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first > > and the third "virtual" register decomposed from r0) in GCC? To > > me the natural way would be a subreg of r0? > > > > Somehow RVV seems to have more knobs than necessary for tuning > > the actual vector register layout (aka N axes but only N-1 dimensions > > thus the axes are > > The concept of fractional LMUL is the same as the concept of AArch64's > partial SVE vectors, > so they can only access the lowest part, like SVE's partial vector. > > We want to spill/restore the exact size of those modes (1/2, 1/4, > 1/8), so adding dedicated modes for those partial vector modes should > be unavoidable IMO. > > And even if we use sub-vector, we still need to define those partial > vector types. Could you use integer modes for the fractional vectors? For computation you can always appropriately limit the LEN? ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-12 9:21 ` Richard Biener @ 2023-04-12 9:31 ` Kito Cheng 2023-04-12 23:22 ` 钟居哲 0 siblings, 1 reply; 63+ messages in thread From: Kito Cheng @ 2023-04-12 9:31 UTC (permalink / raw) To: Richard Biener Cc: juzhe.zhong, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub > > The concept of fractional LMUL is the same as the concept of AArch64's > > partial SVE vectors, > > so they can only access the lowest part, like SVE's partial vector. > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > 1/8), so adding dedicated modes for those partial vector modes should > > be unavoidable IMO. > > > > And even if we use sub-vector, we still need to define those partial > > vector types. > > Could you use integer modes for the fractional vectors? You mean using the scalar integer mode like using (subreg:SI (reg:VNx4SI) 0) to represent LMUL=1/4? (Assume VNx4SI is mode for M1) If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > For computation you can always appropriately limit the LEN? RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee the vector length is at least larger than N bits, but it's just guarantee the minimal length like SVE guarantee the minimal vector length is 128 bits ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-12 9:31 ` Kito Cheng @ 2023-04-12 23:22 ` 钟居哲 2023-04-13 13:06 ` Richard Sandiford 2023-05-05 1:43 ` Li, Pan2 0 siblings, 2 replies; 63+ messages in thread From: 钟居哲 @ 2023-04-12 23:22 UTC (permalink / raw) To: kito.cheng, rguenther Cc: richard.sandiford, Jeff Law, gcc-patches, palmer, jakub [-- Attachment #1: Type: text/plain, Size: 2256 bytes --] Yeah, like kito said. Turns out the tuple type model in ARM SVE is the optimal solution for RVV. And we like ARM SVE style implmentation. And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). Is it possible make it happen in tree_type_common and tree_decl_common, Richards? Thank you so much for all comments. juzhe.zhong@rivai.ai From: Kito Cheng Date: 2023-04-12 17:31 To: Richard Biener CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > The concept of fractional LMUL is the same as the concept of AArch64's > > partial SVE vectors, > > so they can only access the lowest part, like SVE's partial vector. > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > 1/8), so adding dedicated modes for those partial vector modes should > > be unavoidable IMO. > > > > And even if we use sub-vector, we still need to define those partial > > vector types. > > Could you use integer modes for the fractional vectors? You mean using the scalar integer mode like using (subreg:SI (reg:VNx4SI) 0) to represent LMUL=1/4? (Assume VNx4SI is mode for M1) If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > For computation you can always appropriately limit the LEN? RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee the vector length is at least larger than N bits, but it's just guarantee the minimal length like SVE guarantee the minimal vector length is 128 bits ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-12 23:22 ` 钟居哲 @ 2023-04-13 13:06 ` Richard Sandiford 2023-04-13 14:02 ` Richard Biener 2023-05-05 1:43 ` Li, Pan2 1 sibling, 1 reply; 63+ messages in thread From: Richard Sandiford @ 2023-04-13 13:06 UTC (permalink / raw) To: 钟居哲 Cc: kito.cheng, rguenther, Jeff Law, gcc-patches, palmer, jakub 钟居哲 <juzhe.zhong@rivai.ai> writes: > Yeah, like kito said. > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > And we like ARM SVE style implmentation. > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? I thought upthread we had a way forward for tree_type_common and tree_decl_common too, but maybe I only convinced myself. :) > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. I agree it doesn't make sense to try to squeeze modes out like this. It's a bit artificial, and like you say, it's likely only putting off the inevitable. Thanks, Richard > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > Thank you so much for all comments. > > > juzhe.zhong@rivai.ai ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-13 13:06 ` Richard Sandiford @ 2023-04-13 14:02 ` Richard Biener 2023-04-15 2:58 ` Hans-Peter Nilsson 0 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-04-13 14:02 UTC (permalink / raw) To: Richard Sandiford Cc: 钟居哲, kito.cheng, Jeff Law, gcc-patches, palmer, jakub On Thu, 13 Apr 2023, Richard Sandiford wrote: > ??? <juzhe.zhong@rivai.ai> writes: > > Yeah, like kito said. > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > And we like ARM SVE style implmentation. > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > I thought upthread we had a way forward for tree_type_common and > tree_decl_common too, but maybe I only convinced myself. :) > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes > > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > I agree it doesn't make sense to try to squeeze modes out like this. > It's a bit artificial, and like you say, it's likely only putting > off the inevitable. Agreed. Let's do the proposed TYPE_PRECISION change first and then see how bad 16bit mode will be. Richard. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-13 14:02 ` Richard Biener @ 2023-04-15 2:58 ` Hans-Peter Nilsson 2023-04-17 6:38 ` Richard Biener 0 siblings, 1 reply; 63+ messages in thread From: Hans-Peter Nilsson @ 2023-04-15 2:58 UTC (permalink / raw) To: Richard Biener Cc: Richard Sandiford, 钟居哲, kito.cheng, Jeff Law, gcc-patches, palmer, jakub On Thu, 13 Apr 2023, Richard Biener via Gcc-patches wrote: > On Thu, 13 Apr 2023, Richard Sandiford wrote: > > > ??? <juzhe.zhong@rivai.ai> writes: > > > Yeah, like kito said. > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > > And we like ARM SVE style implmentation. > > > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > I thought upthread we had a way forward for tree_type_common and > > tree_decl_common too, but maybe I only convinced myself. :) > > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes > > > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > I agree it doesn't make sense to try to squeeze modes out like this. > > It's a bit artificial, and like you say, it's likely only putting > > off the inevitable. > > Agreed. Let's do the proposed TYPE_PRECISION change first and then > see how bad 16bit mode will be. (I don't see the following obvious having been pointed out, or why it doesn't apply, but if so, I hope you don't mind repeating it, so:) If after all, a change to the size of the code and mode bit-fields in rtx_def is necessary, like to still fit 64 bytes such become non-byte sizes *and* that matters for compilation time, can that change please be made target-dependent? Not as in set by a target macro, but rather deduced from the number of modes defined by the target? After all, that number is readily available (or if there's an order problem seems likely to easily be made available to the rtx_def build-time definition (as opposed to a gen-* -time definition). brgds, H-P ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-15 2:58 ` Hans-Peter Nilsson @ 2023-04-17 6:38 ` Richard Biener 2023-04-20 5:37 ` Hans-Peter Nilsson 0 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-04-17 6:38 UTC (permalink / raw) To: Hans-Peter Nilsson Cc: Richard Sandiford, 钟居哲, kito.cheng, Jeff Law, gcc-patches, palmer, jakub On Fri, 14 Apr 2023, Hans-Peter Nilsson wrote: > On Thu, 13 Apr 2023, Richard Biener via Gcc-patches wrote: > > > On Thu, 13 Apr 2023, Richard Sandiford wrote: > > > > > ??? <juzhe.zhong@rivai.ai> writes: > > > > Yeah, like kito said. > > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > > > And we like ARM SVE style implmentation. > > > > > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > > > I thought upthread we had a way forward for tree_type_common and > > > tree_decl_common too, but maybe I only convinced myself. :) > > > > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes > > > > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > > > I agree it doesn't make sense to try to squeeze modes out like this. > > > It's a bit artificial, and like you say, it's likely only putting > > > off the inevitable. > > > > Agreed. Let's do the proposed TYPE_PRECISION change first and then > > see how bad 16bit mode will be. > > (I don't see the following obvious having been pointed out, or > why it doesn't apply, but if so, I hope you don't mind repeating > it, so:) > > If after all, a change to the size of the code and mode > bit-fields in rtx_def is necessary, like to still fit 64 bytes > such become non-byte sizes *and* that matters for compilation > time, can that change please be made target-dependent? Not as > in set by a target macro, but rather deduced from the number of > modes defined by the target? > > After all, that number is readily available (or if there's an > order problem seems likely to easily be made available to the > rtx_def build-time definition (as opposed to a gen-* -time > definition). But it gets us in the "wrong" direction with the goal of having pluggable targets (aka a multi-target compiler)? Anyway, I suggest we'll see how the space requirements work out. We should definitely try hard to put the fields on a byte boundary so accesses become at most a load + and. Richard. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-17 6:38 ` Richard Biener @ 2023-04-20 5:37 ` Hans-Peter Nilsson 0 siblings, 0 replies; 63+ messages in thread From: Hans-Peter Nilsson @ 2023-04-20 5:37 UTC (permalink / raw) To: Richard Biener Cc: Richard Sandiford, 钟居哲, kito.cheng, Jeff Law, gcc-patches, palmer, jakub On Mon, 17 Apr 2023, Richard Biener wrote: > On Fri, 14 Apr 2023, Hans-Peter Nilsson wrote: > > If after all, a change to the size of the code and mode > > bit-fields in rtx_def is necessary, like to still fit 64 bytes (Sorry: 64 bits, not counting the union u.) > > such become non-byte sizes *and* that matters for compilation > > time, can that change please be made target-dependent? Not as > > in set by a target macro, but rather deduced from the number of > > modes defined by the target? > > > > After all, that number is readily available (or if there's an > > order problem seems likely to easily be made available to the > > rtx_def build-time definition (as opposed to a gen-* -time > > definition). > > But it gets us in the "wrong" direction with the goal of having > pluggable targets (aka a multi-target compiler)? But also away from the slippery slope of slowing down gcc compilation (building and running) while not adding any observable value. (Also, a unified gcc would be years in the future, and the proposal is easily removed.) > Anyway, I suggest we'll see how the space requirements work out. > We should definitely try hard to put the fields on a byte > boundary so accesses become at most a load + and. I'll be quiet until then. :) brgds, H-P ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-12 23:22 ` 钟居哲 2023-04-13 13:06 ` Richard Sandiford @ 2023-05-05 1:43 ` Li, Pan2 2023-05-05 6:25 ` Richard Biener 1 sibling, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-05 1:43 UTC (permalink / raw) To: 钟居哲, kito.cheng, rguenther Cc: richard.sandiford, Jeff Law, gcc-patches, palmer, jakub I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". Consider some variance of valgrind, it looks like the impact to bytes allocated may be limited. However, I am still running this for x86, it will take more than 30 hours for each iteration... RISC-V GCC Version: >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 (experimental) Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Bytes allocated with O2: ----------------------------------------------------------------------------------------------------- Benchmark | upstream | with this PATCH ----------------------------------------------------------------------------------------------------- 400.perlbench | 29699642875 | 29949876269 ~0.0% 401.bzip2 | 1641041659 | 1755563972 +6.95% 403.gcc | 68447500516 | 68900883291 ~0.0% 429.mcf | 1433156462 | 1433253373 ~0.0% 445.gobmk | 14239225210 | 14463438465 ~0.0% 456.hmmer | 9635955623 | 9808534948 +1.8% 458.sjeng | 2419478204 | 2545478940 +5.4% 462.libquantum | 1686404489 | 1800884197 +6.8% 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% 471.omnetpp | 40814627684 | 41185864529 ~0.0% 473.astar | 3807097529 | 3928428183 +3.2% 483.xalancbmk | 152959418167 | 154201738843 ~0.0% Bytes allocated with Ofast + funroll-loops: ------------------------------------------------------------------------------------------ Benchmark | upstream | with this PATCH ------------------------------------------------------------------------------------------ 400.perlbench | 39491184733 | 39223020267 ~0.0% 401.bzip2 | 2843871517 | 2730383463 ~0% 403.gcc | 84195991898 | 83730632955 -4.0% 429.mcf | 1481381164 | 1367309565 -7.7% 445.gobmk | 20123943663 | 19886116394 -1.2% 456.hmmer | 12302445139 | 12121745383 -1.5% 458.sjeng | 3884712615 | 3755481930 -3.3% 462.libquantum | 1966619940 | 1852274342 -5.8% 464.h264ref | 19219365552 | 19050288201 ~0.0% 471.omnetpp | 45701008325 | 45327805079 ~0.0% 473.astar | 4118600354 | 3995943705 -3.0% 483.xalancbmk | 179481305182 | 178160306301 ~0.0% Pan -----Original Message----- From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ??? Sent: Thursday, April 13, 2023 7:23 AM To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de> Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Yeah, like kito said. Turns out the tuple type model in ARM SVE is the optimal solution for RVV. And we like ARM SVE style implmentation. And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). Is it possible make it happen in tree_type_common and tree_decl_common, Richards? Thank you so much for all comments. juzhe.zhong@rivai.ai From: Kito Cheng Date: 2023-04-12 17:31 To: Richard Biener CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > The concept of fractional LMUL is the same as the concept of > > AArch64's partial SVE vectors, so they can only access the lowest > > part, like SVE's partial vector. > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > 1/8), so adding dedicated modes for those partial vector modes > > should be unavoidable IMO. > > > > And even if we use sub-vector, we still need to define those partial > > vector types. > > Could you use integer modes for the fractional vectors? You mean using the scalar integer mode like using (subreg:SI (reg:VNx4SI) 0) to represent LMUL=1/4? (Assume VNx4SI is mode for M1) If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > For computation you can always appropriately limit the LEN? RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee the vector length is at least larger than N bits, but it's just guarantee the minimal length like SVE guarantee the minimal vector length is 128 bits ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-05 1:43 ` Li, Pan2 @ 2023-05-05 6:25 ` Richard Biener 2023-05-06 1:10 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-05-05 6:25 UTC (permalink / raw) To: Li, Pan2 Cc: 钟居哲, kito.cheng, richard.sandiford, Jeff Law, gcc-patches, palmer, jakub On Fri, 5 May 2023, Li, Pan2 wrote: > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > Consider some variance of valgrind, it looks like the impact to bytes > allocated may be limited. However, I am still running this for x86, it > will take more than 30 hours for each iteration... I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. Richard. > RISC-V GCC Version: > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 (experimental) > Copyright (C) 2023 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > Bytes allocated with O2: > ----------------------------------------------------------------------------------------------------- > Benchmark | upstream | with this PATCH > ----------------------------------------------------------------------------------------------------- > 400.perlbench | 29699642875 | 29949876269 ~0.0% > 401.bzip2 | 1641041659 | 1755563972 +6.95% > 403.gcc | 68447500516 | 68900883291 ~0.0% > 429.mcf | 1433156462 | 1433253373 ~0.0% > 445.gobmk | 14239225210 | 14463438465 ~0.0% > 456.hmmer | 9635955623 | 9808534948 +1.8% > 458.sjeng | 2419478204 | 2545478940 +5.4% > 462.libquantum | 1686404489 | 1800884197 +6.8% > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > 473.astar | 3807097529 | 3928428183 +3.2% > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------ > Benchmark | upstream | with this PATCH > ------------------------------------------------------------------------------------------ > 400.perlbench | 39491184733 | 39223020267 ~0.0% > 401.bzip2 | 2843871517 | 2730383463 ~0% > 403.gcc | 84195991898 | 83730632955 -4.0% > 429.mcf | 1481381164 | 1367309565 -7.7% > 445.gobmk | 20123943663 | 19886116394 -1.2% > 456.hmmer | 12302445139 | 12121745383 -1.5% > 458.sjeng | 3884712615 | 3755481930 -3.3% > 462.libquantum | 1966619940 | 1852274342 -5.8% > 464.h264ref | 19219365552 | 19050288201 ~0.0% > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > 473.astar | 4118600354 | 3995943705 -3.0% > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > Pan > > > -----Original Message----- > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ??? > Sent: Thursday, April 13, 2023 7:23 AM > To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de> > Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > Yeah, like kito said. > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > And we like ARM SVE style implmentation. > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > Thank you so much for all comments. > > > juzhe.zhong@rivai.ai > > From: Kito Cheng > Date: 2023-04-12 17:31 > To: Richard Biener > CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > > The concept of fractional LMUL is the same as the concept of > > > AArch64's partial SVE vectors, so they can only access the lowest > > > part, like SVE's partial vector. > > > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > > 1/8), so adding dedicated modes for those partial vector modes > > > should be unavoidable IMO. > > > > > > And even if we use sub-vector, we still need to define those partial > > > vector types. > > > > Could you use integer modes for the fractional vectors? > > You mean using the scalar integer mode like using (subreg:SI > (reg:VNx4SI) 0) to represent > LMUL=1/4? > (Assume VNx4SI is mode for M1) > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > For computation you can always appropriately limit the LEN? > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee the vector length is at least larger than N bits, but it's just guarantee the minimal length like SVE guarantee the minimal vector length is 128 bits > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-05 6:25 ` Richard Biener @ 2023-05-06 1:10 ` Li, Pan2 2023-05-06 1:53 ` Kito Cheng 0 siblings, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-06 1:10 UTC (permalink / raw) To: Richard Biener Cc: 钟居哲, kito.cheng, richard.sandiford, Jeff Law, gcc-patches, palmer, jakub Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. Bytes allocated with O2: ----------------------------------------------------------------------------------------------------- Benchmark | upstream | with this PATCH ----------------------------------------------------------------------------------------------------- 400.perlbench | 25286185160 | 25176544846 ~0.0% 401.bzip2 | 1429883731 | 1391040027 -2.7% 403.gcc | 55023568981 | 54798890746 ~0.0% 429.mcf | 1360975660 | 1321537710 -2.9% 445.gobmk | 12791636502 | 12666523431 -1.0% 456.hmmer | 9354433652 | 9279189174 ~0.0% 458.sjeng | 1991260562 | 1944031904 -2.4% 462.libquantum | 1725112078 | 1684213981 -2.4% 464.h264ref | 8597673515 | 8528855778 ~0.0% 471.omnetpp | 37613034778 | 37432278047 ~0.0% 473.astar | 3817295518 | 3772460508 -1.2% 483.xalancbmk | 149418776991 | 148545162207 ~0.0% Bytes allocated with Ofast + funroll-loops: ------------------------------------------------------------------------------------------ Benchmark | upstream | with this PATCH ------------------------------------------------------------------------------------------ 400.perlbench | 30438407499 | 30574152897 ~0.0% 401.bzip2 | 2277114519 | 2319432664 +1.9% 403.gcc | 64499664264 | 64781232731 ~0.0% 429.mcf | 1361486758 | 1399942116 +2.8% 445.gobmk | 15258056111 | 15396801542 +1.0% 456.hmmer | 10896615649 | 10936223486 ~0.0% 458.sjeng | 2592620709 | 2641687496 +1.9% 462.libquantum | 1814487525 | 1854518500 +2.2% 464.h264ref | 13528736878 | 13614517066 ~0.0% 471.omnetpp | 38721066702 | 38910524667 ~0.0% 473.astar | 3924015756 | 3968057027 +1.1% 483.xalancbmk | 165897692838 | 166843885880 ~0.0% Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Friday, May 5, 2023 2:25 PM To: Li, Pan2 <pan2.li@intel.com> Cc: 钟居哲 <juzhe.zhong@rivai.ai>; kito.cheng <kito.cheng@gmail.com>; richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Fri, 5 May 2023, Li, Pan2 wrote: > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > Consider some variance of valgrind, it looks like the impact to bytes > allocated may be limited. However, I am still running this for x86, it > will take more than 30 hours for each iteration... I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. Richard. > RISC-V GCC Version: > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > Bytes allocated with O2: > ----------------------------------------------------------------------------------------------------- > Benchmark | upstream | with this PATCH > ----------------------------------------------------------------------------------------------------- > 400.perlbench | 29699642875 | 29949876269 ~0.0% > 401.bzip2 | 1641041659 | 1755563972 +6.95% > 403.gcc | 68447500516 | 68900883291 ~0.0% > 429.mcf | 1433156462 | 1433253373 ~0.0% > 445.gobmk | 14239225210 | 14463438465 ~0.0% > 456.hmmer | 9635955623 | 9808534948 +1.8% > 458.sjeng | 2419478204 | 2545478940 +5.4% > 462.libquantum | 1686404489 | 1800884197 +6.8% > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > 473.astar | 3807097529 | 3928428183 +3.2% > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------ > Benchmark | upstream | with this PATCH > ------------------------------------------------------------------------------------------ > 400.perlbench | 39491184733 | 39223020267 ~0.0% > 401.bzip2 | 2843871517 | 2730383463 ~0% > 403.gcc | 84195991898 | 83730632955 -4.0% > 429.mcf | 1481381164 | 1367309565 -7.7% > 445.gobmk | 20123943663 | 19886116394 -1.2% > 456.hmmer | 12302445139 | 12121745383 -1.5% > 458.sjeng | 3884712615 | 3755481930 -3.3% > 462.libquantum | 1966619940 | 1852274342 -5.8% > 464.h264ref | 19219365552 | 19050288201 ~0.0% > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > 473.astar | 4118600354 | 3995943705 -3.0% > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > Pan > > > -----Original Message----- > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ??? > Sent: Thursday, April 13, 2023 7:23 AM > To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de> > Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law > <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer > <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Yeah, like kito said. > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > And we like ARM SVE style implmentation. > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > Thank you so much for all comments. > > > juzhe.zhong@rivai.ai > > From: Kito Cheng > Date: 2023-04-12 17:31 > To: Richard Biener > CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; > palmer; jakub > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > > The concept of fractional LMUL is the same as the concept of > > > AArch64's partial SVE vectors, so they can only access the lowest > > > part, like SVE's partial vector. > > > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > > 1/8), so adding dedicated modes for those partial vector modes > > > should be unavoidable IMO. > > > > > > And even if we use sub-vector, we still need to define those > > > partial vector types. > > > > Could you use integer modes for the fractional vectors? > > You mean using the scalar integer mode like using (subreg:SI > (reg:VNx4SI) 0) to represent > LMUL=1/4? > (Assume VNx4SI is mode for M1) > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > For computation you can always appropriately limit the LEN? > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to > guarantee the vector length is at least larger than N bits, but it's > just guarantee the minimal length like SVE guarantee the minimal > vector length is 128 bits > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 1:10 ` Li, Pan2 @ 2023-05-06 1:53 ` Kito Cheng 2023-05-06 1:59 ` juzhe.zhong 0 siblings, 1 reply; 63+ messages in thread From: Kito Cheng @ 2023-05-06 1:53 UTC (permalink / raw) To: Li, Pan2 Cc: Richard Biener, 钟居哲, richard.sandiford, Jeff Law, gcc-patches, palmer, jakub Hi Pan: Could you try to apply the following diff and measure again? This makes tree_type_common size unchanged. sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this diff) diff --git a/gcc/tree-core.h b/gcc/tree-core.h index af795aa81f98..b8ccfa407ed9 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { tree attributes; unsigned int uid; + ENUM_BITFIELD(machine_mode) mode : 16; + unsigned int precision : 10; unsigned no_force_blk_flag : 1; unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common { unsigned restrict_flag : 1; unsigned contains_placeholder_bits : 2; - ENUM_BITFIELD(machine_mode) mode : 16; /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common { unsigned empty_flag : 1; unsigned indivisible_p : 1; unsigned no_named_args_stdarg_p : 1; - unsigned spare : 15; + unsigned spare : 7; alias_set_type alias_set; tree pointer_to; On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > Bytes allocated with O2: > ----------------------------------------------------------------------------------------------------- > Benchmark | upstream | with this PATCH > ----------------------------------------------------------------------------------------------------- > 400.perlbench | 25286185160 | 25176544846 ~0.0% > 401.bzip2 | 1429883731 | 1391040027 -2.7% > 403.gcc | 55023568981 | 54798890746 ~0.0% > 429.mcf | 1360975660 | 1321537710 -2.9% > 445.gobmk | 12791636502 | 12666523431 -1.0% > 456.hmmer | 9354433652 | 9279189174 ~0.0% > 458.sjeng | 1991260562 | 1944031904 -2.4% > 462.libquantum | 1725112078 | 1684213981 -2.4% > 464.h264ref | 8597673515 | 8528855778 ~0.0% > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > 473.astar | 3817295518 | 3772460508 -1.2% > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------ > Benchmark | upstream | with this PATCH > ------------------------------------------------------------------------------------------ > 400.perlbench | 30438407499 | 30574152897 ~0.0% > 401.bzip2 | 2277114519 | 2319432664 +1.9% > 403.gcc | 64499664264 | 64781232731 ~0.0% > 429.mcf | 1361486758 | 1399942116 +2.8% > 445.gobmk | 15258056111 | 15396801542 +1.0% > 456.hmmer | 10896615649 | 10936223486 ~0.0% > 458.sjeng | 2592620709 | 2641687496 +1.9% > 462.libquantum | 1814487525 | 1854518500 +2.2% > 464.h264ref | 13528736878 | 13614517066 ~0.0% > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > 473.astar | 3924015756 | 3968057027 +1.1% > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > Pan > > > -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Friday, May 5, 2023 2:25 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: 钟居哲 <juzhe.zhong@rivai.ai>; kito.cheng <kito.cheng@gmail.com>; richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > Consider some variance of valgrind, it looks like the impact to bytes > > allocated may be limited. However, I am still running this for x86, it > > will take more than 30 hours for each iteration... > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > Richard. > > > RISC-V GCC Version: > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > This is free software; see the source for copying conditions. There > > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > 473.astar | 3807097529 | 3928428183 +3.2% > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > 403.gcc | 84195991898 | 83730632955 -4.0% > > 429.mcf | 1481381164 | 1367309565 -7.7% > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > 473.astar | 4118600354 | 3995943705 -3.0% > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ??? > > Sent: Thursday, April 13, 2023 7:23 AM > > To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de> > > Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law > > <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer > > <palmer@dabbelt.com>; jakub <jakub@redhat.com> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > Yeah, like kito said. > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > And we like ARM SVE style implmentation. > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > Thank you so much for all comments. > > > > > > juzhe.zhong@rivai.ai > > > > From: Kito Cheng > > Date: 2023-04-12 17:31 > > To: Richard Biener > > CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; > > palmer; jakub > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > The concept of fractional LMUL is the same as the concept of > > > > AArch64's partial SVE vectors, so they can only access the lowest > > > > part, like SVE's partial vector. > > > > > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > > > 1/8), so adding dedicated modes for those partial vector modes > > > > should be unavoidable IMO. > > > > > > > > And even if we use sub-vector, we still need to define those > > > > partial vector types. > > > > > > Could you use integer modes for the fractional vectors? > > > > You mean using the scalar integer mode like using (subreg:SI > > (reg:VNx4SI) 0) to represent > > LMUL=1/4? > > (Assume VNx4SI is mode for M1) > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > For computation you can always appropriately limit the LEN? > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to > > guarantee the vector length is at least larger than N bits, but it's > > just guarantee the minimal length like SVE guarantee the minimal > > vector length is 128 bits > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 1:53 ` Kito Cheng @ 2023-05-06 1:59 ` juzhe.zhong 2023-05-06 2:12 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: juzhe.zhong @ 2023-05-06 1:59 UTC (permalink / raw) To: kito.cheng, pan2.li Cc: rguenther, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub [-- Attachment #1: Type: text/plain, Size: 11481 bytes --] Yeah, you should also swap mode and code in rtx_def according to Richard suggestion since it will not change the rtx_def data structure. I think the only problem is the mode in tree data structure. juzhe.zhong@rivai.ai From: Kito Cheng Date: 2023-05-06 09:53 To: Li, Pan2 CC: Richard Biener; 钟居哲; richard.sandiford; Jeff Law; gcc-patches; palmer; jakub Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Hi Pan: Could you try to apply the following diff and measure again? This makes tree_type_common size unchanged. sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this diff) diff --git a/gcc/tree-core.h b/gcc/tree-core.h index af795aa81f98..b8ccfa407ed9 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { tree attributes; unsigned int uid; + ENUM_BITFIELD(machine_mode) mode : 16; + unsigned int precision : 10; unsigned no_force_blk_flag : 1; unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common { unsigned restrict_flag : 1; unsigned contains_placeholder_bits : 2; - ENUM_BITFIELD(machine_mode) mode : 16; /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common { unsigned empty_flag : 1; unsigned indivisible_p : 1; unsigned no_named_args_stdarg_p : 1; - unsigned spare : 15; + unsigned spare : 7; alias_set_type alias_set; tree pointer_to; On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > Bytes allocated with O2: > ----------------------------------------------------------------------------------------------------- > Benchmark | upstream | with this PATCH > ----------------------------------------------------------------------------------------------------- > 400.perlbench | 25286185160 | 25176544846 ~0.0% > 401.bzip2 | 1429883731 | 1391040027 -2.7% > 403.gcc | 55023568981 | 54798890746 ~0.0% > 429.mcf | 1360975660 | 1321537710 -2.9% > 445.gobmk | 12791636502 | 12666523431 -1.0% > 456.hmmer | 9354433652 | 9279189174 ~0.0% > 458.sjeng | 1991260562 | 1944031904 -2.4% > 462.libquantum | 1725112078 | 1684213981 -2.4% > 464.h264ref | 8597673515 | 8528855778 ~0.0% > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > 473.astar | 3817295518 | 3772460508 -1.2% > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------ > Benchmark | upstream | with this PATCH > ------------------------------------------------------------------------------------------ > 400.perlbench | 30438407499 | 30574152897 ~0.0% > 401.bzip2 | 2277114519 | 2319432664 +1.9% > 403.gcc | 64499664264 | 64781232731 ~0.0% > 429.mcf | 1361486758 | 1399942116 +2.8% > 445.gobmk | 15258056111 | 15396801542 +1.0% > 456.hmmer | 10896615649 | 10936223486 ~0.0% > 458.sjeng | 2592620709 | 2641687496 +1.9% > 462.libquantum | 1814487525 | 1854518500 +2.2% > 464.h264ref | 13528736878 | 13614517066 ~0.0% > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > 473.astar | 3924015756 | 3968057027 +1.1% > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > Pan > > > -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Friday, May 5, 2023 2:25 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: 钟居哲 <juzhe.zhong@rivai.ai>; kito.cheng <kito.cheng@gmail.com>; richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > Consider some variance of valgrind, it looks like the impact to bytes > > allocated may be limited. However, I am still running this for x86, it > > will take more than 30 hours for each iteration... > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > Richard. > > > RISC-V GCC Version: > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > This is free software; see the source for copying conditions. There > > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > 473.astar | 3807097529 | 3928428183 +3.2% > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > 403.gcc | 84195991898 | 83730632955 -4.0% > > 429.mcf | 1481381164 | 1367309565 -7.7% > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > 473.astar | 4118600354 | 3995943705 -3.0% > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ??? > > Sent: Thursday, April 13, 2023 7:23 AM > > To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de> > > Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law > > <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer > > <palmer@dabbelt.com>; jakub <jakub@redhat.com> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > Yeah, like kito said. > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > And we like ARM SVE style implmentation. > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > Thank you so much for all comments. > > > > > > juzhe.zhong@rivai.ai > > > > From: Kito Cheng > > Date: 2023-04-12 17:31 > > To: Richard Biener > > CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; > > palmer; jakub > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > The concept of fractional LMUL is the same as the concept of > > > > AArch64's partial SVE vectors, so they can only access the lowest > > > > part, like SVE's partial vector. > > > > > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > > > 1/8), so adding dedicated modes for those partial vector modes > > > > should be unavoidable IMO. > > > > > > > > And even if we use sub-vector, we still need to define those > > > > partial vector types. > > > > > > Could you use integer modes for the fractional vectors? > > > > You mean using the scalar integer mode like using (subreg:SI > > (reg:VNx4SI) 0) to represent > > LMUL=1/4? > > (Assume VNx4SI is mode for M1) > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > For computation you can always appropriately limit the LEN? > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to > > guarantee the vector length is at least larger than N bits, but it's > > just guarantee the minimal length like SVE guarantee the minimal > > vector length is 128 bits > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 1:59 ` juzhe.zhong @ 2023-05-06 2:12 ` Li, Pan2 2023-05-06 2:18 ` Kito Cheng 0 siblings, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-06 2:12 UTC (permalink / raw) To: juzhe.zhong, kito.cheng Cc: rguenther, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub [-- Attachment #1: Type: text/plain, Size: 13098 bytes --] Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V. Pan From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> Sent: Saturday, May 6, 2023 10:00 AM To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com> Cc: rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Yeah, you should also swap mode and code in rtx_def according to Richard suggestion since it will not change the rtx_def data structure. I think the only problem is the mode in tree data structure. ________________________________ juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> From: Kito Cheng<mailto:kito.cheng@gmail.com> Date: 2023-05-06 09:53 To: Li, Pan2<mailto:pan2.li@intel.com> CC: Richard Biener<mailto:rguenther@suse.de>; 钟居哲<mailto:juzhe.zhong@rivai.ai>; richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff Law<mailto:jeffreyalaw@gmail.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Hi Pan: Could you try to apply the following diff and measure again? This makes tree_type_common size unchanged. sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this diff) diff --git a/gcc/tree-core.h b/gcc/tree-core.h index af795aa81f98..b8ccfa407ed9 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { tree attributes; unsigned int uid; + ENUM_BITFIELD(machine_mode) mode : 16; + unsigned int precision : 10; unsigned no_force_blk_flag : 1; unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common { unsigned restrict_flag : 1; unsigned contains_placeholder_bits : 2; - ENUM_BITFIELD(machine_mode) mode : 16; /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common { unsigned empty_flag : 1; unsigned indivisible_p : 1; unsigned no_named_args_stdarg_p : 1; - unsigned spare : 15; + unsigned spare : 7; alias_set_type alias_set; tree pointer_to; On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > Bytes allocated with O2: > ----------------------------------------------------------------------------------------------------- > Benchmark | upstream | with this PATCH > ----------------------------------------------------------------------------------------------------- > 400.perlbench | 25286185160 | 25176544846 ~0.0% > 401.bzip2 | 1429883731 | 1391040027 -2.7% > 403.gcc | 55023568981 | 54798890746 ~0.0% > 429.mcf | 1360975660 | 1321537710 -2.9% > 445.gobmk | 12791636502 | 12666523431 -1.0% > 456.hmmer | 9354433652 | 9279189174 ~0.0% > 458.sjeng | 1991260562 | 1944031904 -2.4% > 462.libquantum | 1725112078 | 1684213981 -2.4% > 464.h264ref | 8597673515 | 8528855778 ~0.0% > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > 473.astar | 3817295518 | 3772460508 -1.2% > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------ > Benchmark | upstream | with this PATCH > ------------------------------------------------------------------------------------------ > 400.perlbench | 30438407499 | 30574152897 ~0.0% > 401.bzip2 | 2277114519 | 2319432664 +1.9% > 403.gcc | 64499664264 | 64781232731 ~0.0% > 429.mcf | 1361486758 | 1399942116 +2.8% > 445.gobmk | 15258056111 | 15396801542 +1.0% > 456.hmmer | 10896615649 | 10936223486 ~0.0% > 458.sjeng | 2592620709 | 2641687496 +1.9% > 462.libquantum | 1814487525 | 1854518500 +2.2% > 464.h264ref | 13528736878 | 13614517066 ~0.0% > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > 473.astar | 3924015756 | 3968057027 +1.1% > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > Pan > > > -----Original Message----- > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > Sent: Friday, May 5, 2023 2:25 PM > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>> > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > Consider some variance of valgrind, it looks like the impact to bytes > > allocated may be limited. However, I am still running this for x86, it > > will take more than 30 hours for each iteration... > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > Richard. > > > RISC-V GCC Version: > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > This is free software; see the source for copying conditions. There > > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > 473.astar | 3807097529 | 3928428183 +3.2% > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > 403.gcc | 84195991898 | 83730632955 -4.0% > > 429.mcf | 1481381164 | 1367309565 -7.7% > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > 473.astar | 4118600354 | 3995943705 -3.0% > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ??? > > Sent: Thursday, April 13, 2023 7:23 AM > > To: kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther <rguenther@suse.de<mailto:rguenther@suse.de>> > > Cc: richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law > > <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > Yeah, like kito said. > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > And we like ARM SVE style implmentation. > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > Thank you so much for all comments. > > > > > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > > > From: Kito Cheng > > Date: 2023-04-12 17:31 > > To: Richard Biener > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; richard.sandiford; jeffreyalaw; gcc-patches; > > palmer; jakub > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > The concept of fractional LMUL is the same as the concept of > > > > AArch64's partial SVE vectors, so they can only access the lowest > > > > part, like SVE's partial vector. > > > > > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > > > 1/8), so adding dedicated modes for those partial vector modes > > > > should be unavoidable IMO. > > > > > > > > And even if we use sub-vector, we still need to define those > > > > partial vector types. > > > > > > Could you use integer modes for the fractional vectors? > > > > You mean using the scalar integer mode like using (subreg:SI > > (reg:VNx4SI) 0) to represent > > LMUL=1/4? > > (Assume VNx4SI is mode for M1) > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > For computation you can always appropriately limit the LEN? > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to > > guarantee the vector length is at least larger than N bits, but it's > > just guarantee the minimal length like SVE guarantee the minimal > > vector length is 128 bits > > > > > > -- > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 2:12 ` Li, Pan2 @ 2023-05-06 2:18 ` Kito Cheng 2023-05-06 2:20 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: Kito Cheng @ 2023-05-06 2:18 UTC (permalink / raw) To: Li, Pan2 Cc: juzhe.zhong, rguenther, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much. On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V. > > Pan > > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> > Sent: Saturday, May 6, 2023 10:00 AM > To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com> > Cc: rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > Yeah, you should also swap mode and code in rtx_def according to Richard suggestion > since it will not change the rtx_def data structure. > > I think the only problem is the mode in tree data structure. > ________________________________ > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > From: Kito Cheng<mailto:kito.cheng@gmail.com> > Date: 2023-05-06 09:53 > To: Li, Pan2<mailto:pan2.li@intel.com> > CC: Richard Biener<mailto:rguenther@suse.de>; 钟居哲<mailto:juzhe.zhong@rivai.ai>; richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff Law<mailto:jeffreyalaw@gmail.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > Hi Pan: > > Could you try to apply the following diff and measure again? This > makes tree_type_common size unchanged. > > > sizeof tree_type_common= 128 (mode = 8 bit) > sizeof tree_type_common= 136 (mode = 16 bit) > sizeof tree_type_common= 128 (mode = 8 bit w/ this diff) > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index af795aa81f98..b8ccfa407ed9 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { > tree attributes; > unsigned int uid; > > + ENUM_BITFIELD(machine_mode) mode : 16; > + > unsigned int precision : 10; > unsigned no_force_blk_flag : 1; > unsigned needs_constructing_flag : 1; > @@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common { > unsigned restrict_flag : 1; > unsigned contains_placeholder_bits : 2; > > - ENUM_BITFIELD(machine_mode) mode : 16; > > /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. > TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ > @@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common { > unsigned empty_flag : 1; > unsigned indivisible_p : 1; > unsigned no_named_args_stdarg_p : 1; > - unsigned spare : 15; > + unsigned spare : 7; > > alias_set_type alias_set; > tree pointer_to; > > On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: > > > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 25286185160 | 25176544846 ~0.0% > > 401.bzip2 | 1429883731 | 1391040027 -2.7% > > 403.gcc | 55023568981 | 54798890746 ~0.0% > > 429.mcf | 1360975660 | 1321537710 -2.9% > > 445.gobmk | 12791636502 | 12666523431 -1.0% > > 456.hmmer | 9354433652 | 9279189174 ~0.0% > > 458.sjeng | 1991260562 | 1944031904 -2.4% > > 462.libquantum | 1725112078 | 1684213981 -2.4% > > 464.h264ref | 8597673515 | 8528855778 ~0.0% > > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > > 473.astar | 3817295518 | 3772460508 -1.2% > > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 30438407499 | 30574152897 ~0.0% > > 401.bzip2 | 2277114519 | 2319432664 +1.9% > > 403.gcc | 64499664264 | 64781232731 ~0.0% > > 429.mcf | 1361486758 | 1399942116 +2.8% > > 445.gobmk | 15258056111 | 15396801542 +1.0% > > 456.hmmer | 10896615649 | 10936223486 ~0.0% > > 458.sjeng | 2592620709 | 2641687496 +1.9% > > 462.libquantum | 1814487525 | 1854518500 +2.2% > > 464.h264ref | 13528736878 | 13614517066 ~0.0% > > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > > 473.astar | 3924015756 | 3968057027 +1.1% > > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > Sent: Friday, May 5, 2023 2:25 PM > > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>> > > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>> > > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > > > Consider some variance of valgrind, it looks like the impact to bytes > > > allocated may be limited. However, I am still running this for x86, it > > > will take more than 30 hours for each iteration... > > > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > > > Richard. > > > > > RISC-V GCC Version: > > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version > > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > > This is free software; see the source for copying conditions. There > > > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > > > Bytes allocated with O2: > > > ----------------------------------------------------------------------------------------------------- > > > Benchmark | upstream | with this PATCH > > > ----------------------------------------------------------------------------------------------------- > > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > > 473.astar | 3807097529 | 3928428183 +3.2% > > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > > > Bytes allocated with Ofast + funroll-loops: > > > ------------------------------------------------------------------------------------------ > > > Benchmark | upstream | with this PATCH > > > ------------------------------------------------------------------------------------------ > > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > > 403.gcc | 84195991898 | 83730632955 -4.0% > > > 429.mcf | 1481381164 | 1367309565 -7.7% > > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > > 473.astar | 4118600354 | 3995943705 -3.0% > > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > > > Pan > > > > > > > > > -----Original Message----- > > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ??? > > > Sent: Thursday, April 13, 2023 7:23 AM > > > To: kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther <rguenther@suse.de<mailto:rguenther@suse.de>> > > > Cc: richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law > > > <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>> > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > > 8-bit to 16-bit > > > > > > Yeah, like kito said. > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > > And we like ARM SVE style implmentation. > > > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > > > Thank you so much for all comments. > > > > > > > > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > > > > > From: Kito Cheng > > > Date: 2023-04-12 17:31 > > > To: Richard Biener > > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; richard.sandiford; jeffreyalaw; gcc-patches; > > > palmer; jakub > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > > > 8-bit to 16-bit > > > > > The concept of fractional LMUL is the same as the concept of > > > > > AArch64's partial SVE vectors, so they can only access the lowest > > > > > part, like SVE's partial vector. > > > > > > > > > > We want to spill/restore the exact size of those modes (1/2, 1/4, > > > > > 1/8), so adding dedicated modes for those partial vector modes > > > > > should be unavoidable IMO. > > > > > > > > > > And even if we use sub-vector, we still need to define those > > > > > partial vector types. > > > > > > > > Could you use integer modes for the fractional vectors? > > > > > > You mean using the scalar integer mode like using (subreg:SI > > > (reg:VNx4SI) 0) to represent > > > LMUL=1/4? > > > (Assume VNx4SI is mode for M1) > > > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > > > For computation you can always appropriately limit the LEN? > > > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to > > > guarantee the vector length is at least larger than N bits, but it's > > > just guarantee the minimal length like SVE guarantee the minimal > > > vector length is 128 bits > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) > ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 2:18 ` Kito Cheng @ 2023-05-06 2:20 ` Li, Pan2 2023-05-06 2:48 ` Li, Pan2 2023-05-08 1:35 ` Li, Pan2 0 siblings, 2 replies; 63+ messages in thread From: Li, Pan2 @ 2023-05-06 2:20 UTC (permalink / raw) To: Kito Cheng Cc: juzhe.zhong, rguenther, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub Yes, that makes sense, will have a try and keep you posted. Pan -----Original Message----- From: Kito Cheng <kito.cheng@gmail.com> Sent: Saturday, May 6, 2023 10:19 AM To: Li, Pan2 <pan2.li@intel.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much. On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V. > > Pan > > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> > Sent: Saturday, May 6, 2023 10:00 AM > To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com> > Cc: rguenther <rguenther@suse.de>; richard.sandiford > <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > jakub <jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Yeah, you should also swap mode and code in rtx_def according to > Richard suggestion since it will not change the rtx_def data structure. > > I think the only problem is the mode in tree data structure. > ________________________________ > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > From: Kito Cheng<mailto:kito.cheng@gmail.com> > Date: 2023-05-06 09:53 > To: Li, Pan2<mailto:pan2.li@intel.com> > CC: Richard Biener<mailto:rguenther@suse.de>; > 钟居哲<mailto:juzhe.zhong@rivai.ai>; > richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff > Law<mailto:jeffreyalaw@gmail.com>; > gcc-patches<mailto:gcc-patches@gcc.gnu.org>; > palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit Hi Pan: > > Could you try to apply the following diff and measure again? This > makes tree_type_common size unchanged. > > > sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= > 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this > diff) > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h index > af795aa81f98..b8ccfa407ed9 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { > tree attributes; > unsigned int uid; > > + ENUM_BITFIELD(machine_mode) mode : 16; > + > unsigned int precision : 10; > unsigned no_force_blk_flag : 1; > unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct > GTY(()) tree_type_common { > unsigned restrict_flag : 1; > unsigned contains_placeholder_bits : 2; > > - ENUM_BITFIELD(machine_mode) mode : 16; > > /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. > TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7 > +1713,7 @@ struct GTY(()) tree_type_common { > unsigned empty_flag : 1; > unsigned indivisible_p : 1; > unsigned no_named_args_stdarg_p : 1; > - unsigned spare : 15; > + unsigned spare : 7; > > alias_set_type alias_set; > tree pointer_to; > > On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: > > > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 25286185160 | 25176544846 ~0.0% > > 401.bzip2 | 1429883731 | 1391040027 -2.7% > > 403.gcc | 55023568981 | 54798890746 ~0.0% > > 429.mcf | 1360975660 | 1321537710 -2.9% > > 445.gobmk | 12791636502 | 12666523431 -1.0% > > 456.hmmer | 9354433652 | 9279189174 ~0.0% > > 458.sjeng | 1991260562 | 1944031904 -2.4% > > 462.libquantum | 1725112078 | 1684213981 -2.4% > > 464.h264ref | 8597673515 | 8528855778 ~0.0% > > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > > 473.astar | 3817295518 | 3772460508 -1.2% > > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 30438407499 | 30574152897 ~0.0% > > 401.bzip2 | 2277114519 | 2319432664 +1.9% > > 403.gcc | 64499664264 | 64781232731 ~0.0% > > 429.mcf | 1361486758 | 1399942116 +2.8% > > 445.gobmk | 15258056111 | 15396801542 +1.0% > > 456.hmmer | 10896615649 | 10936223486 ~0.0% > > 458.sjeng | 2592620709 | 2641687496 +1.9% > > 462.libquantum | 1814487525 | 1854518500 +2.2% > > 464.h264ref | 13528736878 | 13614517066 ~0.0% > > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > > 473.astar | 3924015756 | 3968057027 +1.1% > > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > Sent: Friday, May 5, 2023 2:25 PM > > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>> > > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; > > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; > > richard.sandiford > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff > > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > gcc-patches > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size > > from 8-bit to 16-bit > > > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > > > Consider some variance of valgrind, it looks like the impact to > > > bytes allocated may be limited. However, I am still running this > > > for x86, it will take more than 30 hours for each iteration... > > > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > > > Richard. > > > > > RISC-V GCC Version: > > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc > > > >> --version > > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > > This is free software; see the source for copying conditions. > > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > > > Bytes allocated with O2: > > > ----------------------------------------------------------------------------------------------------- > > > Benchmark | upstream | with this PATCH > > > ----------------------------------------------------------------------------------------------------- > > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > > 473.astar | 3807097529 | 3928428183 +3.2% > > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > > > Bytes allocated with Ofast + funroll-loops: > > > ------------------------------------------------------------------------------------------ > > > Benchmark | upstream | with this PATCH > > > ------------------------------------------------------------------------------------------ > > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > > 403.gcc | 84195991898 | 83730632955 -4.0% > > > 429.mcf | 1481381164 | 1367309565 -7.7% > > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > > 473.astar | 4118600354 | 3995943705 -3.0% > > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > > > Pan > > > > > > > > > -----Original Message----- > > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ??? > > > Sent: Thursday, April 13, 2023 7:23 AM > > > To: kito.cheng > > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther > > > <rguenther@suse.de<mailto:rguenther@suse.de>> > > > Cc: richard.sandiford > > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; > > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > > gcc-patches > > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > > Yeah, like kito said. > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > > And we like ARM SVE style implmentation. > > > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > > > Thank you so much for all comments. > > > > > > > > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > > > > > From: Kito Cheng > > > Date: 2023-04-12 17:31 > > > To: Richard Biener > > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; > > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > The concept of fractional LMUL is the same as the concept of > > > > > AArch64's partial SVE vectors, so they can only access the > > > > > lowest part, like SVE's partial vector. > > > > > > > > > > We want to spill/restore the exact size of those modes (1/2, > > > > > 1/4, 1/8), so adding dedicated modes for those partial vector > > > > > modes should be unavoidable IMO. > > > > > > > > > > And even if we use sub-vector, we still need to define those > > > > > partial vector types. > > > > > > > > Could you use integer modes for the fractional vectors? > > > > > > You mean using the scalar integer mode like using (subreg:SI > > > (reg:VNx4SI) 0) to represent > > > LMUL=1/4? > > > (Assume VNx4SI is mode for M1) > > > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > > > For computation you can always appropriately limit the LEN? > > > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) > > > to guarantee the vector length is at least larger than N bits, but > > > it's just guarantee the minimal length like SVE guarantee the > > > minimal vector length is 128 bits > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg) > ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 2:20 ` Li, Pan2 @ 2023-05-06 2:48 ` Li, Pan2 2023-05-07 1:55 ` Li, Pan2 2023-05-08 1:35 ` Li, Pan2 1 sibling, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-06 2:48 UTC (permalink / raw) To: Kito Cheng Cc: juzhe.zhong, rguenther, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub [-- Attachment #1: Type: text/plain, Size: 15085 bytes --] Picked all changes mentioned in previous to single patch as attachment. Please help to review if any mistake. Pan -----Original Message----- From: Li, Pan2 Sent: Saturday, May 6, 2023 10:20 AM To: Kito Cheng <kito.cheng@gmail.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Yes, that makes sense, will have a try and keep you posted. Pan -----Original Message----- From: Kito Cheng <kito.cheng@gmail.com> Sent: Saturday, May 6, 2023 10:19 AM To: Li, Pan2 <pan2.li@intel.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much. On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V. > > Pan > > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> > Sent: Saturday, May 6, 2023 10:00 AM > To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com> > Cc: rguenther <rguenther@suse.de>; richard.sandiford > <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > jakub <jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Yeah, you should also swap mode and code in rtx_def according to > Richard suggestion since it will not change the rtx_def data structure. > > I think the only problem is the mode in tree data structure. > ________________________________ > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > From: Kito Cheng<mailto:kito.cheng@gmail.com> > Date: 2023-05-06 09:53 > To: Li, Pan2<mailto:pan2.li@intel.com> > CC: Richard Biener<mailto:rguenther@suse.de>; > 钟居哲<mailto:juzhe.zhong@rivai.ai>; > richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff > Law<mailto:jeffreyalaw@gmail.com>; > gcc-patches<mailto:gcc-patches@gcc.gnu.org>; > palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit Hi Pan: > > Could you try to apply the following diff and measure again? This > makes tree_type_common size unchanged. > > > sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= > 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this > diff) > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h index > af795aa81f98..b8ccfa407ed9 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { > tree attributes; > unsigned int uid; > > + ENUM_BITFIELD(machine_mode) mode : 16; > + > unsigned int precision : 10; > unsigned no_force_blk_flag : 1; > unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct > GTY(()) tree_type_common { > unsigned restrict_flag : 1; > unsigned contains_placeholder_bits : 2; > > - ENUM_BITFIELD(machine_mode) mode : 16; > > /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. > TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7 > +1713,7 @@ struct GTY(()) tree_type_common { > unsigned empty_flag : 1; > unsigned indivisible_p : 1; > unsigned no_named_args_stdarg_p : 1; > - unsigned spare : 15; > + unsigned spare : 7; > > alias_set_type alias_set; > tree pointer_to; > > On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: > > > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 25286185160 | 25176544846 ~0.0% > > 401.bzip2 | 1429883731 | 1391040027 -2.7% > > 403.gcc | 55023568981 | 54798890746 ~0.0% > > 429.mcf | 1360975660 | 1321537710 -2.9% > > 445.gobmk | 12791636502 | 12666523431 -1.0% > > 456.hmmer | 9354433652 | 9279189174 ~0.0% > > 458.sjeng | 1991260562 | 1944031904 -2.4% > > 462.libquantum | 1725112078 | 1684213981 -2.4% > > 464.h264ref | 8597673515 | 8528855778 ~0.0% > > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > > 473.astar | 3817295518 | 3772460508 -1.2% > > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 30438407499 | 30574152897 ~0.0% > > 401.bzip2 | 2277114519 | 2319432664 +1.9% > > 403.gcc | 64499664264 | 64781232731 ~0.0% > > 429.mcf | 1361486758 | 1399942116 +2.8% > > 445.gobmk | 15258056111 | 15396801542 +1.0% > > 456.hmmer | 10896615649 | 10936223486 ~0.0% > > 458.sjeng | 2592620709 | 2641687496 +1.9% > > 462.libquantum | 1814487525 | 1854518500 +2.2% > > 464.h264ref | 13528736878 | 13614517066 ~0.0% > > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > > 473.astar | 3924015756 | 3968057027 +1.1% > > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > Sent: Friday, May 5, 2023 2:25 PM > > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>> > > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; > > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; > > richard.sandiford > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff > > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > gcc-patches > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size > > from 8-bit to 16-bit > > > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > > > Consider some variance of valgrind, it looks like the impact to > > > bytes allocated may be limited. However, I am still running this > > > for x86, it will take more than 30 hours for each iteration... > > > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > > > Richard. > > > > > RISC-V GCC Version: > > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc > > > >> --version > > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > > This is free software; see the source for copying conditions. > > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > > > Bytes allocated with O2: > > > ----------------------------------------------------------------------------------------------------- > > > Benchmark | upstream | with this PATCH > > > ----------------------------------------------------------------------------------------------------- > > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > > 473.astar | 3807097529 | 3928428183 +3.2% > > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > > > Bytes allocated with Ofast + funroll-loops: > > > ------------------------------------------------------------------------------------------ > > > Benchmark | upstream | with this PATCH > > > ------------------------------------------------------------------------------------------ > > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > > 403.gcc | 84195991898 | 83730632955 -4.0% > > > 429.mcf | 1481381164 | 1367309565 -7.7% > > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > > 473.astar | 4118600354 | 3995943705 -3.0% > > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > > > Pan > > > > > > > > > -----Original Message----- > > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ??? > > > Sent: Thursday, April 13, 2023 7:23 AM > > > To: kito.cheng > > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther > > > <rguenther@suse.de<mailto:rguenther@suse.de>> > > > Cc: richard.sandiford > > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; > > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > > gcc-patches > > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > > Yeah, like kito said. > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > > And we like ARM SVE style implmentation. > > > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > > > Thank you so much for all comments. > > > > > > > > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > > > > > From: Kito Cheng > > > Date: 2023-04-12 17:31 > > > To: Richard Biener > > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; > > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > The concept of fractional LMUL is the same as the concept of > > > > > AArch64's partial SVE vectors, so they can only access the > > > > > lowest part, like SVE's partial vector. > > > > > > > > > > We want to spill/restore the exact size of those modes (1/2, > > > > > 1/4, 1/8), so adding dedicated modes for those partial vector > > > > > modes should be unavoidable IMO. > > > > > > > > > > And even if we use sub-vector, we still need to define those > > > > > partial vector types. > > > > > > > > Could you use integer modes for the fractional vectors? > > > > > > You mean using the scalar integer mode like using (subreg:SI > > > (reg:VNx4SI) 0) to represent > > > LMUL=1/4? > > > (Assume VNx4SI is mode for M1) > > > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > > > For computation you can always appropriately limit the LEN? > > > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) > > > to guarantee the vector length is at least larger than N bits, but > > > it's just guarantee the minimal length like SVE guarantee the > > > minimal vector length is 128 bits > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg) > [-- Attachment #2: tmp.patch --] [-- Type: application/octet-stream, Size: 6684 bytes --] diff --git a/gcc/combine.cc b/gcc/combine.cc index 5aa0ec5c45a..3939c6cac66 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -200,7 +200,7 @@ struct reg_stat_type { unsigned HOST_WIDE_INT last_set_nonzero_bits; char last_set_sign_bit_copies; - ENUM_BITFIELD(machine_mode) last_set_mode : 8; + ENUM_BITFIELD(machine_mode) last_set_mode : 16; /* Set nonzero if references to register n in expressions should not be used. last_set_invalid is set nonzero when this register is being @@ -235,7 +235,7 @@ struct reg_stat_type { truncation if we know that value already contains a truncated value. */ - ENUM_BITFIELD(machine_mode) truncated_to_mode : 8; + ENUM_BITFIELD(machine_mode) truncated_to_mode : 16; }; diff --git a/gcc/cse.cc b/gcc/cse.cc index b10c9b0c94d..ea970c737dd 100644 --- a/gcc/cse.cc +++ b/gcc/cse.cc @@ -250,8 +250,8 @@ struct qty_table_elem unsigned int first_reg, last_reg; /* The sizes of these fields should match the sizes of the code and mode fields of struct rtx_def (see rtl.h). */ - ENUM_BITFIELD(rtx_code) comparison_code : 16; - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(rtx_code) comparison_code : 8; + ENUM_BITFIELD(machine_mode) mode : 16; }; /* The table of all qtys, indexed by qty number. */ @@ -406,7 +406,7 @@ struct table_elt int regcost; /* The size of this field should match the size of the mode field of struct rtx_def (see rtl.h). */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; char in_memory; char is_const; char flag; @@ -4155,7 +4155,7 @@ struct set /* Original machine mode, in case it becomes a CONST_INT. The size of this field should match the size of the mode field of struct rtx_def (see rtl.h). */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; /* Hash value of constant equivalent for SET_SRC. */ unsigned src_const_hash; /* A constant equivalent for SET_SRC, if any. */ diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc index 83cb7504fa1..3ca3e9fd946 100644 --- a/gcc/genopinit.cc +++ b/gcc/genopinit.cc @@ -182,7 +182,7 @@ main (int argc, const char **argv) progname = "genopinit"; - if (NUM_OPTABS > 0xffff || MAX_MACHINE_MODE >= 0xff) + if (NUM_OPTABS > 0xffff || MAX_MACHINE_MODE >= 0xffff) fatal ("genopinit range assumptions invalid"); if (!init_rtx_reader_args_cb (argc, argv, handle_arg)) diff --git a/gcc/ira-int.h b/gcc/ira-int.h index e2de47213b4..65ec1678146 100644 --- a/gcc/ira-int.h +++ b/gcc/ira-int.h @@ -281,10 +281,10 @@ struct ira_allocno int regno; /* Mode of the allocno which is the mode of the corresponding pseudo-register. */ - ENUM_BITFIELD (machine_mode) mode : 8; + ENUM_BITFIELD (machine_mode) mode : 16; /* Widest mode of the allocno which in at least one case could be for paradoxical subregs where wmode > mode. */ - ENUM_BITFIELD (machine_mode) wmode : 8; + ENUM_BITFIELD (machine_mode) wmode : 16; /* Register class which should be used for allocation for given allocno. NO_REGS means that we should use memory. */ ENUM_BITFIELD (reg_class) aclass : 16; diff --git a/gcc/ree.cc b/gcc/ree.cc index 413aec7c8eb..e74b96cdfac 100644 --- a/gcc/ree.cc +++ b/gcc/ree.cc @@ -567,7 +567,7 @@ enum ext_modified_kind struct ATTRIBUTE_PACKED ext_modified { /* Mode from which ree has zero or sign extended the destination. */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; /* Kind of modification of the insn. */ ENUM_BITFIELD(ext_modified_kind) kind : 2; diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h index c5180b9308a..3c928058490 100644 --- a/gcc/rtl-ssa/accesses.h +++ b/gcc/rtl-ssa/accesses.h @@ -254,7 +254,7 @@ private: unsigned int m_spare : 2; // The value returned by the accessor above. - machine_mode m_mode : 8; + machine_mode m_mode : 16; }; // A contiguous array of access_info pointers. Used to represent a diff --git a/gcc/rtl.h b/gcc/rtl.h index f634cab730b..4b1deb34840 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -310,10 +310,10 @@ struct GTY((desc("0"), tag("0"), chain_next ("RTX_NEXT (&%h)"), chain_prev ("RTX_PREV (&%h)"))) rtx_def { /* The kind of expression this is. */ - ENUM_BITFIELD(rtx_code) code: 16; + ENUM_BITFIELD(rtx_code) code: 8; /* The kind of value the expression has. */ - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; /* 1 in a MEM if we should keep the alias set for this mem unchanged when we access a component. @@ -2164,7 +2164,7 @@ subreg_shape::operator != (const subreg_shape &other) const inline unsigned HOST_WIDE_INT subreg_shape::unique_id () const { - { STATIC_ASSERT (MAX_MACHINE_MODE <= 256); } + { STATIC_ASSERT (MAX_MACHINE_MODE <= 32768); } { STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 3); } { STATIC_ASSERT (sizeof (offset.coeffs[0]) <= 2); } int res = (int) inner_mode + ((int) outer_mode << 8); diff --git a/gcc/rtlanal.h b/gcc/rtlanal.h index 5fbed816e20..bdd84e39c76 100644 --- a/gcc/rtlanal.h +++ b/gcc/rtlanal.h @@ -100,7 +100,7 @@ public: /* The mode of the reference. If IS_MULTIREG, this is the mode of REGNO - MULTIREG_OFFSET. */ - machine_mode mode : 8; + machine_mode mode : 16; /* If IS_MULTIREG, the offset of REGNO from the start of the register. */ unsigned int multireg_offset : 8; diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 847f0b1e994..d7bb7d04d3f 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { tree attributes; unsigned int uid; + ENUM_BITFIELD(machine_mode) mode : 16; + unsigned int precision : 10; unsigned no_force_blk_flag : 1; unsigned needs_constructing_flag : 1; @@ -1687,8 +1689,6 @@ struct GTY(()) tree_type_common { unsigned restrict_flag : 1; unsigned contains_placeholder_bits : 2; - ENUM_BITFIELD(machine_mode) mode : 8; - /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ unsigned string_flag : 1; @@ -1712,7 +1712,7 @@ struct GTY(()) tree_type_common { unsigned empty_flag : 1; unsigned indivisible_p : 1; unsigned no_named_args_stdarg_p : 1; - unsigned spare : 15; + unsigned spare : 7; alias_set_type alias_set; tree pointer_to; @@ -1770,7 +1770,7 @@ struct GTY(()) tree_decl_common { struct tree_decl_minimal common; tree size; - ENUM_BITFIELD(machine_mode) mode : 8; + ENUM_BITFIELD(machine_mode) mode : 16; unsigned nonlocal_flag : 1; unsigned virtual_flag : 1; ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 2:48 ` Li, Pan2 @ 2023-05-07 1:55 ` Li, Pan2 2023-05-07 15:23 ` Jeff Law 0 siblings, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-07 1:55 UTC (permalink / raw) To: Kito Cheng Cc: juzhe.zhong, rguenther, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub It looks like we cannot simply swap the code and mode in rtx_def, the code may have to be the same bits as the tree_code in tree_base. Or we will meet ICE like below. rtx_def code 16 => 8 bits. rtx_def mode 8 => 16 bits. static inline decl_or_value dv_from_value (rtx value) { decl_or_value dv; dv = value; gcc_checking_assert (dv_is_value_p (dv)); <= ICE return dv; } Thus we also need to align the bits change to the tree_code like below. Unfortunately, only 8 bits may be not sufficient due to compile log "../../gcc/tree-core.h:1034:28: warning: ‘tree_base::code’ is too small to hold all values of ‘enum tree_code’". tree_base code 16 => 8 bits. So the one possible approach for the bits adjustment may look like below, I am not very sure if it is reasonable or not. Any ideas about this? Thank you all in advance, 😉. rtx_def code 16 => 12 bits. rtx_def mode 8 => 12 bits. tree_base code 16 => 12 bits. Pan -----Original Message----- From: Li, Pan2 Sent: Saturday, May 6, 2023 10:49 AM To: 'Kito Cheng' <kito.cheng@gmail.com> Cc: 'juzhe.zhong@rivai.ai' <juzhe.zhong@rivai.ai>; 'rguenther' <rguenther@suse.de>; 'richard.sandiford' <richard.sandiford@arm.com>; 'jeffreyalaw' <jeffreyalaw@gmail.com>; 'gcc-patches' <gcc-patches@gcc.gnu.org>; 'palmer' <palmer@dabbelt.com>; 'jakub' <jakub@redhat.com> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Picked all changes mentioned in previous to single patch as attachment. Please help to review if any mistake. Pan -----Original Message----- From: Li, Pan2 Sent: Saturday, May 6, 2023 10:20 AM To: Kito Cheng <kito.cheng@gmail.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Yes, that makes sense, will have a try and keep you posted. Pan -----Original Message----- From: Kito Cheng <kito.cheng@gmail.com> Sent: Saturday, May 6, 2023 10:19 AM To: Li, Pan2 <pan2.li@intel.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much. On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V. > > Pan > > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> > Sent: Saturday, May 6, 2023 10:00 AM > To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com> > Cc: rguenther <rguenther@suse.de>; richard.sandiford > <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > jakub <jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Yeah, you should also swap mode and code in rtx_def according to > Richard suggestion since it will not change the rtx_def data structure. > > I think the only problem is the mode in tree data structure. > ________________________________ > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > From: Kito Cheng<mailto:kito.cheng@gmail.com> > Date: 2023-05-06 09:53 > To: Li, Pan2<mailto:pan2.li@intel.com> > CC: Richard Biener<mailto:rguenther@suse.de>; > 钟居哲<mailto:juzhe.zhong@rivai.ai>; > richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff > Law<mailto:jeffreyalaw@gmail.com>; > gcc-patches<mailto:gcc-patches@gcc.gnu.org>; > palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit Hi Pan: > > Could you try to apply the following diff and measure again? This > makes tree_type_common size unchanged. > > > sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= > 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this > diff) > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h index > af795aa81f98..b8ccfa407ed9 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { > tree attributes; > unsigned int uid; > > + ENUM_BITFIELD(machine_mode) mode : 16; > + > unsigned int precision : 10; > unsigned no_force_blk_flag : 1; > unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct > GTY(()) tree_type_common { > unsigned restrict_flag : 1; > unsigned contains_placeholder_bits : 2; > > - ENUM_BITFIELD(machine_mode) mode : 16; > > /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. > TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7 > +1713,7 @@ struct GTY(()) tree_type_common { > unsigned empty_flag : 1; > unsigned indivisible_p : 1; > unsigned no_named_args_stdarg_p : 1; > - unsigned spare : 15; > + unsigned spare : 7; > > alias_set_type alias_set; > tree pointer_to; > > On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: > > > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 25286185160 | 25176544846 ~0.0% > > 401.bzip2 | 1429883731 | 1391040027 -2.7% > > 403.gcc | 55023568981 | 54798890746 ~0.0% > > 429.mcf | 1360975660 | 1321537710 -2.9% > > 445.gobmk | 12791636502 | 12666523431 -1.0% > > 456.hmmer | 9354433652 | 9279189174 ~0.0% > > 458.sjeng | 1991260562 | 1944031904 -2.4% > > 462.libquantum | 1725112078 | 1684213981 -2.4% > > 464.h264ref | 8597673515 | 8528855778 ~0.0% > > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > > 473.astar | 3817295518 | 3772460508 -1.2% > > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 30438407499 | 30574152897 ~0.0% > > 401.bzip2 | 2277114519 | 2319432664 +1.9% > > 403.gcc | 64499664264 | 64781232731 ~0.0% > > 429.mcf | 1361486758 | 1399942116 +2.8% > > 445.gobmk | 15258056111 | 15396801542 +1.0% > > 456.hmmer | 10896615649 | 10936223486 ~0.0% > > 458.sjeng | 2592620709 | 2641687496 +1.9% > > 462.libquantum | 1814487525 | 1854518500 +2.2% > > 464.h264ref | 13528736878 | 13614517066 ~0.0% > > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > > 473.astar | 3924015756 | 3968057027 +1.1% > > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > Sent: Friday, May 5, 2023 2:25 PM > > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>> > > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; > > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; > > richard.sandiford > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff > > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > gcc-patches > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size > > from 8-bit to 16-bit > > > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > > > Consider some variance of valgrind, it looks like the impact to > > > bytes allocated may be limited. However, I am still running this > > > for x86, it will take more than 30 hours for each iteration... > > > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > > > Richard. > > > > > RISC-V GCC Version: > > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc > > > >> --version > > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > > This is free software; see the source for copying conditions. > > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > > > Bytes allocated with O2: > > > ----------------------------------------------------------------------------------------------------- > > > Benchmark | upstream | with this PATCH > > > ----------------------------------------------------------------------------------------------------- > > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > > 473.astar | 3807097529 | 3928428183 +3.2% > > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > > > Bytes allocated with Ofast + funroll-loops: > > > ------------------------------------------------------------------------------------------ > > > Benchmark | upstream | with this PATCH > > > ------------------------------------------------------------------------------------------ > > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > > 403.gcc | 84195991898 | 83730632955 -4.0% > > > 429.mcf | 1481381164 | 1367309565 -7.7% > > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > > 473.astar | 4118600354 | 3995943705 -3.0% > > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > > > Pan > > > > > > > > > -----Original Message----- > > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ??? > > > Sent: Thursday, April 13, 2023 7:23 AM > > > To: kito.cheng > > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther > > > <rguenther@suse.de<mailto:rguenther@suse.de>> > > > Cc: richard.sandiford > > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; > > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > > gcc-patches > > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > > Yeah, like kito said. > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > > And we like ARM SVE style implmentation. > > > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > > > Thank you so much for all comments. > > > > > > > > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > > > > > From: Kito Cheng > > > Date: 2023-04-12 17:31 > > > To: Richard Biener > > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; > > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > The concept of fractional LMUL is the same as the concept of > > > > > AArch64's partial SVE vectors, so they can only access the > > > > > lowest part, like SVE's partial vector. > > > > > > > > > > We want to spill/restore the exact size of those modes (1/2, > > > > > 1/4, 1/8), so adding dedicated modes for those partial vector > > > > > modes should be unavoidable IMO. > > > > > > > > > > And even if we use sub-vector, we still need to define those > > > > > partial vector types. > > > > > > > > Could you use integer modes for the fractional vectors? > > > > > > You mean using the scalar integer mode like using (subreg:SI > > > (reg:VNx4SI) 0) to represent > > > LMUL=1/4? > > > (Assume VNx4SI is mode for M1) > > > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > > > For computation you can always appropriately limit the LEN? > > > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) > > > to guarantee the vector length is at least larger than N bits, but > > > it's just guarantee the minimal length like SVE guarantee the > > > minimal vector length is 128 bits > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg) > ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-07 1:55 ` Li, Pan2 @ 2023-05-07 15:23 ` Jeff Law 2023-05-08 1:07 ` Li, Pan2 2023-05-08 6:29 ` Richard Biener 0 siblings, 2 replies; 63+ messages in thread From: Jeff Law @ 2023-05-07 15:23 UTC (permalink / raw) To: Li, Pan2, Kito Cheng Cc: juzhe.zhong, rguenther, richard.sandiford, gcc-patches, palmer, jakub On 5/6/23 19:55, Li, Pan2 wrote: > It looks like we cannot simply swap the code and mode in rtx_def, the code may have to be the same bits as the tree_code in tree_base. Or we will meet ICE like below. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > > static inline decl_or_value > dv_from_value (rtx value) > { > decl_or_value dv; > dv = value; > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > return dv; Ugh. We really just need to fix this code. It assumes particular structure layouts and that's just wrong/dumb. So I think think the first step is to fix up this crap code in var-tracking. That should be a patch unto itself. Then we'd have the structure changes as a separate change. Jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-07 15:23 ` Jeff Law @ 2023-05-08 1:07 ` Li, Pan2 2023-05-08 6:29 ` Richard Biener 1 sibling, 0 replies; 63+ messages in thread From: Li, Pan2 @ 2023-05-08 1:07 UTC (permalink / raw) To: Jeff Law, Kito Cheng Cc: juzhe.zhong, rguenther, richard.sandiford, gcc-patches, palmer, jakub I see. Thank you, will have a try soon. Pan -----Original Message----- From: Jeff Law <jeffreyalaw@gmail.com> Sent: Sunday, May 7, 2023 11:24 PM To: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On 5/6/23 19:55, Li, Pan2 wrote: > It looks like we cannot simply swap the code and mode in rtx_def, the code may have to be the same bits as the tree_code in tree_base. Or we will meet ICE like below. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > > static inline decl_or_value > dv_from_value (rtx value) > { > decl_or_value dv; > dv = value; > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > return dv; Ugh. We really just need to fix this code. It assumes particular structure layouts and that's just wrong/dumb. So I think think the first step is to fix up this crap code in var-tracking. That should be a patch unto itself. Then we'd have the structure changes as a separate change. Jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-07 15:23 ` Jeff Law 2023-05-08 1:07 ` Li, Pan2 @ 2023-05-08 6:29 ` Richard Biener 2023-05-08 6:41 ` Li, Pan2 1 sibling, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-05-08 6:29 UTC (permalink / raw) To: Jeff Law Cc: Li, Pan2, Kito Cheng, juzhe.zhong, richard.sandiford, gcc-patches, palmer, jakub On Sun, 7 May 2023, Jeff Law wrote: > > > On 5/6/23 19:55, Li, Pan2 wrote: > > It looks like we cannot simply swap the code and mode in rtx_def, the code > > may have to be the same bits as the tree_code in tree_base. Or we will meet > > ICE like below. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > > > static inline decl_or_value > > dv_from_value (rtx value) > > { > > decl_or_value dv; > > dv = value; > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > return dv; > Ugh. We really just need to fix this code. It assumes particular structure > layouts and that's just wrong/dumb. Well, it's a neat trick ... we just need to adjust it to static inline bool dv_is_decl_p (decl_or_value dv) { return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. Richard. ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-08 6:29 ` Richard Biener @ 2023-05-08 6:41 ` Li, Pan2 2023-05-08 6:59 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-08 6:41 UTC (permalink / raw) To: Richard Biener, Jeff Law Cc: Kito Cheng, juzhe.zhong, richard.sandiford, gcc-patches, palmer, jakub Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Monday, May 8, 2023 2:30 PM To: Jeff Law <jeffreyalaw@gmail.com> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Sun, 7 May 2023, Jeff Law wrote: > > > On 5/6/23 19:55, Li, Pan2 wrote: > > It looks like we cannot simply swap the code and mode in rtx_def, > > the code may have to be the same bits as the tree_code in tree_base. > > Or we will meet ICE like below. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > > > static inline decl_or_value > > dv_from_value (rtx value) > > { > > decl_or_value dv; > > dv = value; > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > return dv; > Ugh. We really just need to fix this code. It assumes particular > structure layouts and that's just wrong/dumb. Well, it's a neat trick ... we just need to adjust it to static inline bool dv_is_decl_p (decl_or_value dv) { return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. Richard. ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-08 6:41 ` Li, Pan2 @ 2023-05-08 6:59 ` Li, Pan2 2023-05-08 7:37 ` Richard Biener 0 siblings, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-08 6:59 UTC (permalink / raw) To: Richard Biener, Jeff Law Cc: Kito Cheng, juzhe.zhong, richard.sandiford, gcc-patches, palmer, jakub return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to fix this ICE after mode bits change. I will re-trigger the memory allocate bytes test with below changes for X86. rtx_def code 16 => 8 bits. rtx_def mode 8 => 16 bits. tree_base code unchanged. Pan -----Original Message----- From: Li, Pan2 Sent: Monday, May 8, 2023 2:42 PM To: Richard Biener <rguenther@suse.de>; Jeff Law <jeffreyalaw@gmail.com> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Monday, May 8, 2023 2:30 PM To: Jeff Law <jeffreyalaw@gmail.com> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Sun, 7 May 2023, Jeff Law wrote: > > > On 5/6/23 19:55, Li, Pan2 wrote: > > It looks like we cannot simply swap the code and mode in rtx_def, > > the code may have to be the same bits as the tree_code in tree_base. > > Or we will meet ICE like below. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > > > static inline decl_or_value > > dv_from_value (rtx value) > > { > > decl_or_value dv; > > dv = value; > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > return dv; > Ugh. We really just need to fix this code. It assumes particular > structure layouts and that's just wrong/dumb. Well, it's a neat trick ... we just need to adjust it to static inline bool dv_is_decl_p (decl_or_value dv) { return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. Richard. ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-08 6:59 ` Li, Pan2 @ 2023-05-08 7:37 ` Richard Biener 2023-05-08 8:05 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-05-08 7:37 UTC (permalink / raw) To: Li, Pan2 Cc: Jeff Law, Kito Cheng, juzhe.zhong, richard.sandiford, gcc-patches, palmer, jakub On Mon, 8 May 2023, Li, Pan2 wrote: > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to fix > this ICE after mode bits change. Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. Richard. > I will re-trigger the memory allocate > bytes test with below changes for X86. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > tree_base code unchanged. > > Pan > > -----Original Message----- > From: Li, Pan2 > Sent: Monday, May 8, 2023 2:42 PM > To: Richard Biener <rguenther@suse.de>; Jeff Law <jeffreyalaw@gmail.com> > Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > > Pan > > -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Monday, May 8, 2023 2:30 PM > To: Jeff Law <jeffreyalaw@gmail.com> > Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > On Sun, 7 May 2023, Jeff Law wrote: > > > > > > > On 5/6/23 19:55, Li, Pan2 wrote: > > > It looks like we cannot simply swap the code and mode in rtx_def, > > > the code may have to be the same bits as the tree_code in tree_base. > > > Or we will meet ICE like below. > > > > > > rtx_def code 16 => 8 bits. > > > rtx_def mode 8 => 16 bits. > > > > > > static inline decl_or_value > > > dv_from_value (rtx value) > > > { > > > decl_or_value dv; > > > dv = value; > > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > > return dv; > > Ugh. We really just need to fix this code. It assumes particular > > structure layouts and that's just wrong/dumb. > > Well, it's a neat trick ... we just need to adjust it to > > static inline bool > dv_is_decl_p (decl_or_value dv) > { > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > > I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > > Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > > Richard. > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-08 7:37 ` Richard Biener @ 2023-05-08 8:05 ` Li, Pan2 2023-05-09 6:13 ` Li, Pan2 2023-05-09 10:16 ` Richard Sandiford 0 siblings, 2 replies; 63+ messages in thread From: Li, Pan2 @ 2023-05-08 8:05 UTC (permalink / raw) To: Richard Biener Cc: Jeff Law, Kito Cheng, juzhe.zhong, richard.sandiford, gcc-patches, palmer, jakub After the bits patch like below. rtx_def code 16 => 8 bits. rtx_def mode 8 => 16 bits. tree_base code unchanged. The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. tree_base rtx_def code: 16 code: 8 side_effects_flag: 1 mode: 16 constant_flag: 1 addressable_flag: 1 volatile_flag: 1 readonly_flag: 1 asm_written_flag: 1 nowarning_flag: 1 visited: 1 used_flag: 1 nothrow_flag: 1 static_flag: 1 public_flag: 1 private_flag: 1 protected_flag: 1 deprecated_flag: 1 default_def_flag: 1 I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. rtx_def code 16 => 12 bits. rtx_def mode 8 => 12 bits. tree_base code 16 => 12 bits. Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Monday, May 8, 2023 3:38 PM To: Li, Pan2 <pan2.li@intel.com> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Mon, 8 May 2023, Li, Pan2 wrote: > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to > fix this ICE after mode bits change. Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. Richard. > I will re-trigger the memory allocate bytes test with below changes > for X86. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > tree_base code unchanged. > > Pan > > -----Original Message----- > From: Li, Pan2 > Sent: Monday, May 8, 2023 2:42 PM > To: Richard Biener <rguenther@suse.de>; Jeff Law > <jeffreyalaw@gmail.com> > Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; > richard.sandiford <richard.sandiford@arm.com>; gcc-patches > <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub > <jakub@redhat.com> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > > Pan > > -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Monday, May 8, 2023 2:30 PM > To: Jeff Law <jeffreyalaw@gmail.com> > Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; > juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > jakub <jakub@redhat.com> > Subject: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > On Sun, 7 May 2023, Jeff Law wrote: > > > > > > > On 5/6/23 19:55, Li, Pan2 wrote: > > > It looks like we cannot simply swap the code and mode in rtx_def, > > > the code may have to be the same bits as the tree_code in tree_base. > > > Or we will meet ICE like below. > > > > > > rtx_def code 16 => 8 bits. > > > rtx_def mode 8 => 16 bits. > > > > > > static inline decl_or_value > > > dv_from_value (rtx value) > > > { > > > decl_or_value dv; > > > dv = value; > > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > > return dv; > > Ugh. We really just need to fix this code. It assumes particular > > structure layouts and that's just wrong/dumb. > > Well, it's a neat trick ... we just need to adjust it to > > static inline bool > dv_is_decl_p (decl_or_value dv) > { > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > > I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > > Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > > Richard. > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-08 8:05 ` Li, Pan2 @ 2023-05-09 6:13 ` Li, Pan2 2023-05-09 7:04 ` Richard Biener 2023-05-09 10:16 ` Richard Sandiford 1 sibling, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-09 6:13 UTC (permalink / raw) To: Richard Biener Cc: Jeff Law, Kito Cheng, juzhe.zhong, richard.sandiford, gcc-patches, palmer, jakub Update the memory allocated bytes for both the all 12-bits patch and code 8-bits + mode 16-bits. Bytes allocated with O2: ------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch --------------------------------------------------------------------------------------------------------------------------------------------------------- 400.perlbench | 25286185160 | 25286590847 ~0.0% | 25286927562 ~0.0% 401.bzip2 | 1429883731 | 1430373103 ~0.0% | 1430401245 ~0.0% 403.gcc | 55023568981 | 55027574220 ~0.0% | 55028727683 ~0.0% 429.mcf | 1360975660 | 1360959361 ~0.0% | 1360960745 ~0.0% 445.gobmk | 12791636502 | 12789648370 ~0.0% | 12789919097 ~0.0% 456.hmmer | 9354433652 | 9353899089 ~0.0% | 9353990523 ~0.0% 458.sjeng | 1991260562 | 1991107773 ~0.0% | 1991153851 ~0.0% 462.libquantum | 1725112078 | 1724972077 ~0.0% | 1724983726 ~0.0% 464.h264ref | 8597673515 | 8597748172 ~0.0% | 8597931771 ~0.0% 471.omnetpp | 37613034778 | 37614346380 ~0.0% | 37614470890 ~0.0% 473.astar | 3817295518 | 3817226365 ~0.0% | 3817239631 ~0.0% 483.xalancbmk | 149418776991 | 149405214817 ~0.0% | 149405744428 ~0.0% Bytes allocated with Ofast + funroll-loops: ------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch --------------------------------------------------------------------------------------------------------------------------------------------------------- 400.perlbench | 30438407499 | 30568217795 +0.4% | 30568869401 +0.4% 401.bzip2 | 2277114519 | 2318588280 +1.8% | 2318659896 +1.8% 403.gcc | 64499664264 | 64764400606 +0.4% | 64766107560 +0.4% 429.mcf | 1361486758 | 1399872438 +2.8% | 1399876436 +2.8% 445.gobmk | 15258056111 | 15392769408 +0.9% | 15393305108 +0.9% 456.hmmer | 10896615649 | 10934649010 +0.3% | 10934858994 +0.4% 458.sjeng | 2592620709 | 2641551464 +1.9% | 2641641389 +1.9% 462.libquantum | 1814487525 | 1856446214 +2.3% | 1856475555 +2.3% 464.h264ref | 13528736878 | 13606989269 +0.6% | 13607467432 +0.6% 471.omnetpp | 38721066702 | 38908678658 +0.5% | 38908940169 +0.5% 473.astar | 3924015756 | 3967867190 +1.1% | 3967897551 +1.1% 483.xalancbmk | 165897692838 | 166818255397 +0.6% | 166819397831 +0.6% Pan -----Original Message----- From: Li, Pan2 Sent: Monday, May 8, 2023 4:06 PM To: Richard Biener <rguenther@suse.de> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit After the bits patch like below. rtx_def code 16 => 8 bits. rtx_def mode 8 => 16 bits. tree_base code unchanged. The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. tree_base rtx_def code: 16 code: 8 side_effects_flag: 1 mode: 16 constant_flag: 1 addressable_flag: 1 volatile_flag: 1 readonly_flag: 1 asm_written_flag: 1 nowarning_flag: 1 visited: 1 used_flag: 1 nothrow_flag: 1 static_flag: 1 public_flag: 1 private_flag: 1 protected_flag: 1 deprecated_flag: 1 default_def_flag: 1 I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. rtx_def code 16 => 12 bits. rtx_def mode 8 => 12 bits. tree_base code 16 => 12 bits. Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Monday, May 8, 2023 3:38 PM To: Li, Pan2 <pan2.li@intel.com> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Mon, 8 May 2023, Li, Pan2 wrote: > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to > fix this ICE after mode bits change. Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. Richard. > I will re-trigger the memory allocate bytes test with below changes > for X86. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > tree_base code unchanged. > > Pan > > -----Original Message----- > From: Li, Pan2 > Sent: Monday, May 8, 2023 2:42 PM > To: Richard Biener <rguenther@suse.de>; Jeff Law > <jeffreyalaw@gmail.com> > Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; > richard.sandiford <richard.sandiford@arm.com>; gcc-patches > <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub > <jakub@redhat.com> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > > Pan > > -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Monday, May 8, 2023 2:30 PM > To: Jeff Law <jeffreyalaw@gmail.com> > Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; > juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > jakub <jakub@redhat.com> > Subject: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > On Sun, 7 May 2023, Jeff Law wrote: > > > > > > > On 5/6/23 19:55, Li, Pan2 wrote: > > > It looks like we cannot simply swap the code and mode in rtx_def, > > > the code may have to be the same bits as the tree_code in tree_base. > > > Or we will meet ICE like below. > > > > > > rtx_def code 16 => 8 bits. > > > rtx_def mode 8 => 16 bits. > > > > > > static inline decl_or_value > > > dv_from_value (rtx value) > > > { > > > decl_or_value dv; > > > dv = value; > > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > > return dv; > > Ugh. We really just need to fix this code. It assumes particular > > structure layouts and that's just wrong/dumb. > > Well, it's a neat trick ... we just need to adjust it to > > static inline bool > dv_is_decl_p (decl_or_value dv) > { > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > > I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > > Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > > Richard. > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-09 6:13 ` Li, Pan2 @ 2023-05-09 7:04 ` Richard Biener 0 siblings, 0 replies; 63+ messages in thread From: Richard Biener @ 2023-05-09 7:04 UTC (permalink / raw) To: Li, Pan2 Cc: Jeff Law, Kito Cheng, juzhe.zhong, richard.sandiford, gcc-patches, palmer, jakub On Tue, 9 May 2023, Li, Pan2 wrote: > Update the memory allocated bytes for both the all 12-bits patch and > code 8-bits + mode 16-bits. Just to throw in a comment here - for IL tree/GIMPLE is the more important part since the whole program will be in tree/GIMPLE while we only have a single function in RTL at a time. Some host archs will have difficulties loading unaligned words so it is important to keep often accessed larger bitfields aligned to allow efficient access (aligned load + mask, no shifts). That means ideally machine_mode will be 16 bits and code 8 or 16 bits. I think shrinking RTX code is a good idea, we'll unlikely run out of bits there. Shrinking RTX code means you have to re-order code and mode (see above about alignment), that will complicate the var-tracking "fixup". We are going to run out of bits in tree_type_common, we've been handing them out without much care recently :/ Richard. > Bytes allocated with O2: > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch > --------------------------------------------------------------------------------------------------------------------------------------------------------- > 400.perlbench | 25286185160 | 25286590847 ~0.0% | 25286927562 ~0.0% > 401.bzip2 | 1429883731 | 1430373103 ~0.0% | 1430401245 ~0.0% > 403.gcc | 55023568981 | 55027574220 ~0.0% | 55028727683 ~0.0% > 429.mcf | 1360975660 | 1360959361 ~0.0% | 1360960745 ~0.0% > 445.gobmk | 12791636502 | 12789648370 ~0.0% | 12789919097 ~0.0% > 456.hmmer | 9354433652 | 9353899089 ~0.0% | 9353990523 ~0.0% > 458.sjeng | 1991260562 | 1991107773 ~0.0% | 1991153851 ~0.0% > 462.libquantum | 1725112078 | 1724972077 ~0.0% | 1724983726 ~0.0% > 464.h264ref | 8597673515 | 8597748172 ~0.0% | 8597931771 ~0.0% > 471.omnetpp | 37613034778 | 37614346380 ~0.0% | 37614470890 ~0.0% > 473.astar | 3817295518 | 3817226365 ~0.0% | 3817239631 ~0.0% > 483.xalancbmk | 149418776991 | 149405214817 ~0.0% | 149405744428 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch > --------------------------------------------------------------------------------------------------------------------------------------------------------- > 400.perlbench | 30438407499 | 30568217795 +0.4% | 30568869401 +0.4% > 401.bzip2 | 2277114519 | 2318588280 +1.8% | 2318659896 +1.8% > 403.gcc | 64499664264 | 64764400606 +0.4% | 64766107560 +0.4% > 429.mcf | 1361486758 | 1399872438 +2.8% | 1399876436 +2.8% > 445.gobmk | 15258056111 | 15392769408 +0.9% | 15393305108 +0.9% > 456.hmmer | 10896615649 | 10934649010 +0.3% | 10934858994 +0.4% > 458.sjeng | 2592620709 | 2641551464 +1.9% | 2641641389 +1.9% > 462.libquantum | 1814487525 | 1856446214 +2.3% | 1856475555 +2.3% > 464.h264ref | 13528736878 | 13606989269 +0.6% | 13607467432 +0.6% > 471.omnetpp | 38721066702 | 38908678658 +0.5% | 38908940169 +0.5% > 473.astar | 3924015756 | 3967867190 +1.1% | 3967897551 +1.1% > 483.xalancbmk | 165897692838 | 166818255397 +0.6% | 166819397831 +0.6% > > Pan > > > -----Original Message----- > From: Li, Pan2 > Sent: Monday, May 8, 2023 4:06 PM > To: Richard Biener <rguenther@suse.de> > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > After the bits patch like below. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > tree_base code unchanged. > > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. > > tree_base rtx_def > code: 16 code: 8 > side_effects_flag: 1 mode: 16 > constant_flag: 1 > addressable_flag: 1 > volatile_flag: 1 > readonly_flag: 1 > asm_written_flag: 1 > nowarning_flag: 1 > visited: 1 > used_flag: 1 > nothrow_flag: 1 > static_flag: 1 > public_flag: 1 > private_flag: 1 > protected_flag: 1 > deprecated_flag: 1 > default_def_flag: 1 > > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. > > rtx_def code 16 => 12 bits. > rtx_def mode 8 => 12 bits. > tree_base code 16 => 12 bits. > > Pan > > -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Monday, May 8, 2023 3:38 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > On Mon, 8 May 2023, Li, Pan2 wrote: > > > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to > > fix this ICE after mode bits change. > > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. > > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). > > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. > > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. > > Richard. > > > I will re-trigger the memory allocate bytes test with below changes > > for X86. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > tree_base code unchanged. > > > > Pan > > > > -----Original Message----- > > From: Li, Pan2 > > Sent: Monday, May 8, 2023 2:42 PM > > To: Richard Biener <rguenther@suse.de>; Jeff Law > > <jeffreyalaw@gmail.com> > > Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; > > richard.sandiford <richard.sandiford@arm.com>; gcc-patches > > <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub > > <jakub@redhat.com> > > Subject: RE: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > > > > Pan > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de> > > Sent: Monday, May 8, 2023 2:30 PM > > To: Jeff Law <jeffreyalaw@gmail.com> > > Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; > > juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; > > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > > jakub <jakub@redhat.com> > > Subject: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > On Sun, 7 May 2023, Jeff Law wrote: > > > > > > > > > > > On 5/6/23 19:55, Li, Pan2 wrote: > > > > It looks like we cannot simply swap the code and mode in rtx_def, > > > > the code may have to be the same bits as the tree_code in tree_base. > > > > Or we will meet ICE like below. > > > > > > > > rtx_def code 16 => 8 bits. > > > > rtx_def mode 8 => 16 bits. > > > > > > > > static inline decl_or_value > > > > dv_from_value (rtx value) > > > > { > > > > decl_or_value dv; > > > > dv = value; > > > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > > > return dv; > > > Ugh. We really just need to fix this code. It assumes particular > > > structure layouts and that's just wrong/dumb. > > > > Well, it's a neat trick ... we just need to adjust it to > > > > static inline bool > > dv_is_decl_p (decl_or_value dv) > > { > > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > > > > I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > > > > Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > > > > Richard. > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-08 8:05 ` Li, Pan2 2023-05-09 6:13 ` Li, Pan2 @ 2023-05-09 10:16 ` Richard Sandiford 2023-05-09 10:26 ` Richard Biener 1 sibling, 1 reply; 63+ messages in thread From: Richard Sandiford @ 2023-05-09 10:16 UTC (permalink / raw) To: Li, Pan2 Cc: Richard Biener, Jeff Law, Kito Cheng, juzhe.zhong, gcc-patches, palmer, jakub "Li, Pan2" <pan2.li@intel.com> writes: > After the bits patch like below. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > tree_base code unchanged. > > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. > > tree_base rtx_def > code: 16 code: 8 > side_effects_flag: 1 mode: 16 I think we should try hard to avoid that though. The 16-bit value should be aligned to 16 bits if at all possible. decl_or_value doesn't seem like something that should be dictating our approach here. Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux is intended to be a standands-compliant (hah!) way of switching between two pointer types in a reasonably efficient way. Thanks, Richard > constant_flag: 1 > addressable_flag: 1 > volatile_flag: 1 > readonly_flag: 1 > asm_written_flag: 1 > nowarning_flag: 1 > visited: 1 > used_flag: 1 > nothrow_flag: 1 > static_flag: 1 > public_flag: 1 > private_flag: 1 > protected_flag: 1 > deprecated_flag: 1 > default_def_flag: 1 > > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. > > rtx_def code 16 => 12 bits. > rtx_def mode 8 => 12 bits. > tree_base code 16 => 12 bits. > > Pan > > -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Monday, May 8, 2023 3:38 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > On Mon, 8 May 2023, Li, Pan2 wrote: > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to >> fix this ICE after mode bits change. > > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. > > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). > > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. > > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. > > Richard. > >> I will re-trigger the memory allocate bytes test with below changes >> for X86. >> >> rtx_def code 16 => 8 bits. >> rtx_def mode 8 => 16 bits. >> tree_base code unchanged. >> >> Pan >> >> -----Original Message----- >> From: Li, Pan2 >> Sent: Monday, May 8, 2023 2:42 PM >> To: Richard Biener <rguenther@suse.de>; Jeff Law >> <jeffreyalaw@gmail.com> >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub >> <jakub@redhat.com> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from >> 8-bit to 16-bit >> >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. >> >> Pan >> >> -----Original Message----- >> From: Richard Biener <rguenther@suse.de> >> Sent: Monday, May 8, 2023 2:30 PM >> To: Jeff Law <jeffreyalaw@gmail.com> >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; >> juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; >> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; >> jakub <jakub@redhat.com> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from >> 8-bit to 16-bit >> >> On Sun, 7 May 2023, Jeff Law wrote: >> >> > >> > >> > On 5/6/23 19:55, Li, Pan2 wrote: >> > > It looks like we cannot simply swap the code and mode in rtx_def, >> > > the code may have to be the same bits as the tree_code in tree_base. >> > > Or we will meet ICE like below. >> > > >> > > rtx_def code 16 => 8 bits. >> > > rtx_def mode 8 => 16 bits. >> > > >> > > static inline decl_or_value >> > > dv_from_value (rtx value) >> > > { >> > > decl_or_value dv; >> > > dv = value; >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE >> > > return dv; >> > Ugh. We really just need to fix this code. It assumes particular >> > structure layouts and that's just wrong/dumb. >> >> Well, it's a neat trick ... we just need to adjust it to >> >> static inline bool >> dv_is_decl_p (decl_or_value dv) >> { >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } >> >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... >> >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. >> >> Richard. >> > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-09 10:16 ` Richard Sandiford @ 2023-05-09 10:26 ` Richard Biener 2023-05-09 11:50 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: Richard Biener @ 2023-05-09 10:26 UTC (permalink / raw) To: Richard Sandiford Cc: Li, Pan2, Jeff Law, Kito Cheng, juzhe.zhong, gcc-patches, palmer, jakub On Tue, 9 May 2023, Richard Sandiford wrote: > "Li, Pan2" <pan2.li@intel.com> writes: > > After the bits patch like below. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > tree_base code unchanged. > > > > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. > > > > tree_base rtx_def > > code: 16 code: 8 > > side_effects_flag: 1 mode: 16 > > I think we should try hard to avoid that though. The 16-bit value should > be aligned to 16 bits if at all possible. decl_or_value doesn't seem > like something that should be dictating our approach here. > > Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux is > intended to be a standands-compliant (hah!) way of switching between two > pointer types in a reasonably efficient way. Ah, I wasn't aware of that - yes, that looks good to use I think. Pan, can you prepare a patch only doing such conversion of the var-tracking decl_or_value type? Aka make it typedef pointer_mux<rtx_def, tree_node> decl_or_value; and adjust uses? Thanks, Richard. > Thanks, > Richard > > > constant_flag: 1 > > addressable_flag: 1 > > volatile_flag: 1 > > readonly_flag: 1 > > asm_written_flag: 1 > > nowarning_flag: 1 > > visited: 1 > > used_flag: 1 > > nothrow_flag: 1 > > static_flag: 1 > > public_flag: 1 > > private_flag: 1 > > protected_flag: 1 > > deprecated_flag: 1 > > default_def_flag: 1 > > > > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. > > > > rtx_def code 16 => 12 bits. > > rtx_def mode 8 => 12 bits. > > tree_base code 16 => 12 bits. > > > > Pan > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de> > > Sent: Monday, May 8, 2023 3:38 PM > > To: Li, Pan2 <pan2.li@intel.com> > > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > > > On Mon, 8 May 2023, Li, Pan2 wrote: > > > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to > >> fix this ICE after mode bits change. > > > > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. > > > > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). > > > > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. > > > > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. > > > > Richard. > > > >> I will re-trigger the memory allocate bytes test with below changes > >> for X86. > >> > >> rtx_def code 16 => 8 bits. > >> rtx_def mode 8 => 16 bits. > >> tree_base code unchanged. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Li, Pan2 > >> Sent: Monday, May 8, 2023 2:42 PM > >> To: Richard Biener <rguenther@suse.de>; Jeff Law > >> <jeffreyalaw@gmail.com> > >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; > >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches > >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub > >> <jakub@redhat.com> > >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Richard Biener <rguenther@suse.de> > >> Sent: Monday, May 8, 2023 2:30 PM > >> To: Jeff Law <jeffreyalaw@gmail.com> > >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; > >> juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; > >> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > >> jakub <jakub@redhat.com> > >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> On Sun, 7 May 2023, Jeff Law wrote: > >> > >> > > >> > > >> > On 5/6/23 19:55, Li, Pan2 wrote: > >> > > It looks like we cannot simply swap the code and mode in rtx_def, > >> > > the code may have to be the same bits as the tree_code in tree_base. > >> > > Or we will meet ICE like below. > >> > > > >> > > rtx_def code 16 => 8 bits. > >> > > rtx_def mode 8 => 16 bits. > >> > > > >> > > static inline decl_or_value > >> > > dv_from_value (rtx value) > >> > > { > >> > > decl_or_value dv; > >> > > dv = value; > >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > >> > > return dv; > >> > Ugh. We really just need to fix this code. It assumes particular > >> > structure layouts and that's just wrong/dumb. > >> > >> Well, it's a neat trick ... we just need to adjust it to > >> > >> static inline bool > >> dv_is_decl_p (decl_or_value dv) > >> { > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > >> > >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > >> > >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > >> > >> Richard. > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-09 10:26 ` Richard Biener @ 2023-05-09 11:50 ` Li, Pan2 2023-05-10 5:09 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-09 11:50 UTC (permalink / raw) To: Richard Biener, Richard Sandiford Cc: Jeff Law, Kito Cheng, juzhe.zhong, gcc-patches, palmer, jakub Sure thing, I will have a try and keep you posted. Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Tuesday, May 9, 2023 6:26 PM To: Richard Sandiford <richard.sandiford@arm.com> Cc: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Tue, 9 May 2023, Richard Sandiford wrote: > "Li, Pan2" <pan2.li@intel.com> writes: > > After the bits patch like below. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > tree_base code unchanged. > > > > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. > > > > tree_base rtx_def > > code: 16 code: 8 > > side_effects_flag: 1 mode: 16 > > I think we should try hard to avoid that though. The 16-bit value > should be aligned to 16 bits if at all possible. decl_or_value > doesn't seem like something that should be dictating our approach here. > > Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux > is intended to be a standands-compliant (hah!) way of switching > between two pointer types in a reasonably efficient way. Ah, I wasn't aware of that - yes, that looks good to use I think. Pan, can you prepare a patch only doing such conversion of the var-tracking decl_or_value type? Aka make it typedef pointer_mux<rtx_def, tree_node> decl_or_value; and adjust uses? Thanks, Richard. > Thanks, > Richard > > > constant_flag: 1 > > addressable_flag: 1 > > volatile_flag: 1 > > readonly_flag: 1 > > asm_written_flag: 1 > > nowarning_flag: 1 > > visited: 1 > > used_flag: 1 > > nothrow_flag: 1 > > static_flag: 1 > > public_flag: 1 > > private_flag: 1 > > protected_flag: 1 > > deprecated_flag: 1 > > default_def_flag: 1 > > > > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. > > > > rtx_def code 16 => 12 bits. > > rtx_def mode 8 => 12 bits. > > tree_base code 16 => 12 bits. > > > > Pan > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de> > > Sent: Monday, May 8, 2023 3:38 PM > > To: Li, Pan2 <pan2.li@intel.com> > > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng > > <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford > > <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; > > palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > > Subject: RE: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > On Mon, 8 May 2023, Li, Pan2 wrote: > > > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able > >> to fix this ICE after mode bits change. > > > > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. > > > > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). > > > > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. > > > > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. > > > > Richard. > > > >> I will re-trigger the memory allocate bytes test with below changes > >> for X86. > >> > >> rtx_def code 16 => 8 bits. > >> rtx_def mode 8 => 16 bits. > >> tree_base code unchanged. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Li, Pan2 > >> Sent: Monday, May 8, 2023 2:42 PM > >> To: Richard Biener <rguenther@suse.de>; Jeff Law > >> <jeffreyalaw@gmail.com> > >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; > >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches > >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub > >> <jakub@redhat.com> > >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Richard Biener <rguenther@suse.de> > >> Sent: Monday, May 8, 2023 2:30 PM > >> To: Jeff Law <jeffreyalaw@gmail.com> > >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng > >> <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford > >> <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; > >> palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> On Sun, 7 May 2023, Jeff Law wrote: > >> > >> > > >> > > >> > On 5/6/23 19:55, Li, Pan2 wrote: > >> > > It looks like we cannot simply swap the code and mode in > >> > > rtx_def, the code may have to be the same bits as the tree_code in tree_base. > >> > > Or we will meet ICE like below. > >> > > > >> > > rtx_def code 16 => 8 bits. > >> > > rtx_def mode 8 => 16 bits. > >> > > > >> > > static inline decl_or_value > >> > > dv_from_value (rtx value) > >> > > { > >> > > decl_or_value dv; > >> > > dv = value; > >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > >> > > return dv; > >> > Ugh. We really just need to fix this code. It assumes > >> > particular structure layouts and that's just wrong/dumb. > >> > >> Well, it's a neat trick ... we just need to adjust it to > >> > >> static inline bool > >> dv_is_decl_p (decl_or_value dv) > >> { > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > >> > >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > >> > >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > >> > >> Richard. > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-09 11:50 ` Li, Pan2 @ 2023-05-10 5:09 ` Li, Pan2 2023-05-10 7:22 ` Li, Pan2 0 siblings, 1 reply; 63+ messages in thread From: Li, Pan2 @ 2023-05-10 5:09 UTC (permalink / raw) To: Richard Biener, Richard Sandiford Cc: Jeff Law, Kito Cheng, juzhe.zhong, gcc-patches, palmer, jakub Just migrated to the pointer_mux for the var-tracking, it works well even the bitsize of tree_base code is different from the rtl_def code. I will prepare the PATCH if there is no surprise from the X86 bootstrap test. Thanks Richard for pointing out the pointer_mux, 😉! Pan -----Original Message----- From: Li, Pan2 Sent: Tuesday, May 9, 2023 7:51 PM To: Richard Biener <rguenther@suse.de>; Richard Sandiford <richard.sandiford@arm.com> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Sure thing, I will have a try and keep you posted. Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Tuesday, May 9, 2023 6:26 PM To: Richard Sandiford <richard.sandiford@arm.com> Cc: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Tue, 9 May 2023, Richard Sandiford wrote: > "Li, Pan2" <pan2.li@intel.com> writes: > > After the bits patch like below. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > tree_base code unchanged. > > > > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. > > > > tree_base rtx_def > > code: 16 code: 8 > > side_effects_flag: 1 mode: 16 > > I think we should try hard to avoid that though. The 16-bit value > should be aligned to 16 bits if at all possible. decl_or_value > doesn't seem like something that should be dictating our approach here. > > Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux > is intended to be a standands-compliant (hah!) way of switching > between two pointer types in a reasonably efficient way. Ah, I wasn't aware of that - yes, that looks good to use I think. Pan, can you prepare a patch only doing such conversion of the var-tracking decl_or_value type? Aka make it typedef pointer_mux<rtx_def, tree_node> decl_or_value; and adjust uses? Thanks, Richard. > Thanks, > Richard > > > constant_flag: 1 > > addressable_flag: 1 > > volatile_flag: 1 > > readonly_flag: 1 > > asm_written_flag: 1 > > nowarning_flag: 1 > > visited: 1 > > used_flag: 1 > > nothrow_flag: 1 > > static_flag: 1 > > public_flag: 1 > > private_flag: 1 > > protected_flag: 1 > > deprecated_flag: 1 > > default_def_flag: 1 > > > > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. > > > > rtx_def code 16 => 12 bits. > > rtx_def mode 8 => 12 bits. > > tree_base code 16 => 12 bits. > > > > Pan > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de> > > Sent: Monday, May 8, 2023 3:38 PM > > To: Li, Pan2 <pan2.li@intel.com> > > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng > > <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford > > <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; > > palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > > Subject: RE: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > On Mon, 8 May 2023, Li, Pan2 wrote: > > > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able > >> to fix this ICE after mode bits change. > > > > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. > > > > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). > > > > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. > > > > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. > > > > Richard. > > > >> I will re-trigger the memory allocate bytes test with below changes > >> for X86. > >> > >> rtx_def code 16 => 8 bits. > >> rtx_def mode 8 => 16 bits. > >> tree_base code unchanged. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Li, Pan2 > >> Sent: Monday, May 8, 2023 2:42 PM > >> To: Richard Biener <rguenther@suse.de>; Jeff Law > >> <jeffreyalaw@gmail.com> > >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; > >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches > >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub > >> <jakub@redhat.com> > >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Richard Biener <rguenther@suse.de> > >> Sent: Monday, May 8, 2023 2:30 PM > >> To: Jeff Law <jeffreyalaw@gmail.com> > >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng > >> <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford > >> <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; > >> palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> On Sun, 7 May 2023, Jeff Law wrote: > >> > >> > > >> > > >> > On 5/6/23 19:55, Li, Pan2 wrote: > >> > > It looks like we cannot simply swap the code and mode in > >> > > rtx_def, the code may have to be the same bits as the tree_code in tree_base. > >> > > Or we will meet ICE like below. > >> > > > >> > > rtx_def code 16 => 8 bits. > >> > > rtx_def mode 8 => 16 bits. > >> > > > >> > > static inline decl_or_value > >> > > dv_from_value (rtx value) > >> > > { > >> > > decl_or_value dv; > >> > > dv = value; > >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > >> > > return dv; > >> > Ugh. We really just need to fix this code. It assumes > >> > particular structure layouts and that's just wrong/dumb. > >> > >> Well, it's a neat trick ... we just need to adjust it to > >> > >> static inline bool > >> dv_is_decl_p (decl_or_value dv) > >> { > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > >> > >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > >> > >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > >> > >> Richard. > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-10 5:09 ` Li, Pan2 @ 2023-05-10 7:22 ` Li, Pan2 0 siblings, 0 replies; 63+ messages in thread From: Li, Pan2 @ 2023-05-10 7:22 UTC (permalink / raw) To: Li, Pan2, Richard Biener, Richard Sandiford Cc: Jeff Law, Kito Cheng, juzhe.zhong, gcc-patches, palmer, jakub Filed the PATCH with var-tracking only as below, please help to review. Thanks! https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617973.html Pan -----Original Message----- From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches Sent: Wednesday, May 10, 2023 1:09 PM To: Richard Biener <rguenther@suse.de>; Richard Sandiford <richard.sandiford@arm.com> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Just migrated to the pointer_mux for the var-tracking, it works well even the bitsize of tree_base code is different from the rtl_def code. I will prepare the PATCH if there is no surprise from the X86 bootstrap test. Thanks Richard for pointing out the pointer_mux, 😉! Pan -----Original Message----- From: Li, Pan2 Sent: Tuesday, May 9, 2023 7:51 PM To: Richard Biener <rguenther@suse.de>; Richard Sandiford <richard.sandiford@arm.com> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Sure thing, I will have a try and keep you posted. Pan -----Original Message----- From: Richard Biener <rguenther@suse.de> Sent: Tuesday, May 9, 2023 6:26 PM To: Richard Sandiford <richard.sandiford@arm.com> Cc: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Tue, 9 May 2023, Richard Sandiford wrote: > "Li, Pan2" <pan2.li@intel.com> writes: > > After the bits patch like below. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > tree_base code unchanged. > > > > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. > > > > tree_base rtx_def > > code: 16 code: 8 > > side_effects_flag: 1 mode: 16 > > I think we should try hard to avoid that though. The 16-bit value > should be aligned to 16 bits if at all possible. decl_or_value > doesn't seem like something that should be dictating our approach here. > > Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux > is intended to be a standands-compliant (hah!) way of switching > between two pointer types in a reasonably efficient way. Ah, I wasn't aware of that - yes, that looks good to use I think. Pan, can you prepare a patch only doing such conversion of the var-tracking decl_or_value type? Aka make it typedef pointer_mux<rtx_def, tree_node> decl_or_value; and adjust uses? Thanks, Richard. > Thanks, > Richard > > > constant_flag: 1 > > addressable_flag: 1 > > volatile_flag: 1 > > readonly_flag: 1 > > asm_written_flag: 1 > > nowarning_flag: 1 > > visited: 1 > > used_flag: 1 > > nothrow_flag: 1 > > static_flag: 1 > > public_flag: 1 > > private_flag: 1 > > protected_flag: 1 > > deprecated_flag: 1 > > default_def_flag: 1 > > > > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. > > > > rtx_def code 16 => 12 bits. > > rtx_def mode 8 => 12 bits. > > tree_base code 16 => 12 bits. > > > > Pan > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de> > > Sent: Monday, May 8, 2023 3:38 PM > > To: Li, Pan2 <pan2.li@intel.com> > > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng > > <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford > > <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; > > palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > > Subject: RE: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > On Mon, 8 May 2023, Li, Pan2 wrote: > > > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able > >> to fix this ICE after mode bits change. > > > > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. > > > > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). > > > > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. > > > > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. > > > > Richard. > > > >> I will re-trigger the memory allocate bytes test with below changes > >> for X86. > >> > >> rtx_def code 16 => 8 bits. > >> rtx_def mode 8 => 16 bits. > >> tree_base code unchanged. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Li, Pan2 > >> Sent: Monday, May 8, 2023 2:42 PM > >> To: Richard Biener <rguenther@suse.de>; Jeff Law > >> <jeffreyalaw@gmail.com> > >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; > >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches > >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub > >> <jakub@redhat.com> > >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > >> > >> Pan > >> > >> -----Original Message----- > >> From: Richard Biener <rguenther@suse.de> > >> Sent: Monday, May 8, 2023 2:30 PM > >> To: Jeff Law <jeffreyalaw@gmail.com> > >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng > >> <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford > >> <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; > >> palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> > >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from > >> 8-bit to 16-bit > >> > >> On Sun, 7 May 2023, Jeff Law wrote: > >> > >> > > >> > > >> > On 5/6/23 19:55, Li, Pan2 wrote: > >> > > It looks like we cannot simply swap the code and mode in > >> > > rtx_def, the code may have to be the same bits as the tree_code in tree_base. > >> > > Or we will meet ICE like below. > >> > > > >> > > rtx_def code 16 => 8 bits. > >> > > rtx_def mode 8 => 16 bits. > >> > > > >> > > static inline decl_or_value > >> > > dv_from_value (rtx value) > >> > > { > >> > > decl_or_value dv; > >> > > dv = value; > >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > >> > > return dv; > >> > Ugh. We really just need to fix this code. It assumes > >> > particular structure layouts and that's just wrong/dumb. > >> > >> Well, it's a neat trick ... we just need to adjust it to > >> > >> static inline bool > >> dv_is_decl_p (decl_or_value dv) > >> { > >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > >> > >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > >> > >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > >> > >> Richard. > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-05-06 2:20 ` Li, Pan2 2023-05-06 2:48 ` Li, Pan2 @ 2023-05-08 1:35 ` Li, Pan2 1 sibling, 0 replies; 63+ messages in thread From: Li, Pan2 @ 2023-05-08 1:35 UTC (permalink / raw) To: Kito Cheng Cc: juzhe.zhong, rguenther, richard.sandiford, jeffreyalaw, gcc-patches, palmer, jakub Update the X86 memory bytes allocated by below changes (included kito's patch for the tree common part). rtx_def code 16 => 12 bits. rtx_def mode 8 => 12 bits. tree_base code 16 => 12 bits. Bytes allocated with O2: ----------------------------------------------------------------------------------------------------- Benchmark | upstream | with the PATCH ----------------------------------------------------------------------------------------------------- 400.perlbench | 25286185160 | 25286590847 ~0.0% 401.bzip2 | 1429883731 | 1430373103 ~0.0% 403.gcc | 55023568981 | 55027574220 ~0.0% 429.mcf | 1360975660 | 1360959361 ~0.0% 445.gobmk | 12791636502 | 12789648370 ~0.0% 456.hmmer | 9354433652 | 9353899089 ~0.0% 458.sjeng | 1991260562 | 1991107773 ~0.0% 462.libquantum | 1725112078 | 1724972077 ~0.0% 464.h264ref | 8597673515 | 8597748172 ~0.0% 471.omnetpp | 37613034778 | 37614346380 ~0.0% 473.astar | 3817295518 | 3817226365 ~0.0% 483.xalancbmk | 149418776991 | 149405214817 ~0.0% Bytes allocated with Ofast + funroll-loops: ------------------------------------------------------------------------------------------ Benchmark | upstream | with the PATCH ------------------------------------------------------------------------------------------ 400.perlbench | 30438407499 | 30568217795 +0.4% 401.bzip2 | 2277114519 | 2318588280 +1.8% 403.gcc | 64499664264 | 64764400606 +0.4% 429.mcf | 1361486758 | 1399872438 +2.8% 445.gobmk | 15258056111 | 15392769408 +0.9% 456.hmmer | 10896615649 | 10934649010 +0.3% 458.sjeng | 2592620709 | 2641551464 +1.9% 462.libquantum | 1814487525 | 1856446214 +2.3% 464.h264ref | 13528736878 | 13606989269 +0.6% 471.omnetpp | 38721066702 | 38908678658 +0.5% 473.astar | 3924015756 | 3967867190 +1.1% 483.xalancbmk | 165897692838 | 166818255397 +0.6% Pan -----Original Message----- From: Li, Pan2 Sent: Saturday, May 6, 2023 10:20 AM To: Kito Cheng <kito.cheng@gmail.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Yes, that makes sense, will have a try and keep you posted. Pan -----Original Message----- From: Kito Cheng <kito.cheng@gmail.com> Sent: Saturday, May 6, 2023 10:19 AM To: Li, Pan2 <pan2.li@intel.com> Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much. On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V. > > Pan > > From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> > Sent: Saturday, May 6, 2023 10:00 AM > To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com> > Cc: rguenther <rguenther@suse.de>; richard.sandiford > <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; > jakub <jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit > > Yeah, you should also swap mode and code in rtx_def according to > Richard suggestion since it will not change the rtx_def data structure. > > I think the only problem is the mode in tree data structure. > ________________________________ > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > From: Kito Cheng<mailto:kito.cheng@gmail.com> > Date: 2023-05-06 09:53 > To: Li, Pan2<mailto:pan2.li@intel.com> > CC: Richard Biener<mailto:rguenther@suse.de>; > 钟居哲<mailto:juzhe.zhong@rivai.ai>; > richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff > Law<mailto:jeffreyalaw@gmail.com>; > gcc-patches<mailto:gcc-patches@gcc.gnu.org>; > palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from > 8-bit to 16-bit Hi Pan: > > Could you try to apply the following diff and measure again? This > makes tree_type_common size unchanged. > > > sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common= > 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this > diff) > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h index > af795aa81f98..b8ccfa407ed9 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common { > tree attributes; > unsigned int uid; > > + ENUM_BITFIELD(machine_mode) mode : 16; > + > unsigned int precision : 10; > unsigned no_force_blk_flag : 1; > unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct > GTY(()) tree_type_common { > unsigned restrict_flag : 1; > unsigned contains_placeholder_bits : 2; > > - ENUM_BITFIELD(machine_mode) mode : 16; > > /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. > TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7 > +1713,7 @@ struct GTY(()) tree_type_common { > unsigned empty_flag : 1; > unsigned indivisible_p : 1; > unsigned no_named_args_stdarg_p : 1; > - unsigned spare : 15; > + unsigned spare : 7; > > alias_set_type alias_set; > tree pointer_to; > > On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote: > > > > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target. > > > > Bytes allocated with O2: > > ----------------------------------------------------------------------------------------------------- > > Benchmark | upstream | with this PATCH > > ----------------------------------------------------------------------------------------------------- > > 400.perlbench | 25286185160 | 25176544846 ~0.0% > > 401.bzip2 | 1429883731 | 1391040027 -2.7% > > 403.gcc | 55023568981 | 54798890746 ~0.0% > > 429.mcf | 1360975660 | 1321537710 -2.9% > > 445.gobmk | 12791636502 | 12666523431 -1.0% > > 456.hmmer | 9354433652 | 9279189174 ~0.0% > > 458.sjeng | 1991260562 | 1944031904 -2.4% > > 462.libquantum | 1725112078 | 1684213981 -2.4% > > 464.h264ref | 8597673515 | 8528855778 ~0.0% > > 471.omnetpp | 37613034778 | 37432278047 ~0.0% > > 473.astar | 3817295518 | 3772460508 -1.2% > > 483.xalancbmk | 149418776991 | 148545162207 ~0.0% > > > > Bytes allocated with Ofast + funroll-loops: > > ------------------------------------------------------------------------------------------ > > Benchmark | upstream | with this PATCH > > ------------------------------------------------------------------------------------------ > > 400.perlbench | 30438407499 | 30574152897 ~0.0% > > 401.bzip2 | 2277114519 | 2319432664 +1.9% > > 403.gcc | 64499664264 | 64781232731 ~0.0% > > 429.mcf | 1361486758 | 1399942116 +2.8% > > 445.gobmk | 15258056111 | 15396801542 +1.0% > > 456.hmmer | 10896615649 | 10936223486 ~0.0% > > 458.sjeng | 2592620709 | 2641687496 +1.9% > > 462.libquantum | 1814487525 | 1854518500 +2.2% > > 464.h264ref | 13528736878 | 13614517066 ~0.0% > > 471.omnetpp | 38721066702 | 38910524667 ~0.0% > > 473.astar | 3924015756 | 3968057027 +1.1% > > 483.xalancbmk | 165897692838 | 166843885880 ~0.0% > > > > Pan > > > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > Sent: Friday, May 5, 2023 2:25 PM > > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>> > > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; > > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; > > richard.sandiford > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff > > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > gcc-patches > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size > > from 8-bit to 16-bit > > > > On Fri, 5 May 2023, Li, Pan2 wrote: > > > > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated". > > > > > > Consider some variance of valgrind, it looks like the impact to > > > bytes allocated may be limited. However, I am still running this > > > for x86, it will take more than 30 hours for each iteration... > > > > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult. > > > > Richard. > > > > > RISC-V GCC Version: > > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc > > > >> --version > > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 > > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc. > > > This is free software; see the source for copying conditions. > > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > > > Bytes allocated with O2: > > > ----------------------------------------------------------------------------------------------------- > > > Benchmark | upstream | with this PATCH > > > ----------------------------------------------------------------------------------------------------- > > > 400.perlbench | 29699642875 | 29949876269 ~0.0% > > > 401.bzip2 | 1641041659 | 1755563972 +6.95% > > > 403.gcc | 68447500516 | 68900883291 ~0.0% > > > 429.mcf | 1433156462 | 1433253373 ~0.0% > > > 445.gobmk | 14239225210 | 14463438465 ~0.0% > > > 456.hmmer | 9635955623 | 9808534948 +1.8% > > > 458.sjeng | 2419478204 | 2545478940 +5.4% > > > 462.libquantum | 1686404489 | 1800884197 +6.8% > > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6% > > > 471.omnetpp | 40814627684 | 41185864529 ~0.0% > > > 473.astar | 3807097529 | 3928428183 +3.2% > > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0% > > > > > > Bytes allocated with Ofast + funroll-loops: > > > ------------------------------------------------------------------------------------------ > > > Benchmark | upstream | with this PATCH > > > ------------------------------------------------------------------------------------------ > > > 400.perlbench | 39491184733 | 39223020267 ~0.0% > > > 401.bzip2 | 2843871517 | 2730383463 ~0% > > > 403.gcc | 84195991898 | 83730632955 -4.0% > > > 429.mcf | 1481381164 | 1367309565 -7.7% > > > 445.gobmk | 20123943663 | 19886116394 -1.2% > > > 456.hmmer | 12302445139 | 12121745383 -1.5% > > > 458.sjeng | 3884712615 | 3755481930 -3.3% > > > 462.libquantum | 1966619940 | 1852274342 -5.8% > > > 464.h264ref | 19219365552 | 19050288201 ~0.0% > > > 471.omnetpp | 45701008325 | 45327805079 ~0.0% > > > 473.astar | 4118600354 | 3995943705 -3.0% > > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0% > > > > > > Pan > > > > > > > > > -----Original Message----- > > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ??? > > > Sent: Thursday, April 13, 2023 7:23 AM > > > To: kito.cheng > > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther > > > <rguenther@suse.de<mailto:rguenther@suse.de>> > > > Cc: richard.sandiford > > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; > > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; > > > gcc-patches > > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer > > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub > > > <jakub@redhat.com<mailto:jakub@redhat.com>> > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > > Yeah, like kito said. > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV. > > > And we like ARM SVE style implmentation. > > > > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit. > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right? > > > > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently. > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc. > > > > > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes). > > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards? > > > > > > Thank you so much for all comments. > > > > > > > > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> > > > > > > From: Kito Cheng > > > Date: 2023-04-12 17:31 > > > To: Richard Biener > > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; > > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub > > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size > > > from 8-bit to 16-bit > > > > > The concept of fractional LMUL is the same as the concept of > > > > > AArch64's partial SVE vectors, so they can only access the > > > > > lowest part, like SVE's partial vector. > > > > > > > > > > We want to spill/restore the exact size of those modes (1/2, > > > > > 1/4, 1/8), so adding dedicated modes for those partial vector > > > > > modes should be unavoidable IMO. > > > > > > > > > > And even if we use sub-vector, we still need to define those > > > > > partial vector types. > > > > > > > > Could you use integer modes for the fractional vectors? > > > > > > You mean using the scalar integer mode like using (subreg:SI > > > (reg:VNx4SI) 0) to represent > > > LMUL=1/4? > > > (Assume VNx4SI is mode for M1) > > > > > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits. > > > > > > > For computation you can always appropriately limit the LEN? > > > > > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) > > > to guarantee the vector length is at least larger than N bits, but > > > it's just guarantee the minimal length like SVE guarantee the > > > minimal vector length is 128 bits > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg) > ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 14:54 ` Jeff Law 2023-04-10 15:02 ` juzhe.zhong 2023-04-10 15:14 ` juzhe.zhong @ 2023-04-10 15:18 ` Jakub Jelinek 2023-04-10 15:22 ` juzhe.zhong ` (2 more replies) 2 siblings, 3 replies; 63+ messages in thread From: Jakub Jelinek @ 2023-04-10 15:18 UTC (permalink / raw) To: Jeff Law Cc: juzhe.zhong, gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote: > This is likely going to be very controversial. It's going to increase the > size of two of most heavily used data structures in GCC (rtx and trees). > > The first thing I would ask is whether or not we really need the full matrix > in practice or if we can combine some of the modes. > > Why hasn't aarch64 stumbled over this problem? From what I can see, x86 has 130 modes and aarch64 178 right now. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 15:18 ` Jakub Jelinek @ 2023-04-10 15:22 ` juzhe.zhong 2023-04-10 20:42 ` Jeff Law [not found] ` <20230410232205400970205@rivai.ai> 2023-04-10 20:36 ` Jeff Law 2 siblings, 1 reply; 63+ messages in thread From: juzhe.zhong @ 2023-04-10 15:22 UTC (permalink / raw) To: jakub, Jeff Law Cc: gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 1233 bytes --] Yeah, aarch64 already has 178, RVV has much more types than aarch64... You can see intrinsic doc: https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md api number explodes. As well as tuples types in RVV much more than aarch64. Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV? Not sure. I think kito may help for this. juzhe.zhong@rivai.ai From: Jakub Jelinek Date: 2023-04-10 23:18 To: Jeff Law CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote: > This is likely going to be very controversial. It's going to increase the > size of two of most heavily used data structures in GCC (rtx and trees). > > The first thing I would ask is whether or not we really need the full matrix > in practice or if we can combine some of the modes. > > Why hasn't aarch64 stumbled over this problem? From what I can see, x86 has 130 modes and aarch64 178 right now. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 15:22 ` juzhe.zhong @ 2023-04-10 20:42 ` Jeff Law 2023-04-10 23:03 ` juzhe.zhong 2023-04-11 1:36 ` juzhe.zhong 0 siblings, 2 replies; 63+ messages in thread From: Jeff Law @ 2023-04-10 20:42 UTC (permalink / raw) To: juzhe.zhong, jakub Cc: gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther On 4/10/23 09:22, juzhe.zhong@rivai.ai wrote: > Yeah, aarch64 already has 178, RVV has much more types than aarch64... > You can see intrinsic doc: > https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md <https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md> > api number explodes. > > As well as tuples types in RVV much more than aarch64. > Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV? > Not sure. > I think kito may help for this. I think it's a discussion we need to have. I really expect efforts to have > 256 modes are going to be very controversial. jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 20:42 ` Jeff Law @ 2023-04-10 23:03 ` juzhe.zhong 2023-04-11 1:36 ` juzhe.zhong 1 sibling, 0 replies; 63+ messages in thread From: juzhe.zhong @ 2023-04-10 23:03 UTC (permalink / raw) To: Jeff Law, jakub Cc: gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 1313 bytes --] Another feasible solution: Maybe we can drop supporting segment intrinsics in upstream GCC. We let the downstream companies support segment in their own downstream GCC ? juzhe.zhong@rivai.ai From: Jeff Law Date: 2023-04-11 04:42 To: juzhe.zhong; jakub CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On 4/10/23 09:22, juzhe.zhong@rivai.ai wrote: > Yeah, aarch64 already has 178, RVV has much more types than aarch64... > You can see intrinsic doc: > https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md <https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md> > api number explodes. > > As well as tuples types in RVV much more than aarch64. > Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV? > Not sure. > I think kito may help for this. I think it's a discussion we need to have. I really expect efforts to have > 256 modes are going to be very controversial. jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 20:42 ` Jeff Law 2023-04-10 23:03 ` juzhe.zhong @ 2023-04-11 1:36 ` juzhe.zhong 1 sibling, 0 replies; 63+ messages in thread From: juzhe.zhong @ 2023-04-11 1:36 UTC (permalink / raw) To: jeffreyalaw, jakub Cc: gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 1356 bytes --] Hi, I have checked SDnode in LLVM which is a similiar data structure with RTX in GCC. The SDnode in LLVM occupy 80bytes. Can we have some tool to test the memory consuming of the whole GCC with extended-size RTX? juzhe.zhong@rivai.ai From: Jeff Law Date: 2023-04-11 04:42 To: juzhe.zhong; jakub CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On 4/10/23 09:22, juzhe.zhong@rivai.ai wrote: > Yeah, aarch64 already has 178, RVV has much more types than aarch64... > You can see intrinsic doc: > https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md <https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md> > api number explodes. > > As well as tuples types in RVV much more than aarch64. > Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV? > Not sure. > I think kito may help for this. I think it's a discussion we need to have. I really expect efforts to have > 256 modes are going to be very controversial. jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <20230410232205400970205@rivai.ai>]
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit [not found] ` <20230410232205400970205@rivai.ai> @ 2023-04-10 15:33 ` juzhe.zhong 2023-04-10 20:39 ` Jeff Law 0 siblings, 1 reply; 63+ messages in thread From: juzhe.zhong @ 2023-04-10 15:33 UTC (permalink / raw) To: jakub, Jeff Law Cc: gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 2868 bytes --] I saw many redundant scalar modes: E_CDImode, /* machmode.def:267 */ #define HAVE_CDImode #ifdef USE_ENUM_MODES #define CDImode E_CDImode #else #define CDImode (complex_mode ((complex_mode::from_int) E_CDImode)) #endif E_CTImode, /* machmode.def:267 */ #define HAVE_CTImode #ifdef USE_ENUM_MODES #define CTImode E_CTImode #else #define CTImode (complex_mode ((complex_mode::from_int) E_CTImode)) #endif E_HCmode, /* machmode.def:269 */ #define HAVE_HCmode #ifdef USE_ENUM_MODES #define HCmode E_HCmode #else #define HCmode (complex_mode ((complex_mode::from_int) E_HCmode)) #endif E_SCmode, /* machmode.def:269 */ #define HAVE_SCmode #ifdef USE_ENUM_MODES #define SCmode E_SCmode #else #define SCmode (complex_mode ((complex_mode::from_int) E_SCmode)) #endif E_DCmode, /* machmode.def:269 */ #define HAVE_DCmode #ifdef USE_ENUM_MODES #define DCmode E_DCmode #else #define DCmode (complex_mode ((complex_mode::from_int) E_DCmode)) #endif E_TCmode, /* machmode.def:269 */ #define HAVE_TCmode #ifdef USE_ENUM_MODES #define TCmode E_TCmode #else #define TCmode (complex_mode ((complex_mode::from_int) E_TCmode)) #endif ... These scalar modes are redundant I think, can we forbid them? There are 40+ scalar modes that are not used. juzhe.zhong@rivai.ai From: juzhe.zhong@rivai.ai Date: 2023-04-10 23:22 To: jakub; Jeff Law CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit Yeah, aarch64 already has 178, RVV has much more types than aarch64... You can see intrinsic doc: https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md api number explodes. As well as tuples types in RVV much more than aarch64. Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV? Not sure. I think kito may help for this. juzhe.zhong@rivai.ai From: Jakub Jelinek Date: 2023-04-10 23:18 To: Jeff Law CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote: > This is likely going to be very controversial. It's going to increase the > size of two of most heavily used data structures in GCC (rtx and trees). > > The first thing I would ask is whether or not we really need the full matrix > in practice or if we can combine some of the modes. > > Why hasn't aarch64 stumbled over this problem? From what I can see, x86 has 130 modes and aarch64 178 right now. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 15:33 ` juzhe.zhong @ 2023-04-10 20:39 ` Jeff Law 0 siblings, 0 replies; 63+ messages in thread From: Jeff Law @ 2023-04-10 20:39 UTC (permalink / raw) To: juzhe.zhong, jakub Cc: gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther On 4/10/23 09:33, juzhe.zhong@rivai.ai wrote: > I saw many redundant scalar modes: > > E_CDImode, /* machmode.def:267 */ > #define HAVE_CDImode > #ifdef USE_ENUM_MODES > #define CDImode E_CDImode > #else > #define CDImode (complex_mode ((complex_mode::from_int) E_CDImode)) > #endif > E_CTImode, /* machmode.def:267 */ > #define HAVE_CTImode > #ifdef USE_ENUM_MODES > #define CTImode E_CTImode > #else > #define CTImode (complex_mode ((complex_mode::from_int) E_CTImode)) > #endif > E_HCmode, /* machmode.def:269 */ > #define HAVE_HCmode > #ifdef USE_ENUM_MODES > #define HCmode E_HCmode > #else > #define HCmode (complex_mode ((complex_mode::from_int) E_HCmode)) > #endif > E_SCmode, /* machmode.def:269 */ > #define HAVE_SCmode > #ifdef USE_ENUM_MODES > #define SCmode E_SCmode > #else > #define SCmode (complex_mode ((complex_mode::from_int) E_SCmode)) > #endif > E_DCmode, /* machmode.def:269 */ > #define HAVE_DCmode > #ifdef USE_ENUM_MODES > #define DCmode E_DCmode > #else > #define DCmode (complex_mode ((complex_mode::from_int) E_DCmode)) > #endif > E_TCmode, /* machmode.def:269 */ > #define HAVE_TCmode > #ifdef USE_ENUM_MODES > #define TCmode E_TCmode > #else > #define TCmode (complex_mode ((complex_mode::from_int) E_TCmode)) > #endif > ... > > These scalar modes are redundant I think, can we forbid them? > There are 40+ scalar modes that are not used. Those are fairly standard complex modes. Those are unlikely to go away. Some of those might be redundant with 2 element vector modes, but I'd hesitate to do something like using CDI to represent a 2XDI vector. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 15:18 ` Jakub Jelinek 2023-04-10 15:22 ` juzhe.zhong [not found] ` <20230410232205400970205@rivai.ai> @ 2023-04-10 20:36 ` Jeff Law 2023-04-10 22:53 ` juzhe.zhong 2 siblings, 1 reply; 63+ messages in thread From: Jeff Law @ 2023-04-10 20:36 UTC (permalink / raw) To: Jakub Jelinek Cc: juzhe.zhong, gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther On 4/10/23 09:18, Jakub Jelinek wrote: > On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote: >> This is likely going to be very controversial. It's going to increase the >> size of two of most heavily used data structures in GCC (rtx and trees). >> >> The first thing I would ask is whether or not we really need the full matrix >> in practice or if we can combine some of the modes. >> >> Why hasn't aarch64 stumbled over this problem? > > From what I can see, x86 has 130 modes and aarch64 178 right now. To put it another way. Why does RISC-V have so many more modes than AArch64. Jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 20:36 ` Jeff Law @ 2023-04-10 22:53 ` juzhe.zhong 0 siblings, 0 replies; 63+ messages in thread From: juzhe.zhong @ 2023-04-10 22:53 UTC (permalink / raw) To: Jeff Law, jakub Cc: gcc-patches, kito.cheng, palmer, richard.sandiford, rguenther [-- Attachment #1: Type: text/plain, Size: 1158 bytes --] I don't know, maybe we can try to ask rvv-intrinsic-doc define so many tuple types and try to make them reduce the api && tuple types? I am going to remove all FP16 vector to see whether we can reduce machine modes <= 256. I think it may be probably helping to fix that. juzhe.zhong@rivai.ai From: Jeff Law Date: 2023-04-11 04:36 To: Jakub Jelinek CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit On 4/10/23 09:18, Jakub Jelinek wrote: > On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote: >> This is likely going to be very controversial. It's going to increase the >> size of two of most heavily used data structures in GCC (rtx and trees). >> >> The first thing I would ask is whether or not we really need the full matrix >> in practice or if we can combine some of the modes. >> >> Why hasn't aarch64 stumbled over this problem? > > From what I can see, x86 has 130 modes and aarch64 178 right now. To put it another way. Why does RISC-V have so many more modes than AArch64. Jeff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit 2023-04-10 14:48 [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit juzhe.zhong 2023-04-10 14:54 ` Jeff Law @ 2023-04-10 15:10 ` Jakub Jelinek 1 sibling, 0 replies; 63+ messages in thread From: Jakub Jelinek @ 2023-04-10 15:10 UTC (permalink / raw) To: juzhe.zhong Cc: gcc-patches, kito.cheng, palmer, jeffreyalaw, richard.sandiford, rguenther On Mon, Apr 10, 2023 at 10:48:08PM +0800, juzhe.zhong@rivai.ai wrote: > * rtl.h (struct GTY): Ditto. > --- a/gcc/rtl.h > +++ b/gcc/rtl.h > @@ -313,7 +313,7 @@ struct GTY((desc("0"), tag("0"), > ENUM_BITFIELD(rtx_code) code: 16; > > /* The kind of value the expression has. */ > - ENUM_BITFIELD(machine_mode) mode : 8; > + ENUM_BITFIELD(machine_mode) mode : 16; > > /* 1 in a MEM if we should keep the alias set for this mem unchanged > when we access a component. At least for struct rtx_def this is certainly unacceptable. The widely used structure is carefully laid out so that it doesn't waste any bits - there are 16 + 8 + 8 bits, then 32-bit union, and then union of something that needs on 64-bit hosts 64-bit alignment. So header nicely 64 bits before the variable sized payloads. The above change grows that to 16 + 16 + 8 bits, the 32-bit union needs 32-bit alignment, so that is already 96 bits, and then the payload which needs 64-bit alignment, so the above change grows the rtl header by 100%, from 64-bits to 128-bits. > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1693,7 +1693,7 @@ struct GTY(()) tree_type_common { > unsigned restrict_flag : 1; > unsigned contains_placeholder_bits : 2; > > - ENUM_BITFIELD(machine_mode) mode : 8; > + ENUM_BITFIELD(machine_mode) mode : 16; > > /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE. > TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ This structure has 15 spare bits, so in theory it could be accomodated to handle more bits for mode, but the above change is insufficient for that. > @@ -1776,7 +1776,7 @@ struct GTY(()) tree_decl_common { > struct tree_decl_minimal common; > tree size; > > - ENUM_BITFIELD(machine_mode) mode : 8; > + ENUM_BITFIELD(machine_mode) mode : 16; > > unsigned nonlocal_flag : 1; > unsigned virtual_flag : 1; I think this one has 13 spare bits, but again one would need to adjust the structure more so that it doesn't grow unnecessarily. I think you should try hard to avoid having too many modes, there are a lot of arrays especially in RA sized by number of modes or even that times number of register classes (I thought we have some number of modes ^ 2 but can't find them right now), and if there is no way to avoid that, we should consider making those changes dependent on maximum number of modes and use current more compact compile time memory data structures unless the target has more than 256 modes. Jakub ^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2023-05-10 7:22 UTC | newest] Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-04-10 14:48 [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit juzhe.zhong 2023-04-10 14:54 ` Jeff Law 2023-04-10 15:02 ` juzhe.zhong 2023-04-10 15:14 ` juzhe.zhong 2023-04-11 9:16 ` Jakub Jelinek 2023-04-11 9:46 ` juzhe.zhong 2023-04-11 10:11 ` Jakub Jelinek 2023-04-11 10:25 ` juzhe.zhong 2023-04-11 10:52 ` Jakub Jelinek 2023-04-11 9:46 ` Richard Sandiford 2023-04-11 9:59 ` Jakub Jelinek 2023-04-11 10:11 ` juzhe.zhong 2023-04-11 10:05 ` Richard Earnshaw 2023-04-11 10:15 ` Richard Sandiford 2023-04-11 10:59 ` Richard Biener 2023-04-11 11:11 ` Richard Sandiford 2023-04-11 11:19 ` juzhe.zhong 2023-04-11 13:50 ` Kito Cheng 2023-04-12 7:53 ` Richard Biener 2023-04-12 9:06 ` Kito Cheng 2023-04-12 9:21 ` Richard Biener 2023-04-12 9:31 ` Kito Cheng 2023-04-12 23:22 ` 钟居哲 2023-04-13 13:06 ` Richard Sandiford 2023-04-13 14:02 ` Richard Biener 2023-04-15 2:58 ` Hans-Peter Nilsson 2023-04-17 6:38 ` Richard Biener 2023-04-20 5:37 ` Hans-Peter Nilsson 2023-05-05 1:43 ` Li, Pan2 2023-05-05 6:25 ` Richard Biener 2023-05-06 1:10 ` Li, Pan2 2023-05-06 1:53 ` Kito Cheng 2023-05-06 1:59 ` juzhe.zhong 2023-05-06 2:12 ` Li, Pan2 2023-05-06 2:18 ` Kito Cheng 2023-05-06 2:20 ` Li, Pan2 2023-05-06 2:48 ` Li, Pan2 2023-05-07 1:55 ` Li, Pan2 2023-05-07 15:23 ` Jeff Law 2023-05-08 1:07 ` Li, Pan2 2023-05-08 6:29 ` Richard Biener 2023-05-08 6:41 ` Li, Pan2 2023-05-08 6:59 ` Li, Pan2 2023-05-08 7:37 ` Richard Biener 2023-05-08 8:05 ` Li, Pan2 2023-05-09 6:13 ` Li, Pan2 2023-05-09 7:04 ` Richard Biener 2023-05-09 10:16 ` Richard Sandiford 2023-05-09 10:26 ` Richard Biener 2023-05-09 11:50 ` Li, Pan2 2023-05-10 5:09 ` Li, Pan2 2023-05-10 7:22 ` Li, Pan2 2023-05-08 1:35 ` Li, Pan2 2023-04-10 15:18 ` Jakub Jelinek 2023-04-10 15:22 ` juzhe.zhong 2023-04-10 20:42 ` Jeff Law 2023-04-10 23:03 ` juzhe.zhong 2023-04-11 1:36 ` juzhe.zhong [not found] ` <20230410232205400970205@rivai.ai> 2023-04-10 15:33 ` juzhe.zhong 2023-04-10 20:39 ` Jeff Law 2023-04-10 20:36 ` Jeff Law 2023-04-10 22:53 ` juzhe.zhong 2023-04-10 15:10 ` Jakub Jelinek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).