That’s very cool, thanks Thomas for help! Let’s wait the AMD test running result for the final version of the patch. Pan From: juzhe.zhong@rivai.ai Sent: Friday, June 30, 2023 9:27 AM To: Thomas Schwinge ; Li, Pan2 ; gcc-patches ; rguenther ; jakub Cc: Robin Dapp ; jeffreyalaw ; Wang, Yanzhang ; kito.cheng ; Tobias Burnus Subject: Re: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode Thanks a lot! Really appreciate your help ! That's really helpful for RVV (RISC-V vector). Could you merge your patch after you tested? Thanks. ________________________________ juzhe.zhong@rivai.ai From: Thomas Schwinge Date: 2023-06-30 04:14 To: Pan Li; gcc-patches@gcc.gnu.org; Richard Biener; Jakub Jelinek CC: juzhe.zhong@rivai.ai; rdapp.gcc@gmail.com; jeffreyalaw@gmail.com; yanzhang.wang@intel.com; kito.cheng@gmail.com; Tobias Burnus Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine mode Hi! On 2023-06-29T11:29:57+0200, I wrote: > On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches > wrote: >> We extend the machine mode from 8 to 16 bits already. But there still >> one placing missing from the streamer. It has one hard coded array >> for the machine code like size 256. >> >> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the >> value of the MAX_MACHINE_MODE will grow as more and more modes are >> added. While the machine mode array in tree-streamer still leave 256 as is. >> >> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of >> lto_output_init_mode_table will touch the memory out of range unexpected. > > Uh. :-O > >> This patch would like to take the MAX_MACHINE_MODE as the size of the >> array in streamer, to make sure there is no potential unexpected >> memory access in future. Meanwhile, this patch also adjust some place >> which has MAX_MACHINE_MODE <= 256 assumption. > > Thanks to Jakub and Richard for guidance re the offloading compilation > case, where we've got different 'MAX_MACHINE_MODE's between stream-out > and stream-in, and a modes mapping table. > > However, with this patch, there are ICEs all over the place... I'm > having a look. Your patch has all the right ideas, there are just a few additional changes necessary. Please merge in the attached "f into Streamer: Fix out of range memory access of machine mode", with 'Co-authored-by: Thomas Schwinge >'. This has already survived compiler-side 'lto.exp' testing and 'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now running (not expecting any bad surprises). Will let you know by (my) tomorrow morning in case there are any more problems. Explanation: >> --- a/gcc/lto-streamer-in.cc >> +++ b/gcc/lto-streamer-in.cc >> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data *file_data) >> internal_error ("cannot read LTO mode table from %s", >> file_data->file_name); >> >> - unsigned char *table = ggc_cleared_vec_alloc (1 << 8); >> - file_data->mode_table = table; >> const struct lto_simple_header_with_strings *header >> = (const struct lto_simple_header_with_strings *) data; >> int string_offset; >> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data *file_data) >> header->string_size, vNULL); >> bitpack_d bp = streamer_read_bitpack (&ib); >> >> + unsigned mode_bits = bp_unpack_value (&bp, 5); >> + unsigned char *table = ggc_cleared_vec_alloc (1 << mode_bits); >> + >> + file_data->mode_table = table; >> + file_data->mode_bits = mode_bits; Here, we set 'file_data->mode_bits' for the offloading case (where 'lto_input_mode_table' is called) -- but it's not set for the non-offloading case (where 'lto_input_mode_table' isn't called). (See my 'gcc/lto/lto-common.cc:lto_read_decls' change.) That's "not currently a problem", as 'file_data->mode_bits' isn't used anywhere... >> --- a/gcc/lto-streamer.h >> +++ b/gcc/lto-streamer.h >> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data >> int order_base; >> >> int unit_base; >> + >> + unsigned mode_bits; >> }; >> inline machine_mode >> bp_unpack_machine_mode (struct bitpack_d *bp) >> { >> - return (machine_mode) >> - ((class lto_input_block *) >> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)]; >> + int last = 1 << ceil_log2 (MAX_MACHINE_MODE); >> + lto_input_block *input_block = (class lto_input_block *) bp->stream; >> + int index = bp_unpack_enum (bp, machine_mode, last); >> + >> + return (machine_mode) input_block->mode_table[index]; >> } ..., but 'file_data->mode_bits' needs to be considered here, in the stream-in for offloading, where 'file_data->mode_bits' -- that is, the host 'MAX_MACHINE_MODE' -- very likely is different from the offload device 'MAX_MACHINE_MODE'. Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture 'lto_file_decl_data *file_data' instead of just 'unsigned char *mode_table', and adjust all users. That's it. :-) >> --- a/gcc/tree-streamer.h >> +++ b/gcc/tree-streamer.h >> @@ -108,15 +108,19 @@ inline void >> bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode) >> { >> streamer_mode_table[mode] = 1; >> - bp_pack_enum (bp, machine_mode, 1 << 8, mode); >> + int last = 1 << ceil_log2 (MAX_MACHINE_MODE); >> + >> + bp_pack_enum (bp, machine_mode, last, mode); >> } That use of 'MAX_MACHINE_MODE' is safe, as that only concerns the stream-out phase. >> --- a/gcc/tree-streamer.cc >> +++ b/gcc/tree-streamer.cc >> @@ -35,7 +35,7 @@ along with GCC; see the file COPYING3. If not see >> During streaming in, we translate the on the disk mode using this >> table. For normal LTO it is set to identity, for ACCEL_COMPILER >> depending on the mode_table content. */ >> -unsigned char streamer_mode_table[1 << 8]; >> +unsigned char streamer_mode_table[MAX_MACHINE_MODE]; Likewise. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955