From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id E5BC23858C50 for ; Tue, 9 May 2023 07:04:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E5BC23858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4CE7521A36; Tue, 9 May 2023 07:04:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1683615848; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6Q7RisN1D13lYSZOsF2za7oP9r/BO5J8p2VGSpTGmUE=; b=eRYldffCAGOOzYisKgeWg/24zF3NpIg1CUbmtY7YbwhnRUtuEeFSRGIkLZ28yywof55Jkc Rm0QYFfLPTTErXjqm9FfFbwo6SkBrokxk1ow7zL9l5mqzu8RGT4tzDvNolwaV2LezfIEsO GX1qOHJxWWCosrP0k1/nRo5NK0QmHec= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1683615848; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6Q7RisN1D13lYSZOsF2za7oP9r/BO5J8p2VGSpTGmUE=; b=kMyIKKjMf47ANFmIdz5LdfxUHANhUQRShEsI8d8JEivMuqOimKavhcXRsH2AywK2jPxk9I j8j2U3Yc5MLH5JDw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id D53632C141; Tue, 9 May 2023 07:04:07 +0000 (UTC) Date: Tue, 9 May 2023 07:04:07 +0000 (UTC) From: Richard Biener To: "Li, Pan2" cc: Jeff Law , Kito Cheng , "juzhe.zhong@rivai.ai" , "richard.sandiford" , gcc-patches , palmer , jakub Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit In-Reply-To: Message-ID: References: <20230410144808.324346-1-juzhe.zhong@rivai.ai> <436847c8-0c15-24de-5925-f56d78caf540@gmail.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_ASCII_DIVIDERS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 9 May 2023, Li, Pan2 wrote: > Update the memory allocated bytes for both the all 12-bits patch and > code 8-bits + mode 16-bits. Just to throw in a comment here - for IL tree/GIMPLE is the more important part since the whole program will be in tree/GIMPLE while we only have a single function in RTL at a time. Some host archs will have difficulties loading unaligned words so it is important to keep often accessed larger bitfields aligned to allow efficient access (aligned load + mask, no shifts). That means ideally machine_mode will be 16 bits and code 8 or 16 bits. I think shrinking RTX code is a good idea, we'll unlikely run out of bits there. Shrinking RTX code means you have to re-order code and mode (see above about alignment), that will complicate the var-tracking "fixup". We are going to run out of bits in tree_type_common, we've been handing them out without much care recently :/ Richard. > Bytes allocated with O2: > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch > --------------------------------------------------------------------------------------------------------------------------------------------------------- > 400.perlbench | 25286185160 | 25286590847 ~0.0% | 25286927562 ~0.0% > 401.bzip2 | 1429883731 | 1430373103 ~0.0% | 1430401245 ~0.0% > 403.gcc | 55023568981 | 55027574220 ~0.0% | 55028727683 ~0.0% > 429.mcf | 1360975660 | 1360959361 ~0.0% | 1360960745 ~0.0% > 445.gobmk | 12791636502 | 12789648370 ~0.0% | 12789919097 ~0.0% > 456.hmmer | 9354433652 | 9353899089 ~0.0% | 9353990523 ~0.0% > 458.sjeng | 1991260562 | 1991107773 ~0.0% | 1991153851 ~0.0% > 462.libquantum | 1725112078 | 1724972077 ~0.0% | 1724983726 ~0.0% > 464.h264ref | 8597673515 | 8597748172 ~0.0% | 8597931771 ~0.0% > 471.omnetpp | 37613034778 | 37614346380 ~0.0% | 37614470890 ~0.0% > 473.astar | 3817295518 | 3817226365 ~0.0% | 3817239631 ~0.0% > 483.xalancbmk | 149418776991 | 149405214817 ~0.0% | 149405744428 ~0.0% > > Bytes allocated with Ofast + funroll-loops: > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch > --------------------------------------------------------------------------------------------------------------------------------------------------------- > 400.perlbench | 30438407499 | 30568217795 +0.4% | 30568869401 +0.4% > 401.bzip2 | 2277114519 | 2318588280 +1.8% | 2318659896 +1.8% > 403.gcc | 64499664264 | 64764400606 +0.4% | 64766107560 +0.4% > 429.mcf | 1361486758 | 1399872438 +2.8% | 1399876436 +2.8% > 445.gobmk | 15258056111 | 15392769408 +0.9% | 15393305108 +0.9% > 456.hmmer | 10896615649 | 10934649010 +0.3% | 10934858994 +0.4% > 458.sjeng | 2592620709 | 2641551464 +1.9% | 2641641389 +1.9% > 462.libquantum | 1814487525 | 1856446214 +2.3% | 1856475555 +2.3% > 464.h264ref | 13528736878 | 13606989269 +0.6% | 13607467432 +0.6% > 471.omnetpp | 38721066702 | 38908678658 +0.5% | 38908940169 +0.5% > 473.astar | 3924015756 | 3967867190 +1.1% | 3967897551 +1.1% > 483.xalancbmk | 165897692838 | 166818255397 +0.6% | 166819397831 +0.6% > > Pan > > > -----Original Message----- > From: Li, Pan2 > Sent: Monday, May 8, 2023 4:06 PM > To: Richard Biener > Cc: Jeff Law ; Kito Cheng ; juzhe.zhong@rivai.ai; richard.sandiford ; gcc-patches ; palmer ; jakub > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > After the bits patch like below. > > rtx_def code 16 => 8 bits. > rtx_def mode 8 => 16 bits. > tree_base code unchanged. > > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion. > > tree_base rtx_def > code: 16 code: 8 > side_effects_flag: 1 mode: 16 > constant_flag: 1 > addressable_flag: 1 > volatile_flag: 1 > readonly_flag: 1 > asm_written_flag: 1 > nowarning_flag: 1 > visited: 1 > used_flag: 1 > nothrow_flag: 1 > static_flag: 1 > public_flag: 1 > private_flag: 1 > protected_flag: 1 > deprecated_flag: 1 > default_def_flag: 1 > > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email. > > rtx_def code 16 => 12 bits. > rtx_def mode 8 => 12 bits. > tree_base code 16 => 12 bits. > > Pan > > -----Original Message----- > From: Richard Biener > Sent: Monday, May 8, 2023 3:38 PM > To: Li, Pan2 > Cc: Jeff Law ; Kito Cheng ; juzhe.zhong@rivai.ai; richard.sandiford ; gcc-patches ; palmer ; jakub > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit > > On Mon, 8 May 2023, Li, Pan2 wrote: > > > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to > > fix this ICE after mode bits change. > > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap. > > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations). > > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want. > > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap. > > Richard. > > > I will re-trigger the memory allocate bytes test with below changes > > for X86. > > > > rtx_def code 16 => 8 bits. > > rtx_def mode 8 => 16 bits. > > tree_base code unchanged. > > > > Pan > > > > -----Original Message----- > > From: Li, Pan2 > > Sent: Monday, May 8, 2023 2:42 PM > > To: Richard Biener ; Jeff Law > > > > Cc: Kito Cheng ; juzhe.zhong@rivai.ai; > > richard.sandiford ; gcc-patches > > ; palmer ; jakub > > > > Subject: RE: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted. > > > > Pan > > > > -----Original Message----- > > From: Richard Biener > > Sent: Monday, May 8, 2023 2:30 PM > > To: Jeff Law > > Cc: Li, Pan2 ; Kito Cheng ; > > juzhe.zhong@rivai.ai; richard.sandiford ; > > gcc-patches ; palmer ; > > jakub > > Subject: Re: [PATCH] machine_mode type size: Extend enum size from > > 8-bit to 16-bit > > > > On Sun, 7 May 2023, Jeff Law wrote: > > > > > > > > > > > On 5/6/23 19:55, Li, Pan2 wrote: > > > > It looks like we cannot simply swap the code and mode in rtx_def, > > > > the code may have to be the same bits as the tree_code in tree_base. > > > > Or we will meet ICE like below. > > > > > > > > rtx_def code 16 => 8 bits. > > > > rtx_def mode 8 => 16 bits. > > > > > > > > static inline decl_or_value > > > > dv_from_value (rtx value) > > > > { > > > > decl_or_value dv; > > > > dv = value; > > > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE > > > > return dv; > > > Ugh. We really just need to fix this code. It assumes particular > > > structure layouts and that's just wrong/dumb. > > > > Well, it's a neat trick ... we just need to adjust it to > > > > static inline bool > > dv_is_decl_p (decl_or_value dv) > > { > > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } > > > > I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ... > > > > Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot. > > > > Richard. > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)