From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR06-AM7-obe.outbound.protection.outlook.com (mail-am7eur06olkn2020.outbound.protection.outlook.com [40.92.16.20]) by sourceware.org (Postfix) with ESMTPS id 648113857C44 for ; Fri, 21 May 2021 07:30:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 648113857C44 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=hotmail.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=bernd.edlinger@hotmail.de ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SyLsfz2YrehdjI7VQqEY4RSMP6Jlcd5B1lc3NbO2YrV5DR25FjEnlzZwlU4dXjk7h1OHd5UhZOMJbcATXxOzl+0+1Zlj7uw8/guO/mHQFZURfGOovhY+54xGzUHysLAXQBhF/iRUiBJxpmoIsK9uHbIPOX+RdM8VMroTM5X5M/t6Cw7UbPHSGNfSh2QSJdiwFpo9AZMLkN9gyQ+kkqJnUsIg+V9vDdI8Tge72iIAo0mshZbPzMuEWUpnIJ55qzm4S+x33yrTm0ysRETbCKNz1ieFYKlL7oeVvIHx+wX8SJyqgVPAmzaSLOlWG7j/R/gUzB8pGpUlVglTb17mDgPunA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NPxtRiHVqU/3gwBHo7vxTxG76jB0byRFwjnoaRVV7a0=; b=b1hld+H09pqfEWt7QFkPbkL2mmmmt+M0TMWA+ia3wW/Ajz1OPHpawy7yOX0SLXR1NxosXDiG3EAbR2HURuF/5ekUoL6tlmSMUCV8mqwLDxw4n+ER626jpOqeXjEWd+tXxe++WKRTWB5ezG8f4kN7Bf1mpRzLyvZjib8Cf1pQop5bq/EikpO2qUYBAGyhzT4iMe/Blxcgtcu32fN1JPKQMjF2S1SxQHuAIZPOC8auK82E98EC1PDMJpjvUfzZ0aQDz0LVkb1yiNpJdA+e6CIoNFU0Oqf3XZeWaCLFxJWiNMmWHSsjF39WHVKZCXJBHXKdTgfBZ+V4aIZlIcPELLAKow== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none Received: from DB8EUR06FT058.eop-eur06.prod.protection.outlook.com (2a01:111:e400:fc35::46) by DB8EUR06HT127.eop-eur06.prod.protection.outlook.com (2a01:111:e400:fc35::66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4129.25; Fri, 21 May 2021 07:30:15 +0000 Received: from AM8PR10MB4708.EURPRD10.PROD.OUTLOOK.COM (2a01:111:e400:fc35::53) by DB8EUR06FT058.mail.protection.outlook.com (2a01:111:e400:fc35::431) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4129.25 via Frontend Transport; Fri, 21 May 2021 07:30:15 +0000 X-IncomingTopHeaderMarker: OriginalChecksum:BC01D0BA16223DE08C2E3357B323CE35CE7F534910D97FD7021EB900659520B7; UpperCasedChecksum:ECE5739F91EF142BDD2CCA1E216F53B165D0DADC58A341DD1BEF9E4D43B54832; SizeAsReceived:9222; Count:48 Received: from AM8PR10MB4708.EURPRD10.PROD.OUTLOOK.COM ([fe80::e41b:107f:af82:150a]) by AM8PR10MB4708.EURPRD10.PROD.OUTLOOK.COM ([fe80::e41b:107f:af82:150a%7]) with mapi id 15.20.4129.035; Fri, 21 May 2021 07:30:15 +0000 Subject: Re: [PATCH] constructor: Elide expand_constructor when can move by pieces is true To: Richard Biener , "H.J. Lu" Cc: GCC Patches , Richard Sandiford , Uros Bizjak References: <20210518191646.3071005-1-hjl.tools@gmail.com> <20210518191646.3071005-13-hjl.tools@gmail.com> From: Bernd Edlinger Message-ID: Date: Fri, 21 May 2021 09:30:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TMN: [oL6lgZoDVCxQ+oSncQt3bwfgB4niMnu2] X-ClientProxiedBy: FR0P281CA0005.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:15::10) To AM8PR10MB4708.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:20b:364::23) X-Microsoft-Original-Message-ID: MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [192.168.1.101] (84.57.61.94) by FR0P281CA0005.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:15::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4173.12 via Frontend Transport; Fri, 21 May 2021 07:30:15 +0000 X-MS-PublicTrafficType: Email X-IncomingHeaderCount: 48 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-Correlation-Id: a54d1e59-4e52-49fd-b144-08d91c2a46e8 X-MS-TrafficTypeDiagnostic: DB8EUR06HT127: X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ATbWCFU81Mw6WExYzHIhqbJv/ZI+5FzsjytCaoCvQLdAoGDvMZoCbP4UafkajIZNKq0Vs1GtsX+NH14QBU1k7L/Nqx5dUWPM70hkYERRa8UEidmbGFo5PsI3q25vxVWbwWI9PdkP4jgKtfgsrYBM8D7GKx84sfk3u7fjgYnqJIna6mNb8zhiki1AXADW7SXBi+M7q/r0a7FSH208ND/Ebs/wcIxr6uFwJdJwjhN97Ml+uFbsyp7k0cMgbFyNvyk39EicZPvywWS+7SbXOLAwNP+556VhZEtt3VlM910o6HD3L266iddyO/skhxzeMg3dfEFMPH7b+dmm8KNeAwi6S3xdqz4D2HYTSGh57WRVMjIDBt8yFc5MYAzJ/j/54MhQjtRL7Q6OfLq9lc9wsE0Jb914sMPdUGjdR3ZA2T3BwAkHUiPTCi8U03DmorIBC2AxnnmHqP8L8MN3pSKiV39jB5b6iF/xZnxRenLDzdj2Ihc= X-MS-Exchange-AntiSpam-MessageData: FCzlfkquAMZJX3D49EQsmc8z2gm+COQtP9rGI5KeZOxWKRhmjyn3xwrAH05XuIhyVVzURMLHn38/b8su8VK1hDeVaP6SYcphk0v+qMLq2yBwP9oIopqSCoDwXogcUCm7unB/1EV5BjqtBuW4TExsJA== X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: a54d1e59-4e52-49fd-b144-08d91c2a46e8 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 May 2021 07:30:15.6899 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-AuthSource: DB8EUR06FT058.eop-eur06.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: Internet X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8EUR06HT127 X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, FORGED_MUA_MOZILLA, FREEMAIL_FROM, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, LOTS_OF_MONEY, MONEY_NOHTML, MSGID_FROM_MTA_HEADER, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2021 07:30:20 -0000 On 5/21/21 8:57 AM, Richard Biener wrote: > On Thu, May 20, 2021 at 4:04 PM H.J. Lu wrote: >> >> On Thu, May 20, 2021 at 12:51 AM Richard Biener >> wrote: >>> >>> On Wed, May 19, 2021 at 3:22 PM H.J. Lu wrote: >>>> >>>> On Wed, May 19, 2021 at 2:33 AM Richard Biener >>>> wrote: >>>>> >>>>> On Tue, May 18, 2021 at 9:16 PM H.J. Lu wrote: >>>>>> >>>>>> When expanding a constant constructor, don't call expand_constructor if >>>>>> it is more efficient to load the data from the memory via move by pieces. >>>>>> >>>>>> gcc/ >>>>>> >>>>>> PR middle-end/90773 >>>>>> * expr.c (expand_expr_real_1): Don't call expand_constructor if >>>>>> it is more efficient to load the data from the memory. >>>>>> >>>>>> gcc/testsuite/ >>>>>> >>>>>> PR middle-end/90773 >>>>>> * gcc.target/i386/pr90773-24.c: New test. >>>>>> * gcc.target/i386/pr90773-25.c: Likewise. >>>>>> --- >>>>>> gcc/expr.c | 10 ++++++++++ >>>>>> gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 ++++++++++++++++++++++ >>>>>> gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 ++++++++++++++++++++ >>>>>> 3 files changed, 52 insertions(+) >>>>>> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c >>>>>> create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c >>>>>> >>>>>> diff --git a/gcc/expr.c b/gcc/expr.c >>>>>> index d09ee42e262..80e01ea1cbe 100644 >>>>>> --- a/gcc/expr.c >>>>>> +++ b/gcc/expr.c >>>>>> @@ -10886,6 +10886,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, >>>>>> unsigned HOST_WIDE_INT ix; >>>>>> tree field, value; >>>>>> >>>>>> + /* Check if it is more efficient to load the data from >>>>>> + the memory directly. FIXME: How many stores do we >>>>>> + need here if not moved by pieces? */ >>>>>> + unsigned HOST_WIDE_INT bytes >>>>>> + = tree_to_uhwi (TYPE_SIZE_UNIT (type)); >>>>> >>>>> that's prone to fail - it could be a VLA. >>>> >>>> What do you mean by fail? Is it ICE or missed optimization? >>>> Do you have a testcase? >>>> >>>>> >>>>>> + if ((bytes / UNITS_PER_WORD) > 2 >>>>>> + && MOVE_MAX_PIECES > UNITS_PER_WORD >>>>>> + && can_move_by_pieces (bytes, TYPE_ALIGN (type))) >>>>>> + goto normal_inner_ref; >>>>>> + >>>>> >>>>> It looks like you're concerned about aggregate copies but this also handles >>>>> non-aggregates (which on GIMPLE might already be optimized of course). >>>> >>>> Here I check if we copy more than 2 words and we can move more than >>>> a word in a single instruction. >>>> >>>>> Also you say "if it's cheaper" but I see no cost considerations. How do >>>>> we generally handle immed const vs. load from constant pool costs? >>>> >>>> This trades 2 (update to 8) stores with one load plus one store. Is there >>>> a way to check which one is faster? >>> >>> I'm not sure - it depends on whether the target can do stores from immediates >>> at all or what restrictions apply, what the immediate value actually is >>> (zero or all-ones should be way cheaper than sth arbitrary) and how the >>> pressure on the load unit is. can_move_by_pieces (bytes, TYPE_ALIGN (type)) >>> also does not guarantee it will actually move pieces larger than UNITS_PER_WORD, >>> that might depend on alignment. There's by_pieces_ninsns that might provide >>> some hint here. >>> >>> I'm sure it works well for x86. >>> >>> I wonder if the existing code is in the appropriate place and we >>> shouldn't instead >>> handle this somewhere upthread where we ask to copy 'exp' into some other >>> memory location. For your testcase that's expand_assignment but I can >>> imagine passing array[0] by value to a function resulting in similar copying. >>> Testing that shows we get >>> >>> pushq array+56(%rip) >>> .cfi_def_cfa_offset 24 >>> pushq array+48(%rip) >>> .cfi_def_cfa_offset 32 >>> pushq array+40(%rip) >>> .cfi_def_cfa_offset 40 >>> pushq array+32(%rip) >>> .cfi_def_cfa_offset 48 >>> pushq array+24(%rip) >>> .cfi_def_cfa_offset 56 >>> pushq array+16(%rip) >>> .cfi_def_cfa_offset 64 >>> pushq array+8(%rip) >>> .cfi_def_cfa_offset 72 >>> pushq array(%rip) >>> .cfi_def_cfa_offset 80 >>> call bar >>> >>> for that. We do have the by-pieces infrastructure to generally do this kind of >>> copying but in both of these cases we do not seem to use it. I also wonder >>> if the by-pieces infrastructure can pick up constant initializers automagically >>> (we could native_encode the initializer part and feed the by-pieces >>> infrastructure with an array of bytes). There for example might be easy to >>> immediate-store byte parts and difficult ones where we could decide on a >>> case-by-case basis whether to load+store or immediate-store them. >> >> I opened: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100704 >> >>> For example if I change your testcase to have the array[] initializer >>> all-zero we currently emit >>> >>> pxor %xmm0, %xmm0 >>> movups %xmm0, (%rdi) >>> movups %xmm0, 16(%rdi) >>> movups %xmm0, 32(%rdi) >>> movups %xmm0, 48(%rdi) >>> ret >>> >>> will your patch cause us to emit 4 loads? OTHO if I do >>> >>> const struct S array[] = { >>> { 0, 0, 0, 7241, 124764, 48, 16, 33, 10, 96, 2, 0, 0, 4 } >>> }; >>> >>> we get >>> >>> movq $0, (%rdi) >>> movl $0, 8(%rdi) >>> movl $0, 12(%rdi) >>> movl $7241, 16(%rdi) >>> ... >>> >>> ideally we'd have sth like >>> >>> pxor %xmm0, %xmm0 >>> movups %xmm0, (%rdi) >>> movaps array+16(%rip), %xmm0 >>> movups %xmm0, 16(%rdi) >>> ... >>> >>> thus have the zeros written as immediates and the remaining pieces >>> with load+stores. >>> >>> The by-pieces infrastructure eventually get's to see >>> >>> (mem/u/c:BLK (symbol_ref:DI ("array") [flags 0x2] >> 0x7ffff7ff5b40 array>) [1 array+0 S64 A256]) >>> >>> where the MEM_EXPR should provide a way to access the constant initializer. >>> >>> That said I do agree the current code is a bit premature optimization >>> - but maybe >>> it should be fend off in expand_constructor which has the cheap clear_storage >>> first and which already does check can_move_by_pieces with some heuristics, >>> but that seems to be guarded by >>> >>> || (tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)) >>> && (! can_move_by_pieces >>> (tree_to_uhwi (TYPE_SIZE_UNIT (type)), >>> TYPE_ALIGN (type))) >>> && ! mostly_zeros_p (exp)))) >>> >>> which is odd (we _can_ move by pieces, but how does this apply to >>> TREE_CONSTANT CTORs and avoid_temp_mem?). >>> >>> That said, I wonder if we want to elide expand_constructor when the >>> CTOR is TREE_STATIC && TREE_CONSTANT and !mostly_zeros_p >>> and we can_move_by_pieces. >>> >>> So sth like >>> >>> diff --git a/gcc/expr.c b/gcc/expr.c >>> index 7139545d543..76b3bdf0c01 100644 >>> --- a/gcc/expr.c >>> +++ b/gcc/expr.c >>> @@ -8504,6 +8504,12 @@ expand_constructor (tree exp, rtx target, enum >>> expand_modifier modifier, >>> && (! can_move_by_pieces >>> (tree_to_uhwi (TYPE_SIZE_UNIT (type)), >>> TYPE_ALIGN (type))) >>> + && ! mostly_zeros_p (exp)) >>> + || (TREE_CONSTANT (exp) >>> + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)) >>> + && (can_move_by_pieces >>> + (tree_to_uhwi (TYPE_SIZE_UNIT (type)), >>> + TYPE_ALIGN (type))) >>> && ! mostly_zeros_p (exp)))) >>> || ((modifier == EXPAND_INITIALIZER || modifier == EXPAND_CONST_ADDRESS) >>> && TREE_CONSTANT (exp))) >>> >>> which handles your initializer and the all-zero one optimal? >>> >> >> It works. Here is the updated patch. > > So just looking at the code again I think we probably want to add > && avoid_temp_mem here, at least that's the case we're looking > at. Not sure if we ever arrive with TREE_CONSTANT CTORs > and !avoid_temp_mem but if so we'd create a temporary here > which of course would be pointless. > > So maybe it's then clearer to split the condition out as > > diff --git a/gcc/expr.c b/gcc/expr.c > index 7139545d543..ee8f25f9abd 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -8523,6 +8523,19 @@ expand_constructor (tree exp, rtx target, enum > expand_modifier modifier, > return constructor; > } > > + /* If the CTOR is available in static storage and not mostly > + zeros and we can move it by pieces prefer to do so since > + that's usually more efficient than performing a series of > + stores from immediates. */ > + if (avoid_temp_mem > + && TREE_STATIC (exp) > + && TREE_CONSTANT (exp) > + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)) > + && can_move_by_pieces (tree_to_uhwi (TYPE_SIZE_UNIT (type)), > + TYPE_ALIGN (type)) > + && ! mostly_zeros_p (exp)) > + return NULL_RTX; > + > /* Handle calls that pass values in multiple non-contiguous > locations. The Irix 6 ABI has examples of this. */ > if (target == 0 || ! safe_from_p (target, exp, 1) > > > OK with that change. > Note however (I've been playing with the previous version) that the test case FAIL: gcc.target/i386/pr90773-25.c scan-assembler-times vmovdqu[\\\\t ]%ymm[0-9]+, \\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-25.c scan-assembler-times vmovdqu[\\\\t ]%ymm[0-9]+, 32\\\\(%[^,]+\\\\) 1 fails for --target_board=unix $ grep movdqu pr90773-25.s vmovdqu %xmm0, (%rdi) vmovdqu %xmm1, 16(%rdi) vmovdqu %xmm2, 32(%rdi) vmovdqu %xmm3, 48(%rdi) while the test expects %ymm /* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 32\\(%\[\^,\]+\\)" 1 } } */ and FAIL: gcc.target/i386/pr90773-24.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, \\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-24.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, 16\\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-24.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, 32\\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-24.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, 48\\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-25.c scan-assembler-times vmovdqu[\\\\t ]%ymm[0-9]+, \\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-25.c scan-assembler-times vmovdqu[\\\\t ]%ymm[0-9]+, 32\\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-26.c scan-assembler-times pxor[\\\\t ]%xmm[0-9]+, %xmm[0-9]+ 1 FAIL: gcc.target/i386/pr90773-26.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, \\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-26.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, 16\\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-26.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, 32\\\\(%[^,]+\\\\) 1 FAIL: gcc.target/i386/pr90773-26.c scan-assembler-times movups[\\\\t ]%xmm[0-9]+, 48\\\\(%[^,]+\\\\) 1 fails for --target_board=unix/-m32 Bernd. > Thanks, > Richard. > >> Thanks. >> >> -- >> H.J.