From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2044.outbound.protection.outlook.com [40.107.22.44]) by sourceware.org (Postfix) with ESMTPS id 38D7F3858D39 for ; Wed, 8 Feb 2023 15:32:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38D7F3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=westcontrol.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=westcontrol.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XwarekCThb7hHhNJlRTaX/ub7jj/wCsVGTagLCp8/jieWVG10J2t7qzOPX/z2ALDv3UVWqiODgPmlmYLuTlw98RH3xONIACfC+Y3XXeP0H3/0gdETXmqylzz8qjB89kWnpX/2/ovFON7ylLIaMZ8/xLANCaFR7RYew+dQAt+e6yGA+5Dd4KJQ5N4cmcgHNjyaRlOFGXQTJbCDPazyVkGqS1YNsq3xfMYhOY25MgXwAYSUMq15ZMYeY8mQI+nh9Zr2uF21l0zr1fN7b2330uGl3ylgqThbGS0aC7sPL3HEkCGQiVAVRF/G/L93X4uZz+iEDfbXHZPHScpyXhp92dqoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BxiEDSv/zepSO7s/mhYreua5P3BFlvLhjej/Pq9ndOA=; b=bK3XrKTdOdNua5PNBQUI5AngWyZqfnmfPIpPVBPuU5zMU2fAyqXCliOeJESsPJ7Kl2/FFLfvVXS4oLuNiEpGy3Hihv0g7nZADZE4B5hKtwWp7VDeDB7FVYmhXIP/TrP6zcbs8sikRbVYhw90eGfp4GRdIsoDBIy26USeXysswAHYkygF7T+TUE/RHhbGUH9mXLlvWkV275bwn8YpF9YHWZCKHL/RSYo6rNDSCb4qUZLeBO8lP44mghPCUv27d8hgRYnCRBVMpG6wwq8yfpjn3dhaEjH1tak2W2CIiSwS+qWtd90ZCTICCRvJtzLLr74kvLpFJvTTsJbkPrVyDuLMKA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=westcontrol.com; dmarc=pass action=none header.from=westcontrol.com; dkim=pass header.d=westcontrol.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=westcontrol.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BxiEDSv/zepSO7s/mhYreua5P3BFlvLhjej/Pq9ndOA=; b=X58Ug+cTLIpuB/OpFq66jKclFcH58VgmfMECrrYpJJqK7nL56wTXHOOVXdZCqAEOtgoD3nrTzWLU5lRtbR3DTeBZArCtvhDRGXmUz961nYCQdKx4wnFdDhVK2pHsZ1y6o1AwFW+Y/V5rgSLlKn41nbT9kPgLXjS5ix9Sls2y7/BFGvT/wNeywAh+pU/dyrHChLMmZexi4nqRLC3mEWWFYepw9e0vt8LsHbBKYFq/mJws1ROogzRrZPnY6uhjE3EUiLXraQYB1zWD5ls4/VPgqfYc3s2khLtAVSyc/0nKbUZrsFmCjBATusl5MK3JaslJMiLQwfVA5xP+8XO/F/s7Gw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=westcontrol.com; Received: from SV0P279MB0233.NORP279.PROD.OUTLOOK.COM (2603:10a6:f10:b::13) by OS6P279MB0591.NORP279.PROD.OUTLOOK.COM (2603:10a6:e10:2d::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17; Wed, 8 Feb 2023 15:32:33 +0000 Received: from SV0P279MB0233.NORP279.PROD.OUTLOOK.COM ([fe80::59ad:1017:b22:d274]) by SV0P279MB0233.NORP279.PROD.OUTLOOK.COM ([fe80::59ad:1017:b22:d274%5]) with mapi id 15.20.6086.017; Wed, 8 Feb 2023 15:32:33 +0000 Message-ID: <69488f61-fac0-0c1d-1034-d8d84779f8e2@westcontrol.com> Date: Wed, 8 Feb 2023 16:32:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: Why does this unrolled function write to the stack? Content-Language: en-GB To: Jonathan Wakely , Gaelan Steele Cc: "gcc-help@gcc.gnu.org" References: From: David Brown In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: SV0P279CA0034.NORP279.PROD.OUTLOOK.COM (2603:10a6:f10:12::21) To SV0P279MB0233.NORP279.PROD.OUTLOOK.COM (2603:10a6:f10:b::13) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SV0P279MB0233:EE_|OS6P279MB0591:EE_ X-MS-Office365-Filtering-Correlation-Id: 3488c4a9-7e0f-4249-776f-08db09e9b28c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qORkzZLKz2lOBLbK1zbMAIBnXDjklx3Z5emAZBwxIOU2ZgK/L5iNiKpEd30ZmjShVt7yve/BfoGSk/ynIn4q4i8cTwVUtTw+9y/p2EkMUucehyFGtlQOhQGDZbWwXBFXqnSIfufckfGL1stz3tTskBnvoAML5+PKzwXRtn3NW6fsfNS9jHODiYjiBImk54Pj8B8ZRD9pkd+doZSouzeGKgwOOL4Dj5luMDEZHMKkzFt/QICzKzYJllnmr/vmSsGp338TBYalCJPs5/wnbI6+bIb/RILJ7Ew5Q3/3L7JeaM4miiLurJKIazytGFFKTWfwUXbDrLgSAyKzgVpjddN+UGHtzxxQEqG0NGXZigxiBd91ln8nun0U/GKiLUOk89S1ASrXFxae54kRpEKEndmvma/L0jNyx/zbdSQPGbAauujeKc2MMBnhyGaVuZWDpsCOqSxYjcRIdvedRGvpJEx3ndHr4vHg811gOtIlqD/MJ+6WInrdZLNpClS1nJ/7zzPNdUIw6LTgEnKIt5gVTPW+Oj7p/H2Evb89uFU/bIJoIkaiMZFDAKyFkm3indbzN2h/ccVRX/6EEV0JWQVSqjXQpHWLRJHlz2Z7eCd7VaY4ul3ZSLz9chIH78hSy/j/UQVO0WFVtEQ0LQtkj70mdWbbE32h2B7iOd2xYPwFzSuP/q77y43XBJSg10S6VpdyeX9iGHJQ27PgICP1byAcTQyOq3MmSgv+xIRjrt9u17CkI2SEsOvn4d9emLPeDw3iOq4leIBFGzi63IuyNIHdXQWUKw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SV0P279MB0233.NORP279.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230025)(4636009)(136003)(39840400004)(366004)(396003)(376002)(346002)(451199018)(26005)(6512007)(186003)(38100700002)(31686004)(2906002)(316002)(53546011)(8936002)(110136005)(5660300002)(6506007)(31696002)(83380400001)(2616005)(86362001)(66476007)(36756003)(478600001)(966005)(4326008)(6486002)(66946007)(66556008)(41300700001)(8676002)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?R1NZTFBHMGpEMTJmRmZSaStMTlZvSGhnYk1nMDF3R0Y1UHNIejM5NnJISDdS?= =?utf-8?B?Y0ZFWlgyQm1XMkZsVFJqa0FJMEJENno4Ky9lUlE4N0FPbVZEa2FwV0lGSzBu?= =?utf-8?B?WUVtV2VNWDF2WmZ2T2tRZHlDWDluSGpwS0dQRTd3RXVTR2Ztem5QNXlXUkgv?= =?utf-8?B?VmVqS1FhNi8rL0pUN2xYbEg0NEJBZGF3WXMvWkhLR1FzVm1jL3N5OTdraFg1?= =?utf-8?B?cExvemVVTXYwcmhiSzg5VjRJNFRtMjRGWEM5TmVuYXFPTjBCRTNxdGxnbUZN?= =?utf-8?B?VnBZZlVlTFhXRHBaZFhTWm9oU1R5MVF6M3FxS3AzMkVFWTdnU0FkMGQ2ZWZE?= =?utf-8?B?NWxkMXMreCtaSlBTWDIwVWFnYmN5SG9DVE5OTzk2aHFpaHJhMHV1VytkVUJW?= =?utf-8?B?MTJJNzE1NFhOeWNZL2dBbzhHR0dDdXhkOFpSQVc2WmRaaE1RMXUvUVBqZHRm?= =?utf-8?B?UWx6UVNhWHZNZ1NUV09pajlSWHJjSUJGekYrekFWb1BEZmJBZ25xYVlUcnp3?= =?utf-8?B?WkhUbEdQT0RNb054WGFHb2ZDWG92d1U2WHNweWFMbWs1dkFjT3lZZ2t3K3lP?= =?utf-8?B?WW90SG5kVnE5cEZHUjRob2RrZ0ZZelJvbnZwMEdkQjNYcTZCbld4eHE0YjBK?= =?utf-8?B?dU0wek0wRE9EY3dzcVBQNzZIZllEaUlHTjhiMnp0OTBpcGVLV25nSnFFWUlO?= =?utf-8?B?dllmeFJzOXlId1E5cFdtcmFSa2NpMytJZVJnVFhGL2YyWVVFY3llL1c2MXlU?= =?utf-8?B?bjY0MjdJSU5uWkhmVHdBb00va3hlZjNvZjJ3TWxmK0tQMVY2c1haaStXUEd1?= =?utf-8?B?WVJ1bXRUWHNyNW9sa1Y4TENTM1ZzSStuZHRMQU02ZHBFeXVHVWhISVdBQm0r?= =?utf-8?B?aEZiNWxieWU1amwwQU9ybHZJRTdvRFpIaHJLM2dPRDVGSENieWpaUWc5SC8r?= =?utf-8?B?dzhiUGJ5NUJlV1d6TmQ1NVJZdnRrOWpUWk1HVWZhREladVExUWh4MVpoNzJQ?= =?utf-8?B?Y0lpR3ZrZlVZUXhiU1Jud24rcGNJV0RqYTA1RVExVUlFT2wxZFZYbnBtY2tR?= =?utf-8?B?QUExZVkrVDM3S0pzdHhjUEY1K2ZabGJzb1BSNzJVN1gyWVV0bVBibmJjMVVI?= =?utf-8?B?Q1JPWkpmWlN2VDRsS2FxeURsbFdIWEVmQzBjUFRNUXBzSXNFU29vUHhkNDI1?= =?utf-8?B?ZGsrYks0bm1CSkpTM21pODRoZWlGNnIvdngzdXhwa3psUmRsZkFuYkFodFVF?= =?utf-8?B?T2RvWmJxTlFGTmhMWU1WclMzR05lREpUcEkzVVNwc0dBYXcvaDRrVkE1NjBx?= =?utf-8?B?enhiNG1Ub3hUbWJMZHdBUTcvQkRCdTFxelo4MVNhVkpPbVZxOUhHQ0sySVZr?= =?utf-8?B?eTNiTDE5R0hVbkFTUjVneHBmbnp5bkRPRzExWkRPa1d1eFR2TEQxUkFVVkJE?= =?utf-8?B?M2JxZlhwVmtrRTBwSlpzbDFBMEJQb3NuR3Y0UUlDVjg2alJNV0pYOXhlT1lE?= =?utf-8?B?TjNnT0pKeVppSVdQSWg3VUo3Tlp2blJ3enVTdkpXbENwQk5aRjgxMGxoWkJv?= =?utf-8?B?NlU4dFBGcHc5NGNGWlNUam93d3I0RjIvV3lwOCt3SXZUWkFKYUxqM2hMQWlZ?= =?utf-8?B?c1A1Wm1sRFE3VzJKMy93V1grUm5FeHdKcllyU01qd3ZiRDVBNjdIL0FtcEw2?= =?utf-8?B?SmNKU2RNL240WXJiYkFLUjZickxxQjBTZTI4WEs5NUtLaWp1TEhzTEN4bHhW?= =?utf-8?B?a0ljK0pSdHg5aXJvVCtNSDU4djBFaDhIeEFSSi9pRWEwVW02TXNRM1BJRWU3?= =?utf-8?B?bkxVZWdTcCtzZmt1ZmlDMDVKM2ZpM0VyOUx4Z0ZzTUJZWFB4eHJDbjFuU2hs?= =?utf-8?B?cDlDUy81NzJkeVZVRGVYTnRtQnFMTWdFZktialB0Z3Q2VHZLRVl6M2pOb2Z3?= =?utf-8?B?Nm5kdEMzUGMzNWp3WkJ4ZzdZTDYxbFZ2YnVrV0w2NWYvbXBmd3BIUDI0NXhs?= =?utf-8?B?a28vSFZnS0M2RnhiNFpJQWRVN05uNWhXMUJETm40TFM3ak9FMDk3eUhOaE9M?= =?utf-8?B?UVVqYUNhZmtVOEQ0VmlXdW5wY05RLzEvTzVYanJxZG4zbjVrbmEwODlRNEVX?= =?utf-8?B?UlNNZk1MV0laK2hmMUdZUkcwenZxaW5LdnVmWWJMK280VjMrTTF3VVVaMFdv?= =?utf-8?B?VHc9PQ==?= X-OriginatorOrg: westcontrol.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3488c4a9-7e0f-4249-776f-08db09e9b28c X-MS-Exchange-CrossTenant-AuthSource: SV0P279MB0233.NORP279.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2023 15:32:33.5011 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: c75fbd3c-42ad-4db0-9cff-972faf83ae45 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: gUOjzR+FX9UrgP/ihz4LjItZflBP2+TIMw180NW7bSHCJEJmmiBfzTvzmFI9MSFzUKrSv2Nj4QiBRqf1Xkp+aw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: OS6P279MB0591 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 08/02/2023 14:53, Jonathan Wakely via Gcc-help wrote: > On Wed, 8 Feb 2023 at 13:49, Jonathan Wakely wrote: >> >> On Wed, 8 Feb 2023 at 13:31, Gaelan Steele via Gcc-help >> wrote: >>> >>> Hi all, >>> >>> In a computer architecture class, we happened across a strange compilation choice by GCC that neither I nor my professor can make much sense of. The source is as follows: >>> >>> void foo(int *a, const int *__restrict b, const int *__restrict c) >>> { >>> for (int i = 0; i < 16; i++) { >>> a[i] = b[i] + c[i]; >>> } >>> } >>> >>> I won't reproduce the full compiled output here, as it's rather long, but when compiled with -O3 -mno-avx -mno-sse, GCC 12.2 for x86-64 (via Compiler Explorer: https://godbolt.org/z/o9e4o7cj4) produces an unrolled loop that appears to write each sum into an array on the stack before copying it into the provided pointer a. This seems hugely inefficient - it's doing quite a few memory accesses - and I can't see why it would be necessary. >> >> I don't think it's *necessary*. If you use -Os or -O1 or -O2 you get a >> loop. So it's just an optimization choice at -O3 presumably based on >> cost estimates that say that fully unrolling the loop will make the >> code faster than looping. >> There's nothing wrong with the loop unrolling. It's the use of space on the stack that's the problem. > So it does look like GCC is making poor choices here. > It seems to be a regression between gcc 10 and gcc 11 (discovered by changing the compiler on godbolt.org). With gcc 11 onwards, the compiler seems to be using the stack to combine two 4-byte elements at a time into a single 8-byte element. It's easy to see the effect by changing the loop size to 2. (I've no idea what causes the effect, or how to fix it - but knowing it is a regression might make it easier for you to find.)