From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F105C3857BB2 for ; Fri, 10 Nov 2023 15:13:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F105C3857BB2 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=foss.arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F105C3857BB2 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699629185; cv=none; b=NJS3FpbS84cGMNAs0vBkmE7iLBhNNQliLznmqiOvr+9KcVEJ03bRfixAHJQiZaiLLvu/rp6BshjjcgEDdMI6Bee5c9RsZPTHzsw/DMiOwkmK15ckfQOD8ciyD5vBjoR1G9N/yZWQm+7Mt72je0kSPZdkekLEsAsO7VY7GQwfZrI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699629185; c=relaxed/simple; bh=G2AP0iZVHGnFLSFhngFHFHW74oOaUQP7T5D+3ZpCV64=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=wwL+fnFR7h7NG47KBWxURfVi9HJJaKcTcsPjalf70dbY3lsZ0j2zhiPziaZ5OsyQN75iCKJ3yMuzp1rixN/fygTLFOl3mu1y/lknzwq6QISM3nboAVeAR4EnAzdfLsJAO6Wf5aSV+3RalHf/YAnQ2L/VzP1PQTtvR9OzuMB8ZWQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5567112FC; Fri, 10 Nov 2023 07:13:48 -0800 (PST) Received: from [10.57.1.86] (unknown [10.57.1.86]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C079F3F6C4; Fri, 10 Nov 2023 07:13:02 -0800 (PST) Message-ID: <61c6e268-188c-4b35-956d-bd8927d705f2@foss.arm.com> Date: Fri, 10 Nov 2023 15:13:01 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] AArch64: Cleanup memset expansion Content-Language: en-GB To: Kyrylo Tkachov , Wilco Dijkstra , GCC Patches Cc: Richard Sandiford , Richard Earnshaw References: <372b9689-24b5-41f4-a990-5aee0226e15f@foss.arm.com> From: Richard Earnshaw In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3489.2 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 10/11/2023 14:46, Kyrylo Tkachov wrote: > > >> -----Original Message----- >> From: Richard Earnshaw >> Sent: Friday, November 10, 2023 11:31 AM >> To: Wilco Dijkstra ; Kyrylo Tkachov >> ; GCC Patches >> Cc: Richard Sandiford ; Richard Earnshaw >> >> Subject: Re: [PATCH] AArch64: Cleanup memset expansion >> >> >> >> On 10/11/2023 10:17, Wilco Dijkstra wrote: >>> Hi Kyrill, >>> >>>> +  /* Reduce the maximum size with -Os.  */ >>>> +  if (optimize_function_for_size_p (cfun)) >>>> +    max_set_size = 96; >>>> + >>> >>>> .... This is a new "magic" number in this code. It looks sensible, but how >> did you arrive at it? >>> >>> We need 1 instruction to create the value to store (DUP or MOVI) and 1 STP >>> for every 32 bytes, so the 96 means 4 instructions for typical sizes >>> (sizes not >>> a multiple of 16 can add one extra instruction). > > It would be useful to have that reasoning in the comment. > >>> >>> I checked codesize on SPECINT2017, and 96 had practically identical size. >>> Using 128 would also be a reasonable Os value with a very slight size >>> increase, >>> and 384 looks good for O2 - however I didn't want to tune these values >>> as this >>> is a cleanup patch. >>> >>> Cheers, >>> Wilco >> >> Shouldn't this be a param then? Also, manifest constants in the middle >> of code are a potential nightmare, please move it to a #define (even if >> that's then used as the default value for the param). > > I agree on making this a #define but I wouldn't insist on a param. > Code size IMO has a much more consistent right or wrong answer as it's statically determinable. > It this was a speed-related param then I'd expect the flexibility for the power user to override such heuristics would be more widely useful. > But for code size the compiler should always be able to get it right. > > If Richard would still like the param then I'm fine with having the param, but I'd be okay with the comment above and making this a #define. I don't immediately have a feel for how sensitive code would be to the precise value here. Is this value something that might affect individual benchmarks in different ways? Or something where a future architecture might want a different value? For either of those reasons a param might be useful, but if this is primarily a code size trade off and the variation in performance is small, then it's probably not worthwhile having an additional hook. R.