From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 0ACDD385841E for ; Mon, 27 Nov 2023 14:25:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0ACDD385841E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=ispras.ru Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0ACDD385841E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=83.149.199.84 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701095152; cv=none; b=OSgFvZOIvGO1sboiDUrPYP8p+5/jQbffD8H8mRm5Kxq5wxRtPkLNWBlRAlG0391rTG8pCjYoUcBWMHGumDLClO/qwJ+Fh8Y/Rja9RNjUEidrCksEF3ReKPx44MpdNdSNXJNK+lWrZ/U+Vr6XBSB+BTq0uSVgd6+9go3BWird/U8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701095152; c=relaxed/simple; bh=ZB+sLEwd2Hv8wbDID56J6xj/VOijeIiqWIy789BOSSg=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=dPnNsyxHfcU4OCceBXLejM9kEfUdNGEg4sZ+un0Bzt5QTUwQT+eFy+pcb4oRpKw6GHO6NNHlKIxrogMUl5h9pSdc1bS2AcaWo5UEM3HBQnZQpwf52/DRvdTP1mflLYIB0BAOTWChVRnKypjOfoRJn7WPdA12ABIms8m0eJxHjMc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id F210E40F1DC4; Mon, 27 Nov 2023 14:25:48 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru F210E40F1DC4 Date: Mon, 27 Nov 2023 17:25:48 +0300 (MSK) From: Alexander Monakov To: Ralf Jung cc: Paul Eggert , Adhemerval Zanella Netto , libc-alpha Subject: Re: Support for memcpy with equal source and destination In-Reply-To: Message-ID: <9f8ea9e5-f158-3a8a-8585-1f3860700905@ispras.ru> References: <1e8beece-f865-4309-a28f-6782135e2a8a@linaro.org> <9e6eb1ab-9a9d-4b69-ae49-4805ee7cdce8@cs.ucla.edu> <69271612-79a5-43c6-9fc7-fb2461c5d39f@ralfj.de> <089bc099-39ab-1a30-eea2-ebb74e489a8d@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 27 Nov 2023, Ralf Jung wrote: > > Please note that GCC does not use memcpy for "sufficiently small" structure > > copies at -O2, as it's faster to emit the necessary loads+stores inline. > > > > The threshold for "sufficiently small" varies with target and compiler > > version; for instance, it is "above 64 bytes" for 32-bit arm and "above 8192 > > bytes" for x86_64 with current trunk (it also depends on default > > -march/-mtune, etc.). > > Wow, 8 kilobytes?!? It inlines 'rep movsq' for sizes between 32 and 8192 for generic tuning, but it varies substantially for different CPU families. For example, with -mtune=znver[234] the thresholds are: * memcpy is used above 64 bytes, or if size is unknown * 'rep movsq' above 16 bytes (up to 64 bytes) > > So on x86 at least adding such a branch in memcpy is not a practical choice. > > Sorry, how does that follow from GCC not using memcpy for small copies? If you add a branch into memcpy, every single invocation of memcpy will pay a tiny cost for that branch, even though it matters only for copying huge structs, which is vanishingly rare compared to all uses of memcpy. Alexander