From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by sourceware.org (Postfix) with ESMTPS id BA0D33858D28 for ; Thu, 10 Feb 2022 12:35:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA0D33858D28 Received: by mail-oi1-x235.google.com with SMTP id 4so5646910oil.11 for ; Thu, 10 Feb 2022 04:35:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=OuEgIsT1GYs6jFL8QQsAtMEJ/+tUw0nsS4Lt2cNoGb4=; b=k469+PLsNV3QR7METcuXTWyvVJx2tL6KNc5d2yqE/qY8eawRSsQNlZ2GtT25w0k+4O 6ZSOuNPfbJ1AHcnjeuGuIBbhi7Se2coMkaN2ONMqrz5c/ibloiuuTt7ssDIE8VwRXbhE VV1rk4JdtmlETfzibjpzLcYstfku5sOIsb/43FRcM9Ii6CDpIhXsTWLiyJrQr9dxJloO G6QqAoiFMbNUYnk+lYe2xd2OYN5Z9Udjbr1yMSkedsSSKOJnc7j0wza4NDkg9wx/xpgw cOPlaYpEMwZoPbT/zBXoiO9zqoTRv13Dw2oqoFZmwHjun48PzTJlz3hQSFcvAchJmrrn YoQg== X-Gm-Message-State: AOAM533fWlUmIQZd7NDLGIDkvOacEutjNzciKE7Bg4lDoaPpfLqd9QN7 6LRo165tD98d7JCGmEXnPBsOC4bgphDjcw== X-Google-Smtp-Source: ABdhPJy+obuBg4VUdzmqZAYuXrdxSdOjmo36Wx6KoZ7c0XbHypNBZ06PFnp0onvmfGVkrpCQbLp3hw== X-Received: by 2002:a05:6808:1444:: with SMTP id x4mr824596oiv.6.1644496540043; Thu, 10 Feb 2022 04:35:40 -0800 (PST) Received: from ?IPV6:2804:431:c7ca:733:a925:765e:3799:3d34? ([2804:431:c7ca:733:a925:765e:3799:3d34]) by smtp.gmail.com with ESMTPSA id u5sm8103140ooo.46.2022.02.10.04.35.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Feb 2022 04:35:39 -0800 (PST) Message-ID: Date: Thu, 10 Feb 2022 09:35:37 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH v2] x86-64: Optimize bzero Content-Language: en-US To: Noah Goldstein Cc: "H.J. Lu" , GNU C Library , Wilco Dijkstra References: <20220208224319.40271-1-hjl.tools@gmail.com> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Feb 2022 12:35:42 -0000 On 09/02/2022 19:14, Noah Goldstein wrote: > On Wed, Feb 9, 2022 at 5:41 AM Adhemerval Zanella via Libc-alpha > wrote: >> >> >> >> On 08/02/2022 19:43, H.J. Lu via Libc-alpha wrote: >>> Rebase against the current master branch. >>> >>> -- >>> memset with zero as the value to set is by far the majority value (99%+ >>> for Python3 and GCC). >>> >>> bzero can be slightly more optimized for this case by using a zero-idiom >>> xor for broadcasting the set value to a register (vector or GPR). >>> >>> Co-developed-by: Noah Goldstein >> >> Is it really worth to ressurerect bzero with this multiple ifunc variants? >> Would Python3/GCC or any programs start to replace memset with bzero for >> the sake of this optimization? >> >> I agree with Wilco where the gain are marginal in this case. > > The cost is only 1 cache line and it doesn't interfere with memset at all > so it's unlikely to cause any problems. > > The saving is in the lane-cross broadcast which is on the critical > path for memsets in [VEC_SIZE, 2 * VEC_SIZE] (think 32-64). > > As well for EVEX + AVX512, because it uses predicate execution for > [0, VEC_SIZE] there is a slight benefit there (although only in throughput > because the critical path in mask construction is longer than the lane > VEC setup). > > Agreed it's not clear if it's worth it to start replacing memset calls with > bzero calls, but at the very least this will improve existing code that > uses bzero. > My point is this is a lot of code and infrastructure for a symbol marked as legacy for POSIX.1-2001 and removed on POSIX.1-2008 for the sake of marginal gains in specific cases.