From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc31.google.com (mail-oo1-xc31.google.com [IPv6:2607:f8b0:4864:20::c31]) by sourceware.org (Postfix) with ESMTPS id 056C6385843B for ; Thu, 10 Feb 2022 13:16:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 056C6385843B Received: by mail-oo1-xc31.google.com with SMTP id r15-20020a4ae5cf000000b002edba1d3349so6345198oov.3 for ; Thu, 10 Feb 2022 05:16:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:from:to:cc:references:in-reply-to :content-transfer-encoding; bh=XCN6jyLlzNGIGzESpMeId5efvZx82NcS4Mttw2j8EbM=; b=VLw//s+ScBg122iFHIw46i8t8wnoAKqG190fN3RZzY7wE1VID7NPfm0ftrwkYMaXj1 T7UrnQU2VMmDkjGmgBFSNSl+WrFt9RYcbvO2CBTEE6RbsYFOyDIoELJ8bJ5Oa0IQA0n1 6tGzlRqHUXizY7RPMWkknr9xhFjdgNUoNYwjhcCJmswoQXNnW0EW+wnNpRr1YocMbHzm 2ruue4gp4snyqRSCeGZoTClndrl+CMMBgu/gBurZxllYnJ+SnYEisdv1InguILrAJjbD dC0Jy/GpsqDRKD0XR6KMsDDC0hNlGs2LlP5VVrGtmNjudLnVpTJsBfcyJBo3ZJILjCKL ktHA== X-Gm-Message-State: AOAM530PLfbfrdzk/U5dShQTsT4/+sPWsIloukgnhu8mzh6JD92u8ABF t2wvZ6GNbIpJlYSbcs7kNjsSTg== X-Google-Smtp-Source: ABdhPJzeUR1ffkOUGTmNbXjkzCxRqS2DXFxFQZlg1Bct9RV0/uhqI94FaDUVEJH8gfVTJzBqhPkS7A== X-Received: by 2002:a05:6870:e350:: with SMTP id a16mr737899oae.143.1644498997417; Thu, 10 Feb 2022 05:16:37 -0800 (PST) Received: from ?IPV6:2804:431:c7ca:733:a925:765e:3799:3d34? ([2804:431:c7ca:733:a925:765e:3799:3d34]) by smtp.gmail.com with ESMTPSA id g9sm1664397oac.3.2022.02.10.05.16.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Feb 2022 05:16:37 -0800 (PST) Message-ID: Date: Thu, 10 Feb 2022 10:16:34 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH v2] x86-64: Optimize bzero Content-Language: en-US From: Adhemerval Zanella To: Wilco Dijkstra , Noah Goldstein Cc: "H.J. Lu" , GNU C Library References: <20220208224319.40271-1-hjl.tools@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Feb 2022 13:16:39 -0000 On 10/02/2022 10:10, Adhemerval Zanella wrote: > > > On 10/02/2022 10:01, Wilco Dijkstra wrote: >> Hi, >> >>>> The saving is in the lane-cross broadcast which is on the critical >>>> path for memsets in [VEC_SIZE, 2 * VEC_SIZE] (think 32-64). >> >> What is the speedup in eg. bench-memset? Generally the OoO engine will >> be able to hide a small increase in latency, so I'd be surprised it shows up >> as a significant gain. >> >> If you can show a good speedup in an important application (or benchmark >> like SPEC2017) then it may be worth pursuing. However there are other >> optimization opportunities that may be easier or give a larger benefit. >> >>>> Agreed it's not clear if it's worth it to start replacing memset calls with >>>> bzero calls, but at the very least this will improve existing code that >>>> uses bzero. >> >> No code uses bzero, no compiler emits bzero. It died 2 decades ago... >> >>> My point is this is a lot of code and infrastructure for a symbol marked >>> as legacy for POSIX.1-2001 and removed on POSIX.1-2008 for the sake of >>> marginal gains in specific cases. >> >> Indeed, what we really should discuss is how to remove the last traces of >> bcopy and bcmp from GLIBC. Do we need to keep a compatibility symbol >> or could we just get rid of it altogether? > > We need to keep the symbols as-is afaiu, since callers might still target > old POSIX where the symbol is defined as supported. We might add either > compiler or linker warning stating the symbols is deprecated, but imho it > would just be better if we stop trying to microoptimize it and just use > the generic interface (which call memset). And it even makes more sense since gcc either generates memset call or expand bzero, so there is even less point in adding either optimized version for it or ifunc variants to call memset (to avoid the intra-plt call). I will probably cleanup all the bzero optimization we have, it is unfortunate that this patch got in without further discussion.