From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by sourceware.org (Postfix) with ESMTPS id 397C33858D20 for ; Mon, 7 Feb 2022 00:21:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 397C33858D20 Received: by mail-pj1-x1031.google.com with SMTP id v4so5250810pjh.2 for ; Sun, 06 Feb 2022 16:21:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aknoabWx10nORdyre8pqCC6ng6L25l1AzzUBxiexRy0=; b=2rCTdxC/e1szSRXlxKKgqX117D6Q3RmZIVAKLRri50lgSl2r88f62Be2tiud02f0Cp tMjjmpDOfn7mcXq53fjQncglsh7FHoj/5eErG5xtmZIShDRL6XUZno4idNFB8LWufdla AF/7AIk7SSoYu5tBvuofpOwJHmQBvodKnJxnea/uqNlup6hLNsZ9FRjXH6GjfhceAIJA HwXxqKgYjOxPVRT1If4Lqv7SxZJXLQ1vomH4JprI53YOPFnJO4QwnYXsYGUj8LOuZtBJ 2wYTDf/2luxBhbaYxdvRWDBl+Y45etWxH9tB9HwTfV5mYweeMB+RrWgiHE15YsFN98Cc LjHA== X-Gm-Message-State: AOAM533AiCiIjPVJfQ77U+HYolw5koxwBLMl/svKy+6deLK+ECJ2Eqos CvR51rWq5fgu0h2npzTGEPA3q3liGocpsN547tZAOAse4aI= X-Google-Smtp-Source: ABdhPJxD4C9iW6IUltvGY211QoLqPrV+p7WK/Ehqm3jMfSxwIZfvWQbqbER9rYdH1dC/APCJ0Y5mgX+IJSPbJPy6QmQ= X-Received: by 2002:a17:902:a708:: with SMTP id w8mr13876314plq.101.1644193270314; Sun, 06 Feb 2022 16:21:10 -0800 (PST) MIME-Version: 1.0 References: <20220206210914.1593336-1-hjl.tools@gmail.com> <874k5b3afx.fsf@mid.deneb.enyo.de> In-Reply-To: <874k5b3afx.fsf@mid.deneb.enyo.de> From: "H.J. Lu" Date: Sun, 6 Feb 2022 16:20:34 -0800 Message-ID: Subject: Re: [PATCH] elf: Replace memcmp with __memcmpeq for variable size To: Florian Weimer Cc: "H.J. Lu via Libc-alpha" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3027.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Feb 2022 00:21:12 -0000 On Sun, Feb 6, 2022 at 2:19 PM Florian Weimer wrote: > > * H. J. Lu via Libc-alpha: > > > diff --git a/elf/dl-cache.c b/elf/dl-cache.c > > index 88bf78ad7c..8574d4ded1 100644 > > --- a/elf/dl-cache.c > > +++ b/elf/dl-cache.c > > @@ -72,7 +72,7 @@ glibc_hwcaps_compare (uint32_t left_index, struct dl_hwcaps_priority *right) > > to_compare = left_name_length; > > else > > to_compare = right->name_length; > > - int cmp = memcmp (left_name, right->name, to_compare); > > + int cmp = memcmp_eq (left_name, right->name, to_compare); > > if (cmp != 0) > > return cmp; > > if (left_name_length < right->name_length) > > This change is not correct. Fixed. > The x86-specific optimization probably applies to other > targets as well. I also do not quite see where the performance It is useful only if __memcmpeq isn't an alias of memcmp. I certainly don't mind moving x86-64 to generic. > benefits come from. None of the changed spots look particularly hot > to me. These are noises. Here are the new data: The cycles to run "elf/tst-relsort1 --direct" which calls __memcmpeq 24 times in ld.so: Before: 62704: 62704: runtime linker statistics: 62704: total startup time in dynamic loader: 130771 cycles 62704: time needed for relocation: 32153 cycles (24.5%) 62704: number of relocations: 97 62704: number of relocations from cache: 3 62704: number of relative relocations: 1347 62704: time needed to load objects: 43704 cycles (33.4%) 62704: 62704: runtime linker statistics: 62704: final number of relocations: 131 62704: final number of relocations from cache: 3 After: 62705: 62705: runtime linker statistics: 62705: total startup time in dynamic loader: 117103 cycles 62705: time needed for relocation: 28327 cycles (24.1%) 62705: number of relocations: 97 62705: number of relocations from cache: 3 62705: number of relative relocations: 1347 62705: time needed to load objects: 39550 cycles (33.7%) 62705: 62705: runtime linker statistics: 62705: final number of relocations: 131 62705: final number of relocations from cache: 3 These numbers change for each run. __memcmpeq has the lower cycles. > And I thought the assumption was that GCC would perform this optimization? Yes, but it will take a while to implement it in GCC. -- H.J.