From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 745ED3858D20 for ; Sun, 28 Apr 2024 16:13:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 745ED3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 745ED3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::435 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714320783; cv=none; b=lWzoyHFqSBEBJrHLD2xtUU6dszkSpPbYnCW9aq/GQvZD4pi+OW7wkAjt0WJl1Sc9ZnIHh/dviNQ8jVIKoIdlrMBm7a8trSEC7KaRP5OGVaH+ZTP2KQVObu9fjsiZdEvnp/yh1OYQ8OuWYRVPbzrEwdrbMjazyc/E6Mbhaz02V7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714320783; c=relaxed/simple; bh=MQGoqNrmrbVjNKD1Qud0KtXUBUjH2aOx5HMx8pcM+lQ=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=GBCAOCM/d6dxkNpIEWsiT9CRHGrDuwNQV9+8GcUlvtRXGoa5vCpLW0aoXiRryJTsl64Ne35dXRm1YAV1loG4cmYqYg2A+RHFUVL/r0zWw5DGSzL3sDR4chDjcCtohf229FWXFJoXwN86RzljOD4W8DcamGDYaRHwhMRdAcvzhts= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-34d1adeffbfso59738f8f.0 for ; Sun, 28 Apr 2024 09:13:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714320780; x=1714925580; darn=sourceware.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZWf74sLjskEt8hUtcIJJNWZqyvndq7TgNx2AGWA+XZs=; b=EPKebnRrZhrGIXUlZPem35vf5phC+wudNa3lH+UrI48L8OE6hQfchKkOojoRo195mv 4Nh5Uh2QbSud7TpCpsLWPUuXg9azF4ZOh8KQdJvasCl7c8uiEaJq9qY5OZgqJgGlE2RD guFleJ6/VcB6D9aNqEXC8zC7HdDEzVbJ8kfSHaXA5VfWOyRHilKSTTzIkdYfHW3jmSc9 vw51wNBR1mZ5sdSRFaRXP21Q18XeN3JC6Y/YNejQINFxVa2lV9vvwEhPC4lA2asDjvcf UH9oYyjpaYyI4ss7G6ZHwLSOD4V9rvrPmcGEjIf0Oe751O+YtLlFd2ttwRplLMvc/4JV oVAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714320780; x=1714925580; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZWf74sLjskEt8hUtcIJJNWZqyvndq7TgNx2AGWA+XZs=; b=e9U8xlMBpxpV8/M1cj2RAwYlnIoa//RJROwWB4YnthyqGVBgqqfIpKZiFS2HzHfvVC 9y1ciplgX4owbE0xCN0m1Z7XWLkqbGElAHBZaR5+g5CapEXpcA8PvnVaXMoSrm4BDXVF 8HF+U7/hE29XfL+A54al3DyqmteYZRE2mn6EMRf6LVJa3zVPfvx4PCHCfsvmipkc2/81 xd8GPXS7A4ZrsWuGgFesXGWtTh643A7BKSkXqmI0mdcvyikS+L5H3UnVnc9xWWIVC3Um FtypN+dezUL/5wBXsDXGzzMu/J8sYPZdJnArpZDSW+C40X1mQ+pFtFohAjlGUcjlMWHp Bwtg== X-Forwarded-Encrypted: i=1; AJvYcCV5rv/YQnXYpFnOD9ZvCmb2sVrlOpiKXb01JwVpW9NGj0JAlVJTo+9CHFmPsNBJG+NzeL6I7iwQUKn+aw+MLpcL2Hg5MThOgAeI X-Gm-Message-State: AOJu0YxsDCILp2HFHTDhhE8zzfrgrCmSepjWeQ/rK0B4ARN9o57lJR8f J3TdLj/rPuC7YG3F7IRygyaXpfaNYIPvkLwC21su4lMXyuCw0ck2Xx1ork+vtueryVZkaD0KFRq plav5OnG2rs+HuZmhTynXNVdGQtpT+g== X-Google-Smtp-Source: AGHT+IFhR4TEdVXW7KPRTkel/1vYEsQ38goxPIq70rW9IYZma73B69Lg5gqreYZF+8VviXZOsGoIPgZu7d/QU/M2P4M= X-Received: by 2002:a5d:5915:0:b0:34a:9adc:c36e with SMTP id v21-20020a5d5915000000b0034a9adcc36emr3159712wrd.40.1714320779940; Sun, 28 Apr 2024 09:12:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Sunil Pandey Date: Sun, 28 Apr 2024 09:12:23 -0700 Message-ID: Subject: Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2 To: abush wang Cc: "H.J. Lu" , Noah Goldstein , abushwang via Libc-alpha Content-Type: multipart/alternative; boundary="0000000000001e9d1706172a699f" X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000001e9d1706172a699f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Apr 27, 2024 at 7:13=E2=80=AFPM abush wang w= rote: > Actually, I was handling performance issue from libmicro in our distro O= S. > I found that the performance degradation of localtime_r benchmark from > libmicro is blame to strlen. > So I abstracted this test case. > > Can you consistently reproduce strlen perf behaviour by running multiple times back-to-back? You can see high swing from run > On Sat, Apr 27, 2024 at 12:54=E2=80=AFAM Sunil Pandey = wrote: > >> >> >> On Fri, Apr 26, 2024 at 6:30=E2=80=AFAM H.J. Lu wr= ote: >> >>> On Thu, Apr 25, 2024 at 9:03=E2=80=AFPM abush wang wrote: >>> > >>> > Hi, H.J. >>> > When I test glibc performance between 2.28 and 2.38, >>> > I found there is a performance degradation about strlen. >>> > In fact, this difference comes from __strlen_avx2 and __strlen_evex >>> > >>> > ``` >>> > 2.28 >>> > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42 >>> > 42 ENTRY (STRLEN) >>> > >>> > >>> > 2.38 >>> > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79 >>> > 79 ENTRY_P2ALIGN (STRLEN, 6) >>> > ``` >>> > >>> > This is my test: >>> > ``` >>> > #include >>> > #include >>> > #include >>> > #include >>> > >>> > #define MAX_STRINGS 100 >>> > >>> > uint64_t rdtsc() { >>> > uint32_t lo, hi; >>> > __asm__ __volatile__ ( >>> > "rdtsc" : "=3Da"(lo), "=3Dd"(hi) >>> > ); >>> > return ((uint64_t)hi << 32) | lo; >>> > } >>> > >>> > int main(int argc, char *argv[]) { >>> > char *input_str[MAX_STRINGS]; >>> > size_t lengths[MAX_STRINGS]; >>> > int num_strings =3D 0; // Number of input strings >>> > uint64_t start_cycles, end_cycles; >>> > >>> > // Parse command line arguments and store pointers in input_str >>> array >>> > for (int i =3D 1; i < argc && num_strings < MAX_STRINGS; ++i) { >>> > input_str[num_strings] =3D argv[i]; >>> > num_strings++; >>> > } >>> > >>> > // Measure the strlen operation for each string >>> > start_cycles =3D rdtsc(); >>> > for (int i =3D 0; i < num_strings; ++i) { >>> > lengths[i] =3D strlen(input_str[i]); >>> > } >>> > end_cycles =3D rdtsc(); >>> > >>> > unsigned long long total_cycle =3D end_cycles - start_cycles; >>> > unsigned long long av_cycle =3D total_cycle / num_strings; >>> > // Print the total cycles taken for the strlen operations >>> > printf("Total cycles: %llu av cycle: %llu \n", total_cycle, >>> av_cycle); >>> > >>> > // Print the recorded lengths >>> > printf("Lengths of the input strings:\n"); >>> > for (int i =3D 0; i < num_strings; ++i) { >>> > printf("String %d length: %zu\n", i, lengths[i]); >>> > } >>> > >>> > return 0; >>> > } >>> > ``` >>> > >>> > This is result >>> > ``` >>> > 2.28 >>> > ./strlen_test str1 str2 str3 str4 str5 >>> > Total cycles: 1468 av cycle: 293 >>> > Lengths of the input strings: >>> > String 0 length: 4 >>> > String 1 length: 4 >>> > String 2 length: 4 >>> > String 3 length: 4 >>> > String 4 length: 4 >>> > >>> > 2.38 >>> > ./strlen_test str1 str2 str3 str4 str5 >>> > Total cycles: 1814 av cycle: 362 >>> > Lengths of the input strings: >>> > String 0 length: 4 >>> > String 1 length: 4 >>> > String 2 length: 4 >>> > String 3 length: 4 >>> > String 4 length: 4 >>> > ``` >>> > >>> > Thanks, >>> > abush >>> >> >> I'm not sure how you are measuring the performance of strlen function. >> Are you making performance conclusion based on these 2 runs? >> >> 2.28 >> Total cycles: 1468 av cycle: 293 >> >> 2.38 >> Total cycles: 1814 av cycle: 362 >> >> Please use glibc microbenchmark to see if you can reproduce perf drop. >> >> >>> >>> Which processors did you use? Sunil, Noah, can we reproduce it? >>> >>> -- >>> H.J. >>> >> --0000000000001e9d1706172a699f--