From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by sourceware.org (Postfix) with ESMTPS id 132E93858D37 for ; Sun, 28 Apr 2024 02:13:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 132E93858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 132E93858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1031 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714270402; cv=none; b=P6lPI+nhoNEdeqblXo63vh6GT3WkL1fjlO55nMqj6F4T0S3UiOlosHNYreVCahUMiEW1QJqnT3o08h11ceDNlfHdSHTgqz24PCSFJYYMDsZd8vIKxItp+td/R0PIVw6T9dzDZDtcBATSRyyj18/2/GPaNcPu+t8LtgdYQPW8guQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714270402; c=relaxed/simple; bh=HVkxLOuqQyOtCns8hr0FT/psk50NBe1B60rNkiaA0SE=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=nWHdYCJCWaMaKNCdHa83E7nCoqkxqtvbLlxYqQPUZDAE8XUQKr4l0ABIr4x7f+nuCS+kzkM5Y0FmQjdwkAh6IP17YDaoSNYx2NrRcmZnIBcId1OQ1FsQpoPggkwBBJwBOZwqTPa3dtDmm1wdjWNzZ40d5Ui9qSH9ZTeuGRLYMLk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2ae913878b0so2512192a91.2 for ; Sat, 27 Apr 2024 19:13:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714270396; x=1714875196; darn=sourceware.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=9MmgoGgfnFgFzwk7ezvU3mMGiVBLrZh9RU9KI4nEwag=; b=PEl8ntL42kvoJxtLcx4jCNVMbbi1Je0xwgawOLiIooykbUi59tYXLU13PukK1cj97N dfI0Io8WdmiqyIqMkL/ME0RWyE9xRQYXNqoLJfX7Ew6WZ89lMep/K1eX5PJx1YWd0bqM EalLUt+9n39yWGQv+k85Xhcwnr7EEawGkQiNQRTZHg5eWvb+3T5dRKEM0lSMvXICiMC6 xjNlY0cqd/jYEOnEKyE2ueGOCX7SfjiaauHWsrOOo5W7F08qGDOMzx0wrifZFyRPC03w vc5UnyU0P9VAOxEf5qJqCN9o2cb+o4KpPjbZiwXq3ADJHMV1bCKUVBqiDHDPiQolHFgm Qltg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714270396; x=1714875196; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9MmgoGgfnFgFzwk7ezvU3mMGiVBLrZh9RU9KI4nEwag=; b=Wk1EPd47LxKi20H6GI0mKALjAUj86UYYHSXl9PHHt/7HZPoAK6GYKi52Uj0keT7Q/M GQtXRTrozhdobPpDxRjWMV1wUeUi5dKY0MY6m97WE9zsxgMtJSD/MVsgP5zZQx+1Zf7f R6DMqRNIc1ARBentZ2X0d8YmdWaGkPt83kpAtZ5lRhawMnMykO7GQqaW7QEIfNZ3zioD qc1giQO2uKCYV4QvkTQECTEV3LsN500WP/xYfh//YIcv9pR9Vhtw7RXnfgyfKl8YUgmb PPiJ5zNN3dA0zFCYzVa4s6YcYHC7hUPV69/Qbeiw3QKxN4f9pJ3cPnMvA2KcfRLv9etX Em6g== X-Forwarded-Encrypted: i=1; AJvYcCVoAYlE6+9ut7IkmY5wLPNKNgK8siX263Bvk6LJb04AA7QkYN7ZbSq7e1WTUplNn94bzooTMgEcotggQ0MCRn+NLFHZ/viUqeT1 X-Gm-Message-State: AOJu0YxWkSUiw29kMQ54UouQmxeaESiG471jXh+pU10UA4AGC0/STIjY uYEdU33JPBGrUdPLQGWID1W7yL9Qu4ljv4YaR13Yw22Vizp1ktZ0rp/eksZjenVCdKZimdgZVBy MEsApPmxG1M2n7+WAm3ZK56Ocp5DA6cmdwGI= X-Google-Smtp-Source: AGHT+IGaoTiAczjGfktLngxfSyTDPvrP6vRyunDI9Hg9CU7kr3MSv97OpLISSi/RQo6fTJhxCXVwD9WyVXlCbNcUcTc= X-Received: by 2002:a17:90b:1896:b0:2b0:e3b3:78c2 with SMTP id mn22-20020a17090b189600b002b0e3b378c2mr2181846pjb.48.1714270395933; Sat, 27 Apr 2024 19:13:15 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: abush wang Date: Sun, 28 Apr 2024 10:13:04 +0800 Message-ID: Subject: Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2 To: Sunil Pandey Cc: "H.J. Lu" , Noah Goldstein , abushwang via Libc-alpha Content-Type: multipart/alternative; boundary="000000000000ffabd206171ead1e" X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000ffabd206171ead1e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Actually, I was handling performance issue from libmicro in our distro OS. I found that the performance degradation of localtime_r benchmark from libmicro is blame to strlen. So I abstracted this test case. On Sat, Apr 27, 2024 at 12:54=E2=80=AFAM Sunil Pandey w= rote: > > > On Fri, Apr 26, 2024 at 6:30=E2=80=AFAM H.J. Lu wro= te: > >> On Thu, Apr 25, 2024 at 9:03=E2=80=AFPM abush wang wrote: >> > >> > Hi, H.J. >> > When I test glibc performance between 2.28 and 2.38, >> > I found there is a performance degradation about strlen. >> > In fact, this difference comes from __strlen_avx2 and __strlen_evex >> > >> > ``` >> > 2.28 >> > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42 >> > 42 ENTRY (STRLEN) >> > >> > >> > 2.38 >> > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79 >> > 79 ENTRY_P2ALIGN (STRLEN, 6) >> > ``` >> > >> > This is my test: >> > ``` >> > #include >> > #include >> > #include >> > #include >> > >> > #define MAX_STRINGS 100 >> > >> > uint64_t rdtsc() { >> > uint32_t lo, hi; >> > __asm__ __volatile__ ( >> > "rdtsc" : "=3Da"(lo), "=3Dd"(hi) >> > ); >> > return ((uint64_t)hi << 32) | lo; >> > } >> > >> > int main(int argc, char *argv[]) { >> > char *input_str[MAX_STRINGS]; >> > size_t lengths[MAX_STRINGS]; >> > int num_strings =3D 0; // Number of input strings >> > uint64_t start_cycles, end_cycles; >> > >> > // Parse command line arguments and store pointers in input_str >> array >> > for (int i =3D 1; i < argc && num_strings < MAX_STRINGS; ++i) { >> > input_str[num_strings] =3D argv[i]; >> > num_strings++; >> > } >> > >> > // Measure the strlen operation for each string >> > start_cycles =3D rdtsc(); >> > for (int i =3D 0; i < num_strings; ++i) { >> > lengths[i] =3D strlen(input_str[i]); >> > } >> > end_cycles =3D rdtsc(); >> > >> > unsigned long long total_cycle =3D end_cycles - start_cycles; >> > unsigned long long av_cycle =3D total_cycle / num_strings; >> > // Print the total cycles taken for the strlen operations >> > printf("Total cycles: %llu av cycle: %llu \n", total_cycle, >> av_cycle); >> > >> > // Print the recorded lengths >> > printf("Lengths of the input strings:\n"); >> > for (int i =3D 0; i < num_strings; ++i) { >> > printf("String %d length: %zu\n", i, lengths[i]); >> > } >> > >> > return 0; >> > } >> > ``` >> > >> > This is result >> > ``` >> > 2.28 >> > ./strlen_test str1 str2 str3 str4 str5 >> > Total cycles: 1468 av cycle: 293 >> > Lengths of the input strings: >> > String 0 length: 4 >> > String 1 length: 4 >> > String 2 length: 4 >> > String 3 length: 4 >> > String 4 length: 4 >> > >> > 2.38 >> > ./strlen_test str1 str2 str3 str4 str5 >> > Total cycles: 1814 av cycle: 362 >> > Lengths of the input strings: >> > String 0 length: 4 >> > String 1 length: 4 >> > String 2 length: 4 >> > String 3 length: 4 >> > String 4 length: 4 >> > ``` >> > >> > Thanks, >> > abush >> > > I'm not sure how you are measuring the performance of strlen function. > Are you making performance conclusion based on these 2 runs? > > 2.28 > Total cycles: 1468 av cycle: 293 > > 2.38 > Total cycles: 1814 av cycle: 362 > > Please use glibc microbenchmark to see if you can reproduce perf drop. > > >> >> Which processors did you use? Sunil, Noah, can we reproduce it? >> >> -- >> H.J. >> > --000000000000ffabd206171ead1e--