From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by sourceware.org (Postfix) with ESMTPS id 9D40A3858D28 for ; Sun, 28 Apr 2024 02:06:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9D40A3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9D40A3858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1032 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714269985; cv=none; b=S/mPG45d9JcdD3Xxd/RajqpNj7NVuYiGNuauiDUrTXT+udBVHo8uCr6pI62VLI34swduiSPd/onLIZF/hKGSI4VyYN+vdznuOE3aTv2AfQ9Pd+slOZlfkRAJxaHgaXEh8Ypst+Yzwti8VWj2j8iWR8NlMYPKe7x3E8x1YnYKPmg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714269985; c=relaxed/simple; bh=ivQOwk0WWVW6h26xT5sdvZUcPDjCHo/mtYNdPKAHPoc=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Hj6eqI7zFH/okRmUsqVLpHEZEuEaVEqEUM7gD981gMYHIOWPGE3nFYJN0jYF1niLxlnYs29PDE0NDpWBI3rq7PkPM8bJyosG1B4oALKejDfhLGbxsiWxxMOvInK7LdYlGyJ4fX37LP2w2akO1f5sF46tZuaI5i3X3d5cpOeER8E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-2b0cd879c57so805601a91.1 for ; Sat, 27 Apr 2024 19:06:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714269977; x=1714874777; darn=sourceware.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6VK2vu4N0m4uOPGAODYE/3tLxOGXIhnFNMpGPmR3cDM=; b=e6Z3S1dKuTUxZ3TuCIOGIx27tITl9XavyGBZq7hwNVn9dZAtt4A/KGIZ6+qFkRl6nP QHI+ogqVN6n7HxDWEggi7d6yD+UnoYJJttgWCZs8NwKJhc6B0lVBv7vn553Rcubif0V/ 96U1Mb++vWAtkxjEVCWy2X9rcwYS/cAZHaAcNlvzRZE561ypl9j+FtTC0Qyik3dl0gR7 pFL/zn4CMPavVl3KdJZAmqsNiEVU9//skTCFPNXIt06Q/D0ZHicl9v10l0ZwvcHa8Vph HH0dOP5pIxLus5tRAsEbxtsuBY3oNdbSJAGGm+4Ed6Hi44fz2y5SGicHNVJ/o0Aq1Vs9 SGpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714269977; x=1714874777; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6VK2vu4N0m4uOPGAODYE/3tLxOGXIhnFNMpGPmR3cDM=; b=hS+XeG5xILdWli5sxAyIikUueOrOYIQgubpFFBFCmkbgE+vGJziAeIDIyb1yDyJzJ+ sOHANqkJx/6cLCQ08YJHXiDsPiO22C4gdMjPkhT/rE2SrdL7nFXGKM+OEUZaGxo4e3gD GnFpENTvAwfFIUx7DFNbVTvayA2bxeUU/XLpvAWpO7NhVxjDba/nsk8Uw1sz3GZRjPSz zrLMd7sx4nDsXQCE6+sSulLY4ksWcpUYACFea/OZLPn2dhXx9dRs3nQ1D1LfkpOEifMV 4RMpVeKiCJAV5egyGDAEAvoCbySdL+eeSTqDroLOoAx9JPJF9gVJU3EjVGJ8N7n2R59Q kk7A== X-Forwarded-Encrypted: i=1; AJvYcCVkMREaZhIZGsND2AoX85uJ8fedJzuWgCyX28s6GXidxUla3YSjZKfrk/59KOKxddJwwGj8Q6rF7ZvAqTd+McePCBOE+W9VGPoh X-Gm-Message-State: AOJu0YzJvfCP0X/HEtr/q8uE3n6yQOxhLch6iAPMpDyel6jgFp959DsJ K2LNGLJ5wMdIXknIg/vhDajPfG3ZDMu3oJC/wUUqwK27W3FFduzXut8PR7zAj2gRi82QOjoN/qW Oc4i46dLL6wZAuE2J55ppXK6ZhyKhvbotPt0= X-Google-Smtp-Source: AGHT+IHJ8QjEdXMEzz4UC+2IwQHe7ANkVmTJWnp5BZC5FKNDXtRCnbCU/7K3X+yyeDTJ6W7cjr4aEuVwSV3CmRamHKQ= X-Received: by 2002:a17:90a:2dc8:b0:2a2:16db:a425 with SMTP id q8-20020a17090a2dc800b002a216dba425mr6579837pjm.26.1714269977198; Sat, 27 Apr 2024 19:06:17 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: abush wang Date: Sun, 28 Apr 2024 10:06:05 +0800 Message-ID: Subject: Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2 To: "H.J. Lu" Cc: Sunil K Pandey , Noah Goldstein , abushwang via Libc-alpha Content-Type: multipart/alternative; boundary="0000000000000a481306171e950c" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000000a481306171e950c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable This is my env: lscpu ... BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz BIOS Model name: Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz CPU @ 2.5GHz ... I think you can run my demo in these environments to reproduce it On Fri, Apr 26, 2024 at 9:30=E2=80=AFPM H.J. Lu wrote: > On Thu, Apr 25, 2024 at 9:03=E2=80=AFPM abush wang = wrote: > > > > Hi, H.J. > > When I test glibc performance between 2.28 and 2.38, > > I found there is a performance degradation about strlen. > > In fact, this difference comes from __strlen_avx2 and __strlen_evex > > > > ``` > > 2.28 > > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42 > > 42 ENTRY (STRLEN) > > > > > > 2.38 > > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79 > > 79 ENTRY_P2ALIGN (STRLEN, 6) > > ``` > > > > This is my test: > > ``` > > #include > > #include > > #include > > #include > > > > #define MAX_STRINGS 100 > > > > uint64_t rdtsc() { > > uint32_t lo, hi; > > __asm__ __volatile__ ( > > "rdtsc" : "=3Da"(lo), "=3Dd"(hi) > > ); > > return ((uint64_t)hi << 32) | lo; > > } > > > > int main(int argc, char *argv[]) { > > char *input_str[MAX_STRINGS]; > > size_t lengths[MAX_STRINGS]; > > int num_strings =3D 0; // Number of input strings > > uint64_t start_cycles, end_cycles; > > > > // Parse command line arguments and store pointers in input_str arr= ay > > for (int i =3D 1; i < argc && num_strings < MAX_STRINGS; ++i) { > > input_str[num_strings] =3D argv[i]; > > num_strings++; > > } > > > > // Measure the strlen operation for each string > > start_cycles =3D rdtsc(); > > for (int i =3D 0; i < num_strings; ++i) { > > lengths[i] =3D strlen(input_str[i]); > > } > > end_cycles =3D rdtsc(); > > > > unsigned long long total_cycle =3D end_cycles - start_cycles; > > unsigned long long av_cycle =3D total_cycle / num_strings; > > // Print the total cycles taken for the strlen operations > > printf("Total cycles: %llu av cycle: %llu \n", total_cycle, > av_cycle); > > > > // Print the recorded lengths > > printf("Lengths of the input strings:\n"); > > for (int i =3D 0; i < num_strings; ++i) { > > printf("String %d length: %zu\n", i, lengths[i]); > > } > > > > return 0; > > } > > ``` > > > > This is result > > ``` > > 2.28 > > ./strlen_test str1 str2 str3 str4 str5 > > Total cycles: 1468 av cycle: 293 > > Lengths of the input strings: > > String 0 length: 4 > > String 1 length: 4 > > String 2 length: 4 > > String 3 length: 4 > > String 4 length: 4 > > > > 2.38 > > ./strlen_test str1 str2 str3 str4 str5 > > Total cycles: 1814 av cycle: 362 > > Lengths of the input strings: > > String 0 length: 4 > > String 1 length: 4 > > String 2 length: 4 > > String 3 length: 4 > > String 4 length: 4 > > ``` > > > > Thanks, > > abush > > Which processors did you use? Sunil, Noah, can we reproduce it? > > -- > H.J. > --0000000000000a481306171e950c--