From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by sourceware.org (Postfix) with ESMTPS id 199C43858C50 for ; Fri, 26 Apr 2024 13:30:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 199C43858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 199C43858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::112c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714138261; cv=none; b=tnlZD0smIM6J06pznWilBdK/DSZYZqhcwhpBZQBEhldn9QYLwFZ0tUhad04Dtz3B7QEBVdEoXdQ+UEB4lhA7iK/keUneMhSE3PYnUHLn2fRD5jMAwXYV1JMcwHxR4HGlkus4Db2w1nKIxijPdjTdjPbwlPK3X4zxSBLxPW/Dwsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714138261; c=relaxed/simple; bh=9Ac7rWW8uzswULOtufgAZpbqI4FGyDNHK3IWkHeMFqQ=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=wNabjL/CsbXcgJ9SB09FFVqHUUWhnKLcFM1uQb4jCuYt63ssNasCqFdlQ2Jrjn03/bd0ZeMZho4/fNbYZZf4Btc2n3KDkS3ihJjoRTFLFlr1gOvGknQAxoxAXtS45JugEHBnbiF61C0FWcnVIT35pjFG53Ttw/nKwyQqBKkB7Ko= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-61acfd3fd3fso23835867b3.1 for ; Fri, 26 Apr 2024 06:30:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714138258; x=1714743058; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=p0Zc6yIziOcEgYdunzfbEf914lNHMr7zzjJ/px/u2CM=; b=NxS5jafqAixqncs3pMCnOJbb00dmrlPclC62wtt7PU4HgJvKw6SVVwgzuXwl+iKtTE i8io1D4WztqLWsz+pbRFfNV8aH1CPh2kWGHzDTsfOM7iTf3SHeYOYqT+ivI4xvvOfHPo KVYz2SELSjTDVhpi1i7aKgE8mDo8XHRC8nC+ycNB2WpWS1gt1bjWhhkaYV2md/zc7KIu jaXer+yoMi/sFLPkkyMBqqmsSkjByXZUBDIxd8mB66rSnptYAk8/MvCgeidDEfoJFjQS hDI+7AXig+0nvo2Ug2qgJRYJYnKCWJh50bQu6UAqchqKHLrTN0+wntvWf5AkrTI7waQe Q9Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714138258; x=1714743058; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p0Zc6yIziOcEgYdunzfbEf914lNHMr7zzjJ/px/u2CM=; b=vAWiER2B0nbK6A1wlqw5PXzzOfx4bsDo0kbNZe4/a6q71X1AOlqZ+9yJ/uILb5Bwno ze29Y3fszAsY+bA1YNXl103r3T/wgcfCQR+h8E58udOEo5/OpoPU/4UyIcZQZ09dhjBh RmGSO8NdA7OlNp8KzNBfGqkbgYyLS425uJs6eEg6LUkzRmMQ23JmblpMMsLRmDWCVlFn 82GFlIBgGML2ZGYOBRtuUV54TsPIrdP/89tlIPsqDWVN5xFU8Njm93zf0wRNOAponS1w r6XwadWI6hEnmAGSmUvdwpOdiKp4it3uo0ECwT4RTgtBquBELD0KmZ2RdybYePNdJnbo xWaQ== X-Gm-Message-State: AOJu0YxWnhHyNdukzOgjdfi4yBtB5QiWIA+v98D5wRb6XdWNlEjGdHQg tQl5GdbByXJ2tI5KkAtltpGXsXo0I6mzhX/wJf5EcwLxx3GF/MmmpdjmQiVpRZfmt9XjNEYXuXP uRvtGPHbrZrWnb+2MDNLiQN22WofajA== X-Google-Smtp-Source: AGHT+IHsdw3zPZPjXQSta4XaSrIRtx0OO97UArKKbv+jtEuXHewxLGtvHBH7rUGehvy+2e3d+G7BGxjAEEFLz+wA1Sw= X-Received: by 2002:a05:690c:f05:b0:618:406d:5144 with SMTP id dc5-20020a05690c0f0500b00618406d5144mr3067980ywb.28.1714138258306; Fri, 26 Apr 2024 06:30:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "H.J. Lu" Date: Fri, 26 Apr 2024 06:30:21 -0700 Message-ID: Subject: Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2 To: abush wang , Sunil K Pandey , Noah Goldstein Cc: abushwang via Libc-alpha Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3012.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Apr 25, 2024 at 9:03=E2=80=AFPM abush wang w= rote: > > Hi, H.J. > When I test glibc performance between 2.28 and 2.38, > I found there is a performance degradation about strlen. > In fact, this difference comes from __strlen_avx2 and __strlen_evex > > ``` > 2.28 > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42 > 42 ENTRY (STRLEN) > > > 2.38 > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79 > 79 ENTRY_P2ALIGN (STRLEN, 6) > ``` > > This is my test: > ``` > #include > #include > #include > #include > > #define MAX_STRINGS 100 > > uint64_t rdtsc() { > uint32_t lo, hi; > __asm__ __volatile__ ( > "rdtsc" : "=3Da"(lo), "=3Dd"(hi) > ); > return ((uint64_t)hi << 32) | lo; > } > > int main(int argc, char *argv[]) { > char *input_str[MAX_STRINGS]; > size_t lengths[MAX_STRINGS]; > int num_strings =3D 0; // Number of input strings > uint64_t start_cycles, end_cycles; > > // Parse command line arguments and store pointers in input_str array > for (int i =3D 1; i < argc && num_strings < MAX_STRINGS; ++i) { > input_str[num_strings] =3D argv[i]; > num_strings++; > } > > // Measure the strlen operation for each string > start_cycles =3D rdtsc(); > for (int i =3D 0; i < num_strings; ++i) { > lengths[i] =3D strlen(input_str[i]); > } > end_cycles =3D rdtsc(); > > unsigned long long total_cycle =3D end_cycles - start_cycles; > unsigned long long av_cycle =3D total_cycle / num_strings; > // Print the total cycles taken for the strlen operations > printf("Total cycles: %llu av cycle: %llu \n", total_cycle, av_cycle)= ; > > // Print the recorded lengths > printf("Lengths of the input strings:\n"); > for (int i =3D 0; i < num_strings; ++i) { > printf("String %d length: %zu\n", i, lengths[i]); > } > > return 0; > } > ``` > > This is result > ``` > 2.28 > ./strlen_test str1 str2 str3 str4 str5 > Total cycles: 1468 av cycle: 293 > Lengths of the input strings: > String 0 length: 4 > String 1 length: 4 > String 2 length: 4 > String 3 length: 4 > String 4 length: 4 > > 2.38 > ./strlen_test str1 str2 str3 str4 str5 > Total cycles: 1814 av cycle: 362 > Lengths of the input strings: > String 0 length: 4 > String 1 length: 4 > String 2 length: 4 > String 3 length: 4 > String 4 length: 4 > ``` > > Thanks, > abush Which processors did you use? Sunil, Noah, can we reproduce it? --=20 H.J.