From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc33.google.com (mail-oo1-xc33.google.com [IPv6:2607:f8b0:4864:20::c33]) by sourceware.org (Postfix) with ESMTPS id 4E2E3385AC09 for ; Sat, 11 Nov 2023 03:21:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4E2E3385AC09 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4E2E3385AC09 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::c33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699672881; cv=none; b=LvCXJJ7hQT3nyzTKsEetMFmtuCLqQW3X5cDzF/ixKK2ox8BR3R7mWF42IVCqyeoJOWu9I+T0FJq+vFvqcQhj8dvP1faSmKPcTdlmpN/HV+vVH8RLkVIH0YJ3hHxAnVkf2D3J1RdVHOjerlr14eQ3sAvq4o0lnAATqdQK45+9L0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699672881; c=relaxed/simple; bh=XL/GE2G4nOCHlPb54GFcVZsmv3InYnzVhbGEEJTUdYo=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=tsYyyg9hLY1l3pX+DvpQChLU16pHdeGnAc4yzj8sbjdyTQDA6m6QsDkRqgRZ4Jn0bsr9qJLI3nC7L8c5KPA65+RkZ8dsrEfE8IkvgfLWXoXAYFC+KNz74H/riXd14eA+9W1lXY/KUVYXV0CM6y8xLUHfGUP6uE0AeslwYJtQTKQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oo1-xc33.google.com with SMTP id 006d021491bc7-5875c300becso1527707eaf.0 for ; Fri, 10 Nov 2023 19:21:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699672879; x=1700277679; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5ksYISSKiHtOOpwBhlpLwX3e98VwOV5CPp9giFPQ4rY=; b=l2DWonw8mQSlGrYIlKSk6mT8ESHBlbAIm3S0MJdiA2+tltCZjQ1dKpoERwsdDJCA4j FGKFIzgJJ+kclJDgQjbvU48WWf+RcQ9LXSbTDFeygH4D2d+ahiI3kbyMdx+83oN1ojGR yGzRsEzSW8VcoeQMxNWEpKgxTMPaWR9q5XrS1wGaW1r3/hE44T/g2hJ9UCII9krzDPK1 EDVybDrEBlNr8RaaiD5Td44uXR2MFmi6A+JuYxHx/nyUUOg1LZ01RAUHouzHb5XHJoJl JB29NwkkiyUVP9BE+iRpOwGM/swqv0oUzuVUQo94rrDOrtPa9uF+i9Hz1D3tIEwCmQO4 9zIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699672879; x=1700277679; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5ksYISSKiHtOOpwBhlpLwX3e98VwOV5CPp9giFPQ4rY=; b=cOO6VWjPmd8DAe6EmcgLY3A5CN9iv5yDf5Ih8RaBgXgJRxRU+9fKLR530e0++pKsIF dhNEPnZ+fbwwZqka6jO362Ey86gJGzT7gIPNi/kBITqWN3pCh6bfuia/e6tyyptG8oh8 3PE9453e1Hsqfh5yc+7oae+9yj0s/810uIs4uOOY98jT/skgYrGt+Fas4cCPAdnMST+P nUki530aZE9XyD56xag7jjL+dkbqIf7NYqF9Q53ytiGzIdntCtHMOq3DL+TCsIaFmL/r +kwdV++d+S04GwhEyorTY1Xyq+JdIPP+vOyB8DQIzrKYiSsNvXp/1SIztXsnjcBcexcw cn1A== X-Gm-Message-State: AOJu0YwCbP4XbkgfLtf/g31Bx7tBRL6W9Gkzj5RT3eLi8MDFbPjGawCm BG37z2ps/WG8X8njlwpwZJAZeVT4S8gbq7ddVMt0vemt X-Google-Smtp-Source: AGHT+IH8T2s8mnNy2xnb3GDbg9ZjkD1mJLNq8/iheBGz4V2Xk0Aza/6xh72iv0j26YnVGNQEp7undyvGGydUGc9175w= X-Received: by 2002:a05:6870:9e42:b0:1ef:b109:b8ca with SMTP id pt2-20020a0568709e4200b001efb109b8camr1310947oab.34.1699672879516; Fri, 10 Nov 2023 19:21:19 -0800 (PST) MIME-Version: 1.0 References: <20231101221657.311121-1-goldstein.w.n@gmail.com> <20231109224918.2036632-1-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Fri, 10 Nov 2023 21:21:07 -0600 Message-ID: Subject: Re: [PATCH v3] x86: Fix unchecked AVX512-VBMI2 usage in strrchr-evex-base.S To: Sunil Pandey Cc: libc-alpha@sourceware.org, hjl.tools@gmail.com, carlos@systemhalted.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Nov 9, 2023 at 9:11=E2=80=AFPM Sunil Pandey wro= te: > > > > On Thu, Nov 9, 2023 at 2:49=E2=80=AFPM Noah Goldstein wrote: >> >> strrchr-evex-base used `vpcompress{b|d}` in the page cross logic but >> was missing the CPU_FEATURE checks for VBMI2 in the >> ifunc/ifunc-impl-list. >> >> The fix is either to add those checks or change the logic to not use >> `vpcompress{b|d}`. Choosing the latter here so that the strrchr-evex >> implementation is usable on SKX. >> >> New implementation is a bit slower, but this is in a cold path so its >> probably okay. >> --- >> sysdeps/x86_64/multiarch/strrchr-evex-base.S | 61 ++++++++++++-------- >> 1 file changed, 37 insertions(+), 24 deletions(-) >> >> diff --git a/sysdeps/x86_64/multiarch/strrchr-evex-base.S b/sysdeps/x86_= 64/multiarch/strrchr-evex-base.S >> index cd6a0a870a..eb5f1f855a 100644 >> --- a/sysdeps/x86_64/multiarch/strrchr-evex-base.S >> +++ b/sysdeps/x86_64/multiarch/strrchr-evex-base.S >> @@ -35,7 +35,6 @@ >> # define CHAR_SIZE 4 >> # define VPCMP vpcmpd >> # define VPMIN vpminud >> -# define VPCOMPRESS vpcompressd >> # define VPTESTN vptestnmd >> # define VPTEST vptestmd >> # define VPBROADCAST vpbroadcastd >> @@ -46,7 +45,6 @@ >> # define CHAR_SIZE 1 >> # define VPCMP vpcmpb >> # define VPMIN vpminub >> -# define VPCOMPRESS vpcompressb >> # define VPTESTN vptestnmb >> # define VPTEST vptestmb >> # define VPBROADCAST vpbroadcastb >> @@ -71,7 +69,7 @@ ENTRY_P2ALIGN(STRRCHR, 6) >> andl $(PAGE_SIZE - 1), %eax >> cmpl $(PAGE_SIZE - VEC_SIZE), %eax >> jg L(cross_page_boundary) >> - >> +L(page_cross_continue): >> VMOVU (%rdi), %VMM(1) >> /* k0 has a 1 for each zero CHAR in YMM1. */ >> VPTESTN %VMM(1), %VMM(1), %k0 >> @@ -79,10 +77,11 @@ ENTRY_P2ALIGN(STRRCHR, 6) >> test %VGPR(rsi), %VGPR(rsi) >> jz L(aligned_more) >> /* fallthrough: zero CHAR in first VEC. */ >> -L(page_cross_return): >> + >> /* K1 has a 1 for each search CHAR match in VEC(1). */ >> VPCMPEQ %VMATCH, %VMM(1), %k1 >> KMOV %k1, %VGPR(rax) >> +L(page_cross_return): > > > Unused label. > >> >> /* Build mask up until first zero CHAR (used to mask of >> potential search CHAR matches past the end of the string). *= / >> blsmsk %VGPR(rsi), %VGPR(rsi) >> @@ -167,7 +166,6 @@ L(first_vec_x1_return): >> >> .p2align 4,, 12 >> L(aligned_more): >> -L(page_cross_continue): >> /* Need to keep original pointer incase VEC(1) has last match. = */ >> movq %rdi, %rsi >> andq $-VEC_SIZE, %rdi >> @@ -340,34 +338,49 @@ L(return_new_match_ret): >> leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax >> ret >> >> - .p2align 4,, 4 >> L(cross_page_boundary): >> + /* eax contains all the page offset bits of src (rdi). `xor rdi, >> + rax` sets pointer will all page offset bits cleared so >> + offset of (PAGE_SIZE - VEC_SIZE) will get last aligned VEC >> + before page cross (guaranteed to be safe to read). Doing this >> + as opposed to `movq %rdi, %rax; andq $-VEC_SIZE, %rax` saves >> + a bit of code size. */ >> xorq %rdi, %rax >> - mov $-1, %VRDX >> - VMOVU (PAGE_SIZE - VEC_SIZE)(%rax), %VMM(6) >> - VPTESTN %VMM(6), %VMM(6), %k0 >> + VMOVU (PAGE_SIZE - VEC_SIZE)(%rax), %VMM(1) >> + VPTESTN %VMM(1), %VMM(1), %k0 >> KMOV %k0, %VRSI >> >> + /* Shift out zero CHAR matches that are before the beginning of >> + src (rdi). */ >> # ifdef USE_AS_WCSRCHR >> movl %edi, %ecx >> - and $(VEC_SIZE - 1), %ecx >> + andl $(VEC_SIZE - 1), %ecx >> shrl $2, %ecx >> # endif >> - shlx %SHIFT_REG, %VRDX, %VRDX >> - >> -# ifdef USE_AS_WCSRCHR >> - kmovw %edx, %k1 >> -# else >> - KMOV %VRDX, %k1 >> -# endif >> - >> - VPCOMPRESS %VMM(6), %VMM(1){%k1}{z} >> - /* We could technically just jmp back after the vpcompress but >> - it doesn't save any 16-byte blocks. */ >> shrx %SHIFT_REG, %VRSI, %VRSI >> + >> test %VRSI, %VRSI >> - jnz L(page_cross_return) >> - jmp L(page_cross_continue) >> - /* 1-byte from cache line. */ >> + jz L(page_cross_continue) >> + >> + /* Found zero CHAR so need to test for search CHAR. */ >> + VPCMP $0, %VMATCH, %VMM(1), %k1 >> + KMOV %k1, %VRAX >> + /* Shift out search CHAR matches that are before the beginning o= f >> + src (rdi). */ >> + shrx %SHIFT_REG, %VRAX, %VRAX >> + /* Check if any search CHAR match in range. */ >> + blsmsk %VRSI, %VRSI >> + and %VRSI, %VRAX >> + jz L(ret2) >> + bsr %VRAX, %VRAX >> +# ifdef USE_AS_WCSRCHR >> + leaq (%rdi, %rax, CHAR_SIZE), %rax >> +# else >> + addq %rdi, %rax >> +# endif >> +L(ret2): >> + ret > > > Modified instructions to fit page crossing logic in cache line for strrch= r-evex512. > > 300: 48 31 f8 xor %rdi,%rax > 303: 62 e1 fe 48 6f 48 3f vmovdqu64 0xfc0(%rax),%zmm17 > 30a: 62 b2 76 40 26 c1 vptestnmb %zmm17,%zmm17,%k0 > 310: c4 e1 fb 93 f0 kmovq %k0,%rsi > 315: 89 f9 mov %edi,%ecx > 317: 48 d3 ee shr %cl,%rsi > 31a: 0f 84 f8 fc ff ff je 18 <__strrchr_evex512+0x18> > 320: 62 b1 75 40 74 c8 vpcmpeqb %zmm16,%zmm17,%k1 > 326: c4 e1 fb 93 c1 kmovq %k1,%rax > 32b: 48 d3 e8 shr %cl,%rax > 32e: c4 e2 c8 f3 d6 blsmsk %rsi,%rsi > 333: 48 21 f0 and %rsi,%rax > 336: 74 07 je 33f <__strrchr_evex512+0x33f> > 338: 48 0f bd c0 bsr %rax,%rax > 33c: 48 01 f8 add %rdi,%rax > 33f: c3 ret > Can update if you want, but its a alot of ifdefs, LMK. >> >> + /* 3 bytes from cache-line for evex. >> + Crosses cache line for evex512. */ >> END(STRRCHR) >> #endif >> -- >> 2.34.1 >>