From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x329.google.com (mail-ot1-x329.google.com [IPv6:2607:f8b0:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id 13AE5385802B for ; Wed, 5 May 2021 17:55:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 13AE5385802B Received: by mail-ot1-x329.google.com with SMTP id b5-20020a9d5d050000b02902a5883b0f4bso2495080oti.2 for ; Wed, 05 May 2021 10:55:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uNyVtNh/L4N+sNAgCCrG1PUkXyhG+/Gyqb9VrefLs4Y=; b=rzMjHJYrYPS5Y+BghMvGqXUKvezvJuS08RSiYK2p1E5shWaNsrwOMj4TCELxHMybPu EIk8Hh6RqQdwIx2jLTW7IheDqPl+5YuZ/zVihhVrzD4x/QqCED868vk6bcdtzFM3/uy0 Dwf/tFJ8jSR5S0P59KH/2C64YhX0wOG8SeZkGxwsVg0hgy+ttpChucnMGVNSSXDqK+8T wDIpw+A6e66byb2TDEE79zy+Ye+A2EfsjS8OeRVo/BCFqt9X9FnGQy5mnB8MjwxOYysM qiAbR/Y6COL1qEumoQF3Ea+U0PNjgKfXhHR1Xpac9HyzIbqWnQLTz7fo+PUnt4EoCa5y DxbQ== X-Gm-Message-State: AOAM530+bN4zRDXvwxAzvximkyGo8GZX1dj6qE3hGiNsmBzYOmQlQobC htmuKLVkCWsJVuEDHt4x2ldZwRXdFvcpFHSggPs= X-Google-Smtp-Source: ABdhPJxJWWdcImWRcfGt343ko0hltuGHAa+r/VRPeaP2Gch6MWf3HLuMlXeQSLI5X5GHwdXF/Haj23BvxFO3yKqMbFY= X-Received: by 2002:a05:6830:15d2:: with SMTP id j18mr12729475otr.89.1620237313509; Wed, 05 May 2021 10:55:13 -0700 (PDT) MIME-Version: 1.0 References: <20210504233226.1514601-1-goldstein.w.n@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Wed, 5 May 2021 10:54:37 -0700 Message-ID: Subject: Re: [PATCH v1] x86: Add EVEX optimized memchr family not safe for RTM To: Noah Goldstein Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3028.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 May 2021 17:55:15 -0000 On Wed, May 5, 2021 at 9:25 AM Noah Goldstein wrote: > > On Wed, May 5, 2021 at 9:23 AM H.J. Lu wrote: > > > > On Tue, May 4, 2021 at 4:34 PM Noah Goldstein wrote: > > > > > > No bug. > > > > > > This commit adds a new implementation for EVEX memchr that is not safe > > > for RTM because it uses vzeroupper. The benefit is that by using > > > > EVEX memchr won't cause RTM abort if YMM16-YMM31 are used > > since there is no need to use vzeroupper. Please remove vzeroupper from > > EVEX memchr and remove EVEX RTM functions. > > That's impossible for this implementation. > > The reason ymm0-ymm15 are used is so that we can use vpcmpeq which is > not encodable with ymm16-ymm31. > > This implementation is optimized for CPUs which dont support RTM but > do support EVEX. > Are you seeing something along the line of Prefer_AVX2_STRCMP: commit 1da50d4bda07f04135dca39f40e79fc9eabed1f8 Author: H.J. Lu Date: Fri Feb 26 05:36:59 2021 -0800 x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. -- H.J.