From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by sourceware.org (Postfix) with ESMTPS id 252E23858D1E for ; Mon, 6 May 2024 13:04:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 252E23858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 252E23858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::433 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715000684; cv=none; b=Wjm+rBNmguSFcjVol3dPXgTl68vRrQ1EJDc19Ua0rJAL+qGIicVljzDVCcEND9hk9dYKE8lzEnRvG4cx9qAW/FeXxUxw7ssWX0cdvwXKzmdhqpFMeHqMl+kix5dyiLVjuN67oRM7N6b1VLFUQ6JHDYgguvClvWyLePad21hcZCU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715000684; c=relaxed/simple; bh=qlcSmwAMck12zghqavbu5XllOE2eLGvnoEg2RhgYnCQ=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=fGxMBXJlEROQokIOMdsQhtgqFWthpEOm+9+TiXU2gsya7bsZeygkNsk6BS7SEueQWyapptCsDKBUJ0cXQf6f82+U6bg3tZyd8n7QcV8FwCA19v5tCDRQo6s3ycTAOdAtu/YXw+XKxgcARrQSffnSEoEea8gET2eYrkxcBV3wKmk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-6f43ee95078so1614414b3a.1 for ; Mon, 06 May 2024 06:04:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1715000682; x=1715605482; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:organization:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:from:to:cc:subject:date:message-id:reply-to; bh=Lqgfo3sYTASnOzm8bZ+rKY9Z6+xqCVmko8D4NGJwYnw=; b=Sjp1Zy1En/9ewKJ03M78LypFOmfY5JHpxIol59rDDhh546vF7N/PKGYL0xglBQKO0i Kq2ej/ZRPy7Wd4xwryYjUDg6GLJt9IZlsqfgLd09DZgy82FV2OUIq5214g/U+9c+zIOV 0kv1xCq315Pa2zFh/ffyStZZ5IbSdjIC4YP4VWUzso6fbnnb4ESf1QV8RlOCx+chnB4z ihnvo/JgpCq9CBSk339/mny1Ma7ZPFBa5wtxpb7SRbReDH6bEs0KafH8pkAtXpQ6BfZa Y0hxqXGDN40cKcbHyg/QCfml7Em2vTYxf5eFVSJivJ4mZhKLwZGSk+DpCg7AqRzWUBg7 kq8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715000682; x=1715605482; h=content-transfer-encoding:in-reply-to:organization:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Lqgfo3sYTASnOzm8bZ+rKY9Z6+xqCVmko8D4NGJwYnw=; b=SSVNCv/gPruAnNngEqneDqSWidpGucoDZtiBm/25W1A8W5dsO5EdFk1v7XM7xSf6BL Pr9T/Yddd3e64AcPQdYaO9QMJ+5gUoQdHK82/SOwZd+qUCxYUwSq1d3u9YQv4kcJ5bDE vidFe1bJcGcWbXxGGvcbFlbvIKEHdb5/iKnrkmyMtPHs9fo1o63i33xjOFl+58trXodH gb6ebzsi7P8ir9635A3tpH32f6xEnR45/E7rOo0rGiBrLiVsdSiEbSCs9FAkLX1gqQIG 9rRV36/+3tSidqVxCBsxT47hZGsQzRyJNUMzOJuvwQA0xE/HmonV2GFTN82Of/AL27vv sJxg== X-Gm-Message-State: AOJu0YyVNl4km1LLcXp+g/EdgS551Ab7pWtdC284eaFDY11//HGZxNtf k+2B3TDMKNPy99niv1yNTsGxeWgYjFqP0kzvlsooLHxfjBCbbnRdnQyHORgGu4YRN9JpozFENVN e X-Google-Smtp-Source: AGHT+IErAU+oJMCAf7AeriEU0Tw74OwRbTdfwp/Nc/0jLnzkCBcwe+KKG0wZ/moDb8DXQ6DdKJ8DVw== X-Received: by 2002:a05:6a21:8887:b0:1a7:9812:3680 with SMTP id tb7-20020a056a21888700b001a798123680mr10515898pzc.40.1715000681711; Mon, 06 May 2024 06:04:41 -0700 (PDT) Received: from [192.168.15.31] ([191.13.195.113]) by smtp.gmail.com with ESMTPSA id c12-20020a631c4c000000b005c6617b52e6sm7987432pgm.5.2024.05.06.06.04.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 May 2024 06:04:41 -0700 (PDT) Message-ID: <63ec70c8-f522-4ad7-bb01-6a179bf61ac4@linaro.org> Date: Mon, 6 May 2024 10:04:37 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [V3] powerpc: Optimized strncmp for power10 To: Peter Bergner , Amrita H S Cc: libc-alpha@sourceware.org, Paul E Murphy References: <20240429095847.3541150-1-amritahs@linux.vnet.ibm.com> Content-Language: en-US From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 03/05/24 18:31, Peter Bergner wrote: > On 4/29/24 4:58 AM, Amrita H S wrote: >> +++ b/sysdeps/powerpc/powerpc64/le/power10/strncmp.S >> @@ -0,0 +1,271 @@ >> +/* Optimized strncmp implementation for PowerPC64/POWER10. >> + Copyright (C) 2021-2023 Free Software Foundation, Inc. > > This is a new file, so I believe the Copyright date should just be "2024". > > >> +++ b/sysdeps/powerpc/powerpc64/multiarch/strncmp-power10.S >> @@ -0,0 +1,25 @@ >> +/* Copyright (C) 2016-2023 Free Software Foundation, Inc. > > Likewise. > > > >> +ENTRY_TOCLESS (STRNCMP, 4) >> + /* Check if size is 0. */ >> + cmpdi cr0,r5,0 >> + beq cr0,L(ret0) >> + andi. r7,r3,4095 >> + andi. r8,r4,4095 >> + cmpldi cr0,r7,4096-16 >> + cmpldi cr1,r8,4096-16 >> + bgt cr0,L(crosses) >> + bgt cr1,L(crosses) >> + COMPARE_16(v4,v5,0) >> + addi r3,r3,16 >> + addi r4,r4,16 > > This code looks like it assumes the kernel is using a 4k page size. > All distros that I know of and the default kernel config for ppc64 > and ppc64le kernels is to use a 64K HW page size. Is there a reason > we're not checking for a 64k cache boundary here? > > Adhemerval, you seem to have added the first power8 strncmp.S optimized > routine (sysdeps/powerpc/powerpc64/power8/strncmp.S) and that also uses > a 4k page boundary. Do you remember the history of why we checked for > a 4k page boundary rather than 64k? Was is a matter of using 64k showed > no improvement over 4k and using 4k meant we didn't have to worry about > some system maybe running in 4k page size kernels? If I recall correctly it was to not tie the implementation to an specific page size, since the ABI still allows 4k page sizes. I think both branches will highly unlikely to be taken, so branch prediction will most likely get a high frequency hit. We can also try to make it dynamically if you think these checks are really costly, this will mean to add two extra loads and possible an extra cache like hit (one for GLRO struct, another for dl_pagesize). I don't think this is worth. Another question is whether this tests still make sense for POWER10, is it still that costly for cross page-page reads as for POWER8?