From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 089B63858C2D for ; Fri, 4 Aug 2023 08:52:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 089B63858C2D Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kam.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 8ED302828D4; Fri, 4 Aug 2023 10:52:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1; t=1691139152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=HEvV3zK+MbFtoB/P3D7WTPzsbkXNCsgF9SRhsACjINs=; b=WU5biNMCcQdv6nLSOEkPxLbp/WrjVHj6mRtI2Dt/6PlGBmoHSIS6Hp2kIq0EX8mISyA47u 7Wh0GDcm1B6pQ9epp4LOhz40IBdIG0Iep2uTSg5bys8dKklLOeEpZ8pZ97TcKvpPinzzwK 2k+ZbRiDp+pJeAgC4FCTFTsFS9Mcd8M= Date: Fri, 4 Aug 2023 10:52:32 +0200 From: Jan Hubicka To: Richard Biener Cc: Aldy Hernandez , Jeff Law , gcc-patches@gcc.gnu.org Subject: Re: Fix profile upate after vectorizer peeling Message-ID: References: <2ecea360-f45e-57c0-aa99-bbb2b55e91bb@ventanamicro.com> <7b5724c0-aedd-2836-26a9-e700ccd79200@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,JMQ_SPF_NEUTRAL,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, so I found the problem. We duplicate multiple paths and end up with: ;; basic block 6, loop depth 0, count 365072224 (estimated locally, freq 0.3400) ;; prev block 12, next block 7, flags: (NEW, REACHABLE, VISITED) ;; pred: 4 [never (guessed)] count:0 (estimated locally, freq 0.0000) (TRUE_VALUE,EXECUTABLE) ;; 10 [always] count:365072224 (estimated locally, freq 0.3400) (FALLTHRU,EXECUTABLE) # _18 = PHI <0(4), 0(10)> # d_39 = PHI if (_18 == 0) goto ; [97.06%] else goto ; [2.94%] ;; succ: 8 [97.1% (guessed)] count:354334801 (estimated locally, freq 0.3300) (TRUE_VALUE,EXECUTABLE) ;; 7 [2.9% (guessed)] count:10737423 (estimated locally, freq 0.0100) (FALSE_VALUE,EXECUTABLE) Here goto bb 7 is never taken but profile is wrong. Before threading we have chain of conditionals: __asm__("pushf{l|d} pushf{l|d} pop{l} %0 mov{l} {%0, %1|%1, %0} xor{l} {%2, %0|%0, %2} push{l} %0 popf{l|d} pushf{l|d} pop{l} %0 popf{l|d} " : "=&r" __eax_19, "=&r" __ebx_20 : "i" 2097152); _21 = __eax_19 ^ __ebx_20; _22 = _21 & 2097152; if (_22 == 0) goto ; [34.00%] else goto ; [66.00%] [local count: 708669602 freq: 0.660000]: __asm__ __volatile__("cpuid " : "=a" __eax_24, "=b" __ebx_25, "=c" __ecx_26, "=d" __edx_27 : "0" 0); [local count: 1073741826 freq: 1.000000]: # _33 = PHI <0(2), __eax_24(3)> _16 = _33 == 0; if (_33 == 0) goto ; [34.00%] else goto ; [66.00%] [local count: 708669600 freq: 0.660000]: __asm__ __volatile__("cpuid " : "=a" a_44, "=b" b_45, "=c" c_46, "=d" d_47 : "0" 1, "2" 0); [local count: 1073741824 freq: 1.000000]: # _18 = PHI <0(4), 1(5)> # d_39 = PHI if (_18 == 0) goto ; [33.00%] else goto ; [67.00%] If first _22 == 0 then also _33 == 0 and _18 == 0 but the last case has probability 33% while the first 34%, so the profile guess is not consistent with the threaded path. So threading is right to end up with profile inconsistency, but it should print reason for doing it. One option is to disable optimization for the check. Other option is to get the first conditional predicted right. Would this be OK? gcc/ChangeLog: * config/i386/cpuid.h: Mark CPUs not supporting cpuid as unlikely. diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index 03fd6fc9478..9c768ac0b6d 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -295,7 +295,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig) : "i" (0x00200000)); #endif - if (!((__eax ^ __ebx) & 0x00200000)) + if (__builtin_expect (!((__eax ^ __ebx) & 0x00200000), 0)) return 0; #endif