From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 7C2603858C2D for ; Tue, 3 May 2022 09:09:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7C2603858C2D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=ispras.ru Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id 3006F40D403D; Tue, 3 May 2022 09:09:33 +0000 (UTC) Date: Tue, 3 May 2022 12:09:32 +0300 (MSK) From: Alexander Monakov To: Paul Zimmermann cc: gcc-help@gcc.gnu.org, stephane.glondu@inria.fr, sibid@uvic.ca Subject: Re: slowdown with -std=gnu18 with respect to -std=c99 In-Reply-To: Message-ID: <9f7e3aa9-8d46-1fbb-75b-1c8ad9a667f@ispras.ru> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2022 09:09:44 -0000 On Tue, 3 May 2022, Paul Zimmermann via Gcc-help wrote: > Does anyone have a clue? I can reproduce a difference, but in my case it's simply because in -std=gnuXX mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few steps in the benchmarked function to use fma instead of separate mul/add instructions. (regarding __builtin_expect, it also makes a small difference in my case, it seems GCC generates some redundant code without it, but the difference is 10x smaller than what presence/absence of FMA gives) I think you might be able to figure it out on your end if you run both variants under 'perf stat', note how cycle count and instruction counts change, and then look at disassembly to see what changed. You can use 'perf record' and 'perf report' to easily see the hot code path; if you do that, I'd recommend to run it with the same sampling period in both cases, e.g. like this: perf record -e instructions:P -c 500000 ./perf ... Alexander