From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 219ED3858D33 for ; Tue, 8 Aug 2023 08:07:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 219ED3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 3DAE9223E1; Tue, 8 Aug 2023 08:07:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1691482077; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nMyqS5sARV+o6g35Ya82uulfCTdAZdTzHeo2cb4lqk8=; b=FasmOX5IXiVgMDbouYhs7nX7+ARWLdo3XkQPKzc/l4TWT0EoetYS2vsQWASQTICLiGrPDD JNGIqy+dnSn6JZntFvB8U7UuC2jfMYGohC/Zw2JRS52IAJK9a6Qp6z9wQ82cGnMHzKVm3q 97eWOkiT3srYFgiOsiaA3ZYzsLiNJ3s= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1691482077; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nMyqS5sARV+o6g35Ya82uulfCTdAZdTzHeo2cb4lqk8=; b=1rX/VExtGWkIetcDjAdZKg8SNV69OUYkHIEZd9z9zQDN9I6g6cmpxKupuxIvogx6wVSqmS nGUYmBx1Ou6BX8DQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 1D5922C142; Tue, 8 Aug 2023 08:07:57 +0000 (UTC) Date: Tue, 8 Aug 2023 08:07:57 +0000 (UTC) From: Richard Biener To: Uros Bizjak cc: "gcc-patches@gcc.gnu.org" , Jan Hubicka , Hongtao Liu Subject: Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 7 Aug 2023, Uros Bizjak wrote: > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener wrote: > > > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > > named patterns in order to avoid generation of partial vector V4SFmode > > > trapping instructions. > > > > > > The new option is enabled by default, because even with sanitization, > > > a small but consistent speed up of 2 to 3% with Polyhedron capacita > > > benchmark can be achieved vs. scalar code. > > > > > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% > > > vs. scalar code. This is what clang does by default, as it defaults > > > to -fno-trapping-math. > > > > I like the new option, note you lack invoke.texi documentation where > > I'd also elaborate a bit on the interaction with -fno-trapping-math > > and the possible performance impact then NaNs or denormals leak > > into the upper halves and cross-reference -mdaz-ftz. > > The attached doc patch is invoke.texi entry for -mmmxfp-with-sse > option. It is written in a way to also cover half-float vectors. WDYT? "generate trapping floating-point operations" I'd say "generate floating-point operations that might affect the set of floating point status flags", the word "trapping" is IMHO misleading. Not sure if "set of floating point status flags" is the correct term, but it's what the C standard seems to refer to when talking about things you get with fegetexceptflag. feraieexcept refers to "floating-point exceptions". Unfortunately the -fno-trapping-math documentation is similarly confusing (and maybe even wrong, I read it to conform to 'non-stop' IEEE arithmetic). I'd maybe give an example of a FP operation that's _not_ affected by the flag (copysign?). Otherwise it looks OK to me. Thanks, Richard.