From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 42FC83858D37 for ; Wed, 5 Oct 2022 13:47:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 42FC83858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664977669; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=YKCEBgP6t2JIDHiHcuVyAFkgQBQ3AGmnE33GHAUjnzo=; b=NAV7YcFD822fzR3ZXNO4T70T57HbJnU4He50qudVhcc0xUn7/dXVVrunfsiK7BMct0F+BO GifYaR29EfwXOceIZ2BW+l/dhTVATD47YrLg9NwyLzRyQgDCFoB0eT3R7yvLctZ60myKY3 vlFBSbYGhiQYYLWOtBVozrNoHkqWHKc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-620-NX7ffr5qPXKzh55QpTaE9A-1; Wed, 05 Oct 2022 09:47:47 -0400 X-MC-Unique: NX7ffr5qPXKzh55QpTaE9A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 83F2F185A78B; Wed, 5 Oct 2022 13:47:46 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.194]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2D35F2027061; Wed, 5 Oct 2022 13:47:46 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 295DlhCd4088398 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 5 Oct 2022 15:47:43 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 295Dlf2C4088397; Wed, 5 Oct 2022 15:47:41 +0200 Date: Wed, 5 Oct 2022 15:47:36 +0200 From: Jakub Jelinek To: Jason Merrill Cc: "Joseph S. Myers" , Richard Biener , Jeff Law , Uros Bizjak , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support Message-ID: Reply-To: Jakub Jelinek References: <37522634-319a-b471-aa35-87e711b0479e@redhat.com> <55062a15-79a1-f8cf-ed20-25ca8ff42abe@redhat.com> MIME-Version: 1.0 In-Reply-To: <55062a15-79a1-f8cf-ed20-25ca8ff42abe@redhat.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 04, 2022 at 05:50:50PM -0400, Jason Merrill wrote: > > Another question is the suffixes of the builtins. For now I have added > > bf16 suffix and enabled the builtins with !both_p, so one always needs to > > use __builtin_* form for them. None of the GCC builtins end with b, > > so this isn't ambiguous with __builtin_*f16, but some libm functions do end > > with b, in particular ilogb, logb and f{??,??x}sub. ilogb and the subs > > always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed > > log? Shall the builtins use f16b suffixes instead like the mangling does? > > Do we want bf16 builtins at all? The impression I've gotten is that users > want computation to happen in SFmode and only later truncate back to BFmode. As I wrote earlier, I think we need at least one, __builtin_nans variant which would be used in libstdc++ std::numeric_limits::signaling_NaN() implementation. I think std::numeric_limits::infinity() can be implemented as return (__bf16) __builtin_huge_valf (); and similarly std::numeric_limits::quiet_NaN() as return (__bf16) __builtin_nanf (""); but return (__bf16) __builtin_nansf (""); would loose the signaling NaN on the conversion and raise exception, and as the method is constexpr, union { unsigned short a; __bf16 b; } u = { 0x7f81 }; return u.b; wouldn't work. I can certainly restrict the builtins to the single one, but wonder whether the suffix for that builtin shouldn't be chosen such that eventually we could add more builtins if we need to and don't run into the log with bf16 suffix vs. logb with f16 suffix ambiguity. As you said, most of the libstdc++ overloads for std::bfloat16_t then can use float builtins or library calls under the hood, but std::nextafter is another case where I think we'll need to have something bfloat16_t specific, because float ulp isn't bfloat16_t ulp, the latter is much larger. Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too in the next iteration (always with pedwarn in that case). > > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_ > > { > > machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode; > > icode = optab_handler (cstore_optab, optab_mode); > > - if (icode != CODE_FOR_nothing) > > + if (icode != CODE_FOR_nothing > > + /* Don't consider [BH]Fmode as usable wider mode, as neither is > > + a subset or superset of the other. */ > > + && (compare_mode == mode > > + || !SCALAR_FLOAT_MODE_P (compare_mode) > > + || maybe_ne (GET_MODE_PRECISION (compare_mode), > > + GET_MODE_PRECISION (mode)))) > > Why do you need to do this here (and in prepare_cmp_insn, and similarly in > can_compare_p)? Shouldn't get_wider skip over modes that are not actually > wider? I'm afraid too many places rely on all modes of a certain class to be visible when walking from "narrowest" to "widest" mode, say FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode && GET_MODE_WIDER_MODE (HFmode) == SFmode. Note, besides this GET_MODE_PRECISION (HFmode) == GET_MODE_PRECISION (BFmode) case, another set of modes which have the same size are powerpc* TFmode/IFmode/KFmode, but in that case it makes ugly hacks where it artificially lowers the precision of 2 of them: rs6000-modes.h:#define FLOAT_PRECISION_IFmode 128 rs6000-modes.h:#define FLOAT_PRECISION_TFmode 127 rs6000-modes.h:#define FLOAT_PRECISION_KFmode 126 (and the middle-end then has to work around that mess). Doing something similar wouldn't help the BFmode vs. HFmode case though, one of them would have wider precision and so e.g. C FE would then prefer it, but more importantly, as they are unordered modes where most of the optabs aren't implemented it is bad to pick optabs for the "wider" mode to handle the "narrower" one. I think powerpc works because they define optabs for all the 3 modes when those modes are usable. Jakub