From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id DD9113858D28; Fri, 10 Feb 2023 13:38:13 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD9113858D28
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1676036293;
	bh=AMQQuqTwGFElQPbfB1AKWa/Q30q6owMI5siSAqODT4g=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=teFGAu+mrCy6LPMRVdfs5v135MSH5MHKUO1wdGAk+C1h820bFVKeNoI/ZdV9e5aQJ
	 UKGJz6DPE3ovQMOWEe/Ez65Gb6XZqREFjCFtlF4BAktKqC7Q/lAVGYzUReT13/hBK4
	 dcWbG2F41q5NznszKqyhq4yxh1K5qm5srbWKtad0=
From: "already5chosen at yahoo dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libgcc/108279] Improved speed for float128 routines
Date: Fri, 10 Feb 2023 13:38:09 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libgcc
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: already5chosen at yahoo dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-108279-4-AVAwTrwATB@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-108279-4@http.gcc.gnu.org/bugzilla/>
References: <bug-108279-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108279
--- Comment #24 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Michael_S from comment #22)
> (In reply to Michael_S from comment #8)
> > (In reply to Thomas Koenig from comment #6)
> > > And there will have to be a decision about 32-bit targets.
> > >
> >=20
> > IMHO, 32-bit targets should be left in their current state.
> > People that use them probably do not care deeply about performance.
> > Technically, I can implement 32-bit targets in the same sources, by mea=
ns of
> > few ifdefs and macros, but resulting source code will look much uglier =
than
> > how it looks today. Still, not to the same level of horror that you hav=
e in
> > matmul_r16.c, but certainly uglier than how I like it to look.
> > And I am not sure at all that my implementation of 32-bit targets would=
 be
> > significantly faster than current soft float.
>=20
> I explored this path (implementing 32-bit and 64-bit targets from the same
> source with few ifdefs) a little more:
> Now I am even more sure that it is not a way to go. gcc compiler does not
> generate good 32-bit code for this style of sources. This especially appl=
ies
> to i386, other supported 32-bit targets (RV32, SPARC32) are affected less.
>=20

I can't explain to myself why I am doing it, but I did continue exploration=
 of
32-bit targets. Well, not quite "targets", I don't have SPARC32 or RV32 to
play. So, I did continue exploration of i386.
As said above, using the same code for 32-bit and 64-bit does not produce
acceptable results. But pure 32-bit source did better than what I expected.
So when 2023-01-13 I wrote "And I am not sure at all that my implementation=
 of
32-bit targets would be significantly faster than current soft float" I was
wrong. My implementation of 32-bit targets (i.e. i386) is significantly fas=
ter
than current soft float. Up to 3 times faster on Zen3, approximately 2 times
faster on various oldish Intel CPUs.
Today I put 32-bit sources into my github repository.

I am still convinced that improving performance of IEEE binary128 on 32-bit
targets is wastage of time, but since the time is already wasted may be res=
ults
can be used.

And may be, it can be used to bring IEEE binary128 to the Arm Cortex-M, whe=
re
it can be moderately useful in some situations.=