From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <npiggin@gmail.com>
Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com
 [IPv6:2607:f8b0:4864:20::1042])
 by sourceware.org (Postfix) with ESMTPS id D7B703858D34
 for <libc-alpha@sourceware.org>; Mon, 20 Apr 2020 04:33:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D7B703858D34
Received: by mail-pj1-x1042.google.com with SMTP id t40so3890252pjb.3
 for <libc-alpha@sourceware.org>; Sun, 19 Apr 2020 21:33:23 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to
 :mime-version:message-id:content-transfer-encoding;
 bh=/SU6oEkGmAQk0odZCe5iksWkWXpw1rmuYB3XbYdsaiI=;
 b=tyEXJ6U3yvFi6JAtFJOv5qJqyx9shP9gc0nTZ3QrqCnwaoCfg/6yWmoTKL6mXo2vU1
 24y//YfBlkx8WkGWbIeEpiSSN9MSNvSB77FSF813PTUzfmGAuiX9isdhurjJkGh2EupY
 TKuTpOfExS6t2AH72wZs5VXpgpHXm5HLjra52dNFPIAcTseJaF8o2OZ/Z0cEeJOq+5GB
 l3PcOpW0gXq9MCZPDWYJjciaWnaugQq0VAsJpVAHjYgMZrOZ+XRrIO9c2OR0TqAqfwyq
 jROG9Y5LR/spNwOzq81tuLQ07PXiS+gldTu6tCozJf5h4GBi+fMAJj2HYcPuPIys2GRK
 MATA==
X-Gm-Message-State: AGi0PuY32tv6cAosyYhhTMsAxmyDyDstR8Gh/tI78FoEq+sI0o5m21H8
 ph67I6y69Ea13JLLFbIS34c=
X-Google-Smtp-Source: APiQypIXHLzTzY/Qh3MIsu+pH5hF6id8O/cnX7Wfi4Xm/YIeT8paufHl8HNKcBlATNKaxDVJqO65uQ==
X-Received: by 2002:a17:90a:a591:: with SMTP id
 b17mr19624411pjq.90.1587357202797; 
 Sun, 19 Apr 2020 21:33:22 -0700 (PDT)
Received: from localhost ([203.185.249.170])
 by smtp.gmail.com with ESMTPSA id s10sm12459863pjp.13.2020.04.19.21.33.21
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sun, 19 Apr 2020 21:33:22 -0700 (PDT)
Date: Mon, 20 Apr 2020 14:31:58 +1000
From: Nicholas Piggin <npiggin@gmail.com>
Subject: Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
To: Rich Felker <dalias@libc.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
 libc-alpha@sourceware.org, libc-dev@lists.llvm.org,
 linuxppc-dev@lists.ozlabs.org, musl@lists.openwall.com
References: <20200415225539.GL11469@brightrain.aerifal.cx>
 <c2612908-67f7-cceb-d121-700dea096016@linaro.org>
 <20200416153756.GU11469@brightrain.aerifal.cx>
 <4b2a7a56-dd2b-1863-50e5-2f4cdbeef47c@linaro.org>
 <20200416175932.GZ11469@brightrain.aerifal.cx>
 <4f824a37-e660-8912-25aa-fde88d4b79f3@linaro.org>
 <20200416183151.GA11469@brightrain.aerifal.cx>
 <1587344003.daumxvs1kh.astroid@bobo.none>
 <20200420013412.GZ11469@brightrain.aerifal.cx>
 <1587348538.l1ioqml73m.astroid@bobo.none>
 <20200420040926.GA11469@brightrain.aerifal.cx>
In-Reply-To: <20200420040926.GA11469@brightrain.aerifal.cx>
MIME-Version: 1.0
Message-Id: <1587356128.aslvdnmtbw.astroid@bobo.none>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=1.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_NUMSUBJECT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, SUSPICIOUS_RECIPS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Apr 2020 04:33:25 -0000

Excerpts from Rich Felker's message of April 20, 2020 2:09 pm:
> On Mon, Apr 20, 2020 at 12:32:21PM +1000, Nicholas Piggin wrote:
>> Excerpts from Rich Felker's message of April 20, 2020 11:34 am:
>> > On Mon, Apr 20, 2020 at 11:10:25AM +1000, Nicholas Piggin wrote:
>> >> Excerpts from Rich Felker's message of April 17, 2020 4:31 am:
>> >> > Note that because lr is clobbered we need at least once normally
>> >> > call-clobbered register that's not syscall clobbered to save lr in.
>> >> > Otherwise stack frame setup is required to spill it.
>> >>=20
>> >> The kernel would like to use r9-r12 for itself. We could do with fewe=
r=20
>> >> registers, but we have some delay establishing the stack (depends on =
a
>> >> load which depends on a mfspr), and entry code tends to be quite stor=
e
>> >> heavy whereas on the caller side you have r1 set up (modulo stack=20
>> >> updates), and the system call is a long delay during which time the=20
>> >> store queue has significant time to drain.
>> >>=20
>> >> My feeling is it would be better for kernel to have these scratch=20
>> >> registers.
>> >=20
>> > If your new kernel syscall mechanism requires the caller to make a
>> > whole stack frame it otherwise doesn't need and spill registers to it,
>> > it becomes a lot less attractive. Some of those 90 cycles saved are
>> > immediately lost on the userspace side, plus you either waste icache
>> > at the call point or require the syscall to go through a
>> > userspace-side helper function that performs the spill and restore.
>>=20
>> You would be surprised how few cycles that takes on a high end CPU. Some=
=20
>> might be a couple of %. I am one for counting cycles mind you, I'm not=20
>> being flippant about it. If we can come up with something faster I'd be=20
>> up for it.
>=20
> If the cycle count is trivial then just do it on the kernel side.

The cycle count for user is, because you have r1 ready. Kernel does not=20
have its stack ready, it has to mfspr rX ; ld rY,N(rX); to get stack to=20
save into.

Which is also wasted work for a userspace.

Now that I think about it, no stack frame is even required! lr is saved=20
into the caller's stack when its clobbered with an asm, just as when=20
it's used for a function call.

>> > The right way to do this is to have the kernel preserve enough
>> > registers that userspace can avoid having any spills. It doesn't have
>> > to preserve everything, probably just enough to save lr. (BTW are
>>=20
>> Again, the problem is the kernel doesn't have its dependencies=20
>> immediately ready to spill, and spilling (may be) more costly=20
>> immediately after the call because we're doing a lot of stores.
>>=20
>> I could try measure this. Unfortunately our pipeline simulator tool=20
>> doesn't model system calls properly so it's hard to see what's happening=
=20
>> across the user/kernel horizon, I might check if that can be improved
>> or I can hack it by putting some isync in there or something.
>=20
> I think it's unlikely to make any real difference to the total number
> of cycles spent which side it happens on, but putting it on the kernel
> side makes it easier to avoid wasting size/icache at each syscall
> site.
>=20
>> > syscall arg registers still preserved? If not, this is a major cost on
>> > the userspace side, since any call point that has to loop-and-retry
>> > (e.g. futex) now needs to make its own place to store the original
>> > values.)
>>=20
>> Powerpc system calls never did. We could have scv preserve them, but=20
>> you'd still need to restore r3. We could make an ABI which does not
>> clobber r3 but puts the return value in r9, say. I'd like to see what
>> the user side code looks like to take advantage of such a thing though.
>=20
> Oh wow, I hadn't realized that, but indeed the code we have now is
> allowing for the kernel to clobber them all. So at least this isn't
> getting any worse I guess. I think it was a very poor choice of
> behavior though and a disadvantage vs what other archs do (some of
> them preserve all registers; others preserve only normally call-saved
> ones plus the syscall arg ones and possibly a few other specials).

Well, we could change it. Does the generated code improve significantly
we take those clobbers away?

Thanks,
Nick