From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id 52D9F385840C for ; Fri, 21 Apr 2023 07:28:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 52D9F385840C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x429.google.com with SMTP id ffacd0b85a97d-2f7c281a015so846224f8f.1 for ; Fri, 21 Apr 2023 00:28:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1682062088; x=1684654088; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=3JcreIMHGTIvBDkY4Fc5IeGL1T3C5Zm+4jAwZ8DhA30=; b=sRLJD7a9ftJvcHSM7rJ0yuiwsESCzHH6FOAsO3BuUkPRKEF7MbpJw1kPNfGRcNyJde L1Vj7RX9pukv9P979JK4tJYR0kF7CtN/6hKjWs86beeFJocjOQzjJ6Baz8lfiXSzDx0X amO1DKS1QH/0/3txOSebiH6L024AnLQBeFQvUXmAOZc2QfbRAFHARLAo1OgKbb4WDDil 0O7s7xj4Vj4CVWf7phedRTNfv7Xcs+FE4uQnR1Y6sgoWrjCeqHLQRKjLmwFMUGo6I1lw I+jqF9Xbnrry9uALD0k6zU3pgkLYANbfjBUPV8rPVlRIFgDRdBvptwk0eX5aSFIUST7B KZ3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682062088; x=1684654088; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3JcreIMHGTIvBDkY4Fc5IeGL1T3C5Zm+4jAwZ8DhA30=; b=Ai2ZgCAYhLVaJQiJTDBXy1Spp7uDufyjMxxl7iGGTYOpTXWMVjBE3ASz62nd2IXnjW wRcklTc38hah/fBIqrhWrqO7mZYXRMnEPZ0+F2CGlkCZIHIJGfYq+vn7DBT4k13IVwoR K0VgwAXGkHb9kSVW5Tsvwprw8HpBD976sEKg4sQ0ANTXNnqyJeE/fJJWk56SbTpqWp8h b/+UrIa2DkdfDXENwUlmqpQjAEojPsV77yKeSbtIEN8ix7BFCjYrHfZuw81Dwv3pqV2O j2KY6gSNhLVopdxNdZhAITr9soSGBJ3xgDFiBdOqN2fbeF8WrYe3xOX3jYguxZA6AeDK b6fg== X-Gm-Message-State: AAQBX9dZfinmP4/MWFUAn9yF7X2FZhfN8EsTidvTxff6KG4n5Qc06u50 3xoCMdFpQbj35cOefoQXkVWwoqEPUBoc10zJnnjosQ== X-Google-Smtp-Source: AKy350ba+F3Av+HJV43l2gli1j3Beei28ZVIyjJzKoKOub3FuoUz8FTPWF6TVMQOxvkcGCAWfw2ra18MgH/otk3qOuI= X-Received: by 2002:a5d:4525:0:b0:304:4460:11ed with SMTP id j5-20020a5d4525000000b00304446011edmr269524wra.11.1682062087901; Fri, 21 Apr 2023 00:28:07 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Fri, 21 Apr 2023 12:57:32 +0530 Message-ID: Subject: Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector To: Prathamesh Kulkarni , Richard Biener , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="0000000000003d80fd05f9d399a8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000003d80fd05f9d399a8 Content-Type: text/plain; charset="UTF-8" On Wed, 12 Apr 2023 at 14:29, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Thu, 6 Apr 2023 at 16:05, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > On Tue, 4 Apr 2023 at 23:35, Richard Sandiford > >> > wrote: > >> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > >> >> > index cd9cace3c9b..3de79060619 100644 > >> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > >> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > >> >> > @@ -817,6 +817,62 @@ public: > >> >> > > >> >> > class svdupq_impl : public quiet > >> >> > { > >> >> > +private: > >> >> > + gimple * > >> >> > + fold_nonconst_dupq (gimple_folder &f, unsigned factor) const > >> >> > + { > >> >> > + /* Lower lhs = svdupq (arg0, arg1, ..., argN} into: > >> >> > + tmp = {arg0, arg1, ..., arg} > >> >> > + lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...}) */ > >> >> > + > >> >> > + /* TODO: Revisit to handle factor by padding zeros. */ > >> >> > + if (factor > 1) > >> >> > + return NULL; > >> >> > >> >> Isn't the key thing here predicate vs. vector rather than factor == 1 vs. > >> >> factor != 1? Do we generate good code for b8, where factor should be 1? > >> > Hi, > >> > It generates the following code for svdup_n_b8: > >> > https://pastebin.com/ypYt590c > >> > >> Hmm, yeah, not pretty :-) But it's not pretty without either. > >> > >> > I suppose lowering to ctor+vec_perm_expr is not really useful > >> > for this case because it won't simplify ctor, unlike the above case of > >> > svdupq_s32 (x[0], x[1], x[2], x[3]); > >> > However I wonder if it's still a good idea to lower svdupq for predicates, for > >> > representing svdupq (or other intrinsics) using GIMPLE constructs as > >> > far as possible ? > >> > >> It's possible, but I think we'd need an example in which its a clear > >> benefit. > > Sorry I posted for wrong test case above. > > For the following test: > > svbool_t f(uint8x16_t x) > > { > > return svdupq_n_b8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], > > x[8], x[9], x[10], x[11], x[12], > > x[13], x[14], x[15]); > > } > > > > Code-gen: > > https://pastebin.com/maexgeJn > > > > I suppose it's equivalent to following ? > > > > svbool_t f2(uint8x16_t x) > > { > > svuint8_t tmp = svdupq_n_u8 ((bool) x[0], (bool) x[1], (bool) x[2], > > (bool) x[3], > > (bool) x[4], (bool) x[5], (bool) x[6], > > (bool) x[7], > > (bool) x[8], (bool) x[9], (bool) x[10], > > (bool) x[11], > > (bool) x[12], (bool) x[13], (bool) > > x[14], (bool) x[15]); > > return svcmpne_n_u8 (svptrue_b8 (), tmp, 0); > > } > > Yeah, this is essentially the transformation that the svdupq rtl > expander uses. It would probably be a good idea to do that in > gimple too. Hi, I tested the interleave+zip1 for vector init patch and it segfaulted during bootstrap while trying to build libgfortran/generated/matmul_i2.c. Rebuilding with --enable-checking=rtl showed out of bounds access in aarch64_unzip_vector_init in following hunk: + rtvec vec = rtvec_alloc (n / 2); + for (int i = 0; i < n; i++) + RTVEC_ELT (vec, i) = (even_p) ? XVECEXP (vals, 0, 2 * i) + : XVECEXP (vals, 0, 2 * i + 1); which is incorrect since it allocates n/2 but iterates and stores upto n. The attached patch fixes the issue, which passed bootstrap, however resulted in following fallout during testsuite run: 1] sve/acle/general/dupq_[1-4].c tests fail. For the following test: int32x4_t f(int32_t x) { return (int32x4_t) { x, 1, 2, 3 }; } Code-gen without patch: f: adrp x1, .LC0 ldr q0, [x1, #:lo12:.LC0] ins v0.s[0], w0 ret Code-gen with patch: f: movi v0.2s, 0x2 adrp x1, .LC0 ldr d1, [x1, #:lo12:.LC0] ins v0.s[0], w0 zip1 v0.4s, v0.4s, v1.4s ret It shows, fallback_seq_cost = 20, seq_total_cost = 16 where seq_total_cost determines the cost for interleave+zip1 sequence and fallback_seq_cost is the cost for fallback sequence. Altho it shows lesser cost, I am not sure if the interleave+zip1 sequence is better in this case ? 2] sve/acle/general/dupq_[5-6].c tests fail: int32x4_t f(int32_t x0, int32_t x1, int32_t x2, int32_t x3) { return (int32x4_t) { x0, x1, x2, x3 }; } code-gen without patch: f: fmov s0, w0 ins v0.s[1], w1 ins v0.s[2], w2 ins v0.s[3], w3 ret code-gen with patch: f: fmov s0, w0 fmov s1, w1 ins v0.s[1], w2 ins v1.s[1], w3 zip1 v0.4s, v0.4s, v1.4s ret It shows fallback_seq_cost = 28, seq_total_cost = 16 3] aarch64/ldp_stp_16.c's cons2_8_float test fails. Test case: void cons2_8_float(float *x, float val0, float val1) { #pragma GCC unroll(8) for (int i = 0; i < 8 * 2; i += 2) { x[i + 0] = val0; x[i + 1] = val1; } } which is lowered to: void cons2_8_float (float * x, float val0, float val1) { vector(4) float _86; [local count: 119292720]: _86 = {val0_11(D), val1_13(D), val0_11(D), val1_13(D)}; MEM [(float *)x_10(D)] = _86; MEM [(float *)x_10(D) + 16B] = _86; MEM [(float *)x_10(D) + 32B] = _86; MEM [(float *)x_10(D) + 48B] = _86; return; } code-gen without patch: cons2_8_float: dup v0.4s, v0.s[0] ins v0.s[1], v1.s[0] ins v0.s[3], v1.s[0] stp q0, q0, [x0] stp q0, q0, [x0, 32] ret code-gen with patch: cons2_8_float: dup v1.2s, v1.s[0] dup v0.2s, v0.s[0] zip1 v0.4s, v0.4s, v1.4s stp q0, q0, [x0] stp q0, q0, [x0, 32] ret It shows fallback_seq_cost = 28, seq_total_cost = 16 I think the test fails because it doesn't match: ** dup v([0-9]+)\.4s, .* Shall it be OK to amend the test assuming code-gen with patch is better ? 4] aarch64/pr109072_1.c s32x4_3 test fails: For the following test: int32x4_t s32x4_3 (int32_t x, int32_t y) { int32_t arr[] = { x, y, y, y }; return vld1q_s32 (arr); } code-gen without patch: s32x4_3: dup v0.4s, w1 ins v0.s[0], w0 ret code-gen with patch: s32x4_3: fmov s1, w1 fmov s0, w0 ins v0.s[1], v1.s[0] dup v1.2s, v1.s[0] zip1 v0.4s, v0.4s, v1.4s ret It shows fallback_seq_cost = 20, seq_total_cost = 16 I am not sure how interleave+zip1 cost is lesser than fallback seq cost for this case. I assume that the fallback sequence is better here ? PS: The patch for folding svdupq to ctor+vec_perm_expr passes bootstrap+test without any issues. Thanks, Prathamesh > > Thanks, > Richard > > > > > which generates: > > f2: > > .LFB3901: > > .cfi_startproc > > movi v1.16b, 0x1 > > ptrue p0.b, all > > cmeq v0.16b, v0.16b, #0 > > bic v0.16b, v1.16b, v0.16b > > dup z0.q, z0.q[0] > > cmpne p0.b, p0/z, z0.b, #0 > > ret > > > > Thanks, > > Prathamesh --0000000000003d80fd05f9d399a8 Content-Type: application/octet-stream; name="gnu-821-6.diff" Content-Disposition: attachment; filename="gnu-821-6.diff" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lgq77bx20 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNjIGIvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQuY2MKaW5kZXggNDI2MTdjZWQ3M2EuLmM2Yjg4OTQzODZiIDEwMDY0NAot LS0gYS9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYworKysgYi9nY2MvY29uZmlnL2FhcmNo NjQvYWFyY2g2NC5jYwpAQCAtMjIwNDUsMTEgKzIyMDQ1LDEyIEBAIGFhcmNoNjRfc2ltZF9tYWtl X2NvbnN0YW50IChydHggdmFscykKICAgICByZXR1cm4gTlVMTF9SVFg7CiB9CiAKLS8qIEV4cGFu ZCBhIHZlY3RvciBpbml0aWFsaXNhdGlvbiBzZXF1ZW5jZSwgc3VjaCB0aGF0IFRBUkdFVCBpcwot ICAgaW5pdGlhbGlzZWQgdG8gY29udGFpbiBWQUxTLiAgKi8KKy8qIEEgc3Vicm91dGluZSBvZiBh YXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdCwgd2l0aCB0aGUgc2FtZSBpbnRlcmZhY2UuCisgICBU aGUgY2FsbGVyIGhhcyBhbHJlYWR5IHRyaWVkIGEgZGl2aWRlLWFuZC1jb25xdWVyIGFwcHJvYWNo LCBzbyBkbworICAgbm90IGNvbnNpZGVyIHRoYXQgY2FzZSBoZXJlLiAgKi8KIAogdm9pZAotYWFy Y2g2NF9leHBhbmRfdmVjdG9yX2luaXQgKHJ0eCB0YXJnZXQsIHJ0eCB2YWxzKQorYWFyY2g2NF9l eHBhbmRfdmVjdG9yX2luaXRfZmFsbGJhY2sgKHJ0eCB0YXJnZXQsIHJ0eCB2YWxzKQogewogICBt YWNoaW5lX21vZGUgbW9kZSA9IEdFVF9NT0RFICh0YXJnZXQpOwogICBzY2FsYXJfbW9kZSBpbm5l cl9tb2RlID0gR0VUX01PREVfSU5ORVIgKG1vZGUpOwpAQCAtMjIxMDksMzggKzIyMTEwLDYgQEAg YWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQgKHJ0eCB0YXJnZXQsIHJ0eCB2YWxzKQogICAgICAg cmV0dXJuOwogICAgIH0KIAotICAvKiBDaGVjayBmb3IgaW50ZXJsZWF2aW5nIGNhc2UuCi0gICAg IEZvciBlZyBpZiBpbml0aWFsaXplciBpcyAoaW50MTZ4OF90KSB7eCwgeSwgeCwgeSwgeCwgeSwg eCwgeX0uCi0gICAgIEdlbmVyYXRlIGZvbGxvd2luZyBjb2RlOgotICAgICBkdXAgdjAuaCwgeAot ICAgICBkdXAgdjEuaCwgeQotICAgICB6aXAxIHYwLmgsIHYwLmgsIHYxLmgKLSAgICAgZm9yICJs YXJnZSBlbm91Z2giIGluaXRpYWxpemVyLiAgKi8KLQotICBpZiAobl9lbHRzID49IDgpCi0gICAg ewotICAgICAgaW50IGk7Ci0gICAgICBmb3IgKGkgPSAyOyBpIDwgbl9lbHRzOyBpKyspCi0JaWYg KCFydHhfZXF1YWxfcCAoWFZFQ0VYUCAodmFscywgMCwgaSksIFhWRUNFWFAgKHZhbHMsIDAsIGkg JSAyKSkpCi0JICBicmVhazsKLQotICAgICAgaWYgKGkgPT0gbl9lbHRzKQotCXsKLQkgIG1hY2hp bmVfbW9kZSBtb2RlID0gR0VUX01PREUgKHRhcmdldCk7Ci0JICBydHggZGVzdFsyXTsKLQotCSAg Zm9yIChpbnQgaSA9IDA7IGkgPCAyOyBpKyspCi0JICAgIHsKLQkgICAgICBydHggeCA9IGV4cGFu ZF92ZWN0b3JfYnJvYWRjYXN0IChtb2RlLCBYVkVDRVhQICh2YWxzLCAwLCBpKSk7Ci0JICAgICAg ZGVzdFtpXSA9IGZvcmNlX3JlZyAobW9kZSwgeCk7Ci0JICAgIH0KLQotCSAgcnR2ZWMgdiA9IGdl bl9ydHZlYyAoMiwgZGVzdFswXSwgZGVzdFsxXSk7Ci0JICBlbWl0X3NldF9pbnNuICh0YXJnZXQs IGdlbl9ydHhfVU5TUEVDIChtb2RlLCB2LCBVTlNQRUNfWklQMSkpOwotCSAgcmV0dXJuOwotCX0K LSAgICB9Ci0KICAgZW51bSBpbnNuX2NvZGUgaWNvZGUgPSBvcHRhYl9oYW5kbGVyICh2ZWNfc2V0 X29wdGFiLCBtb2RlKTsKICAgZ2NjX2Fzc2VydCAoaWNvZGUgIT0gQ09ERV9GT1Jfbm90aGluZyk7 CiAKQEAgLTIyMjYyLDcgKzIyMjMxLDcgQEAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQgKHJ0 eCB0YXJnZXQsIHJ0eCB2YWxzKQogCSAgICB9CiAJICBYVkVDRVhQIChjb3B5LCAwLCBpKSA9IHN1 YnN0OwogCX0KLSAgICAgIGFhcmNoNjRfZXhwYW5kX3ZlY3Rvcl9pbml0ICh0YXJnZXQsIGNvcHkp OworICAgICAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXRfZmFsbGJhY2sgKHRhcmdldCwgY29w eSk7CiAgICAgfQogCiAgIC8qIEluc2VydCB0aGUgdmFyaWFibGUgbGFuZXMgZGlyZWN0bHkuICAq LwpAQCAtMjIyNzYsNiArMjIyNDUsODEgQEAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQgKHJ0 eCB0YXJnZXQsIHJ0eCB2YWxzKQogICAgIH0KIH0KIAorLyogUmV0dXJuIGV2ZW4gb3Igb2RkIGhh bGYgb2YgVkFMUyBkZXBlbmRpbmcgb24gRVZFTl9QLiAgKi8KKworc3RhdGljIHJ0eAorYWFyY2g2 NF91bnppcF92ZWN0b3JfaW5pdCAobWFjaGluZV9tb2RlIG1vZGUsIHJ0eCB2YWxzLCBib29sIGV2 ZW5fcCkKK3sKKyAgaW50IG4gPSBYVkVDTEVOICh2YWxzLCAwKTsKKyAgbWFjaGluZV9tb2RlIG5l d19tb2RlCisgICAgPSBhYXJjaDY0X3NpbWRfY29udGFpbmVyX21vZGUgKEdFVF9NT0RFX0lOTkVS IChtb2RlKSwKKwkJCQkgICBHRVRfTU9ERV9CSVRTSVpFIChtb2RlKS50b19jb25zdGFudCAoKSAv IDIpOworICBydHZlYyB2ZWMgPSBydHZlY19hbGxvYyAobiAvIDIpOworICBmb3IgKGludCBpID0g MDsgaSA8IG4vMjsgaSsrKQorICAgIFJUVkVDX0VMVCAodmVjLCBpKSA9IChldmVuX3ApID8gWFZF Q0VYUCAodmFscywgMCwgMiAqIGkpCisJCQkJICA6IFhWRUNFWFAgKHZhbHMsIDAsIDIgKiBpICsg MSk7CisgIHJldHVybiBnZW5fcnR4X1BBUkFMTEVMIChuZXdfbW9kZSwgdmVjKTsKK30KKworLyog RXhwYW5kIGEgdmVjdG9yIGluaXRpYWxpc2F0aW9uIHNlcXVlbmNlLCBzdWNoIHRoYXQgVEFSR0VU IGlzCisgICBpbml0aWFsaXplZCB0byBjb250YWluIFZBTFMuICAqLworCit2b2lkCithYXJjaDY0 X2V4cGFuZF92ZWN0b3JfaW5pdCAocnR4IHRhcmdldCwgcnR4IHZhbHMpCit7CisgIC8qIFRyeSBk ZWNvbXBvc2luZyB0aGUgaW5pdGlhbGl6ZXIgaW50byBldmVuIGFuZCBvZGQgaGFsdmVzIGFuZAor ICAgICB0aGVuIFpJUCB0aGVtIHRvZ2V0aGVyLiAgVXNlIHRoZSByZXN1bHRpbmcgc2VxdWVuY2Ug aWYgaXQgaXMKKyAgICAgc3RyaWN0bHkgY2hlYXBlciB0aGFuIGxvYWRpbmcgVkFMUyBkaXJlY3Rs eS4KKworICAgICBQcmVmZXIgdGhlIGZhbGxiYWNrIHNlcXVlbmNlIGluIHRoZSBldmVudCBvZiBh IHRpZSwgc2luY2UgaXQKKyAgICAgd2lsbCB0ZW5kIHRvIHVzZSBmZXdlciByZWdpc3RlcnMuICAq LworCisgIG1hY2hpbmVfbW9kZSBtb2RlID0gR0VUX01PREUgKHRhcmdldCk7CisgIGludCBuX2Vs dHMgPSBYVkVDTEVOICh2YWxzLCAwKTsKKworICBpZiAobl9lbHRzIDwgNAorICAgICAgfHwgbWF5 YmVfbmUgKEdFVF9NT0RFX0JJVFNJWkUgKG1vZGUpLCAxMjgpKQorICAgIHsKKyAgICAgIGFhcmNo NjRfZXhwYW5kX3ZlY3Rvcl9pbml0X2ZhbGxiYWNrICh0YXJnZXQsIHZhbHMpOworICAgICAgcmV0 dXJuOworICAgIH0KKworICBzdGFydF9zZXF1ZW5jZSAoKTsKKyAgcnR4IGhhbHZlc1syXTsKKyAg dW5zaWduZWQgY29zdHNbMl07CisgIGZvciAoaW50IGkgPSAwOyBpIDwgMjsgaSsrKQorICAgIHsK KyAgICAgIHN0YXJ0X3NlcXVlbmNlICgpOworICAgICAgcnR4IG5ld192YWxzCisJPSBhYXJjaDY0 X3VuemlwX3ZlY3Rvcl9pbml0IChtb2RlLCB2YWxzLCAoaSAlIDIpID09IDApOworICAgICAgcnR4 IHRtcF9yZWcgPSBnZW5fcmVnX3J0eCAoR0VUX01PREUgKG5ld192YWxzKSk7CisgICAgICBhYXJj aDY0X2V4cGFuZF92ZWN0b3JfaW5pdCAodG1wX3JlZywgbmV3X3ZhbHMpOworICAgICAgaGFsdmVz W2ldID0gZ2VuX3J0eF9TVUJSRUcgKG1vZGUsIHRtcF9yZWcsIDApOworICAgICAgcnR4X2luc24g KnJlY19zZXEgPSBnZXRfaW5zbnMgKCk7CisgICAgICBlbmRfc2VxdWVuY2UgKCk7CisgICAgICBj b3N0c1tpXSA9IHNlcV9jb3N0IChyZWNfc2VxLCAhb3B0aW1pemVfc2l6ZSk7CisgICAgICBlbWl0 X2luc24gKHJlY19zZXEpOworICAgIH0KKworICBydHZlYyB2ID0gZ2VuX3J0dmVjICgyLCBoYWx2 ZXNbMF0sIGhhbHZlc1sxXSk7CisgIHJ0eF9pbnNuICp6aXAxX2luc24KKyAgICA9IGVtaXRfc2V0 X2luc24gKHRhcmdldCwgZ2VuX3J0eF9VTlNQRUMgKG1vZGUsIHYsIFVOU1BFQ19aSVAxKSk7Cisg IHVuc2lnbmVkIHNlcV90b3RhbF9jb3N0CisgICAgPSAoIW9wdGltaXplX3NpemUpID8gc3RkOjpt YXggKGNvc3RzWzBdLCBjb3N0c1sxXSkgOiBjb3N0c1swXSArIGNvc3RzWzFdOworICBzZXFfdG90 YWxfY29zdCArPSBpbnNuX2Nvc3QgKHppcDFfaW5zbiwgIW9wdGltaXplX3NpemUpOworCisgIHJ0 eF9pbnNuICpzZXEgPSBnZXRfaW5zbnMgKCk7CisgIGVuZF9zZXF1ZW5jZSAoKTsKKworICBzdGFy dF9zZXF1ZW5jZSAoKTsKKyAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXRfZmFsbGJhY2sgKHRh cmdldCwgdmFscyk7CisgIHJ0eF9pbnNuICpmYWxsYmFja19zZXEgPSBnZXRfaW5zbnMgKCk7Cisg IHVuc2lnbmVkIGZhbGxiYWNrX3NlcV9jb3N0ID0gc2VxX2Nvc3QgKGZhbGxiYWNrX3NlcSwgIW9w dGltaXplX3NpemUpOworICBlbmRfc2VxdWVuY2UgKCk7CisKKyAgZW1pdF9pbnNuIChzZXFfdG90 YWxfY29zdCA8IGZhbGxiYWNrX3NlcV9jb3N0ID8gc2VxIDogZmFsbGJhY2tfc2VxKTsKK30KKwog LyogRW1pdCBSVEwgY29ycmVzcG9uZGluZyB0bzoKICAgIGluc3IgVEFSR0VULCBFTEVNLiAgKi8K IApkaWZmIC0tZ2l0IGEvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvaW50ZXJsZWF2 ZS1pbml0LTEuYyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTE4 LmMKc2ltaWxhcml0eSBpbmRleCA4MiUKcmVuYW1lIGZyb20gZ2NjL3Rlc3RzdWl0ZS9nY2MudGFy Z2V0L2FhcmNoNjQvaW50ZXJsZWF2ZS1pbml0LTEuYwpyZW5hbWUgdG8gZ2NjL3Rlc3RzdWl0ZS9n Y2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMTguYwppbmRleCBlZTc3NTA0ODU4OS4uZTgxMmQz OTQ2ZGUgMTAwNjQ0Ci0tLSBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L2ludGVy bGVhdmUtaW5pdC0xLmMKKysrIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVj LWluaXQtMTguYwpAQCAtNyw4ICs3LDggQEAKIC8qCiAqKiBmb286CiAqKgkuLi4KLSoqCWR1cAl2 WzAtOV0rXC44aCwgd1swLTldKwotKioJZHVwCXZbMC05XStcLjhoLCB3WzAtOV0rCisqKglkdXAJ dlswLTldK1wuNGgsIHdbMC05XSsKKyoqCWR1cAl2WzAtOV0rXC40aCwgd1swLTldKwogKioJemlw MQl2WzAtOV0rXC44aCwgdlswLTldK1wuOGgsIHZbMC05XStcLjhoCiAqKgkuLi4KICoqCXJldApA QCAtMjMsOCArMjMsOCBAQCBpbnQxNng4X3QgZm9vKGludDE2X3QgeCwgaW50IHkpCiAvKgogKiog Zm9vMjoKICoqCS4uLgotKioJZHVwCXZbMC05XStcLjhoLCB3WzAtOV0rCi0qKgltb3ZpCXZbMC05 XStcLjhoLCAweDEKKyoqCWR1cAl2WzAtOV0rXC40aCwgd1swLTldKworKioJbW92aQl2WzAtOV0r XC40aCwgMHgxCiAqKgl6aXAxCXZbMC05XStcLjhoLCB2WzAtOV0rXC44aCwgdlswLTldK1wuOGgK ICoqCS4uLgogKioJcmV0CmRpZmYgLS1naXQgYS9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFy Y2g2NC92ZWMtaW5pdC0xOS5jIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVj LWluaXQtMTkuYwpuZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAwMDAwMC4uZTI4ZmRj ZGEyOWQKLS0tIC9kZXYvbnVsbAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2 NC92ZWMtaW5pdC0xOS5jCkBAIC0wLDAgKzEsMjEgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9ICov CisvKiB7IGRnLW9wdGlvbnMgIi1PMyIgfSAqLworLyogeyBkZy1maW5hbCB7IGNoZWNrLWZ1bmN0 aW9uLWJvZGllcyAiKioiICIiICIiIH0gfSAqLworCisjaW5jbHVkZSA8YXJtX25lb24uaD4KKwor LyoKKyoqIGZfczg6CisqKgkuLi4KKyoqCWR1cAl2WzAtOV0rXC44Yiwgd1swLTldKworKioJYWRy cAl4WzAtOV0rLCBcLkxDWzAtOV0rCisqKglsZHIJZFswLTldKywgXFt4WzAtOV0rLCAjOmxvMTI6 LkxDWzAtOV0rXF0KKyoqCXppcDEJdlswLTldK1wuMTZiLCB2WzAtOV0rXC4xNmIsIHZbMC05XStc LjE2YgorKioJcmV0CisqLworCitpbnQ4eDE2X3QgZl9zOChpbnQ4X3QgeCkKK3sKKyAgcmV0dXJu IChpbnQ4eDE2X3QpIHsgeCwgMSwgeCwgMiwgeCwgMywgeCwgNCwKKyAgICAgICAgICAgICAgICAg ICAgICAgeCwgNSwgeCwgNiwgeCwgNywgeCwgOCB9OworfQpkaWZmIC0tZ2l0IGEvZ2NjL3Rlc3Rz dWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjAuYyBiL2djYy90ZXN0c3VpdGUvZ2Nj LnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTIwLmMKbmV3IGZpbGUgbW9kZSAxMDA2NDQKaW5kZXgg MDAwMDAwMDAwMDAuLjkzNjZjYTM0OWI2Ci0tLSAvZGV2L251bGwKKysrIGIvZ2NjL3Rlc3RzdWl0 ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjAuYwpAQCAtMCwwICsxLDIyIEBACisvKiB7 IGRnLWRvIGNvbXBpbGUgfSAqLworLyogeyBkZy1vcHRpb25zICItTzMiIH0gKi8KKy8qIHsgZGct ZmluYWwgeyBjaGVjay1mdW5jdGlvbi1ib2RpZXMgIioqIiAiIiAiIiB9IH0gKi8KKworI2luY2x1 ZGUgPGFybV9uZW9uLmg+CisKKy8qCisqKiBmX3M4OgorKioJLi4uCisqKglhZHJwCXhbMC05XSss IFwuTENbMC05XSsKKyoqCWR1cAl2WzAtOV0rXC44Yiwgd1swLTldKworKioJbGRyCWRbMC05XSss IFxbeFswLTldKywgIzpsbzEyOlwuTENbMC05XStcXQorKioJaW5zCXYwXC5iXFswXF0sIHcwCisq Kgl6aXAxCXZbMC05XStcLjE2YiwgdlswLTldK1wuMTZiLCB2WzAtOV0rXC4xNmIKKyoqCXJldAor Ki8KKworaW50OHgxNl90IGZfczgoaW50OF90IHgsIGludDhfdCB5KQoreworICByZXR1cm4gKGlu dDh4MTZfdCkgeyB4LCB5LCAxLCB5LCAyLCB5LCAzLCB5LAorICAgICAgICAgICAgICAgICAgICAg ICA0LCB5LCA1LCB5LCA2LCB5LCA3LCB5IH07Cit9CmRpZmYgLS1naXQgYS9nY2MvdGVzdHN1aXRl L2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC0yMS5jIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFy Z2V0L2FhcmNoNjQvdmVjLWluaXQtMjEuYwpuZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAw MDAwMDAwMC4uZTE2NDU5NDg2ZDcKLS0tIC9kZXYvbnVsbAorKysgYi9nY2MvdGVzdHN1aXRlL2dj Yy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC0yMS5jCkBAIC0wLDAgKzEsMjIgQEAKKy8qIHsgZGct ZG8gY29tcGlsZSB9ICovCisvKiB7IGRnLW9wdGlvbnMgIi1PMyIgfSAqLworLyogeyBkZy1maW5h bCB7IGNoZWNrLWZ1bmN0aW9uLWJvZGllcyAiKioiICIiICIiIH0gfSAqLworCisjaW5jbHVkZSA8 YXJtX25lb24uaD4KKworLyoKKyoqIGZfczg6CisqKgkuLi4KKyoqCWFkcnAJeFswLTldKywgXC5M Q1swLTldKworKioJbGRyCXFbMC05XSssIFxbeFswLTldKywgIzpsbzEyOlwuTENbMC05XStcXQor KioJaW5zCXYwXC5iXFswXF0sIHcwCisqKglpbnMJdjBcLmJcWzFcXSwgdzEKKyoqCS4uLgorKioJ cmV0CisqLworCitpbnQ4eDE2X3QgZl9zOChpbnQ4X3QgeCwgaW50OF90IHkpCit7CisgIHJldHVy biAoaW50OHgxNl90KSB7IHgsIHksIDEsIDIsIDMsIDQsIDUsIDYsCisgICAgICAgICAgICAgICAg ICAgICAgIDcsIDgsIDksIDEwLCAxMSwgMTIsIDEzLCAxNCB9OworfQpkaWZmIC0tZ2l0IGEvZ2Nj L3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjItc2l6ZS5jIGIvZ2NjL3Rl c3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjItc2l6ZS5jCm5ldyBmaWxlIG1v ZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi44ZjM1ODU0YzAwOAotLS0gL2Rldi9udWxsCisr KyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTIyLXNpemUuYwpA QCAtMCwwICsxLDI0IEBACisvKiB7IGRnLWRvIGNvbXBpbGUgfSAqLworLyogeyBkZy1vcHRpb25z ICItT3MiIH0gKi8KKy8qIHsgZGctZmluYWwgeyBjaGVjay1mdW5jdGlvbi1ib2RpZXMgIioqIiAi IiAiIiB9IH0gKi8KKworLyogVmVyaWZ5IHRoYXQgZmFsbGJhY2sgY29kZS1zZXF1ZW5jZSBpcyBj aG9zZW4gb3ZlcgorICAgcmVjdXJzaXZlbHkgZ2VuZXJhdGVkIGNvZGUtc2VxdWVuY2UgbWVyZ2Vk IHdpdGggemlwMS4gICovCisKKy8qCisqKiBmX3MxNjoKKyoqCS4uLgorKioJc3h0aAl3MCwgdzAK KyoqCWZtb3YJczAsIHcwCisqKglpbnMJdjBcLmhcWzFcXSwgdzEKKyoqCWlucwl2MFwuaFxbMlxd LCB3MgorKioJaW5zCXYwXC5oXFszXF0sIHczCisqKglpbnMJdjBcLmhcWzRcXSwgdzQKKyoqCWlu cwl2MFwuaFxbNVxdLCB3NQorKioJaW5zCXYwXC5oXFs2XF0sIHc2CisqKglpbnMJdjBcLmhcWzdc XSwgdzcKKyoqCS4uLgorKioJcmV0CisqLworCisjaW5jbHVkZSAidmVjLWluaXQtMjIuaCIKZGlm ZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTIyLXNw ZWVkLmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC0yMi1zcGVl ZC5jCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi4xNzJkNTZmZmRmMQot LS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1p bml0LTIyLXNwZWVkLmMKQEAgLTAsMCArMSwyNyBAQAorLyogeyBkZy1kbyBjb21waWxlIH0gKi8K Ky8qIHsgZGctb3B0aW9ucyAiLU8zIiB9ICovCisvKiB7IGRnLWZpbmFsIHsgY2hlY2stZnVuY3Rp b24tYm9kaWVzICIqKiIgIiIgIiIgfSB9ICovCisKKy8qIFZlcmlmeSB0aGF0IHdlIHJlY3Vyc2l2 ZWx5IGdlbmVyYXRlIGNvZGUgZm9yIGV2ZW4gYW5kIG9kZCBoYWx2ZXMKKyAgIGluc3RlYWQgb2Yg ZmFsbGJhY2sgY29kZS4gVGhpcyBpcyBzbyBkZXNwaXRlIHRoZSBsb25nZXIgY29kZS1nZW4KKyAg IGJlY2F1c2UgaXQgaGFzIGZld2VyIGRlcGVuZGVuY2llcyBhbmQgdGh1cyBoYXMgbGVzc2VyIGNv c3QuICAqLworCisvKgorKiogZl9zMTY6CisqKgkuLi4KKyoqCXN4dGgJdzAsIHcwCisqKglzeHRo CXcxLCB3MQorKioJZm1vdglkMCwgeDAKKyoqCWZtb3YJZDEsIHgxCisqKglpbnMJdlswLTldK1wu aFxbMVxdLCB3MgorKioJaW5zCXZbMC05XStcLmhcWzFcXSwgdzMKKyoqCWlucwl2WzAtOV0rXC5o XFsyXF0sIHc0CisqKglpbnMJdlswLTldK1wuaFxbMlxdLCB3NQorKioJaW5zCXZbMC05XStcLmhc WzNcXSwgdzYKKyoqCWlucwl2WzAtOV0rXC5oXFszXF0sIHc3CisqKgl6aXAxCXZbMC05XStcLjho LCB2WzAtOV0rXC44aCwgdlswLTldK1wuOGgKKyoqCS4uLgorKioJcmV0CisqLworCisjaW5jbHVk ZSAidmVjLWluaXQtMjIuaCIKZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9h YXJjaDY0L3ZlYy1pbml0LTIyLmggYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC92 ZWMtaW5pdC0yMi5oCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi4xNWI4 ODlkNDA5NwotLS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJj aDY0L3ZlYy1pbml0LTIyLmgKQEAgLTAsMCArMSw3IEBACisjaW5jbHVkZSA8YXJtX25lb24uaD4K KworaW50MTZ4OF90IGZfczE2IChpbnQxNl90IHgwLCBpbnQxNl90IHgxLCBpbnQxNl90IHgyLCBp bnQxNl90IHgzLAorICAgICAgICAgICAgICAgICBpbnQxNl90IHg0LCBpbnQxNl90IHg1LCBpbnQx Nl90IHg2LCBpbnQxNl90IHg3KQoreworICByZXR1cm4gKGludDE2eDhfdCkgeyB4MCwgeDEsIHgy LCB4MywgeDQsIHg1LCB4NiwgeDcgfTsKK30K --0000000000003d80fd05f9d399a8--