From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id C63F03858C5F for ; Fri, 3 Feb 2023 03:03:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C63F03858C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wm1-x336.google.com with SMTP id bg13-20020a05600c3c8d00b003d9712b29d2so5120260wmb.2 for ; Thu, 02 Feb 2023 19:03:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=saJcP6O0NyyYZFRaGdoO4EtSv6futNuCjv92uwOnp2w=; b=EP3F6NLa5emPmzqRf5kG+J7Qkfae409PZEgAsym/IVP/SbcSOs1/HVQpZdDRdaD6jf vGp+wjqTF/L4dCj93SAdiGI6OW6xzkIyb4HYKpDg2Y02jhSG+dj9J/wvcY4KEBvHccg7 J70VucQwLPkw6HvwM7BaY8KB84RZyVZZNPOM1zSaS7Q6f/sIe98jc+vWza+WZPtON8jv r399e5WpdDHSxvvaQHWb/O+5Vlvb7YRriZ9DnIVpQJH0QzbCMAUnqiZeQ5KVKzqeU7y2 4Zae19KMoE7LhLr5vqNQ78yah4HoUUwU9Wv8HtEBEM7VZrS5Crf/RVJEw2TBcqwTt/Ho UJCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=saJcP6O0NyyYZFRaGdoO4EtSv6futNuCjv92uwOnp2w=; b=iK7V3H/ht9/qdAgF+rM1KfC05YX3umCahJVcdBZgRWPVSedSHaEVxSyv8JLmHegacS e4iHhO5LqDLEUmb2zzDB+tiAWvQNH0OdoKNjWNgwCOSVKu1w3UC+nVeb9IxJuaiGUqYK hQIRgC+gX/OMpnqR3MtY39o5yyafBlFhrBLIDqqExbNadmwxduttJ2iStnxYJy5zQdBL Lyi0MjPyKh57bK8bMVCDH2Rlz9hzs3AraFZCeBRuXOJLoiyFNEgK5Dgk2KJI8TI4roen gBaTrG7Bc+YCIK6Haf9xKqSIP5X/zQflrx9057+kLHRaPXUyKtHmpYgh66S0EzZ4HlkR Dv0A== X-Gm-Message-State: AO0yUKVRJCfGhZ15uY+ksu/oHR6Nj5sFNL8a+E799DuirpZmA2H1Zc75 zpmtCU+7eIphPhG255rydSV/aWXeb7WeMnxKeQ1sNQ== X-Google-Smtp-Source: AK7set8C7EPeq2fVSQm9M9mkdVt7MW7pnO/6r3YtXbJ/PB32XV+DzQKin9LBAkaoh0LhlrbQglyyW0g4dvJcPJa8yp8= X-Received: by 2002:a05:600c:4408:b0:3df:e57d:f4c9 with SMTP id u8-20020a05600c440800b003dfe57df4c9mr87335wmn.36.1675393403350; Thu, 02 Feb 2023 19:03:23 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Fri, 3 Feb 2023 08:32:46 +0530 Message-ID: Subject: Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="000000000000aac18505f3c2ec07" X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000aac18505f3c2ec07 Content-Type: text/plain; charset="UTF-8" On Fri, 3 Feb 2023 at 07:10, Prathamesh Kulkarni wrote: > > On Thu, 2 Feb 2023 at 20:50, Richard Sandiford > wrote: > > > > Prathamesh Kulkarni writes: > > >> >> > I have attached a patch that extends the transform if one half is dup > > >> >> > and other is set of constants. > > >> >> > For eg: > > >> >> > int8x16_t f(int8_t x) > > >> >> > { > > >> >> > return (int8x16_t) { x, 1, x, 2, x, 3, x, 4, x, 5, x, 6, x, 7, x, 8 }; > > >> >> > } > > >> >> > > > >> >> > code-gen trunk: > > >> >> > f: > > >> >> > adrp x1, .LC0 > > >> >> > ldr q0, [x1, #:lo12:.LC0] > > >> >> > ins v0.b[0], w0 > > >> >> > ins v0.b[2], w0 > > >> >> > ins v0.b[4], w0 > > >> >> > ins v0.b[6], w0 > > >> >> > ins v0.b[8], w0 > > >> >> > ins v0.b[10], w0 > > >> >> > ins v0.b[12], w0 > > >> >> > ins v0.b[14], w0 > > >> >> > ret > > >> >> > > > >> >> > code-gen with patch: > > >> >> > f: > > >> >> > dup v0.16b, w0 > > >> >> > adrp x0, .LC0 > > >> >> > ldr q1, [x0, #:lo12:.LC0] > > >> >> > zip1 v0.16b, v0.16b, v1.16b > > >> >> > ret > > >> >> > > > >> >> > Bootstrapped+tested on aarch64-linux-gnu. > > >> >> > Does it look OK ? > > >> >> > > >> >> Looks like a nice improvement. It'll need to wait for GCC 14 now though. > > >> >> > > >> >> However, rather than handle this case specially, I think we should instead > > >> >> take a divide-and-conquer approach: split the initialiser into even and > > >> >> odd elements, find the best way of loading each part, then compare the > > >> >> cost of these sequences + ZIP with the cost of the fallback code (the code > > >> >> later in aarch64_expand_vector_init). > > >> >> > > >> >> For example, doing that would allow: > > >> >> > > >> >> { x, y, 0, y, 0, y, 0, y, 0, y } > > >> >> > > >> >> to be loaded more easily, even though the even elements aren't wholly > > >> >> constant. > > >> > Hi Richard, > > >> > I have attached a prototype patch based on the above approach. > > >> > It subsumes specializing for above {x, y, x, y, x, y, x, y} case by generating > > >> > same sequence, thus I removed that hunk, and improves the following cases: > > >> > > > >> > (a) > > >> > int8x16_t f_s16(int8_t x) > > >> > { > > >> > return (int8x16_t) { x, 1, x, 2, x, 3, x, 4, > > >> > x, 5, x, 6, x, 7, x, 8 }; > > >> > } > > >> > > > >> > code-gen trunk: > > >> > f_s16: > > >> > adrp x1, .LC0 > > >> > ldr q0, [x1, #:lo12:.LC0] > > >> > ins v0.b[0], w0 > > >> > ins v0.b[2], w0 > > >> > ins v0.b[4], w0 > > >> > ins v0.b[6], w0 > > >> > ins v0.b[8], w0 > > >> > ins v0.b[10], w0 > > >> > ins v0.b[12], w0 > > >> > ins v0.b[14], w0 > > >> > ret > > >> > > > >> > code-gen with patch: > > >> > f_s16: > > >> > dup v0.16b, w0 > > >> > adrp x0, .LC0 > > >> > ldr q1, [x0, #:lo12:.LC0] > > >> > zip1 v0.16b, v0.16b, v1.16b > > >> > ret > > >> > > > >> > (b) > > >> > int8x16_t f_s16(int8_t x, int8_t y) > > >> > { > > >> > return (int8x16_t) { x, y, 1, y, 2, y, 3, y, > > >> > 4, y, 5, y, 6, y, 7, y }; > > >> > } > > >> > > > >> > code-gen trunk: > > >> > f_s16: > > >> > adrp x2, .LC0 > > >> > ldr q0, [x2, #:lo12:.LC0] > > >> > ins v0.b[0], w0 > > >> > ins v0.b[1], w1 > > >> > ins v0.b[3], w1 > > >> > ins v0.b[5], w1 > > >> > ins v0.b[7], w1 > > >> > ins v0.b[9], w1 > > >> > ins v0.b[11], w1 > > >> > ins v0.b[13], w1 > > >> > ins v0.b[15], w1 > > >> > ret > > >> > > > >> > code-gen patch: > > >> > f_s16: > > >> > adrp x2, .LC0 > > >> > dup v1.16b, w1 > > >> > ldr q0, [x2, #:lo12:.LC0] > > >> > ins v0.b[0], w0 > > >> > zip1 v0.16b, v0.16b, v1.16b > > >> > ret > > >> > > >> Nice. > > >> > > >> > There are a couple of issues I have come across: > > >> > (1) Choosing element to pad vector. > > >> > For eg, if we are initiailizing a vector say { x, y, 0, y, 1, y, 2, y } > > >> > with mode V8HI. > > >> > We split it into { x, 0, 1, 2 } and { y, y, y, y} > > >> > However since the mode is V8HI, we would need to pad the above split vectors > > >> > with 4 more elements to match up to vector length. > > >> > For {x, 0, 1, 2} using any constant is the obvious choice while for {y, y, y, y} > > >> > using 'y' is the obvious choice thus making them: > > >> > {x, 0, 1, 2, 0, 0, 0, 0} and {y, y, y, y, y, y, y, y} > > >> > These would be then merged using zip1 which would discard the lower half > > >> > of both vectors. > > >> > Currently I encoded the above two heuristics in > > >> > aarch64_expand_vector_init_get_padded_elem: > > >> > (a) If split portion contains a constant, use the constant to pad the vector. > > >> > (b) If split portion only contains variables, then use the most > > >> > frequently repeating variable > > >> > to pad the vector. > > >> > I suppose tho this could be improved ? > > >> > > >> I think we should just build two 64-bit vectors (V4HIs) and use a subreg > > >> to fill the upper elements with undefined values. > > >> > > >> I suppose in principle we would have the same problem when splitting > > >> a 64-bit vector into 2 32-bit vectors, but it's probably better to punt > > >> on that for now. Eventually it would be worth adding full support for > > >> 32-bit Advanced SIMD modes (with necessary restrictions for FP exceptions) > > >> but it's quite a big task. The 128-bit to 64-bit split is the one that > > >> matters most. > > >> > > >> > (2) Setting cost for zip1: > > >> > Currently it returns 4 as cost for following zip1 insn: > > >> > (set (reg:V8HI 102) > > >> > (unspec:V8HI [ > > >> > (reg:V8HI 103) > > >> > (reg:V8HI 108) > > >> > ] UNSPEC_ZIP1)) > > >> > I am not sure if that's correct, or if not, what cost to use in this case > > >> > for zip1 ? > > >> > > >> TBH 4 seems a bit optimistic. It's COSTS_N_INSNS (1), whereas the > > >> generic advsimd_vec_cost::permute_cost is 2 insns. But the costs of > > >> inserts are probably underestimated to the same extent, so hopefully > > >> things work out. > > >> > > >> So it's probably best to accept the costs as they're currently given. > > >> Changing them would need extensive testing. > > >> > > >> However, one of the advantages of the split is that it allows the > > >> subvectors to be built in parallel. When optimising for speed, > > >> it might make sense to take the maximum of the subsequence costs > > >> and add the cost of the zip to that. > > > Hi Richard, > > > Thanks for the suggestions. > > > In the attached patch, it recurses only if nelts == 16 to punt for 64 > > > -> 32 bit split, > > > > It should be based on the size rather than the number of elements. > > The example we talked about above involved building V8HIs from two > > V4HIs, which is also valid. > Right, sorry got mixed up. The attached patch punts if vector_size == 64 by > resorting to fallback, which handles V8HI cases. > For eg: > int16x8_t f(int16_t x) > { > return (int16x8_t) { x, 1, x, 2, x, 3, x, 4 }; > } > > code-gen with patch: > f: > dup v0.4h, w0 > adrp x0, .LC0 > ldr d1, [x0, #:lo12:.LC0] > zip1 v0.8h, v0.8h, v1.8h > ret > > Just to clarify, we punt on 64 bit vector size, because there is no > 32-bit vector available, > to build 2 32-bit vectors for even and odd halves, and then "extend" > them with subreg ? > > It also punts if n_elts < 8, because I am not sure > if it's profitable to do recursion+merging for 4 or lesser elements. > Does it look OK ? > > > > > and uses std::max(even_init, odd_init) + insn_cost (zip1_insn) for > > > computing total cost of the sequence. > > > > > > So, for following case: > > > int8x16_t f_s8(int8_t x) > > > { > > > return (int8x16_t) { x, 1, x, 2, x, 3, x, 4, > > > x, 5, x, 6, x, 7, x, 8 }; > > > } > > > > > > it now generates: > > > f_s16: > > > dup v0.8b, w0 > > > adrp x0, .LC0 > > > ldr d1, [x0, #:lo12:.LC0] > > > zip1 v0.16b, v0.16b, v1.16b > > > ret > > > > > > Which I assume is correct, since zip1 will merge the lower halves of > > > two vectors while leaving the upper halves undefined ? > > > > Yeah, it looks valid, but I would say that zip1 ignores the upper halves > > (rather than leaving them undefined). > Yes, sorry for mis-phrasing. > > For the following test: > int16x8_t f_s16 (int16_t x0, int16_t x1, int16_t x2, int16_t x3, > int16_t x4, int16_t x5, int16_t x6, int16_t x7) > { > return (int16x8_t) { x0, x1, x2, x3, x4, x5, x6, x7 }; > } > > it chose to go recursive+zip1 route since we take max (cost > (odd_init), cost (even_init)) and add > cost of zip1 insn which turns out to be lesser than cost of fallback: > > f_s16: > sxth w0, w0 > sxth w1, w1 > fmov d0, x0 > fmov d1, x1 > ins v0.h[1], w2 > ins v1.h[1], w3 > ins v0.h[2], w4 > ins v1.h[2], w5 > ins v0.h[3], w6 > ins v1.h[3], w7 > zip1 v0.8h, v0.8h, v1.8h > ret > > I assume that's OK since it has fewer dependencies compared to > fallback code-gen even if it's longer ? > With -Os the cost for sequence is taken as cost(odd_init) + > cost(even_init) + cost(zip1_insn) > which turns out to be same as cost for fallback sequence and it > generates the fallback code-sequence: > > f_s16: > sxth w0, w0 > fmov s0, w0 > ins v0.h[1], w1 > ins v0.h[2], w2 > ins v0.h[3], w3 > ins v0.h[4], w4 > ins v0.h[5], w5 > ins v0.h[6], w6 > ins v0.h[7], w7 > ret > Forgot to remove the hunk handling interleaving case, done in the attached patch. Thanks, Prathamesh > Thanks, > Prathamesh > > > > Thanks, > > Richard --000000000000aac18505f3c2ec07 Content-Type: text/plain; charset="US-ASCII"; name="gnu-821-4.txt" Content-Disposition: attachment; filename="gnu-821-4.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_ldnxxppo0 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LmNjIGIvZ2NjL2NvbmZpZy9h YXJjaDY0L2FhcmNoNjQuY2MKaW5kZXggYWNjMGNmZTVmOTQuLmRkMmE2NGQyZTRlIDEwMDY0NAot LS0gYS9nY2MvY29uZmlnL2FhcmNoNjQvYWFyY2g2NC5jYworKysgYi9nY2MvY29uZmlnL2FhcmNo NjQvYWFyY2g2NC5jYwpAQCAtMjE5NzYsNyArMjE5NzYsNyBAQCBhYXJjaDY0X3NpbWRfbWFrZV9j b25zdGFudCAocnR4IHZhbHMpCiAgICBpbml0aWFsaXNlZCB0byBjb250YWluIFZBTFMuICAqLwog CiB2b2lkCi1hYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdCAocnR4IHRhcmdldCwgcnR4IHZhbHMp CithYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdF9mYWxsYmFjayAocnR4IHRhcmdldCwgcnR4IHZh bHMpCiB7CiAgIG1hY2hpbmVfbW9kZSBtb2RlID0gR0VUX01PREUgKHRhcmdldCk7CiAgIHNjYWxh cl9tb2RlIGlubmVyX21vZGUgPSBHRVRfTU9ERV9JTk5FUiAobW9kZSk7CkBAIC0yMjAzNiwzOCAr MjIwMzYsNiBAQCBhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdCAocnR4IHRhcmdldCwgcnR4IHZh bHMpCiAgICAgICByZXR1cm47CiAgICAgfQogCi0gIC8qIENoZWNrIGZvciBpbnRlcmxlYXZpbmcg Y2FzZS4KLSAgICAgRm9yIGVnIGlmIGluaXRpYWxpemVyIGlzIChpbnQxNng4X3QpIHt4LCB5LCB4 LCB5LCB4LCB5LCB4LCB5fS4KLSAgICAgR2VuZXJhdGUgZm9sbG93aW5nIGNvZGU6Ci0gICAgIGR1 cCB2MC5oLCB4Ci0gICAgIGR1cCB2MS5oLCB5Ci0gICAgIHppcDEgdjAuaCwgdjAuaCwgdjEuaAot ICAgICBmb3IgImxhcmdlIGVub3VnaCIgaW5pdGlhbGl6ZXIuICAqLwotCi0gIGlmIChuX2VsdHMg Pj0gOCkKLSAgICB7Ci0gICAgICBpbnQgaTsKLSAgICAgIGZvciAoaSA9IDI7IGkgPCBuX2VsdHM7 IGkrKykKLQlpZiAoIXJ0eF9lcXVhbF9wIChYVkVDRVhQICh2YWxzLCAwLCBpKSwgWFZFQ0VYUCAo dmFscywgMCwgaSAlIDIpKSkKLQkgIGJyZWFrOwotCi0gICAgICBpZiAoaSA9PSBuX2VsdHMpCi0J ewotCSAgbWFjaGluZV9tb2RlIG1vZGUgPSBHRVRfTU9ERSAodGFyZ2V0KTsKLQkgIHJ0eCBkZXN0 WzJdOwotCi0JICBmb3IgKGludCBpID0gMDsgaSA8IDI7IGkrKykKLQkgICAgewotCSAgICAgIHJ0 eCB4ID0gZXhwYW5kX3ZlY3Rvcl9icm9hZGNhc3QgKG1vZGUsIFhWRUNFWFAgKHZhbHMsIDAsIGkp KTsKLQkgICAgICBkZXN0W2ldID0gZm9yY2VfcmVnIChtb2RlLCB4KTsKLQkgICAgfQotCi0JICBy dHZlYyB2ID0gZ2VuX3J0dmVjICgyLCBkZXN0WzBdLCBkZXN0WzFdKTsKLQkgIGVtaXRfc2V0X2lu c24gKHRhcmdldCwgZ2VuX3J0eF9VTlNQRUMgKG1vZGUsIHYsIFVOU1BFQ19aSVAxKSk7Ci0JICBy ZXR1cm47Ci0JfQotICAgIH0KLQogICBlbnVtIGluc25fY29kZSBpY29kZSA9IG9wdGFiX2hhbmRs ZXIgKHZlY19zZXRfb3B0YWIsIG1vZGUpOwogICBnY2NfYXNzZXJ0IChpY29kZSAhPSBDT0RFX0ZP Ul9ub3RoaW5nKTsKIApAQCAtMjIxODksNyArMjIxNTcsNyBAQCBhYXJjaDY0X2V4cGFuZF92ZWN0 b3JfaW5pdCAocnR4IHRhcmdldCwgcnR4IHZhbHMpCiAJICAgIH0KIAkgIFhWRUNFWFAgKGNvcHks IDAsIGkpID0gc3Vic3Q7CiAJfQotICAgICAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXQgKHRh cmdldCwgY29weSk7CisgICAgICBhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdF9mYWxsYmFjayAo dGFyZ2V0LCBjb3B5KTsKICAgICB9CiAKICAgLyogSW5zZXJ0IHRoZSB2YXJpYWJsZSBsYW5lcyBk aXJlY3RseS4gICovCkBAIC0yMjIwMyw2ICsyMjE3MSw5MSBAQCBhYXJjaDY0X2V4cGFuZF92ZWN0 b3JfaW5pdCAocnR4IHRhcmdldCwgcnR4IHZhbHMpCiAgICAgfQogfQogCitERUJVR19GVU5DVElP Tgorc3RhdGljIHZvaWQKK2FhcmNoNjRfZXhwYW5kX3ZlY3Rvcl9pbml0X2RlYnVnX3NlcSAocnR4 X2luc24gKnNlcSwgY29uc3QgY2hhciAqcykKK3sKKyAgZnByaW50ZiAoc3RkZXJyLCAiJXM6ICV1 XG4iLCBzLCBzZXFfY29zdCAoc2VxLCAhb3B0aW1pemVfc2l6ZSkpOworICBmb3IgKHJ0eF9pbnNu ICppID0gc2VxOyBpOyBpID0gTkVYVF9JTlNOIChpKSkKKyAgICB7CisgICAgICBkZWJ1Z19ydHgg KFBBVFRFUk4gKGkpKTsKKyAgICAgIGZwcmludGYgKHN0ZGVyciwgImNvc3Q6ICVkXG4iLCBwYXR0 ZXJuX2Nvc3QgKFBBVFRFUk4gKGkpLCAhb3B0aW1pemVfc2l6ZSkpOworICAgIH0KK30KKworc3Rh dGljIHJ0eAorYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXRfc3BsaXRfdmFscyAobWFjaGluZV9t b2RlIG1vZGUsIHJ0eCB2YWxzLCBib29sIGV2ZW5fcCkKK3sKKyAgaW50IG4gPSBYVkVDTEVOICh2 YWxzLCAwKTsKKyAgbWFjaGluZV9tb2RlIG5ld19tb2RlCisgICAgPSBhYXJjaDY0X3NpbWRfY29u dGFpbmVyX21vZGUgKEdFVF9NT0RFX0lOTkVSIChtb2RlKSwgNjQpOworICBydHZlYyB2ZWMgPSBy dHZlY19hbGxvYyAobiAvIDIpOworICBmb3IgKGludCBpID0gMDsgaSA8IG47IGkrKykKKyAgICBS VFZFQ19FTFQgKHZlYywgaSkgPSAoZXZlbl9wKSA/IFhWRUNFWFAgKHZhbHMsIDAsIDIgKiBpKQor CQkJCSAgOiBYVkVDRVhQICh2YWxzLCAwLCAyICogaSArIDEpOworICByZXR1cm4gZ2VuX3J0eF9Q QVJBTExFTCAobmV3X21vZGUsIHZlYyk7Cit9CisKKy8qCitUaGUgZnVuY3Rpb24gZG9lcyB0aGUg Zm9sbG93aW5nOgorKGEpIEdlbmVyYXRlcyBjb2RlIHNlcXVlbmNlIGJ5IHNwbGl0dGluZyBWQUxT IGludG8gZXZlbiBhbmQgb2RkIGhhbHZlcywKKyAgICBhbmQgcmVjdXJzaXZlbHkgY2FsbGluZyBp dHNlbGYgdG8gaW5pdGlhbGl6ZSB0aGVtIGFuZCB0aGVuIG1lcmdlIHVzaW5nCisgICAgemlwMS4K KyhiKSBHZW5lcmF0ZSBjb2RlIHNlcXVlbmNlIGRpcmVjdGx5IHVzaW5nIGFhcmNoNjRfZXhwYW5k X3ZlY3Rvcl9pbml0X2ZhbGxiYWNrLgorKGMpIENvbXBhcmUgdGhlIGNvc3Qgb2YgY29kZSBzZXF1 ZW5jZXMgZ2VuZXJhdGVkIGJ5IChhKSBhbmQgKGIpLCBhbmQgY2hvb3NlCisgICAgdGhlIG1vcmUg ZWZmaWNpZW50IG9uZS4KKyovCisKK3ZvaWQKK2FhcmNoNjRfZXhwYW5kX3ZlY3Rvcl9pbml0IChy dHggdGFyZ2V0LCBydHggdmFscykKK3sKKyAgbWFjaGluZV9tb2RlIG1vZGUgPSBHRVRfTU9ERSAo dGFyZ2V0KTsKKyAgaW50IG5fZWx0cyA9IFhWRUNMRU4gKHZhbHMsIDApOworCisgIGlmIChuX2Vs dHMgPCA4CisgICAgICB8fCBrbm93bl9lcSAoR0VUX01PREVfQklUU0laRSAobW9kZSksIDY0KSkK KyAgICB7CisgICAgICBhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdF9mYWxsYmFjayAodGFyZ2V0 LCB2YWxzKTsKKyAgICAgIHJldHVybjsKKyAgICB9CisKKyAgc3RhcnRfc2VxdWVuY2UgKCk7Cisg IHJ0eCBkZXN0WzJdOworICB1bnNpZ25lZCBjb3N0c1syXTsKKyAgZm9yIChpbnQgaSA9IDA7IGkg PCAyOyBpKyspCisgICAgeworICAgICAgc3RhcnRfc2VxdWVuY2UgKCk7CisgICAgICBkZXN0W2ld ID0gZ2VuX3JlZ19ydHggKG1vZGUpOworICAgICAgcnR4IG5ld192YWxzCisJPSBhYXJjaDY0X2V4 cGFuZF92ZWN0b3JfaW5pdF9zcGxpdF92YWxzIChtb2RlLCB2YWxzLCAoaSAlIDIpID09IDApOwor ICAgICAgcnR4IHRtcF9yZWcgPSBnZW5fcmVnX3J0eCAoR0VUX01PREUgKG5ld192YWxzKSk7Cisg ICAgICBhYXJjaDY0X2V4cGFuZF92ZWN0b3JfaW5pdCAodG1wX3JlZywgbmV3X3ZhbHMpOworICAg ICAgZGVzdFtpXSA9IGdlbl9ydHhfU1VCUkVHIChtb2RlLCB0bXBfcmVnLCAwKTsKKyAgICAgIHJ0 eF9pbnNuICpyZWNfc2VxID0gZ2V0X2luc25zICgpOworICAgICAgZW5kX3NlcXVlbmNlICgpOwor ICAgICAgY29zdHNbaV0gPSBzZXFfY29zdCAocmVjX3NlcSwgIW9wdGltaXplX3NpemUpOworICAg ICAgZW1pdF9pbnNuIChyZWNfc2VxKTsKKyAgICB9CisKKyAgcnR2ZWMgdiA9IGdlbl9ydHZlYyAo MiwgZGVzdFswXSwgZGVzdFsxXSk7CisgIHJ0eF9pbnNuICp6aXAxX2luc24KKyAgICA9IGVtaXRf c2V0X2luc24gKHRhcmdldCwgZ2VuX3J0eF9VTlNQRUMgKG1vZGUsIHYsIFVOU1BFQ19aSVAxKSk7 CisgIHVuc2lnbmVkIHNlcV90b3RhbF9jb3N0CisgICAgPSAoIW9wdGltaXplX3NpemUpID8gc3Rk OjptYXggKGNvc3RzWzBdLCBjb3N0c1sxXSkgOiBjb3N0c1swXSArIGNvc3RzWzFdOworICBzZXFf dG90YWxfY29zdCArPSBpbnNuX2Nvc3QgKHppcDFfaW5zbiwgIW9wdGltaXplX3NpemUpOworCisg IHJ0eF9pbnNuICpzZXEgPSBnZXRfaW5zbnMgKCk7CisgIGVuZF9zZXF1ZW5jZSAoKTsKKworICBz dGFydF9zZXF1ZW5jZSAoKTsKKyAgYWFyY2g2NF9leHBhbmRfdmVjdG9yX2luaXRfZmFsbGJhY2sg KHRhcmdldCwgdmFscyk7CisgIHJ0eF9pbnNuICpmYWxsYmFja19zZXEgPSBnZXRfaW5zbnMgKCk7 CisgIHVuc2lnbmVkIGZhbGxiYWNrX3NlcV9jb3N0ID0gc2VxX2Nvc3QgKGZhbGxiYWNrX3NlcSwg IW9wdGltaXplX3NpemUpOworICBlbmRfc2VxdWVuY2UgKCk7CisKKyAgZW1pdF9pbnNuIChzZXFf dG90YWxfY29zdCA8IGZhbGxiYWNrX3NlcV9jb3N0ID8gc2VxIDogZmFsbGJhY2tfc2VxKTsKK30K KwogLyogRW1pdCBSVEwgY29ycmVzcG9uZGluZyB0bzoKICAgIGluc3IgVEFSR0VULCBFTEVNLiAg Ki8KIApkaWZmIC0tZ2l0IGEvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvaW50ZXJs ZWF2ZS1pbml0LTEuYyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0 LTE4LmMKc2ltaWxhcml0eSBpbmRleCA4MiUKcmVuYW1lIGZyb20gZ2NjL3Rlc3RzdWl0ZS9nY2Mu dGFyZ2V0L2FhcmNoNjQvaW50ZXJsZWF2ZS1pbml0LTEuYwpyZW5hbWUgdG8gZ2NjL3Rlc3RzdWl0 ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMTguYwppbmRleCBlZTc3NTA0ODU4OS4uZTgx MmQzOTQ2ZGUgMTAwNjQ0Ci0tLSBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L2lu dGVybGVhdmUtaW5pdC0xLmMKKysrIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQv dmVjLWluaXQtMTguYwpAQCAtNyw4ICs3LDggQEAKIC8qCiAqKiBmb286CiAqKgkuLi4KLSoqCWR1 cAl2WzAtOV0rXC44aCwgd1swLTldKwotKioJZHVwCXZbMC05XStcLjhoLCB3WzAtOV0rCisqKglk dXAJdlswLTldK1wuNGgsIHdbMC05XSsKKyoqCWR1cAl2WzAtOV0rXC40aCwgd1swLTldKwogKioJ emlwMQl2WzAtOV0rXC44aCwgdlswLTldK1wuOGgsIHZbMC05XStcLjhoCiAqKgkuLi4KICoqCXJl dApAQCAtMjMsOCArMjMsOCBAQCBpbnQxNng4X3QgZm9vKGludDE2X3QgeCwgaW50IHkpCiAvKgog KiogZm9vMjoKICoqCS4uLgotKioJZHVwCXZbMC05XStcLjhoLCB3WzAtOV0rCi0qKgltb3ZpCXZb MC05XStcLjhoLCAweDEKKyoqCWR1cAl2WzAtOV0rXC40aCwgd1swLTldKworKioJbW92aQl2WzAt OV0rXC40aCwgMHgxCiAqKgl6aXAxCXZbMC05XStcLjhoLCB2WzAtOV0rXC44aCwgdlswLTldK1wu OGgKICoqCS4uLgogKioJcmV0CmRpZmYgLS1naXQgYS9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQv YWFyY2g2NC92ZWMtaW5pdC0xOS5jIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQv dmVjLWluaXQtMTkuYwpuZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAwMDAwMC4uZTI4 ZmRjZGEyOWQKLS0tIC9kZXYvbnVsbAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFy Y2g2NC92ZWMtaW5pdC0xOS5jCkBAIC0wLDAgKzEsMjEgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9 ICovCisvKiB7IGRnLW9wdGlvbnMgIi1PMyIgfSAqLworLyogeyBkZy1maW5hbCB7IGNoZWNrLWZ1 bmN0aW9uLWJvZGllcyAiKioiICIiICIiIH0gfSAqLworCisjaW5jbHVkZSA8YXJtX25lb24uaD4K KworLyoKKyoqIGZfczg6CisqKgkuLi4KKyoqCWR1cAl2WzAtOV0rXC44Yiwgd1swLTldKworKioJ YWRycAl4WzAtOV0rLCBcLkxDWzAtOV0rCisqKglsZHIJZFswLTldKywgXFt4WzAtOV0rLCAjOmxv MTI6LkxDWzAtOV0rXF0KKyoqCXppcDEJdlswLTldK1wuMTZiLCB2WzAtOV0rXC4xNmIsIHZbMC05 XStcLjE2YgorKioJcmV0CisqLworCitpbnQ4eDE2X3QgZl9zOChpbnQ4X3QgeCkKK3sKKyAgcmV0 dXJuIChpbnQ4eDE2X3QpIHsgeCwgMSwgeCwgMiwgeCwgMywgeCwgNCwKKyAgICAgICAgICAgICAg ICAgICAgICAgeCwgNSwgeCwgNiwgeCwgNywgeCwgOCB9OworfQpkaWZmIC0tZ2l0IGEvZ2NjL3Rl c3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjAuYyBiL2djYy90ZXN0c3VpdGUv Z2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTIwLmMKbmV3IGZpbGUgbW9kZSAxMDA2NDQKaW5k ZXggMDAwMDAwMDAwMDAuLjkzNjZjYTM0OWI2Ci0tLSAvZGV2L251bGwKKysrIGIvZ2NjL3Rlc3Rz dWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjAuYwpAQCAtMCwwICsxLDIyIEBACisv KiB7IGRnLWRvIGNvbXBpbGUgfSAqLworLyogeyBkZy1vcHRpb25zICItTzMiIH0gKi8KKy8qIHsg ZGctZmluYWwgeyBjaGVjay1mdW5jdGlvbi1ib2RpZXMgIioqIiAiIiAiIiB9IH0gKi8KKworI2lu Y2x1ZGUgPGFybV9uZW9uLmg+CisKKy8qCisqKiBmX3M4OgorKioJLi4uCisqKglhZHJwCXhbMC05 XSssIFwuTENbMC05XSsKKyoqCWR1cAl2WzAtOV0rXC44Yiwgd1swLTldKworKioJbGRyCWRbMC05 XSssIFxbeFswLTldKywgIzpsbzEyOlwuTENbMC05XStcXQorKioJaW5zCXYwXC5iXFswXF0sIHcw CisqKgl6aXAxCXZbMC05XStcLjE2YiwgdlswLTldK1wuMTZiLCB2WzAtOV0rXC4xNmIKKyoqCXJl dAorKi8KKworaW50OHgxNl90IGZfczgoaW50OF90IHgsIGludDhfdCB5KQoreworICByZXR1cm4g KGludDh4MTZfdCkgeyB4LCB5LCAxLCB5LCAyLCB5LCAzLCB5LAorICAgICAgICAgICAgICAgICAg ICAgICA0LCB5LCA1LCB5LCA2LCB5LCA3LCB5IH07Cit9CmRpZmYgLS1naXQgYS9nY2MvdGVzdHN1 aXRlL2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC0yMS5jIGIvZ2NjL3Rlc3RzdWl0ZS9nY2Mu dGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjEuYwpuZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAw MDAwMDAwMDAwMC4uZTE2NDU5NDg2ZDcKLS0tIC9kZXYvbnVsbAorKysgYi9nY2MvdGVzdHN1aXRl L2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC0yMS5jCkBAIC0wLDAgKzEsMjIgQEAKKy8qIHsg ZGctZG8gY29tcGlsZSB9ICovCisvKiB7IGRnLW9wdGlvbnMgIi1PMyIgfSAqLworLyogeyBkZy1m aW5hbCB7IGNoZWNrLWZ1bmN0aW9uLWJvZGllcyAiKioiICIiICIiIH0gfSAqLworCisjaW5jbHVk ZSA8YXJtX25lb24uaD4KKworLyoKKyoqIGZfczg6CisqKgkuLi4KKyoqCWFkcnAJeFswLTldKywg XC5MQ1swLTldKworKioJbGRyCXFbMC05XSssIFxbeFswLTldKywgIzpsbzEyOlwuTENbMC05XStc XQorKioJaW5zCXYwXC5iXFswXF0sIHcwCisqKglpbnMJdjBcLmJcWzFcXSwgdzEKKyoqCS4uLgor KioJcmV0CisqLworCitpbnQ4eDE2X3QgZl9zOChpbnQ4X3QgeCwgaW50OF90IHkpCit7CisgIHJl dHVybiAoaW50OHgxNl90KSB7IHgsIHksIDEsIDIsIDMsIDQsIDUsIDYsCisgICAgICAgICAgICAg ICAgICAgICAgIDcsIDgsIDksIDEwLCAxMSwgMTIsIDEzLCAxNCB9OworfQpkaWZmIC0tZ2l0IGEv Z2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjItc2l6ZS5jIGIvZ2Nj L3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjLWluaXQtMjItc2l6ZS5jCm5ldyBmaWxl IG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi44ZjM1ODU0YzAwOAotLS0gL2Rldi9udWxs CisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTIyLXNpemUu YwpAQCAtMCwwICsxLDI0IEBACisvKiB7IGRnLWRvIGNvbXBpbGUgfSAqLworLyogeyBkZy1vcHRp b25zICItT3MiIH0gKi8KKy8qIHsgZGctZmluYWwgeyBjaGVjay1mdW5jdGlvbi1ib2RpZXMgIioq IiAiIiAiIiB9IH0gKi8KKworLyogVmVyaWZ5IHRoYXQgZmFsbGJhY2sgY29kZS1zZXF1ZW5jZSBp cyBjaG9zZW4gb3ZlcgorICAgcmVjdXJzaXZlbHkgZ2VuZXJhdGVkIGNvZGUtc2VxdWVuY2UgbWVy Z2VkIHdpdGggemlwMS4gICovCisKKy8qCisqKiBmX3MxNjoKKyoqCS4uLgorKioJc3h0aAl3MCwg dzAKKyoqCWZtb3YJczAsIHcwCisqKglpbnMJdjBcLmhcWzFcXSwgdzEKKyoqCWlucwl2MFwuaFxb MlxdLCB3MgorKioJaW5zCXYwXC5oXFszXF0sIHczCisqKglpbnMJdjBcLmhcWzRcXSwgdzQKKyoq CWlucwl2MFwuaFxbNVxdLCB3NQorKioJaW5zCXYwXC5oXFs2XF0sIHc2CisqKglpbnMJdjBcLmhc WzdcXSwgdzcKKyoqCS4uLgorKioJcmV0CisqLworCisjaW5jbHVkZSAidmVjLWluaXQtMjIuaCIK ZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3ZlYy1pbml0LTIy LXNwZWVkLmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2NC92ZWMtaW5pdC0yMi1z cGVlZC5jCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi4xNzJkNTZmZmRm MQotLS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9hYXJjaDY0L3Zl Yy1pbml0LTIyLXNwZWVkLmMKQEAgLTAsMCArMSwyNyBAQAorLyogeyBkZy1kbyBjb21waWxlIH0g Ki8KKy8qIHsgZGctb3B0aW9ucyAiLU8zIiB9ICovCisvKiB7IGRnLWZpbmFsIHsgY2hlY2stZnVu Y3Rpb24tYm9kaWVzICIqKiIgIiIgIiIgfSB9ICovCisKKy8qIFZlcmlmeSB0aGF0IHdlIHJlY3Vy c2l2ZWx5IGdlbmVyYXRlIGNvZGUgZm9yIGV2ZW4gYW5kIG9kZCBoYWx2ZXMKKyAgIGluc3RlYWQg b2YgZmFsbGJhY2sgY29kZS4gVGhpcyBpcyBzbyBkZXNwaXRlIHRoZSBsb25nZXIgY29kZS1nZW4K KyAgIGJlY2F1c2UgaXQgaGFzIGZld2VyIGRlcGVuZGVuY2llcyBhbmQgdGh1cyBoYXMgbGVzc2Vy IGNvc3QuICAqLworCisvKgorKiogZl9zMTY6CisqKgkuLi4KKyoqCXN4dGgJdzAsIHcwCisqKglz eHRoCXcxLCB3MQorKioJZm1vdglkMCwgeDAKKyoqCWZtb3YJZDEsIHgxCisqKglpbnMJdlswLTld K1wuaFxbMVxdLCB3MgorKioJaW5zCXZbMC05XStcLmhcWzFcXSwgdzMKKyoqCWlucwl2WzAtOV0r XC5oXFsyXF0sIHc0CisqKglpbnMJdlswLTldK1wuaFxbMlxdLCB3NQorKioJaW5zCXZbMC05XStc LmhcWzNcXSwgdzYKKyoqCWlucwl2WzAtOV0rXC5oXFszXF0sIHc3CisqKgl6aXAxCXZbMC05XStc LjhoLCB2WzAtOV0rXC44aCwgdlswLTldK1wuOGgKKyoqCS4uLgorKioJcmV0CisqLworCisjaW5j bHVkZSAidmVjLWluaXQtMjIuaCIKZGlmZiAtLWdpdCBhL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdl dC9hYXJjaDY0L3ZlYy1pbml0LTIyLmggYi9nY2MvdGVzdHN1aXRlL2djYy50YXJnZXQvYWFyY2g2 NC92ZWMtaW5pdC0yMi5oCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAwLi4x NWI4ODlkNDA5NwotLS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRhcmdldC9h YXJjaDY0L3ZlYy1pbml0LTIyLmgKQEAgLTAsMCArMSw3IEBACisjaW5jbHVkZSA8YXJtX25lb24u aD4KKworaW50MTZ4OF90IGZfczE2IChpbnQxNl90IHgwLCBpbnQxNl90IHgxLCBpbnQxNl90IHgy LCBpbnQxNl90IHgzLAorICAgICAgICAgICAgICAgICBpbnQxNl90IHg0LCBpbnQxNl90IHg1LCBp bnQxNl90IHg2LCBpbnQxNl90IHg3KQoreworICByZXR1cm4gKGludDE2eDhfdCkgeyB4MCwgeDEs IHgyLCB4MywgeDQsIHg1LCB4NiwgeDcgfTsKK30K --000000000000aac18505f3c2ec07--