From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 4A39E3857023 for ; Fri, 30 Apr 2021 06:30:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4A39E3857023 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-510-m-LFxbloM5C4qvTvfv7n5g-1; Fri, 30 Apr 2021 02:30:23 -0400 X-MC-Unique: m-LFxbloM5C4qvTvfv7n5g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4156F18B9ED1; Fri, 30 Apr 2021 06:30:22 +0000 (UTC) Received: from oldenburg.str.redhat.com (ovpn-115-124.ams2.redhat.com [10.36.115.124]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1D38910016F8; Fri, 30 Apr 2021 06:30:20 +0000 (UTC) From: Florian Weimer To: Feng Xue OS via Gcc Cc: Feng Xue OS , JiangNing OS Subject: Re: [RFC] A memory gathering optimization for loop References: Date: Fri, 30 Apr 2021 08:30:33 +0200 In-Reply-To: (Feng Xue's message of "Thu, 14 Jan 2021 04:28:25 +0000") Message-ID: <87mttgb7xy.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Apr 2021 06:30:29 -0000 * Feng Xue: > To simplify explanation of this memory gathering optimization, pseudo > code on original nested loops is given as: > > outer-loop ( ) { /* data in inner loop are also invariant in outer loop. */ > ... > inner-loop (iter, iter_count) { /* inner loop to apply MGO */ > > Type1 v1 = LOAD_FN1 (iter); > Type2 v2 = LOAD_FN2 (v1); > Type3 v3 = LOAD_FN3 (v2); > ... > iter = NEXT_FN (iter); > } > > ... > } > > "LOAD_FNx()" is a conceptual function that translates its argument to a > result, and is composed of a sequence of operations in which there is > only one memory load. It is capable of representing simple memory > dereference like "iter->field", or more complicated expression like > "array[iter * 2 + cst].field". > > Likewise, "NEXT_FN()" is also a conceptual function which defines how an > iterator is advanced. It is able to describe simple integer iterator, > list iterator, or even pure-function-based iterator on advanced stuff > like hashset/tree. > > Then the code will be transformed to: > > typedef struct cache_elem { > bool init; > Type1 c_v1; > Type2 c_v2; > Type3 c_v3; > } cache_elem; > > cache_elem *cache_arr = calloc (iter_count, sizeof(cache_elem)); > > outer-loop () { > ... > size_t cache_idx = 0; > > inner-loop (iter, iter_count) { > > if (!cache_arr[cache_idx]->init) { > v1 = LOAD_FN1 (iter); > v2 = LOAD_FN2 (v1); > v3 = LOAD_FN3 (v2); > > cache_arr[cache_idx]->init = true; > cache_arr[cache_idx]->c_v1 = v1; > cache_arr[cache_idx]->c_v2 = v2; > cache_arr[cache_idx]->c_v3 = v3; > } > else { > v1 = cache_arr[cache_idx]->c_v1; > v2 = cache_arr[cache_idx]->c_v2; > v3 = cache_arr[cache_idx]->c_v3; > } > ... > cache_idx++; > iter = NEXT_FN (iter); > } > ... > } > > free (cache_arr); Why do you need the init field? Isn't the array initialized sequentially? You need to keep the original loop and use it if calloc fails. The transformation is not valid in general because calloc is not an async-signal-safe function, and GCC shouldn't introduce such calls if they are not present in the original source code. For -fnon-call-exceptions, you need to introduce unwinding code that calls free. Can you change this optimization so that it can use a fixed-size buffer? This would avoid all issues around calling calloc. If iter_count can be very large, allocating that much extra memory might not be beneficial anyway. Thanks, Florian