From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 6C42B38582B0 for ; Fri, 17 Jun 2022 21:39:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6C42B38582B0 Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-443-YD2-hmd0OsGQVx_ptxHZMw-1; Fri, 17 Jun 2022 17:39:48 -0400 X-MC-Unique: YD2-hmd0OsGQVx_ptxHZMw-1 Received: by mail-qk1-f197.google.com with SMTP id bq34-20020a05620a46a200b006a9793b0cd3so6317615qkb.10 for ; Fri, 17 Jun 2022 14:39:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=A9f8FN5G/NZBtZ5JEWyyPmXrH3m1qeX6CJCTfzFETD0=; b=jRQm0lQcMIhNmwxfXF7Xek8YMAU428mMZJQbWmCzDGJUWykiJXIHSU+jIpGf8HGhh0 h7pYJ4sKidYuRvceNQEt/IALzMDgCPq6obA9UjKeHmUnEZw6pk3Dh7ehN50cqv5GPyRA Ku1jpRuP+FAtJ0S1QwkYGwKdYOZuN0yvp5civPoKCWEoHb1j9UhtyhLytkbpg1G/r6++ ozRcyaYVwz5RE+iJ82ta6P1vgkWdOsbueO2qWP+dvVh5lY5KosT8oRTfeVDtCLpH6Nf0 GZJsJnhXG4yL3agj90P3M327uJ0sMDPVu3q44i0ZPfcarmZJk1r4fbo+jxerMmqkaL63 EOEA== X-Gm-Message-State: AJIora9/z9nq6N0QJX6VN4G89dCtGIxCyjL/5kTomCI0kvfaRcs+pH11 75w+tIeItB2/DngeuzsCgwrI8yx+2LyXLBeC5atSFJOoQodJrRbmSnreH02evPsJqIrD1lsY7ET mxPPhWb8= X-Received: by 2002:a05:620a:13fa:b0:6a7:7eb2:5780 with SMTP id h26-20020a05620a13fa00b006a77eb25780mr8434790qkl.387.1655501987607; Fri, 17 Jun 2022 14:39:47 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tqUTGpd8G6jCWd2E3uZMx2pWS+1QE+4oeBRnIGVfy7Ogvjl3+DH/UmAFxx4j70PoG7TVaIwg== X-Received: by 2002:a05:620a:13fa:b0:6a7:7eb2:5780 with SMTP id h26-20020a05620a13fa00b006a77eb25780mr8434775qkl.387.1655501987357; Fri, 17 Jun 2022 14:39:47 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id c26-20020ac8111a000000b003051e3b49c2sm4919532qtj.61.2022.06.17.14.39.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 14:39:46 -0700 (PDT) Message-ID: <1b6a6d3e570b4e945db8b7e452cb47f384140272.camel@redhat.com> Subject: Re: [RFC] analyzer: allocation size warning From: David Malcolm To: Tim Lange , Prathamesh Kulkarni Cc: GCC Mailing List Date: Fri, 17 Jun 2022 17:39:44 -0400 In-Reply-To: References: User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2022 21:39:51 -0000 On Fri, 2022-06-17 at 21:23 +0200, Tim Lange wrote: > > > On Fr, Jun 17 2022 at 22:45:42 +0530, Prathamesh Kulkarni > wrote: > > On Fri, 17 Jun 2022 at 21:25, Tim Lange wrote: > > > > > >  Hi everyone, > > Hi Tim, > > Thanks for posting the POC patch! > > Just a couple of comments (inline) > Hi Prathamesh, > thanks for looking at it. > > > > > >  tracked in PR105900 [0], I'd like to add support for a new warning > > > on > > >  dubious allocation sizes. The new checker emits a warning when the > > >  allocation size is not a multiple of the type's size. With the > > > checker, > > >  following mistakes are detected: > > >    int *arr = malloc(3); // forgot to multiply by sizeof > > >    arr[0] = ...; > > >    arr[1] = ...; > > >  or > > >    int *buf = malloc (n + sizeof(int)); // probably should be * > > > instead > > >  of + > > >  Because it is implemented inside the analyzer, it also emits > > > warnings > > >  when the buffer is first of type void* and later on casted to > > > something > > >  else. Though, this also inherits a limitation. The checker can not > > >  distinguish 2 * sizeof(short) from sizeof(int) because sizeof is > > >  resolved and constants are folded at the point when the analyzer > > > runs. > > >  As a mitigation, I plan to implement a check in the frontend that > > > emits > > >  a warning if sizeof(lhs pointee type) is not part of the malloc > > >  argument. > > IMHO, warning if sizeof(lhs pointee_type) is not present inside > > malloc, might not be a good idea because it > > would reject valid calls to malloc. > > For eg: > > (1) > > size_t size = sizeof(int); > > int *p = malloc (size); > > > > (2) > > void *p = malloc (sizeof(int)); > > int *q = p; > Hm, that's right. Maybe only warn when there is a sizeof(type) in the > argument and the lhs pointee_type != type (except for void*, maybe > char* and "inherited" structs)? That sounds plausible. [...snip...] > > > > > Won't the warning be incorrect if 'n' is a multiple of sizeof(int) ? > > I assume by symbolic buffer size, 'n' is not known at compile time. > * VLAs are resolved to n * `sizeof(type) when the analyzer runs and > work > fine. Great - and please make sure the test suite has test coverage for anything we're talking about! IIRC, VLAs work using __builtin_alloca, rather than malloc, and the new diagnostic is currently implemented as an extension of the sm-malloc.cc code, so I don't think it could fire anyway for a __builtin_alloca. Does it fire for: int *ptr = alloca (sizeof (short)); // BUG: sizeof(short) != sizeof(int), probably There are two places in the analyzer that are tracking memory allocations: (a) the sm-malloc.cc code, and (b) in the region_model's m_dynamic_extents hash_map, which tracks a symbolic value for the size of each reachable dynamically-allocated region (and inhibits merging of exploded_nodes that have different dynamic extents). See region_model::set_dynamic_extents and region_model::get_dynamic_extents These are called by: region_model::create_region_for_alloca and region_model::create_region_for_heap_alloc and: region_model::impl_call_realloc My gut feeling is that the diagnostic would work better implemented in terms of (b), rather than (a). If you test in region_model::set_value, rather than in a state machine, the test could use the dynamic-extent tracking of region_model (and thus work e.g. with alloca and realloc, see the example above), and could also be extended to catch assignments to pointers from statically-allocated regions of the wrong size e.g.: char buf[2]; int *ptr = (int *)buf; // BUG: sizeof(buf) is only 2 bytes short s; int *ptr = (int *)&s; // BUG: sizeof(short) != sizeof(int), probably diagnostic_manager::add_events_for_eedge already has some code that will add a region_creation_event to the checker_path for the case where a region is determined to be the one of interest to the diagnostic, See e.g. poisoned_value_diagnostic::mark_interesting_stuff, which indicates the region of interest, which is how we get event (1) in the following for an alloca: ../../src/gcc/testsuite/gcc.dg/analyzer/uninit-alloca.c: In function ‘test_1’: ../../src/gcc/testsuite/gcc.dg/analyzer/uninit-alloca.c:6:10: warning: use of uninitialized  value ‘*p’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value] 6 | return *p; /* { dg-warning "use of uninitialized value '\\*p'" } */ | ^~ ‘test_1’: events 1-2 | | 5 | int *p = __builtin_alloca (sizeof (int)); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (1) region created on stack here | 6 | return *p; | | ~~ | | | | | (2) use of uninitialized value ‘*p’ here | and event (1) for a statically-sized variable here: ../../src/gcc/testsuite/gcc.dg/analyzer/uninit-1.c: In function ‘test_1’: ../../src/gcc/testsuite/gcc.dg/analyzer/uninit-1.c:7:10: warning: use of uninitialized  value ‘i’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value] 7 | return i; | ^ ‘test_1’: events 1-2 | | 6 | int i; | | ^ | | | | | (1) region created on stack here | 7 | return i; | | ~ | | | | | (2) use of uninitialized value ‘i’ here | So I think you could implement this warning inside region-model.cc instead of sm-malloc.cc, and by implementing the mark_interesting_stuff vfunc to set the region of interest to the buffer with the wrong size, you'd get the allocation event "for free", and could handle assignment to pointers from the address of statically-sized objects with incompatible size. That said, I haven't tried implementing this, and there may be some snag I've forgotten :/ * Flows with if (cond) n = ...; else n = ...; are tracked by the analyzer with a widening_svalue and can be handled (While thinking about this answer, I noticed my patch is missing this case. Thanks!) Great! Please add test coverage :) * In case of more complicated flows, the analyzer's buffer size tracking resorts to unknown_svalue. If any variable in an expression is unknown, no warning will be emitted. That's OK, I think. * Generally, when requesting memory for a variable type, accepting an arbitrary number doesn't sound right. I do warn, e.g. if 'n' is a conjured_svalue (e.g. a from scanf call). scanf should probably mark its arguments as tainted, and thus -Wanalyzer-tainted-allocation-size should complain if you do a malloc based on scanf-supplied data without sanitization (otherwise it's a denial-of-service attack waiting to happen, I think). I've filed this as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106021 I think only the last case could in theory be a false-positive. I've noticed that this is the case when 'n' is guarded by an if making sure n is only a multiple of sizeof(type). In theory, I can fix this case too as the analysis is path-sensitive. Sounds like this should be in the test suite also :) Do you know of some other case where 'n' might be an unknown value neither guarded an if condition nor resorted to 'unknown' by a complicated flow but still correct? - Tim > > Thanks, > Prathamesh > Hope this makes sense and is constructive Thanks Dave