[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] Fix error: array subscript has type 'char'
On Thu, Jan 14, 2021 at 02:25:05PM +0100, Jan Beulich wrote: > On 14.01.2021 13:29, Manuel Bouyer wrote: > > On Thu, Jan 14, 2021 at 11:53:20AM +0100, Jan Beulich wrote: > >> On 12.01.2021 19:12, Manuel Bouyer wrote: > >>> From: Manuel Bouyer <bouyer@xxxxxxxxxx> > >>> > >>> Use unsigned char variable, or cast to (unsigned char), for > >>> tolower()/islower() and friends. Fix compiler error > >>> array subscript has type 'char' [-Werror=char-subscripts] > >> > >> Isn't this something that wants changing in your ctype.h instead? > >> the functions (or macros), as per the C standard, ought to accept > >> plain char aiui, and if they use the input as an array subscript, > >> it should be their implementation suitably converting type first. > > > > I asked for inputs from NetBSD developers familiar with this part. > > > > Although the parameter is an int, only a subset of values is valid, > > as stated in ISO C 2018 (Section 7.4 paragrah 1): > >> In all cases the argument is an int, the value of which shall be > >> representable as an unsigned char or shall equal the value of the > >> macro EOF. If the argument has any other value, the behavior is > >> undefined. > > Which means you're shifting the undefined-ness from the implementation > (using the value as array index) to the callers (truncating values, or > converting value range). In particular I do not think that the > intention behind the standard's wording is for every caller to need to > cast to unsigned char, when e.g. character literals are of type char > and string literals are of type const char[]. If you don't cast you fall into the undefined behavior case for non-ascii characters. For example, "é" in iso-latin-1 is 0xe9. In a signed char, this is -23 (decimal). without the cast, tolower() will see -23. If it is casted to (unsigned char) first, tolower() will see 233, as expected. The following test program illustrates this: armandeche:/tmp>cat toto.c #include <stdio.h> int main(int _c, const char *_v[]) { char c = 0xe9; printf("%d %d\n", (int)c, (int)(unsigned char)c); } armandeche:/tmp>./toto -23 233 > > > As stated by NetBSD's ctype(3) manual page, NetBSD and glibc took different > > approach. NetBSD emits a compile-time warning if the input may lead to > > undefined behavior. quoting the man page: > > Some implementations of libc, such as glibc as of 2018, attempt to > > avoid > > the worst of the undefined behavior by defining the functions to work > > for > > all integer inputs representable by either unsigned char or char, and > > suppress the warning. However, this is not an excuse for avoiding > > conversion to unsigned char: if EOF coincides with any such value, as > > it > > does when it is -1 on platforms with signed char, programs that pass > > char > > will still necessarily confuse the classification and mapping of EOF > > with > > the classification and mapping of some non-EOF inputs. > > > > > > So, although no warning is emmited on linux, it looks like to me that the > > cast to unsigned char is needed anyway, and relying on glibc's behavior > > is not portable. > > I fully agree we shouldn't rely on glibc's behavior (I'm sure > you've looked at xen/include/xen/ctype.h to see how we address > this it Xen itself, but I will admit this is to a degree comparing > apples and oranges, not the least because the lack of a need to > consider EOF in Xen). At least in xen/tools/symbols.c I don't > think we're at risk of running into "undefined" space. Casts are as long as there's only ascii characters. Anyway NetBSD won't change its ctype.h -- Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx> NetBSD: 26 ans d'experience feront toujours la difference --
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |