The C Programming Language

The Core Language

C provides no memory safety. Most recommendations in this section deal with this aspect of the language.

Undefined Behavior

Some C constructs are defined to be undefined by the C standard. This does not only mean that the standard does not describe what happens when the construct is executed. It also allows optimizing compilers such as GCC to assume that this particular construct is never reached. In some cases, this has caused GCC to optimize security checks away. (This is not a flaw in GCC or the C language. But C certainly has some areas which are more difficult to use than others.)

Common sources of undefined behavior are:

  • out-of-bounds array accesses

  • null pointer dereferences

  • overflow in signed integer arithmetic

Recommendations for Pointers and Array Handling

Always keep track of the size of the array you are working with. Often, code is more obviously correct when you keep a pointer past the last element of the array, and calculate the number of remaining elements by subtracting the current position from that pointer. The alternative, updating a separate variable every time when the position is advanced, is usually less obviously correct.

Array processing in C shows how to extract Pascal-style strings from a character buffer. The two pointers kept for length checks are inend and outend. inp and outp are the respective positions. The number of input bytes is checked using the expression len > (size_t)(inend - inp). The cast silences a compiler warning; inend is always larger than inp.

Contoh 1. Array processing in C
ssize_t
extract_strings(const char *in, size_t inlen, char **out, size_t outlen)
{
  const char *inp = in;
  const char *inend = in + inlen;
  char **outp = out;
  char **outend = out + outlen;

  while (inp != inend) {
    size_t len;
    char *s;
    if (outp == outend) {
      errno = ENOSPC;
      goto err;
    }
    len = (unsigned char)*inp;
    ++inp;
    if (len > (size_t)(inend - inp)) {
      errno = EINVAL;
      goto err;
    }
    s = malloc(len + 1);
    if (s == NULL) {
      goto err;
    }
    memcpy(s, inp, len);
    inp += len;
    s[len] = '\0';
    *outp = s;
    ++outp;
  }
  return outp - out;
err:
  {
    int errno_old = errno;
    while (out != outp) {
      free(*out);
      ++out;
    }
    errno = errno_old;
  }
  return -1;
}

It is important that the length checks always have the form len > (size_t)(inend - inp), where len is a variable of type size_t which denotes the total number of bytes which are about to be read or written next. In general, it is not safe to fold multiple such checks into one, as in len1 + len2 > (size_t)(inend - inp), because the expression on the left can overflow or wrap around (see Recommendations for Integer Arithmetic), and it no longer reflects the number of bytes to be processed.

Recommendations for Integer Arithmetic

Overflow in signed integer arithmetic is undefined. This means that it is not possible to check for overflow after it happened, see Incorrect overflow detection in C.

Contoh 2. Incorrect overflow detection in C
void report_overflow(void);

int
add(int a, int b)
{
  int result = a + b;
  if (a < 0 || b < 0) {
    return -1;
  }
  // The compiler can optimize away the following if statement.
  if (result < 0) {
    report_overflow();
  }
  return result;
}

The following approaches can be used to check for overflow, without actually causing it.

  • Use a wider type to perform the calculation, check that the result is within bounds, and convert the result to the original type. All intermediate results must be checked in this way.

  • Perform the calculation in the corresponding unsigned type and use bit fiddling to detect the overflow. Overflow checking for unsigned addition shows how to perform an overflow check for unsigned integer addition. For three or more terms, all the intermediate additions have to be checked in this way.

Contoh 3. Overflow checking for unsigned addition
void report_overflow(void);

unsigned
add_unsigned(unsigned a, unsigned b)
{
  unsigned sum = a + b;
  if (sum < a) { // or sum < b
    report_overflow();
  }
  return sum;
}
Contoh 4. Overflow checking for unsigned multiplication
unsigned
mul(unsigned a, unsigned b)
{
  if (b && a > ((unsigned)-1) / b) {
    report_overflow();
  }
  return a * b;
}

Basic arithmetic operations are commutative, so for bounds checks, there are two different but mathematically equivalent expressions. Sometimes, one of the expressions results in better code because parts of it can be reduced to a constant. This applies to overflow checks for multiplication a * b involving a constant a, where the expression is reduced to b > C for some constant C determined at compile time. The other expression, b && a > ((unsigned)-1) / b, is more difficult to optimize at compile time.

When a value is converted to a signed integer, GCC always chooses the result based on 2’s complement arithmetic. This GCC extension (which is also implemented by other compilers) helps a lot when implementing overflow checks.

Sometimes, it is necessary to compare unsigned and signed integer variables. This results in a compiler warning, comparison between signed and unsigned integer expressions, because the comparison often gives unexpected results for negative values. When adding a cast, make sure that negative values are covered properly. If the bound is unsigned and the checked quantity is signed, you should cast the checked quantity to an unsigned type as least as wide as either operand type. As a result, negative values will fail the bounds check. (You can still check for negative values separately for clarity, and the compiler will optimize away this redundant check.)

Legacy code should be compiled with the -fwrapv GCC option. As a result, GCC will provide 2’s complement semantics for integer arithmetic, including defined behavior on integer overflow.

Global Variables

Global variables should be avoided because they usually lead to thread safety hazards. In any case, they should be declared static, so that access is restricted to a single translation unit.

Global constants are not a problem, but declaring them can be tricky. Declaring a constant array of constant strings shows how to declare a constant array of constant strings. The second const is needed to make the array constant, and not just the strings. It must be placed after the *, and not before it.

Contoh 5. Declaring a constant array of constant strings
static const char *const string_list[] = {
  "first",
  "second",
  "third",
  NULL
};

Sometimes, static variables local to functions are used as a replacement for proper memory management. Unlike non-static local variables, it is possible to return a pointer to static local variables to the caller. But such variables are well-hidden, but effectively global (just as static variables at file scope). It is difficult to add thread safety afterwards if such interfaces are used. Merely dropping the static keyword in such cases leads to undefined behavior.

Another source for static local variables is a desire to reduce stack space usage on embedded platforms, where the stack may span only a few hundred bytes. If this is the only reason why the static keyword is used, it can just be dropped, unless the object is very large (larger than 128 kilobytes on 32-bit platforms). In the latter case, it is recommended to allocate the object using malloc, to obtain proper array checking, for the same reasons outlined in alloca and Other Forms of Stack-based Allocation.

The C Standard Library

Parts of the C standard library (and the UNIX and GNU extensions) are difficult to use, so you should avoid them.

Please check the applicable documentation before using the recommended replacements. Many of these functions allocate buffers using malloc which your code must deallocate explicitly using free.

Absolutely Banned Interfaces

The functions listed below must not be used because they are almost always unsafe. Use the indicated replacements instead.

  • getsfgets

  • getwdgetcwd or get_current_dir_name

  • readdir_rreaddir

  • realpath (with a non-NULL second parameter) ⟶ realpath with NULL as the second parameter, or canonicalize_file_name

The constants listed below must not be used, either. Instead, code must allocate memory dynamically and use interfaces with length checking.

  • NAME_MAX (limit not actually enforced by the kernel)

  • PATH_MAX (limit not actually enforced by the kernel)

  • _PC_NAME_MAX (This limit, returned by the pathconf function, is not enforced by the kernel.)

  • _PC_PATH_MAX (This limit, returned by the pathconf function, is not enforced by the kernel.)

The following structure members must not be used.

  • f_namemax in struct statvfs (limit not actually enforced by the kernel, see _PC_NAME_MAX above)

Functions to Avoid

The following string manipulation functions can be used securely in principle, but their use should be avoided because they are difficult to use correctly. Calls to these functions can be replaced with asprintf or vasprintf. (For non-GNU targets, these functions are available from Gnulib.) In some cases, the snprintf function might be a suitable replacement, see String Functions with Explicit Length Arguments.

  • sprintf

  • strcat

  • strcpy

  • vsprintf

Use the indicated replacements for the functions below.

String Functions with Explicit Length Arguments

The C run-time library provides string manipulation functions which not just look for NUL characters for string termination, but also honor explicit lengths provided by the caller. However, these functions evolved over a long period of time, and the lengths mean different things depending on the function.

snprintf

The snprintf function provides a way to construct a string in a statically-sized buffer. (If the buffer size is allocated on the heap, consider use asprintf instead.)

char fraction[30];
snprintf(fraction, sizeof(fraction), "%d/%d", numerator, denominator);

The second argument to the snprintf call should always be the size of the buffer in the first argument (which should be a character array). Elaborate pointer and length arithmetic can introduce errors and nullify the security benefits of snprintf.

In particular, snprintf is not well-suited to constructing a string iteratively, by appending to an existing buffer. snprintf returns one of two values, -1 on errors, or the number of characters which would have been written to the buffer if the buffer were large enough. This means that adding the result of snprintf to the buffer pointer to skip over the characters just written is incorrect and risky. However, as long as the length argument is not zero, the buffer will remain null-terminated. Repeatedly writing to a buffer using snprintf works because end -current > 0 is a loop invariant. After the loop, the result string is in the buf variable.

Contoh 6. Repeatedly writing to a buffer using snprintf
char buf[512];
char *current = buf;
const char *const end = buf + sizeof(buf);
for (struct item *it = data; it->key; ++it) {
  snprintf(current, end - current, "%s%s=%d",
	       current == buf ? "" : ", ", it->key, it->value);
  current += strlen(current);
}

If you want to avoid the call to strlen for performance reasons, you have to check for a negative return value from snprintf and also check if the return value is equal to the specified buffer length or larger. Only if neither condition applies, you may advance the pointer to the start of the write buffer by the number return by snprintf. However, this optimization is rarely worthwhile.

Note that it is not permitted to use the same buffer both as the destination and as a source argument.

vsnprintf and Format Strings

If you use vsnprintf (or vasprintf or even snprintf) with a format string which is not a constant, but a function argument, it is important to annotate the function with a format function attribute, so that GCC can warn about misuse of your function (see The format function attribute).

Contoh 7. The format function attribute
void log_format(const char *format, ...) __attribute__((format(printf, 1, 2)));

void
log_format(const char *format, ...)
{
  char buf[1000];
  va_list ap;
  va_start(ap, format);
  vsnprintf(buf, sizeof(buf), format, ap);
  va_end(ap);
  log_string(buf);
}

strncpy

The strncpy function does not ensure that the target buffer is null-terminated. A common idiom for ensuring NUL termination is:

char buf[10];
strncpy(buf, data, sizeof(buf));
buf[sizeof(buf) - 1] = '\0';

Another approach uses the strncat function for this purpose:

buf[0] = '\0';
strncat(buf, data, sizeof(buf) - 1);

strncat

The length argument of the strncat function specifies the maximum number of characters copied from the source buffer, excluding the terminating NUL character. This means that the required number of bytes in the destination buffer is the length of the original string, plus the length argument in the strncat call, plus one. Consequently, this function is rarely appropriate for performing a length-checked string operation, with the notable exception of the strcpy emulation described in strncpy.

To implement a length-checked string append, you can use an approach similar to Repeatedly writing to a buffer using snprintf:

char buf[10];
snprintf(buf, sizeof(buf), "%s", prefix);
snprintf(buf + strlen(buf), sizeof(buf) - strlen(buf), "%s", data);

In many cases, including this one, the string concatenation can be avoided by combining everything into a single format string:

snprintf(buf, sizeof(buf), "%s%s", prefix, data);

But you should must not dynamically construct format strings to avoid concatenation because this would prevent GCC from type-checking the argument lists.

It is not possible to use format strings like "%s%s" to implement concatenation, unless you use separate buffers. snprintf does not support overlapping source and target strings.

strlcpy and strlcat

Some systems support strlcpy and strlcat functions which behave this way, but these functions are not part of GNU libc. strlcpy is often replaced with snprintf with a "%s" format string. See strncpy for a caveat related to the snprintf return value.

To emulate strlcat, use the approach described in strncat.

ISO C11 Annex K *_s functions

ISO C11 adds another set of length-checking functions, but GNU libc currently does not implement them.

Other strn* and stpn* functions

GNU libc contains additional functions with different variants of length checking. Consult the documentation before using them to find out what the length actually means.

Using tricky syscalls or library functions

This is the hardest system call to use correctly because of everything you have to do

  • The buf should be of PATH_MAX length, that includes space for the terminating NUL character.

  • The bufsize should be sizeof(buf) - 1

  • readlink return value should be caught as a signed integer (ideally type ssize_t).

  • It should be checked for < 0 for indication of errors.

  • The caller needs to '\0' -terminate the buffer using the returned value as an index.

chroot

  • Target dir should be writable only by root (this implies owned by).

  • Must call chdir immediately after chroot or you are not really in the changed root.

stat, lstat, fstatat

  • These functions have an inherent race in that you operate on the path name which could change in the mean time. Using fstat is recommended when stat is used.

  • If S_ISLNK macro is used, the stat buffer MUST come from lstat or from fstatat with AT_SYMLINK_NOFOLLOW

  • If you are doing something really important, call fstat after opening and compare the before and after stat buffers before trusting them.

setgid, setuid:

  • Call these in the right order: groups and then uid.

  • Always check the return code.

  • If setgid & setuid are used, supplemental groups are not reset. This must be done with setgroups or initgroups before the uid change.

Memory Allocators

The C library interfaces for memory allocation are provided by malloc, free and realloc, and the calloc function. In addition to these generic functions, there are derived functions such as strdup which perform allocation using malloc internally, but do not return untyped heap memory (which could be used for any object).

The C compiler knows about these functions and can use their expected behavior for optimizations. For instance, the compiler assumes that an existing pointer (or a pointer derived from an existing pointer by arithmetic) will not point into the memory area returned by malloc.

If the allocation fails, realloc does not free the old pointer. Therefore, the idiom ptr = realloc(ptr, size); is wrong because the memory pointed to by ptr leaks in case of an error.

Memory leaks

After a memory area has been allocated with functions like malloc, calloc, etc. and it is no longer necessary, it must be freed in order for the system to release the memory region and re-use it if necessary. Failing to do so may lead to the application using more memory than necessary and, in some cases, crashing due to no more memory being available.

If portability is not important in your program, an alternative way of automatic memory management is to leverage the cleanup attribute supported by the recent versions of GCC and Clang. If a local variable is declared with the attribute, the specified cleanup function will be called when the variable goes out of scope.

static inline void freep(void *p) {
        free(*(void**) p);
}

void somefunction(const char *param) {
	if (strcmp(param, "do_something_complex") == 0) {
		__attribute__((cleanup(freep))) char *ptr = NULL;

		/* Allocate a temporary buffer */
		ptr = malloc(size);

		/* Do something on it, but do not need to manually call free() */
	}
}

Use-after-free errors

After free, the pointer is invalid. Further pointer dereferences are not allowed (and are usually detected by valgrind). Less obvious is that any use of the old pointer value is not allowed, either. In particular, comparisons with any other pointer (or the null pointer) are undefined according to the C standard.

The same rules apply to realloc if the memory area cannot be enlarged in-place. For instance, the compiler may assume that a comparison between the old and new pointer will always return false, so it is impossible to detect movement this way.

On a related note, realloc frees the memory area if the new size is zero. If the size unintentionally becomes zero, as a result of unsigned integer wrap-around for instance, the following idiom causes a double-free.

new_size = size + x; /* 'x' is a very large value and the result wraps around to zero */
new_ptr = realloc(ptr, new_size);
if (!new_ptr) {
	free(ptr);
}

Handling Memory Allocation Errors

Recovering from out-of-memory errors is often difficult or even impossible. In these cases, malloc and other allocation functions return a null pointer. Dereferencing this pointer lead to a crash. Such dereferences can even be exploitable for code execution if the dereference is combined with an array subscript.

In general, if you cannot check all allocation calls and handle failure, you should abort the program on allocation failure, and not rely on the null pointer dereference to terminate the process. See Recommendations for Manually-written Decoders for related memory allocation concerns.

alloca and Other Forms of Stack-based Allocation

Allocation on the stack is risky because stack overflow checking is implicit. There is a guard page at the end of the memory area reserved for the stack. If the program attempts to read from or write to this guard page, a SIGSEGV signal is generated and the program typically terminates.

This is sufficient for detecting typical stack overflow situations such as unbounded recursion, but it fails when the stack grows in increments larger than the size of the guard page. In this case, it is possible that the stack pointer ends up pointing into a memory area which has been allocated for a different purposes. Such misbehavior can be exploitable.

A common source for large stack growth are calls to alloca and related functions such as strdupa. These functions should be avoided because of the lack of error checking. (They can be used safely if the allocated size is less than the page size (typically, 4096 bytes), but this case is relatively rare.) Additionally, relying on alloca makes it more difficult to reorganize the code because it is not allowed to use the pointer after the function calling alloca has returned, even if this function has been inlined into its caller.

Similar concerns apply to variable-length arrays (VLAs), a feature of the C99 standard which started as a GNU extension. For large objects exceeding the page size, there is no error checking, either.

In both cases, negative or very large sizes can trigger a stack-pointer wraparound, and the stack pointer ends up pointing into caller stack frames, which is fatal and can be exploitable.

If you want to use alloca or VLAs for performance reasons, consider using a small on-stack array (less than the page size, large enough to fulfill most requests). If the requested size is small enough, use the on-stack array. Otherwise, call malloc. When exiting the function, check if malloc had been called, and free the buffer as needed.

Remember that memory allocated on the stack through alloca is released at the end of the function and not at the end of the block where it is defined, thus it is reccommended to not call alloca inside a loop. In this regard, VLA behaves better, considering the memory allocated with VLA is released at the end of the block that defines them. Do not mix VLA and alloca though, otherwise this behaviour is not guaranteed for VLA either!

Array Allocation

When allocating arrays, it is important to check for overflows. The calloc function performs such checks.

If malloc or realloc is used, the size check must be written manually. For instance, to allocate an array of n elements of type T, check that the requested size is not greater than ((size_t) -1) / sizeof(T). See Recommendations for Integer Arithmetic.

GNU libc provides a dedicated function reallocarray that allocates an array with those checks performed internally. However, care must be taken if portability is important: while the interface originated in OpenBSD and has been adopted in many other platforms, NetBSD exposes an incompatible behavior with the same interface.

Custom Memory Allocators

Custom memory allocates come in two forms: replacements for malloc, and completely different interfaces for memory management. Both approaches can reduce the effectiveness of valgrind and similar tools, and the heap corruption detection provided by GNU libc, so they should be avoided.

Memory allocators are difficult to write and contain many performance and security pitfalls.

  • When computing array sizes or rounding up allocation requests (to the next allocation granularity, or for alignment purposes), checks for arithmetic overflow are required.

  • Size computations for array allocations need overflow checking. See Array Allocation.

  • It can be difficult to beat well-tuned general-purpose allocators. In micro benchmarks, pool allocators can show huge wins, and size-specific pools can reduce internal fragmentation. But often, utilization of individual pools is poor, and external fragmentation increases the overall memory usage.

Conservative Garbage Collection

Garbage collection can be an alternative to explicit memory management using malloc and free. The Boehm-Dehmers-Weiser allocator can be used from C programs, with minimal type annotations. Performance is competitive with malloc on 64-bit architectures, especially for multi-threaded programs. The stop-the-world pauses may be problematic for some real-time applications, though.

However, using a conservative garbage collector may reduce opportunities for code reduce because once one library in a program uses garbage collection, the whole process memory needs to be subject to it, so that no pointers are missed. The Boehm-Dehmers-Weiser collector also reserves certain signals for internal use, so it is not fully transparent to the rest of the program.

Other C-related Topics

Wrapper Functions

Some libraries provide wrappers for standard library functions. Common cases include allocation functions such as xmalloc which abort the process on allocation failure (instead of returning a NULL pointer), or alternatives to relatively recent library additions such as snprintf (along with implementations for systems which lack them).

In general, such wrappers are a bad idea, particularly if they are not implemented as inline functions or preprocessor macros. The compiler lacks knowledge of such wrappers outside the translation unit which defines them, which means that some optimizations and security checks are not performed. Adding attribute annotations to function declarations can remedy this to some extent, but these annotations have to be maintained carefully for feature parity with the standard implementation.

At the minimum, you should apply these attributes:

  • If you wrap function which accepts are GCC-recognized format string (for example, a printf-style function used for logging), you should add a suitable format attribute, as in The format function attribute.

  • If you wrap a function which carries a warn_unused_result attribute and you propagate its return value, your wrapper should be declared with warn_unused_result as well.

  • Duplicating the buffer length checks based on the __builtin_object_size GCC builtin is desirable if the wrapper processes arrays. (This functionality is used by the -D_FORTIFY_SOURCE=2 checks to guard against static buffer overflows.) However, designing appropriate interfaces and implementing the checks may not be entirely straightforward.

For other attributes (such as malloc), careful analysis and comparison with the compiler documentation is required to check if propagating the attribute is appropriate. Incorrectly applied attributes can result in undesired behavioral changes in the compiled code.

Common mistakes

Mistakes in macros

A macro is a name given to a block of C statements as a pre-processor directive. Being a pre-processor the block of code is transformed by the compiler before being compiled.

A macro starts with the preprocessor directive, #define. It can define a single value or any 'substitution', syntactically valid or not.

A common mistake when working with macros is that programmers treat arguments to macros like they would functions. This becomes an issue when the argument may be expanded multiple times in a macro.

For example:

macro-misuse.c

#define simple(thing) do { \
                    if (thing < 1) { \
                     y = thing; \
                    } \
                    else if (thing > 100) { \
                     y = thing * 2 + thing; \
                    } \
                    else  { \
                     y = 200; \
                    } \
                    } while (0)

int main(void) {
     int x = 200;
     int y = 0;
     simple(x++);

     return 0;
}

Each pass through the simple() macro would mean that x could be expanded in-place each time 'thing' was mentioned.

The 'main' function would be processed and expanded as follows:

macro-misuse-post-processing.c

  int main(void) {
     int x = 200;
     int y = 0;
     do {
                    if ( x++ < 1) {
                     y = x++;
                    }
                    else if (thing > 100) {
                     y = x++ * 2 + x++;
                    }
                    else  {
                     x = 200;
                    }
    } while (0)

    return 0;
  }

Each evaluation of the argument to 'simple' (x++) would be executed each time it was referenced.

While this may be 'expected' behaviour by the original creator, large projects may have programmers who were unaware of how the macro may expand and this may introduce unexpected behaviour, especially if the value is later used as indexing into an array or able to be overflowed.