The C++ Programming Language

The Core Language

C++ includes a large subset of the C language. As far as the C subset is used, the recommendations in Defensive Coding in C apply.

Array Allocation with operator new[]

For very large values of n, an expression like new T[n] can return a pointer to a heap region which is too small. In other words, not all array elements are actually backed with heap memory reserved to the array. Current GCC versions generate code that performs a computation of the form sizeof(T) * size_t(n) + cookie_size, where cookie_size is currently at most 8. This computation can overflow, and GCC versions prior to 4.8 generated code which did not detect this. (Fedora 18 was the first release which fixed this in GCC.)

The std::vector template can be used instead an explicit array allocation. (The GCC implementation detects overflow internally.)

If there is no alternative to operator new[] and the sources will be compiled with older GCC versions, code which allocates arrays with a variable length must check for overflow manually. For the new T[n] example, the size check could be n || (n > 0 && n > (size_t(-1) - 8) / sizeof(T)). (See Recommendations for Integer Arithmetic) If there are additional dimensions (which must be constants according to the C++ standard), these should be included as factors in the divisor.

These countermeasures prevent out-of-bounds writes and potential code execution. Very large memory allocations can still lead to a denial of service. Recommendations for Manually-written Decoders contains suggestions for mitigating this problem when processing untrusted data.

See Array Allocation for array allocation advice for C-style memory allocation.

Overloading

Do not overload functions with versions that have different security characteristics. For instance, do not implement a function strcat which works on std::string arguments. Similarly, do not name methods after such functions.

ABI compatibility and preparing for security updates

A stable binary interface (ABI) is vastly preferred for security updates. Without a stable ABI, all reverse dependencies need recompiling, which can be a lot of work and could even be impossible in some cases. Ideally, a security update only updates a single dynamic shared object, and is picked up automatically after restarting affected processes.

Outside of extremely performance-critical code, you should ensure that a wide range of changes is possible without breaking ABI. Some very basic guidelines are:

  • Avoid inline functions.

  • Use the pointer-to-implementation idiom.

  • Try to avoid templates. Use them if the increased type safety provides a benefit to the programmer.

  • Move security-critical code out of templated code, so that it can be patched in a central place if necessary.

The KDE project publishes a document with more extensive guidelines on ABI-preserving changes to C++ code, Policies/Binary Compatibility Issues With C++ (d-pointer refers to the pointer-to-implementation idiom).

C++0X and C++11 Support

GCC offers different language compatibility modes:

  • -std=c++98 for the original 1998 C++ standard

  • -std=c++03 for the 1998 standard with the changes from the TR1 technical report

  • -std=c++11 for the 2011 C++ standard. This option should not be used.

  • -std=c++0x for several different versions of C++11 support in development, depending on the GCC version. This option should not be used.

For each of these flags, there are variants which also enable GNU extensions (mostly language features also found in C99 or C11):

  • -std=gnu++98

  • -std=gnu++03

  • -std=gnu++11

Again, -std=gnu++11 should not be used.

If you enable C++11 support, the ABI of the standard C++ library libstdc++ will change in subtle ways. Currently, no C++ libraries are compiled in C++11 mode, so if you compile your code in C++11 mode, it will be incompatible with the rest of the system. Unfortunately, this is also the case if you do not use any C++11 features. Currently, there is no safe way to enable C++11 mode (except for freestanding applications).

The meaning of C++0X mode changed from GCC release to GCC release. Earlier versions were still ABI-compatible with C++98 mode, but in the most recent versions, switching to C++0X mode activates C++11 support, with its compatibility problems.

Some C++11 features (or approximations thereof) are available with TR1 support, that is, with -std=c03` or [option]`-std=gnu03 and in the <tr1/*> header files. This includes std::tr1::shared_ptr (from <tr1/memory>) and std::tr1::function (from <tr1/functional>). For other C++11 features, the Boost C++ library contains replacements.

The C++ Standard Library

The C++ standard library includes most of its C counterpart by reference, see Defensive Coding in C.

Functions That Are Difficult to Use

This section collects functions and function templates which are part of the standard library and are difficult to use.

Unpaired Iterators

Functions which use output operators or iterators which do not come in pairs (denoting ranges) cannot perform iterator range checking. (See Iterators) Function templates which involve output iterators are particularly dangerous:

  • std::copy

  • std::copy_backward

  • std::copy_if

  • std::move (three-argument variant)

  • std::move_backward

  • std::partition_copy_if

  • std::remove_copy

  • std::remove_copy_if

  • std::replace_copy

  • std::replace_copy_if

  • std::swap_ranges

  • std::transform

In addition, std::copy_n, std::fill_n and std::generate_n do not perform iterator checking, either, but there is an explicit count which has to be supplied by the caller, as opposed to an implicit length indicator in the form of a pair of forward iterators.

These output-iterator-expecting functions should only be used with unlimited-range output iterators, such as iterators obtained with the std::back_inserter function.

Other functions use single input or forward iterators, which can read beyond the end of the input range if the caller is not careful:

  • std::equal

  • std::is_permutation

  • std::mismatch

String Handling with std::string

The std::string class provides a convenient way to handle strings. Unlike C strings, std::string objects have an explicit length (and can contain embedded NUL characters), and storage for its characters is managed automatically. This section discusses std::string, but these observations also apply to other instances of the std::basic_string template.

The pointer returned by the data() member function does not necessarily point to a NUL-terminated string. To obtain a C-compatible string pointer, use c_str() instead, which adds the NUL terminator.

The pointers returned by the data() and c_str() functions and iterators are only valid until certain events happen. It is required that the exact std::string object still exists (even if it was initially created as a copy of another string object). Pointers and iterators are also invalidated when non-const member functions are called, or functions with a non-const reference parameter. The behavior of the GCC implementation deviates from that required by the C++ standard if multiple threads are present. In general, only the first call to a non-const member function after a structural modification of the string (such as appending a character) is invalidating, but this also applies to member function such as the non-const version of begin(), in violation of the C++ standard.

Particular care is necessary when invoking the c_str() member function on a temporary object. This is convenient for calling C functions, but the pointer will turn invalid as soon as the temporary object is destroyed, which generally happens when the outermost expression enclosing the expression on which c_str() is called completes evaluation. Passing the result of c_str() to a function which does not store or otherwise leak that pointer is safe, though.

Like with std::vector and std::array, subscribing with operator[] does not perform bounds checks. Use the at(size_type) member function instead. See Containers and operator[]. Furthermore, accessing the terminating NUL character using operator[] is not possible. (In some implementations, the c_str() member function writes the NUL character on demand.)

Never write to the pointers returned by data() or c_str() after casting away const. If you need a C-style writable string, use a std::vector<char> object and its data() member function. In this case, you have to explicitly add the terminating NUL character.

GCC’s implementation of std::string is currently based on reference counting. It is expected that a future version will remove the reference counting, due to performance and conformance issues. As a result, code that implicitly assumes sharing by holding to pointers or iterators for too long will break, resulting in run-time crashes or worse. On the other hand, non-const iterator-returning functions will no longer give other threads an opportunity for invalidating existing iterators and pointers because iterator invalidation does not depend on sharing of the internal character array object anymore.

Containers and operator[]

Many sequence containers similar to std::vector provide both operator[](size_type) and a member function at(size_type). This applies to std::vector itself, std::array, std::string and other instances of std::basic_string.

operator[](size_type) is not required by the standard to perform bounds checking (and the implementation in GCC does not). In contrast, at(size_type) must perform such a check. Therefore, in code which is not performance-critical, you should prefer at(size_type) over operator[](size_type), even though it is slightly more verbose.

The front() and back() member functions are undefined if a vector object is empty. You can use vec.at(0) and vec.at(vec.size() - 1) as checked replacements. For an empty vector, data() is defined; it returns an arbitrary pointer, but not necessarily the NULL pointer.

Iterators

Iterators do not perform any bounds checking. Therefore, all functions that work on iterators should accept them in pairs, denoting a range, and make sure that iterators are not moved outside that range. For forward iterators and bidirectional iterators, you need to check for equality before moving the first or last iterator in the range. For random-access iterators, you need to compute the difference before adding or subtracting an offset. It is not possible to perform the operation and check for an invalid operator afterwards.

Output iterators cannot be compared for equality. Therefore, it is impossible to write code that detects that it has been supplied an output area that is too small, and their use should be avoided.

These issues make some of the standard library functions difficult to use correctly, see Unpaired Iterators.