size t motivation

When reading the code of programmers who understand the given language better, it's a great idea to learn from what they do. At the same time, I don't like blindly doing what others are doing without understanding why. Today, we'll be looking at why using size_t for indices into arrays is a good idea.

std::size_t can store the maximum size of a theoretically possible object of any type (including arrays).

As previously mentioned, sizeof(T) is not the size we're talking about here. Instead, we mean the total amount of memory owned by the object, which includes both the stack and heap memory, i.e., its entire memory footprint. Let's call that mem_fp(obj). Therefore, we can say that mem_fp(obj) <= max_size(size_t).

Why it's used for indexing

If you write a for loop like this: for (int i = 0; i < vec.size(); ++i), the type of i is int, and the other is size_t. This means a type conversion needs to occur, which is not ideal because conversions can introduce inefficiencies or errors, such as potential loss of data when converting from larger to smaller types.

Additionally, some systems can hold more data in memory during program execution, which is the difference between 32-bit and 64-bit platforms. Due to this, size_t is used to represent the size of any object in memory, as it is platform-dependent and is guaranteed to be large enough to do so. Since any vector, regardless of its mem_fp, is an object, it is guaranteed that size_t can represent its mem_fp. Specifically, we have an equation for mem_fp(v), where v is a std::vector<T>, which is given by mem_fp(v) = v.size() * mem_fp(T). This implies that v.size() <= mem_fp(v) (where equality only holds true when mem_fp(T) = 1). Since any valid index would be an element in the range 0, ..., v.size() - 1 < max_size(size_t), it is guaranteed that no matter what index is computed, it will be a valid element of size_t.

Because of the above paragraph, if you were to just use int or unsigned int, it's possible that the largest value of those types might not be able to reach the last index without overflowing because int and unsigned int may not be large enough to hold the maximum possible size of a large array, leading to overflow or undefined behavior. This is why size_t is preferred.

Note that in practice using int or unsigned int is probably fine as long as you don't have huge objects which is probably why you haven't encountered errors related to this, but for the cases that have not yet occurred the code may not be safe.


edit this page