Arrays and Vectors in C++

September 26, 2023

If you have been programming before, you have most likely used arrays. They can be found in most of the modern programming languages. This post is going to give you an overview of arrays and then dive into the C++ vector class.

The Array Data Structure

In computer science, an array is a data structure consisting of a collection of elements (values or variables), of same memory size, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array. Wikipedia

Thanks, Wikipedia! One key point is that an array is stored in a continuous space in memory. That, combined with the fact that the types of data stored in the values must be of the same data type, means that it becomes very easy to access array indices or iterate over them. Arrays store the base address and together with the fixed size of values, so the memory address of a value at index i can be easily computed with base-addr + i * size.

That means arrays have constant lookup time, no overhead per array item, almost no overhead for the array itself and make it very fast to access, iterate over or copy parts of the array. For more complex tasks like inserting items, sorting and searching, there are of course more efficient data structures, most of which are based on arrays.

Arrays are sometimes referred to as:

tuple
vector
matrix
tensor
table

Arrays in C/C++

Arrays in C use a zero-based index. They are a core data structure with a fixed size and values stored in a continuous memory location. C++ can use C-based arrays, but it also has an own array class with more functionality similar to vectors.

Declaration, Initialization and Assignment of Basic Arrays

// simple declaration
int a[10];
// initialization
int b[] = { 1, 2, 3, 4, 5 };
// initializing all elements to zero
int c[10] = {};
// assigning values to elements
a[0] = 1;
a[1] = 2;

Declaration can be done by specifying the type of the array values, followed by the variable name, followed by square brackets containing the array size. The size can be omitted when initializing the array with values.

When declaring an array without initializing the values, then the values will not be zero. They will have the value that has been on that memory address before. The array variable is a pointer to the address of the first element, so let’s have a look.

int a[1];

// This prints the address of the pointer to the first array element.
std::cout << a << endl; // 0x7ffcc66069c0

// This prints the value of the first element. It can be any integer value.
// The value will be whatever value had been on that memory address range before.
std::cout << a[0] << endl; // 72704
// We could also dereference the pointer to get the value of the first element.
srd::cout << *a << endl; // 72704

Array Bounds

C/C++ does not check array bounds when accessing values, it simply does not care.

int a[] = { 5, 6 };

cout << a[0] << ", " << a[1] << ", " << a[2] << endl;

This will print 5, 6, -81237 with the last value being any random integer that happens to be at the memory location right after the last array element.

Vectors in C++

Vectors are very similar to arrays in C++, with the key difference that they can grow and shrink dynamically. They are based on arrays, which means they are also stored in a continuous memory location and can be addressed by calculating the pointer based on the type and index.

One might think that adding items to vectors would mean the whole array has to be moved to a different, larger memory location, but that’s actually not always the case. Vectors usually allocate a bit more space than they need, so they can grow without moving the contents around.

The name “vector” is a bit misleading. Mathematically, a vector is like a matrix with only a single row and a fixed size. Vectors are sized dynamically and also heap allocated. Apparently the goal with the naming was to strongly indicate the difference between regular arrays and vectors.

Performance Considerations

Vectors are relatively efficient when adding or removing elements at their end or accessing elements. They are not very efficient when adding or removing elements that are not at the end or the vector. For efficient iteration, insertion or deletion of any middle element there are other types that are better suited.

Using Vectors

Vectors are part of the C++ Standard Library, more specifically of the Containers Library.

There are many ways of creating vectors.

#include <vector>

int main() {
    vector<int> v1; // creates an empty vector
    vector<int> v2(3); // creates a vector with size 3
    vector<int> v3{1, 2, 3}; // creates a vector with an initializer list
    vector<int> v4(v3); // creates a vector, copying the elements from v3
    vector<int> v5(v4.begin(), v4.end()); // creates a vector, copying the elements from v4
    vector<int> v6(3, 4); // creats a vector with size 3 and fills it with copies of the value 4
}

Vectors can be iterated over just like arrays.

#include <vector>

int main() {
    vector<int> v{1,2,3};

    for (auto i : v) {
        std::cout << i;
    }

    for (int i = 0; i < v.size(); i++) {
        std::cout << v[i];
    }
}

Useful Vector Methods

The vector class provides a lot of methods to interact with an instance. I will go over a few of the methods as provided on the C++ reference page.

Capacity

Vectors have both a capacity and a size. The system will reserve a certain amount of memory that can be larger than the actual size, so the vector doesn’t have to be reallocated constantly when adding new elements.

size_t size() const noexcept;
size_t capacity() const noexcept;

size() returns the number of items in the vector while capacity() returns the number of items that can fit in the allocated space. Capacity is always equal or greater than size.

void resize(size_t n);
void resize(size_t n, const value_type& val);

resize() changes the size of the vector to n, either dropping elements at the end or filling the missing space with copies of val or the default constructor, if val is not provided.

bool empty() const noexcept;

empty() returns true if the vector is empty and false if it’s not. This does not modify the vector, for that you can use clear();

void reserve(size_t n);

reserve() allocates space for n vector items. The resulting vector capacity can be equal to or greater than n.

Element Access

Vectors provide a few additional methods to access elements besides the commonly used operator[].

reference at(size_t n);
const_reference at(size_t n) const;

at() is similar to operator[] operator, it returns the element at the given index. The key difference is that at() performs a boundary check and will raise an exception if the index is out of bounds.

reference front();
const_reference front() const;
reference back();
const_reference back() const;

Quite simple, front() returns the first element in the vector and back() returns the last element. The docs state that using this on an empty vector causes undefined behavior, which sounds a tiny bit scary.

value_type* data() noexcept;
const value_type* data() const noexcept;

Earlier I stated that vectors are based on arrays and data() gives you a pointer to the first value of the underlying array. It can be used just like any other array.

Modifiers

void assign(iterator first, iterator last); // range
void assign(size_t n, const value_type& val); // fill
void assign(initializer_list<value_type> il); // initializer-list

assign() destroys the current content of the vector and replaces it with new content. The three versions work similar to the constructor versions used in the example above.

void push_back(const value_type& val);
void push_back(value_type&& val);

push_back() lets you add an element to the end of the vector at amortized constant complexity. Amortized because reallocation may happen, and it happens with complexity linear to the size of the vector.

void pop_back();

pop_back() simply removes the last element of the vector and destroys it, reducing the vector size by one. This happens with constant complexity. Using this on an empty vector is another case of undefined behavior.

iterator insert(iterator position, const value_type &val); // copy
iterator insert(iterator position, value_type &&val); // move
iterator insert(iterator position, iterator first, iterator last); // range
iterator insert(iterator position, size_t n, const value_type& val); // fill
iterator insert(iterator position, initializer_list<value_type> il); // initializer-list

insert() inserts one or multiple elements into the vector before the element at position and returns an iterator pointing to the first of the inserted elements. Inserting at any place other than the end of the vector is deemed inefficient compared to other types of storage. It’s quite similar to the constructor call signatures, or assign() (minus the position parameter). You can pass a value as copy or move, two iterators, an initializer list or an amount and fill value.

iterator erase(iterator position);
iterator erase(iterator start, iterator end);

erase() removes either a single element at position or a range of elements from first to last. It returns an iterator pointing to the element after the removed element(s), which can be the end() of the vector.