If you are reading this you want to know more about c pointers. That’s a good thing. Even if you don’t program in C very often, understanding pointers gives you a deeper understanding how programming and memory works “under the hood”. Learning pointers will make you a better programmer. In this post we will start with variables and memory. We will look at how that relates to pointers. We will talk about the “why” behind pointers. We will discuss pointer operations. Then we will finish up with different types of pointers you will encounter.
Let’s start simple. What is a variable? Most programmers will say a variable is a name for a piece of data that can change in a program. That’s true but it’s also just scratching the surface.
int main ( int argc , char * * argv ) // some variables int anum = 1 ; char achar = 'a' ;When a variable gets declared, memory to hold a variable of that type is allocated at an unused memory location. The location that is allocated is the variable’s memory address. For a compiler, a variable is a symbol for a starting memory address. The compiler knows two things about any variable, the name and the type. For int anum above, the anum is a symbol that gets translated to a memory address. The type, int , tells the compiler how much memory to store starting at that address.
A C compiler converts C source code to assembly source code. During that conversion variable names are converted to relative memory addresses. Here is an example in assembly of the code above. Don’t worry, you don’t need to know assembly to know pointers. This is just an example to show what happens.
Assembly (x86) mov rbp , rsp mov DWORD PTR [ rbp - 20 ] , edi mov QWORD PTR [ rbp - 32 ] , rsi mov DWORD PTR [ rbp - 4 ] , 1 mov BYTE PTR [ rbp - 5 ] , 97 mov eax , 0Three things to notice. The DWORD,BYTE labels, the [rbp-4], [rbp-5] pieces, and the values 1, 97 . The rbp is a base pointer. For our discussion, think of it like a starting point, a starting memory address. The [rbp-4], [rbp-5] are relative offsets, minus 20 and minus 4, from the starting point. The DWORD, BYTE are sizes, number of bytes to store. On my machine, a DWORD is 4-bytes, 32-bits, and a BYTE is 1-byte, 8-bits.
Put this all together and mov DWORD PTR [rbp-4], 1 says store 4 bytes with the value 1 starting at the relative offset [rbp-4] , and mov BYTE PTR [rbp-5], 97 says store 1 byte with the value 97, the ascii value for ‘a’, starting at the offset [rbp-5] . When the program runs, the offsets like [rbp-4] , are changed to actual memory addresses. The key takeaway is this. To a compiler all variables are just memory addresses and sizes.
To a compiler all variables are just memory addresses and sizes
C programs have different types of variables including ints, floats, arrays, chars, structs, and pointers. An int holds an integer number, a float holds a floating point decimal number. Arrays hold multiple values. A pointer is a variable that holds the memory address of another variable. It’s that simple. Above the int variable anum above holds the number 1 which is 4 bytes stored by the compiler at a starting at the relative offset [rbp-4] . When the program runs that offset might be the real memory address 0x1234 . A pointer to anum would hold the value 0x1234 .
Why do pointers exist? Why do we need them? The simple answer is efficiency. Back when C was created, computers were much slower. Most programs were written in assembly. Programmers needed to be much more efficient at solving problmes.
The more detailed answer has to do with call semantics. The C language is call-by-value. When you call a function in C, the value of any parameters are literally copied into the function’s call stack. Pass an int, 4-bytes are copied into the function. Pass a char and 1-byte is copied into the function. What happens when you need to pass a 100k element int array into a function? You don’t want to have to copy the 400,000 bytes into a function. That is really inefficient. Instead you have a pointer which references the array. The pointer, all 4 or 8 bytes of it, is copied into the function where it can be dereferenced and the array accessed. Same goes for large structs. Don’t pass a copy of the large struct in, pass in a pointer to the struct.
There are two main operators for working with pointers. The * operator and the & operator. There is also the -> operator but we will get to that later.
The * operator is used when declaring a pointer and when dereferencing a pointer. Declaring a pointer is like declaring any other variable. The compiler allocates spaces for the pointer. The size of a pointer, the number of bytes that are used to store each pointer, is dependent on the architecture of the machine. For 32-bit systems, pointers will be 4-bytes or 32-bits. For 64 bit systems, like most are these days, pointers will be 8-bytes or 64-bits.
The & operator is used to get the address of another variable. It is used to assign a value to a pointer. Putting the & operator in front of another variable returns a pointer to that variable of the type of that variable.
Take the following code which shows some simple usage of the * and & operators.