A walk on memory
Python memory walk
Today, for lots of the programmers, memory is place where we store our variables. In most of very high languages memory management is automatic and is done by the programming language itself via an internal process called garbage collection. Memory management sounds like an advanced fancy concept. Honestly it is. Lots of literature exists in the field of computer science dedicated to the topic. While an extensive field and complex if we take time to delve into the intricacies of how the concept of memory is implemented at it lowest level the idea is straightforward. Memory management simply means a set of techniques to reserve and dispose of memory in a computer system.
Automatic memory management simply means that the programmer just needs to reserve memory when it needs and the language will be responsible to free the used memory when it is no longer needed.
For this reason there is a lot of programmers that look to memory like a big bag in which they are able to store new data as they need.
As an example of a very high language we can consider Python. In the following example
class Point:
_x=0
_y=0
def __init__(self,x,y):
self._x=x
self._y=y
def __str__(self):
return "{x: "+str(self._x)+",y:"+str(self._y)+"}"
p1=Point(1,1)
print(p1)
This will create as output the following
python memory.py
{x: 1,y:1}
You may ask. Well, ok thats makes sense, your point being? Glad you ask. In the context of memory management this is an example of memory allocation without the respective memory cleanup. This does not means, however, that the cleanup process is not taking place. It just means that you, as a programmer don't need to care.
Now lets add the following as method in the body of Point class. And lets run again
def __del__(self):
print("You dont't need to clean your mess. Python will do it for you")
now you'll notice the following
python memory.py
{x: 1,y:1}
You didn't need to call me python will do it for you
As expected python being a very nice guy will pray for your sins and as such will manage the memory for you.
Interesting. If you have contact with the design of operative systems (OS) you'll notice that the concept process is a core mechanism. The process in a nutshell is the representation of our programs in memory and it is created by the kernel. Also from OS design we know that memory is far from being a simple black box where we put our data. It is a complex machine. In particular we know that actually the memory of a process is divided in segments, those segments define memory intervals where data is stored. For simplicity we usually think about two main categories, the heap and the stack of the process.
From computer science drawings we know that this is the memory layout we should expect.
So a obvious question arises. Where are the variables being stored?!
Lets create some data in memory
a=12
b=24
c=1231231
hello="Hello"
world="World"
p1=Point(1,32)
Now we just need to see where are they being stored. But how, right?. Well turns out that python has a special function id that we can use to get the memory address of a specific variable.
We create an utility function to help us visualize memory addresses of python variables
def p_mem(l, v):
print(l + ":\t" + hex(id(v))+"\t"+str(id(v)))
The first argument is a label and the second is the variable we want to inspect. The method will print on stdout the address in hexadecimal and decimal number representations.
But how, right?. To answer this question I'll assume that you are able to inspect the layout memory of your process. For that we need to ask the kernel. In unix based distributions we actually have a easy way to do it. We can use the /proc/PID/maps. For that you need to find the process id of your program and then use it instead of PID.
For that we just need to
ps aux | grep memory.py
balhau 20199 0.0 0.0 15936 7308 pts/10 S+ 11:27 0:00 python memory.py
So in my particular case PID=20199. The following command will get the memory segments of the process, and filter out those that pertain to the executable code and libraries loaded into memory, that we are not interested as of now.
cat /proc/201997/maps | grep -v lib | grep -v bin
56361173f000-563611763000 rw-p 00000000 00:00 0
563612362000-563612449000 rw-p 00000000 00:00 0 [heap]
7f8eeaedc000-7f8eeaf1c000 rw-p 00000000 00:00 0
7f8eeb48d000-7f8eeb603000 rw-p 00000000 00:00 0
7f8eeb798000-7f8eeb79c000 rw-p 00000000 00:00 0
7f8eeb98a000-7f8eeb990000 rw-p 00000000 00:00 0
7f8eeb9e8000-7f8eeb9e9000 rw-p 00000000 00:00 0
7ffc9a4a4000-7ffc9a4c6000 rw-p 00000000 00:00 0 [stack]
7ffc9a50d000-7ffc9a510000 r--p 00000000 00:00 0 [vvar]
7ffc9a510000-7ffc9a511000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
In an attempt to try to answer where is our data being stored we will create some variables
#Some small integers
a = 1
b = 2
c = 3
d = 5
e = 1
# Some not small integers
x = 1231231
y = 1231232
z = 1231233
t = 1231233
Now we can use the method created previously to inspect memory address for all the variables created
p_mem("a", a)
p_mem("b", b)
p_mem("c", c)
p_mem("d", d)
p_mem("e", e)
p_mem("x", x)
p_mem("y", y)
p_mem("z", z)
p_mem("t", t)
This give us the following output
a: 0x56361237bcd8 94790233865432
b: 0x56361237bcc0 94790233865408
c: 0x56361237bca8 94790233865384
d: 0x56361237bc78 94790233865336
e: 0x56361237bcd8 94790233865432
x: 0x5636123d2c90 94790234221712
y: 0x5636123d2c78 94790234221688
z: 0x5636123d2c60 94790234221664
t: 0x5636123d2c60 94790234221664
So the answer is, heap.
If we look closely we'll find some interesting patterns on the address of the variables. From a to e all the references are defined in the range 0x56361237bcXX. But there is another set of variables that are defined in another memory range 0x5636123d2XX. But, it gets even more interesting. If you create two new functions
def one_stack_frame():
a = 1
b = 2
c = 3
d = 5
e = 1
print("Stack Frame One")
p_mem("a", a)
p_mem("b", b)
p_mem("c", c)
p_mem("d", d)
p_mem("e", e)
print("Stack Frame One ended\n")
def two_stack_frame():
x = 1231231
y = 1231232
z = 1231233
t = 1231233
print("Stack Frame Two")
p_mem("x", x)
p_mem("y", y)
p_mem("z", z)
p_mem("t", t)
print("Stack Frame Two ended\n")
And call those after the previous code you'll end up with
Main Stack Frame - Small numbers
a: 0x55e7081becd8 94450761854168
b: 0x55e7081becc0 94450761854144
c: 0x55e7081beca8 94450761854120
d: 0x55e7081bec78 94450761854072
e: 0x55e7081becd8 94450761854168
Main Stack Frame - Big numbers
x: 0x55e708215c30 94450762210352
y: 0x55e708215cf0 94450762210544
z: 0x55e708215c18 94450762210328
t: 0x55e708215c18 94450762210328
Stack Frame One
a: 0x55e7081becd8 94450761854168
b: 0x55e7081becc0 94450761854144
c: 0x55e7081beca8 94450761854120
d: 0x55e7081bec78 94450761854072
e: 0x55e7081becd8 94450761854168
Stack Frame One ended
Stack Frame Two
x: 0x55e708215c90 94450762210448
y: 0x55e708215c78 94450762210424
z: 0x55e708215c60 94450762210400
t: 0x55e708215c60 94450762210400
Stack Frame Two ended
If you look closely you'll notice that the set of small numbers have the exactly same reference than the ones defined outside. However the set of bigger numbers have an entire set of references. If we go check into the python source code you'll find this
#define PYLONG_FROM_UINT(INT_TYPE, ival) \
do { \
if (IS_SMALL_UINT(ival)) { \
return get_small_int((sdigit)(ival)); \
} \
/* Count the number of Python digits. */ \
Py_ssize_t ndigits = 0; \
INT_TYPE t = (ival); \
while (t) { \
++ndigits; \
t >>= PyLong_SHIFT; \
} \
PyLongObject *v = _PyLong_New(ndigits); \
if (v == NULL) { \
return NULL; \
} \
digit *p = v->ob_digit; \
while ((ival)) { \
*p++ = (digit)((ival) & PyLong_MASK); \
(ival) >>= PyLong_SHIFT; \
} \
return (PyObject *)v; \
} while(0)
So it seems that in the case that the ival is IS_SMALL_UINT then the will use the get_small_int function to construct the object. Lets keep digging
#define IS_SMALL_UINT(ival) ((ival) < NSMALLPOSINTS)
...
#define NSMALLPOSINTS _PY_NSMALLPOSINTS
...
#define _PY_NSMALLPOSINTS 257
Interesting. So it appears that in the case the number is less than 257 (so at least 256) then the object representing the number will be given by the get_small_int function. So lets dig it
static PyObject *
get_small_int(sdigit ival)
{
assert(IS_SMALL_INT(ival));
PyObject *v = __PyLong_GetSmallInt_internal(ival);
Py_INCREF(v);
return v;
}
...
static inline PyObject* __PyLong_GetSmallInt_internal(int value)
{
PyInterpreterState *interp = _PyInterpreterState_GET();
assert(-_PY_NSMALLNEGINTS <= value && value < _PY_NSMALLPOSINTS);
size_t index = _PY_NSMALLNEGINTS + value;
PyObject *obj = (PyObject*)interp->small_ints[index];
// _PyLong_GetZero(), _PyLong_GetOne() and get_small_int() must not be
// called before _PyLong_Init() nor after _PyLong_Fini().
assert(obj != NULL);
return obj;
}
And this explains a lot. If you pay attention you'll notice that get_small_int is in the end returning an already available object from an internal cache.
PyObject *obj = (PyObject*)interp->small_ints[index];
...
PyLongObject* small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS];
Now a question remains. Is the small_ints stored in the stack or is on the heap?
To answer that lets build small program in c.
void p_mem(const char *label, void *pointer)
{
printf("%s:\t%p\t%ld\n", label, pointer, (long)pointer);
}
int* small_ints[265];
p_mem("small_ints",small_ints);
The output
small_ints: 0x7ffff5ebb030 140737319252016
belongs to the stack segment. So it appears that small_ints is stored in the stack. We also found
int
_PyLong_Init(PyInterpreterState *interp)
{
for (Py_ssize_t i=0; i < NSMALLNEGINTS + NSMALLPOSINTS; i++) {
sdigit ival = (sdigit)i - NSMALLNEGINTS;
int size = (ival < 0) ? -1 : ((ival == 0) ? 0 : 1);
PyLongObject *v = _PyLong_New(1);
if (!v) {
return -1;
}
Py_SET_SIZE(v, size);
v->ob_digit[0] = (digit)abs(ival);
interp->small_ints[i] = v;
}
return 0;
}
internally the _PyLong_New is using
result = PyObject_Malloc(offsetof(PyLongObject, ob_digit) +
size*sizeof(digit));
malloc. So in short the array is a stack member however the elements in that array are heap allocated. That explains why numbers in python are being stored in a process segment different than the stack.
But, lets back a bit. We noticed that something interesting when we call functions inside other functions. It seems that the stack shrinks when we create functions inside other functions. It also appears that the stack memory is freed when we exit a function.
Lets find out what happens on the stack when we define a method like this
def my_rec_one(level):
sf = sys._getframe(1)
if level > 0:
my_rec_one(level - 1)
p_mem("rec_" + str(level),sf)
Lets call my_rec_one(10)
python memory.py
rec_0: 0x7fe70190e3f0 140630140445680
rec_1: 0x7fe70190e220 140630140445216
rec_2: 0x7fe70190e050 140630140444752
rec_3: 0x7fe7018f2d00 140630140333312
rec_4: 0x7fe7018f2b30 140630140332848
rec_5: 0x7fe7018f2960 140630140332384
rec_6: 0x7fe7018f2790 140630140331920
rec_7: 0x7fe7018f25c0 140630140331456
rec_8: 0x7fe7018f23f0 140630140330992
rec_9: 0x7fe7018f2220 140630140330528
rec_10: 0x7fe7018eb210 140630140301840
By using the sf = sys._getframe(1) we can get the python interpreter current stack frame. Now we notice that the output points to the stack memory segment. We also notice, as expected, that the stack shrinks in size when we get deeper in the recursion level.
What if, instead of 10, we call this method with 1000 as an argument?
...
my_rec_one(level - 1)
RuntimeError: maximum recursion depth exceeded
So it appears that we can't keep recursion into infinity since python interpreter will use the stack of the process and as such will be limited by it as in the case of lower level languages like c/c++.
So in the end recursion sucks right? Well, not so fast. While general recursion algorithms do suffer from stack overflow problems due the increase of the stack there is, however hope.
In a very nice post from Chris Penner we can see an elegant solution for a particular case of recursion. Tail recursive functions. It turns out that if we adapt the code to inspect the stack
@tail_recursive
def factorial(n, accumulator=1):
sf = sys._getframe(1)
p_mem("rec_"+str(n),sf)
if n == 0:
return accumulator
recurse(n-1, accumulator=accumulator*n)
We will get
python memory.py
rec_10: 0x7f98b910d620 140293916644896
rec_9: 0x7f98b910d620 140293916644896
rec_8: 0x7f98b910d620 140293916644896
rec_7: 0x7f98b910d620 140293916644896
rec_6: 0x7f98b910d620 140293916644896
rec_5: 0x7f98b910d620 140293916644896
rec_4: 0x7f98b910d620 140293916644896
rec_3: 0x7f98b910d620 140293916644896
rec_2: 0x7f98b910d620 140293916644896
rec_1: 0x7f98b910d620 140293916644896
rec_0: 0x7f98b910d620 140293916644896
As we can see not all recursive algorithms are born equal, and in particular tail recursive ones are very efficient with due optimization.
Walking lower in C
After a nice walk on the higher end world of Python lets jump into another realm. Instead lets consider C. In C we have a bit more control on how we store our data in memory.
Consider the following piece of code
int main(int argc, char **argv)
{
int a = 12; p_mem("a", &a);
int b = 24; p_mem("b", &b);
int c = 48; p_mem("c", &c);
p_diff("b-a",&a,&b);
p_diff("c-b",&b,&c);
p_diff("c-a",&a,&c);
int x = 21; p_mem("x", &x);
int y = 42; p_mem("y", &y);
int z = 84; p_mem("z", &z);
p_diff("y-x",&x,&y);
p_diff("z-y",&y,&z);
p_diff("z-x",&x,&z);
}
This will print the following
a: 0x7fffb05fb860 140736152451168
b: 0x7fffb05fb864 140736152451172
c: 0x7fffb05fb868 140736152451176
b-a: p1:0x7fffb05fb860 p2:0x7fffb05fb864 diff: 4
c-b: p1:0x7fffb05fb864 p2:0x7fffb05fb868 diff: 4
c-a: p1:0x7fffb05fb860 p2:0x7fffb05fb868 diff: 8
x: 0x7fffb05fb86c 140736152451180
y: 0x7fffb05fb870 140736152451184
z: 0x7fffb05fb874 140736152451188
y-x: p1:0x7fffb05fb86c p2:0x7fffb05fb870 diff: 4
z-y: p1:0x7fffb05fb870 p2:0x7fffb05fb874 diff: 4
z-x: p1:0x7fffb05fb86c p2:0x7fffb05fb874 diff: 8
Lets wait a moment. Didn't we say that stack shrinks instead of growing? If we look closely we got the opposite. At first sight we could think that we are allocating variables in the heap. But by checking the memory segments of the process we can verify that we are indeed operating in the stack. So what is going on?
The devil is in the details. The stack shrinks in size. However stack increases/decreases at stack frame level. And what does this exactly means? Good question lad. This means that the stack is reserved during the construction of the main stack frame. Lets change the previous code slightly.
{
int a = 12; p_mem("a", &a);
int b = 24; p_mem("b", &b);
int c = 48; p_mem("c", &c);
p_diff("b-a",&a,&b);
p_diff("c-b",&b,&c);
p_diff("c-a",&a,&c);
}
{
int x = 21; p_mem("x", &x);
int y = 42; p_mem("y", &y);
int z = 84; p_mem("z", &z);
p_diff("y-x",&x,&y);
p_diff("z-y",&y,&z);
p_diff("z-x",&x,&z);
}
Albeit odd this is valid C code. And it is an interesting trick. Lets look again at the output
a: 0x7fffc751b13c 140736537407804
b: 0x7fffc751b140 140736537407808
c: 0x7fffc751b144 140736537407812
b-a: p1:0x7fffc751b13c p2:0x7fffc751b140 diff: 4
c-b: p1:0x7fffc751b140 p2:0x7fffc751b144 diff: 4
c-a: p1:0x7fffc751b13c p2:0x7fffc751b144 diff: 8
x: 0x7fffc751b13c 140736537407804
y: 0x7fffc751b140 140736537407808
z: 0x7fffc751b144 140736537407812
y-x: p1:0x7fffc751b13c p2:0x7fffc751b140 diff: 4
z-y: p1:0x7fffc751b140 p2:0x7fffc751b144 diff: 4
z-x: p1:0x7fffc751b13c p2:0x7fffc751b144 diff: 8
If you look closely the memory used by a,b,c variables is the same as that of x,y,z. This is because the first block defines a frame that has the scope defined by the brackets and the same for the second case. So this is equivalent to allocate the first set of variables on the stack and then cleaning the stack, this process is repeated on the second block.
But wait a minute. We didn't yet prove that stack shrinks. Fair enough. Lets try this then
void stack_frame2(void)
{
int b = 2;
p_mem("b", &b);
}
void stack_frame1(void)
{
int a = 1;
p_mem("a", &a);
stack_frame2();
}
stack_frame1();
This will output the following
a: 0x7fff8b570904 140735531124996
b: 0x7fff8b5708e4 140735531124964
And now we see it. The second allocation has a lower address value then the first allocation as expected. This exercise show us several interesting behaviors.
- Creation of stack frames actually will decrease the stack offset
- Inside a stack frame the compiler is free to order position of variables.
- This can give us a false impression that stack does not shrink or that it behaves non deterministically.
Heap/Stack speed
There is an old saying stack is faster than heap. But is it? Well lets see.
First lets build a small benchmark utility
#define MEASURE(label, function) \
{ \
printf("----Start [%s]----\n", label); \
clock_t start, end; \
double cpu_time_used; \
start = clock(); \
function(); \
end = clock(); \
cpu_time_used = ((double)(end - start)) / CLOCKS_PER_SEC; \
printf("----End [%s]----\n", label); \
printf("----Elapsed %f\n\n", cpu_time_used); \
}
Here we used the C preprocessor to group all that code under the macro called MEASURE.
Lets do a simple test. In this test we will
* Repeat MAX_ITER_LOOP_ARRAY times
* Create a heap array of size SIZE_ARRAY
* Iterate all entries of the array
* Create once a stack based array
* Iterate all entries of the array
#define MAX_ITER_LOOP_ARRAY 1000
#define SIZE_ARRAY 10000
int arrayOnHeap()
{
int i, j = 0;
int *array, *p;
int sum, aux = 0;
for (i = 0; i < MAX_ITER_LOOP_ARRAY; i++)
{
array = (int *)malloc(sizeof(int) * SIZE_ARRAY);
aux = sum - 12;
sum = (sum * i) ^ sum + aux;
for (j = 0; j < SIZE_ARRAY; j++)
{
*(array + j) = sum + j;
}
free(array);
}
return sum;
}
int arrayOnStack()
{
int array[SIZE_ARRAY];
int i, j = 0;
int sum=0;
int aux = 0;
for (i = 0; i < MAX_ITER_LOOP_ARRAY; i++)
{
aux = sum - 12;
sum = (sum * i) ^ sum + aux;
for (j = 0; j < SIZE_ARRAY; j++)
{
array[j] = sum + j;
}
}
return sum;
}
MEASURE("Array On Stack", arrayOnStack);
MEASURE("Array On Heap", arrayOnHeap);
If the old saying is true we should expect stack approach to be faster right?
----Start [Array On Stack]----
----End [Array On Stack]----
----Elapsed 0.036032
----Start [Array On Heap]----
----End [Array On Heap]----
----Elapsed 0.034358
Well, this is unexpected right? If heap allocation is a more complex process how the heck its faster? Lets probe those heap based arrays. For that we just add the following line
...
p_mem("Heap Array: ",array);
free(array);
...
Heap Array:: 0x7f2dc2d93010 139834519269392
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Heap Array:: 0x55d2b51df6b0 94363470132912
Oh I see. So malloc is clever enough to understand that previous free memory allocation is the same as the new allocation request. Well well, sneaky malloc (for more details check the source code) function.
This explains why the performance is comparable, even faster. Lets make a small tweak. Lets create a huge memory leak by removing the free call. If we remove an instruction the code should be even faster right? Well lets see
----Start [Array On Stack]----
----End [Array On Stack]----
----Elapsed 0.361855
----Start [Array On Heap]----
----End [Array On Heap]----
----Elapsed 0.523437
Wow. By removing code we make it slower? What the heck? Lets probe
Heap Array: : 0x7f3f62cbc010 139910217187344
Heap Array: : 0x7f3f628eb010 139910213185552
Heap Array: : 0x7f3f6251a010 139910209183760
Heap Array: : 0x7f3f62149010 139910205181968
Heap Array: : 0x7f3f61d78010 139910201180176
Heap Array: : 0x7f3f619a7010 139910197178384
Heap Array: : 0x7f3f615d6010 139910193176592
Heap Array: : 0x7f3f61205010 139910189174800
Heap Array: : 0x7f3f60e34010 139910185173008
Heap Array: : 0x7f3f60a63010 139910181171216
So, by removing the free we force malloc to actually allocate a new chunk of memory each time, instead of sneaky caching. That's why, counter intuitively removing code makes it slower, not faster.
The key lesson to take here is to understand that due to the complexities of computer systems and the mechanisms underneath the way we create our tests can vary significantly. The sensitivity to variance is, however, valuable since it give us insight about the underneath mechanisms. We should, however, maintain a conservative spirit in the face of benchmark results since they can be as insightful as deceitful·
Lifting lower tools
It is a shared common knowledge that lower level languages like C don't have the modern mechanisms that very high level languages like Python or Java give us in terms of memory management.
This shared knowledge is true. This fact is however sometimes confused with the stronger statement that by using lower level languages automatic memory management is out of equation. This second proposition while appealingly similar with the previous lacks of true. Automatic memory management is something that can be done with lower as well higher level languages. The question is not if but who will be responsible for doing it.
Lets consider the C language. In terms of memory allocation we know that we can use stack or heap to allocate our data. If we want to implement an automatic way of cleanup memory we only need to take into consideration the heap case since stack has an automatic cleanup mechanism which is done when the stack frame ends.
For this task we can use the help of a friend. The compiler. If we take some time to read through the compiler documentation we notice some advanced features. One feature worth consideration is the variable attribute.
If you check the gnu compiler documentation you'll find this
__attribute__((cleanup(callback_function)))
While a bit cryptic this is relatively easy to explain, the previous code means
- When a variable gets out of scope we call the callback_function
Why this is helpful?
Lets consider the following program
typedef struct Point
{
int x;
int y;
} Point;
int main(int argc, char **argv)
{
Point *p1 = (Point *)malloc(sizeof(Point));
p_mem("P1",p1);
Point *p2 = (Point *)malloc(sizeof(Point));
p_mem("P2",p2);
}
This program has a obvious problem. We got two variables that will end in memory leak instances. If we run the previous program in valgrind this will be the output
==273121== LEAK SUMMARY:
==273121== definitely lost: 16 bytes in 2 blocks
==273121== indirectly lost: 0 bytes in 0 blocks
==273121== possibly lost: 0 bytes in 0 blocks
==273121== still reachable: 0 bytes in 0 blocks
==273121== suppressed: 0 bytes in 0 blocks
==273121== Rerun with --leak-check=full to see details of leaked memory
We are leaking into memory because we created two Point object and we did not cleanup the memory. To sort this issue we need to adapt the code into
Point *p1 = (Point *)malloc(sizeof(Point));
p_mem("P1",p1);
Point *p2 = (Point *)malloc(sizeof(Point));
p_mem("P2",p2);
free(p1);
free(p2);
Now valgrind will happily report
==273337== HEAP SUMMARY:
==273337== in use at exit: 0 bytes in 0 blocks
==273337== total heap usage: 3 allocs, 3 frees, 1,040 bytes allocated
But yes, this work is being done by us. This is what most of programmers, understandably, dislike.
This is the part where the compiler come to help. The previous attribute can be used to create something like this macro
void scoped_free(void *pointer)
{
printf("Scoped free invoked %p\n",pointer);
void **pp = (void**)pointer;
free(*pp);
}
#define AUTO_FREE __attribute__((cleanup(scoped_free)))
...
AUTO_FREE Point *p1 = (Point *)malloc(sizeof(Point));
p_mem("P1",p1);
AUTO_FREE Point *p2 = (Point *)malloc(sizeof(Point));
p_mem("P2",p2);
If we run the previous program we will end up with
P1: 0x565280e472a0 94912349762208
P2: 0x565280e476d0 94912349763280
Scoped free invoked 0x7ffcf11f0210
Scoped free invoked 0x7ffcf11f0208
And Valgrind will be happy with us as well
==274435== HEAP SUMMARY:
==274435== in use at exit: 0 bytes in 0 blocks
==274435== total heap usage: 3 allocs, 3 frees, 1,040 bytes allocated
We just implemented the most basic automatic memory collection possible. This has several drawbacks (hence basic). The first is that we are forced to cleanup the variables when we got out of scope (here you can see a simple GC implemented in C).