In C++ allocators are special classes that practically do what their name suggests, they allocate and deallocate memory. They are used mainly by STL containers (vector, map etc) and they act as an interface/guide for the container’s memory management.
Allocators is a somewhat hidden feature that most C++ programmers don’t bother messing around with. For the most cases the default STL allocator (std::allocator) is enough and it works just fine. For some specific cases though where performance is essential developers create their own allocators that work around some design problems and limitations the default allocator has.
This article presents some consepts relevant to game development in C++11. More specifically:
1. Create a custom allocator that resembles std::allocator
2. Why replace the default allocator?
3. Stack allocator
Create a custom allocator that resembles std::allocator
The first section is introductory and will present a way to create an custom allocator that is similar to std::allocator and adds some debugging information on top of that.
An STL compatible allocator is a template class of the form:
template<typename T> class Allocator
it must contain some typedefs on public scope
typedef size_t size_type; typedef ptrdiff_t difference_type; typedef T* pointer; typedef const T* const_pointer; typedef T& reference; typedef const T& const_reference; typedef T value_type;
and it must contain all of the following methods
/// Default constructor Allocator() throw(); /// Copy constructor Allocator(const Allocator& other) throw(); /// Copy constructor with another type template<typename U> Allocator(const Allocator<U>&) throw(); /// Copy Allocator<T>& operator=(const Allocator& other); /// Copy with another type template<typename U> Allocator& operator=(const Allocator<U>& other); /// Get address of a reference pointer address(reference x) const; /// Get const address of a const reference const_pointer address(const_reference x) const; /// Allocate n elements of type T pointer allocate(size_type n, const void* hint = 0); /// Free memory of pointer p void deallocate(void* p, size_type n); /// Call the constructor of p void construct(pointer p, const T& val); /// Call the constructor of p with many arguments. C++11 template<typename U, typename... Args> void construct(U* p, Args&&... args); /// Call the destructor of p void destroy(pointer p); /// Call the destructor of p of type U template<typename U> void destroy(U* p); /// Get the max allocation size size_type max_size() const /// A struct to rebind the allocator to another allocator of type U template<typename U> struct rebind { typedef Allocator<U> other; };
Now that we know how an allocator should look like we can create one that wraps malloc() and free(). It also tracks the allocations and informs the user if he forgot to deallocate some space.
/// Define a global that holds the allocated size extern size_t gAllocatedSize; /// Default constructor Allocator() throw() {} /// Copy constructor Allocator(const Allocator&) throw() {} /// Copy constructor with another type template<typename U> Allocator(const Allocator<U>&) throw() {} /// Destructor ~Allocator() {} /// Copy Allocator<T>& operator=(const Allocator&) { return *this; } /// Copy with another type template<typename U> Allocator& operator=(const Allocator<U>&) { return *this; } /// Get address of reference pointer address(reference x) const { return &x; } /// Get const address of const reference const_pointer address(const_reference x) const { return &x; } /// Allocate memory pointer allocate(size_type n, const void* = 0) { size_type size = n * sizeof(value_type); gAllocatedSize += size; return (pointer)::malloc(size); } /// Deallocate memory void deallocate(void* p, size_type n) { size_type size = n * sizeof(T); gAllocatedSize -= size; ::free(p); } /// Call constructor void construct(pointer p, const T& val) { // Placement new new ((T*)p) T(val); } /// Call constructor with more arguments template<typename U, typename... Args> void construct(U* p, Args&&... args) { // Placement new ::new((void*)p) U(std::forward<Args>(args)...); } /// Call the destructor of p void destroy(pointer p) { p->~T(); } /// Call the destructor of p of type U template<typename U> void destroy(U* p) { p->~U(); } /// Get the max allocation size size_type max_size() const { return size_type(-1); } /// A struct to rebind the allocator to another allocator of type U template<typename U> struct rebind { typedef Allocator<U> other; };
An example of usage for Allocator is the following:
std::vector<Foo, Allocator<Foo>> vec; vec.resize(10, Foo(something)); std::cout << "Allocated size: " << gAllocatedSize << std::endl;
You can check the gAllocatedSize at the end of the application and report any memory wasted.
Why replace the default allocator?
The answers on that question are many but for a game engine the rationale is simple. To understand the problem better we need first to understand that malloc and free work optimally when we allocate big chunks and we don’t call them that often. Calling malloc many times for small blocks will result on memory fragmentation. Fragmentation means that data will be scattered all over place a fact that affects the memory locality. Increased fragmentation implies more cache misses and thus slower reads from memory. On top of that the system will need more system memory because of the unused memory holes.
Another problem comes from game consoles needs. On consoles the memory is pretty limited and requires strict management. Wasting memory is not an option. Also allocating more memory than we have is not an option as well. For those reasons console game engines can trigger various events when memory is running low like freeing non essential assets. Other reasons may be super optimal alighment etc.
Another reason may be memory debugging. In that case all the allocations go through a single allocator and at some point (maybe at program’s end) we can identify memory leaks. This is what we described on the Allocator example above.
In AnKi the scene consists of a number of scene nodes. The actual number of nodes is high and each node is a relativity small sized class. Once it’s time to load a level AnKi allocates a big number of objects and this implies a high number of allocations. When the scene is no longer needed then a number of deallocations must happen to free memory which is slow.
To optimize the allocation/deallocation of scene nodes a new concept must take in place. What if we allocate a big chunk of memory up front and use that for all the small allocations? I sounds reasonable. When the scene is going to get freed we expect all scene nodes to be freed. That property of the scene implies a single deallocation as well.
For that we created a stack allocator…
Stack allocator
The stack allocator is essentially a prellocated chunk of memory that has an (annoying) characteristic: the allocations/deallocations must be in a LIFO order. This characteristic obviously bind us to the fact that we cannot deallocate whenever we want but on the other hand its extremely fast.
Before we create an allocator we need to create a memory pool. This memory pool is a class that holds:
1. the preallocated memory,
2. a pointer to the top of the stack and
3. a few other things.
To make things more interesting the memory pool is fully thread safe by using atomic operations, curtesy of C++11 as well.
The class definition (as it is now in AnKi’s source code tree) is the following:
/// Thread safe memory pool class StackMemoryPool: public NonCopyable { public: /// Default constructor StackMemoryPool(PtrSize size, U32 alignmentBits = 16); /// Move StackMemoryPool(StackMemoryPool&& other) { *this = std::move(other); } /// Destroy ~StackMemoryPool(); /// Move StackMemoryPool& operator=(StackMemoryPool&& other); /// Access the total size PtrSize getSize() const { return memsize; } /// Get the allocated size PtrSize getAllocatedSize() const; /// Allocate memory /// @return The allocated memory or nullptr on failure void* allocate(PtrSize size) throw(); /// Free memory in StackMemoryPool. If the ptr is not the last allocation /// then nothing happens and the method returns false /// /// @param[in] ptr Memory block to deallocate /// @return True if the deallocation actually happened and false otherwise Bool free(void* ptr) throw(); /// Reinit the pool. All existing allocated memory will be lost void reset(); private: /// Pre-allocated memory chunk U8* memory = nullptr; /// Size of the pre-allocated memory chunk PtrSize memsize = 0; /// Points to the memory and more specifically to the top of the stack std::atomic<U8*> top = {nullptr}; /// Alignment of allocations U32 alignmentBits; /// Calculate tha aligned size of an allocation PtrSize calcAlignSize(PtrSize size) const; };
The class is non-copyable, to be more precise it’s the definition of non-copyable. The most important methods are the allocate() and free(). Both of them do not throw exceptions for reasons that are out of the scope of this article. The alignment is very important and it’s been used so the allocate() outputs correctly allocated memory. If the memory was not allocated there is a high risk for the application to crash. Also for some systems it can be useful for speed.
The implementation of the class is more interesting:
//============================================================================== struct MemoryBlockHeader { U32 size; }; //============================================================================== StackMemoryPool::StackMemoryPool(PtrSize size_, U32 alignmentBits_) : memsize(size_), alignmentBits(alignmentBits_) { ANKI_ASSERT(memsize > 0); memory = (U8*)::malloc(memsize); if(memory != nullptr) { top = memory; } else { throw ANKI_EXCEPTION("malloc() failed"); } } //============================================================================== StackMemoryPool::~StackMemoryPool() { if(memory != nullptr) { ::free(memory); } } //============================================================================== StackMemoryPool& StackMemoryPool::operator=(StackMemoryPool&& other) { if(memory != nullptr) { ::free(memory); } memory = other.memory; memsize = other.memsize; top.store(other.top.load()); alignmentBits = other.alignmentBits; other.memory = nullptr; other.memsize = 0; other.top = nullptr; return *this; } //============================================================================== PtrSize StackMemoryPool::getAllocatedSize() const { return top.load() - memory; } //============================================================================== void* StackMemoryPool::allocate(PtrSize size_) throw() { // memory is nullptr if moved ANKI_ASSERT(memory != nullptr); PtrSize size = calcAlignSize(size_ + sizeof(MemoryBlockHeader)); ANKI_ASSERT(size < std::numeric_limits<U32>::max() && "Too big allocation"); U8* out = top.fetch_add(size); if(out + size <= memory + memsize) { // Write the block header ((MemoryBlockHeader*)out)->size = size; // Set the correct output out += sizeof(MemoryBlockHeader); } else { // Error out = nullptr; } return out; } //============================================================================== Bool StackMemoryPool::free(void* ptr) throw() { // memory is nullptr if moved ANKI_ASSERT(memory != nullptr); // Correct the p U8* realptr = (U8*)ptr - sizeof(MemoryBlockHeader); // realptr should be inside the pool's preallocated memory ANKI_ASSERT(realptr >= memory && realptr < memory + memsize); // Get block size U32 size = ((MemoryBlockHeader*)realptr)->size; // Atomic stuff U8* expected = realptr + size; U8* desired = realptr; // if(top == expected) { // top = desired; // exchange = true; // } else { // expected = top; // exchange = false; // } Bool exchange = top.compare_exchange_strong(expected, desired); return exchange; } //============================================================================== void StackMemoryPool::reset() { // memory is nullptr if moved ANKI_ASSERT(memory != nullptr); top = memory; } //============================================================================== PtrSize StackMemoryPool::calcAlignSize(PtrSize size) const { return size + (size % (alignmentBits / 8)); }
The constructor and destructor are pretty trivial.
When allocate() is been called the pool calculates the size of the block to allocate. The block consists from a header (see MemoryBlockHeader) plus the requested memory. The header in this case consists from just a number, the size of the allocated bytes (including the header). The size is kept because the free() will not be called with size just like the standard free(). At this moment take a breath. Why call free() without size when the allocator’s deallocate() method passes the size? The truth is that the StackMemoryPool does not need the block header if it’s been used only in allocators but in AnKi it’s been used for other stuff as well. You can ignore the block header from now on.
The free() decrements the top of the stack but in a thread-safe way. We mentioned before that the allocations and deallocations should happen in LIFO order. If the requested deallocation in free() is not the last then the top will not change and false will be returned. The atomic operation compare_exchange_strong() is explained in the comments and it’s essetial for making this operation thread safe.
The calcAlignSize() is been used to align the given size for reasons mentioned before.
Now that we have the memory pool we can create the stack allocator. The code is the following:
/// Stack based allocator /// /// @tparam T The type /// @tparam deallocationFlag If true then the allocator will try to deallocate /// the memory. This is extremely important to /// understand when it should be true. See notes /// @tparam alignmentBits Set the alighment in bits /// /// @note The deallocationFlag can brake the allocator when the deallocations /// are not in the correct order. For example when deallocationFlag==true /// and the allocator is used in vector it is likely to fail. /// /// @note Don't ever EVER remove the double copy constructor and the double /// operator=. The compiler will create defaults template<typename T, Bool deallocationFlag = false, U32 alignmentBits = 16> class StackAllocator { template<typename U, Bool deallocationFlag_, U32 alignmentBits_> friend class StackAllocator; public: // Typedefs typedef size_t size_type; typedef ptrdiff_t difference_type; typedef T* pointer; typedef const T* const_pointer; typedef T& reference; typedef const T& const_reference; typedef T value_type; /// Default constructor StackAllocator() throw() {} /// Copy constructor StackAllocator(const StackAllocator& b) throw() { *this = b; } /// Copy constructor template<typename U> StackAllocator( const StackAllocator<U, deallocationFlag, alignmentBits>& b) throw() { *this = b; } /// Constuctor with size StackAllocator(size_type size) throw() { mpool.reset(new StackMemoryPool(size, alignmentBits)); } /// Destructor ~StackAllocator() {} /// Copy StackAllocator& operator=(const StackAllocator& b) { mpool = b.mpool; return *this; } /// Copy template<typename U> StackAllocator& operator=( const StackAllocator<U, deallocationFlag, alignmentBits>& b) { mpool = b.mpool; return *this; } /// Get the address of a reference pointer address(reference x) const { return &x; } /// Get the const address of a const reference const_pointer address(const_reference x) const { return &x; } /// Allocate memory pointer allocate(size_type n, const void* hint = 0) { ANKI_ASSERT(mpool != nullptr); (void)hint; size_type size = n * sizeof(value_type); void* out = mpool->allocate(size); if(out != nullptr) { // Everything ok } else { throw ANKI_EXCEPTION("Allocation failed. There is not enough room"); } return (pointer)out; } /// Deallocate memory void deallocate(void* p, size_type n) { ANKI_ASSERT(mpool != nullptr); (void)p; (void)n; if(deallocationFlag) { Bool ok = mpool->free(p); if(!ok) { throw ANKI_EXCEPTION("Freeing wrong pointer. " "The deallocations on StackAllocator should be in order"); } } } /// Call constructor void construct(pointer p, const T& val) { // Placement new new ((T*)p) T(val); } /// Call constructor with many arguments template<typename U, typename... Args> void construct(U* p, Args&&... args) { // Placement new ::new((void *)p) U(std::forward<Args>(args)...); } /// Call destructor void destroy(pointer p) { p->~T(); } /// Call destructor template<typename U> void destroy(U* p) { p->~U(); } /// Get the max allocation size size_type max_size() const { ANKI_ASSERT(mpool != nullptr); return mpool->getSize(); } /// A struct to rebind the allocator to another allocator of type U template<typename U> struct rebind { typedef StackAllocator<U, deallocationFlag, alignmentBits> other; }; /// Reinit the allocator. All existing allocated memory will be lost void reset() { ANKI_ASSERT(mpool != nullptr); mpool->reset(); } const StackMemoryPool& getMemoryPool() const { ANKI_ASSERT(mpool != nullptr); return *mpool; } private: std::shared_ptr<StackMemoryPool> mpool; }; /// Another allocator of the same type can deallocate from this one template<typename T1, typename T2, Bool deallocationFlag, U32 alignmentBits> inline bool operator==( const StackAllocator<T1, deallocationFlag, alignmentBits>&, const StackAllocator<T2, deallocationFlag, alignmentBits>&) { return true; } /// Another allocator of the another type cannot deallocate from this one template<typename T1, typename AnotherAllocator, Bool deallocationFlag, U32 alignmentBits> inline bool operator==( const StackAllocator<T1, deallocationFlag, alignmentBits>&, const AnotherAllocator&) { return false; } /// Another allocator of the same type can deallocate from this one template<typename T1, typename T2, Bool deallocationFlag, U32 alignmentBits> inline bool operator!=( const StackAllocator<T1, deallocationFlag, alignmentBits>&, const StackAllocator<T2, deallocationFlag, alignmentBits>&) { return false; } /// Another allocator of the another type cannot deallocate from this one template<typename T1, typename AnotherAllocator, Bool deallocationFlag, U32 alignmentBits> inline bool operator!=( const StackAllocator<T1, deallocationFlag, alignmentBits>&, const AnotherAllocator&) { return true; }
The StackAllocator contains lots of things but most of them are standard allocator things. StackAllocator is essentialy a wrapper on top of StackMemoryPool. The only interesting thing about it is the deallocationFlag. If the allocator is used in the propper LIFO way then we can set this flag to 1, if we set it to 0 then the allocator will ignore deallocations. Why we ignore deallocations? This happens because the STL containers don’t work in a LIFO order and trying to deallocate like that is for the most cases useless.
One other thing to mention is that the mpool member variable is shared_ptr, this makes the StackAllocator totaly thread safe.
An example of StackAllocator usage follows:
// With strings StackAllocator<char, false> alloc(128); typedef std::basic_string<char, std::char_traits<char>, StackAllocator<char, false>> Str; Str str(alloc); str = "lalala"; str = "lalalalo"; // With Foo class StackAllocator<Foo> fooalloc(128); std::vector<Foo, StackAllocator<Foo>> vec(fooalloc); for(int i = 0; i < 10; i++) { vec.push_back(Foo(i)); }
That’s all for now. Comments and suggestions on the comments section bellow.