Algebrical Data Types in C++
Update: next post is here
The subtitle of this post could be C++11 is cool, but let's start from the beginning.
The starting point is the long standing argument on reporting error states of functions by means of exceptions or by multivalued functions. Recent languages such as Go by Google and Swift by Apple prefer to manage error states in the second form, while C++ and Java are biased toward exceptions (See [6] for a discussion).
Multivalued functions are based on the concept of sum-types, also called algebrical types in other languages such as Haskell, or tagged unions. A sum-type is defined by a list of types, and a sum-type variable contains a typed value from such a list. A sum-type is similar to unions, except that the current type of the union is managed automatically; also a sum-type can be considered complementary to C++11 tuples. In particular we can say:
The post uses several C++11 features, in particular:
Before describing the variadic solution to sum-types, we will look into a similar concept that is optional types. An optional variable is associated to a single type value, or to none. Optional is present natively in other languages such as C# and Swift, and it is provided in C++ by the boost library.
The subtitle of this post could be C++11 is cool, but let's start from the beginning.
The starting point is the long standing argument on reporting error states of functions by means of exceptions or by multivalued functions. Recent languages such as Go by Google and Swift by Apple prefer to manage error states in the second form, while C++ and Java are biased toward exceptions (See [6] for a discussion).
Multivalued functions are based on the concept of sum-types, also called algebrical types in other languages such as Haskell, or tagged unions. A sum-type is defined by a list of types, and a sum-type variable contains a typed value from such a list. A sum-type is similar to unions, except that the current type of the union is managed automatically; also a sum-type can be considered complementary to C++11 tuples. In particular we can say:
- multi-type, single-value: sum-type/variant
- multi-type, multi-value: tuple
- single-type, multi-value: vector or array
- boost variant (stored in heap or stack)
- new cpp post (using C++11)
The following implementation is independent of the newcpp post except for the alignment management.
The post uses several C++11 features, in particular:
- variadic templates
- union with any content
- delegation of constructors
Before describing the variadic solution to sum-types, we will look into a similar concept that is optional types. An optional variable is associated to a single type value, or to none. Optional is present natively in other languages such as C# and Swift, and it is provided in C++ by the boost library.
Optional
Optional is obtained by two concepts that will be reused in the sum-types. First the new placement operator that allows to construct an object over a previously allocated location, and the direct invocation of the destructor.
The storage of optional is:
bool valid;
union {
T value; // C++11: non trivial constructor
};
union {
T value; // C++11: non trivial constructor
};
Then the allocation and destructions are:
new (&value) T(x);
if (valid)
value.~T();
Sum-types
A sum-type is defined by a list of possible types that can be assumed by the sum-type and this is the perfect case for C++ Variadic Templates. The different types will be called subtypes.
template
class sumtype;
We want to provide the following features:
- storage on the stack, no heap
- type-safe access
- optional variant of sumtype
The interface of the sumtype will be:
template
class sumtype
{
public:
template
struct byindex : find_index {}; // returns the type by index
template
struct bytype : find_type {}; // returns -1 at compile time if not fond
template
struct bytypeex: find_type {}; // raises compile time exception if not found
~sumtype();
sumtype(const sumtype & other);
template
sumtype(const T & what);
template
sumtype(T && what);
sumtype & operator= (const sumtype & other);
sumtype & operator= (sumtype && other);
template
T & gettype();
template
typename byindex::type & getindex();
template
bool is() const;
int size() const;
int index() const;
};
class sumtype
{
public:
template
struct byindex : find_index {}; // returns the type by index
template
struct bytype : find_type {}; // returns -1 at compile time if not fond
template
struct bytypeex: find_type {}; // raises compile time exception if not found
~sumtype();
sumtype(const sumtype & other);
template
sumtype(const T & what);
template
sumtype(T && what);
sumtype & operator= (const sumtype & other);
sumtype & operator= (sumtype && other);
template
T & gettype();
template
typename byindex::type & getindex();
template
bool is() const;
int size() const;
int index() const;
};
For obtaining the above features we need to provide the following basic elements at compile and run-time:
- Size of the biggest subtype, for the storage (compile time)
- From subtype to index (compile time)
- From index to subtype (compile time)
- From index to some operation over subtype (runtime)
The last operation is important to implement:
- destruction
- new placement, copy constructor and assignment
- size of the current associated variable
Compile Time
The variadic templates are managed by means of a recursive pattern that allows to obtain the above features.
The simplest example is the biggest subtype:
template
struct find_biggest;
template
struct find_biggest
{
typedef First type;
};
template
struct find_biggest
{
typedef typename find_biggest::type next;
typedef typename std::conditional::value >= Pred::value,First,next>::type type;
};
This code takes advantage of the std::conditional that based on a boolean known at compile time selects among two types.
The recursive scheme allows to implement other two compile time operations: the type at a given index is obtained by recursively descending:
template
struct find_index;
template
struct find_index
{
typedef First type;
};
template
struct find_index
{
typedef typename find_index::type next;
typedef typename std::conditional::type type;
};
Finally a similar approach is used for obtaining the index based a type:
template
struct find_type;
template
struct find_type
{
enum { index = std::is_same::value ? current : -1 };
};
template
struct find_type
{
enum { index = std::is_same::value ? current : find_type::index };
};
Generalized Max (Updated)
The biggest subtype selection implementation works only with sizeof. The newcpp post provided the generalized approach based on a templated predicate:
template class Pred, typename... Args>
struct tmax;
template
struct Sizeof
{
static constexpr size_t value = sizeof(T);
};
typename tmax::type biggestsubtype;
Runtime
Runtime operations are obtained using a recursive pattern of the variadic but verifying the index at run-time. In this way any operation can be performed, such as new placement, copy construction and destruction. Depending on the operation requested we can throw an exception if there is no match.
We report here the interface of the recursive class, exemplifying only one method:
template
struct find_indexrun
{
typedef find_indexrun next;
static void destroy(int index, unsigned char * value);
static void ctorcopy(int index, unsigned char * value, const unsigned char * other);
static void ctormove(int index, unsigned char * value, const unsigned char * other);
static void assign(int index, unsigned char * value, const unsigned char * other);
static size_t size(int index);
};
For example the destruction is obtained as follows:
template
struct find_indexrun;
template
struct find_indexrun
{
static void destroy(int index, unsigned char * value)
{
if(index == current)
((First*)value)->~First();
}
...
};
template
struct find_indexrun
{
typedef find_indexrun next;
static void destroy(int index, unsigned char * value)
{
if(index == current)
((First*)value)->~First();
else
{
next::destroy(index,value);
}
}
....
};
Storage
The storage is provided by a char array sized with the biggest subtype.
typedef typename find_biggest::type bigtype;
typedef typename max::type bigtype;
unsigned char value[sizeof(bigtype)];
In my initial implementation alignment was not considered. I improved it using the newcpp solution (thanks): the subtype with the biggest alignment is found, and then the storage value is placed inside a union.
union
{
unsigned char value[sizeof(bigtype)];
typename tmax::type _align;
};
Construction
The construction employes copy/move constructor from any subtype or from the sumtype. The construction from subtype T is obtained by extracting the associated index and running the appropriate constructor. In the case of the sumtype constructor the action depends at run-time.
Access
Access is performed by type or by index with compile-time resolution of both the index or the type.
Optional Sum-type
We can combine sum-type with optional, allowing the case of empty sum-type. The following elements are atted to the interface of sumtype:
optional_sumtype(); // now we have an empty constructor
void clear(); // clears the content
operator bool(); // tells if there is any content
The original idea was to implement it by deriving from sumtype but the compiler does not like it, so we had to re-implement sumtype, or to add a template option to the sumtype class.
template
struct optional_sumtype: public sumtype
{
...
};
Limitations and future
There are two elements not addressed in this post. The first is the visitor pattern that allow to match the state of the sum-type with some function, in a way equivalent to the typed case of Haskell. This has been implemented in this post.
The second is the recursive pattern: the recursive pattern is only possible when the storage is performed through a pointer because the overall size of the data structure is not known apriori. This feature is covered by boost and newcpp implementation.
Conclusions
The code has been tested with clang 5.1 and g++ 4.9 from MacPort.
The source code is available on github.
Update: see the C++ standard proposal on Any types http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3804.html
Update: see the C++ standard proposal on Any types http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3804.html
References
[1] http://www.boost.org/doc/libs/1_55_0/doc/html/variant.html
[2] http://thenewcpp.wordpress.com/2012/02/15/variadic-templates-part-3-or-how-i-wrote-a-variant-class/
[3] http://www.boost.org/doc/libs/1_55_0/libs/optional/doc/html/index.html
[4] http://www.stroustrup.com/OOPSLA-typeswitch-draft.pdf
[5] http://ecn.channel9.msdn.com/events/GoingNative12/GN12VariadicTemplatesAreFunadic.pdf
[5] http://ecn.channel9.msdn.com/events/GoingNative12/GN12VariadicTemplatesAreFunadic.pdf
[6] http://dave.cheney.net/2012/01/18/why-go-gets-exceptions-right
Comments