Algebrical Data Types in C++

June 11, 2014

Algebrical Data Types in C++

Update: next post is here

The subtitle of this post could be C++11 is cool, but let's start from the beginning.

The starting point is the long standing argument on reporting error states of functions by means of exceptions or by multivalued functions. Recent languages such as Go by Google and Swift by Apple prefer to manage error states in the second form, while C++ and Java are biased toward exceptions (See [6] for a discussion).

Multivalued functions are based on the concept of sum-types, also called algebrical types in other languages such as Haskell, or tagged unions. A sum-type is defined by a list of types, and a sum-type variable contains a typed value from such a list. A sum-type is similar to unions, except that the current type of the union is managed automatically; also a sum-type can be considered complementary to C++11 tuples. In particular we can say:

multi-type, single-value: sum-type/variant
multi-type, multi-value: tuple
single-type, multi-value: vector or array

Existing implementation of sum-types:

boost variant (stored in heap or stack)
new cpp post (using C++11)

The following implementation is independent of the newcpp post except for the alignment management.

The post uses several C++11 features, in particular:

variadic templates
union with any content
delegation of constructors

Before describing the variadic solution to sum-types, we will look into a similar concept that is optional types. An optional variable is associated to a single type value, or to none. Optional is present natively in other languages such as C# and Swift, and it is provided in C++ by the boost library.

Optional

Optional is obtained by two concepts that will be reused in the sum-types. First the new placement operator that allows to construct an object over a previously allocated location, and the direct invocation of the destructor.

The storage of optional is:

bool valid;
union {
T value; // C++11: non trivial constructor
};

Then the allocation and destructions are:

new (&value) T(x);

if (valid)

value.~T();

Sum-types

A sum-type is defined by a list of possible types that can be assumed by the sum-type and this is the perfect case for C++ Variadic Templates. The different types will be called subtypes.

template

class sumtype;

We want to provide the following features:

storage on the stack, no heap
type-safe access
optional variant of sumtype

The interface of the sumtype will be:

template
class sumtype
{
public:
template
struct byindex : find_index {}; // returns the type by index
template
struct bytype : find_type {}; // returns -1 at compile time if not fond
template

struct bytypeex: find_type {}; // raises compile time exception if not found

~sumtype();
sumtype(const sumtype & other);

template
sumtype(const T & what);

template
sumtype(T && what);

sumtype & operator= (const sumtype & other);

sumtype & operator= (sumtype && other);

template
T & gettype();

template
typename byindex::type & getindex();

template
bool is() const;
int size() const;
int index() const;
};

For obtaining the above features we need to provide the following basic elements at compile and run-time:

Size of the biggest subtype, for the storage (compile time)
From subtype to index (compile time)
From index to subtype (compile time)
From index to some operation over subtype (runtime)

The last operation is important to implement:

destruction
new placement, copy constructor and assignment
size of the current associated variable

Compile Time

The variadic templates are managed by means of a recursive pattern that allows to obtain the above features.

The simplest example is the biggest subtype:

template

struct find_biggest;

template

struct find_biggest

{

typedef First type;

};

template

struct find_biggest

{

typedef typename find_biggest::type next;

typedef typename std::conditional::value >= Pred::value,First,next>::type type;

};

This code takes advantage of the std::conditional that based on a boolean known at compile time selects among two types.

The recursive scheme allows to implement other two compile time operations: the type at a given index is obtained by recursively descending:

template

struct find_index;

template

struct find_index

{

typedef First type;

};

template

struct find_index

{

typedef typename find_index::type next;

typedef typename std::conditional::type type;

};

Finally a similar approach is used for obtaining the index based a type:

template

struct find_type;

template

struct find_type

{

enum { index = std::is_same::value ? current : -1 };

};

template

struct find_type

{

enum { index = std::is_same::value ? current : find_type::index };

};

Generalized Max (Updated)

The biggest subtype selection implementation works only with sizeof. The newcpp post provided the generalized approach based on a templated predicate:

template

struct tmax;

template

struct Sizeof

{

static constexpr size_t value = sizeof(T);

};

typename tmax::type biggestsubtype;

Runtime

Runtime operations are obtained using a recursive pattern of the variadic but verifying the index at run-time. In this way any operation can be performed, such as new placement, copy construction and destruction. Depending on the operation requested we can throw an exception if there is no match.

We report here the interface of the recursive class, exemplifying only one method:

template

struct find_indexrun

{

typedef find_indexrun next;

static void destroy(int index, unsigned char * value);

static void ctorcopy(int index, unsigned char * value, const unsigned char * other);

static void ctormove(int index, unsigned char * value, const unsigned char * other);

static void assign(int index, unsigned char * value, const unsigned char * other);

static size_t size(int index);

};

For example the destruction is obtained as follows:

template

struct find_indexrun;

template

struct find_indexrun

{

static void destroy(int index, unsigned char * value)

{

if(index == current)

((First*)value)->~First();

}

...

};

template

struct find_indexrun

{

typedef find_indexrun next;

static void destroy(int index, unsigned char * value)

{

if(index == current)

((First*)value)->~First();

else

{

next::destroy(index,value);

}

....

};

Storage

The storage is provided by a char array sized with the biggest subtype.

typedef typename find_biggest::type bigtype;

typedef typename max::type bigtype;

unsigned char value[sizeof(bigtype)];

In my initial implementation alignment was not considered. I improved it using the newcpp solution (thanks): the subtype with the biggest alignment is found, and then the storage value is placed inside a union.

union

{

unsigned char value[sizeof(bigtype)];

typename tmax::type _align;

};

Construction

The construction employes copy/move constructor from any subtype or from the sumtype. The construction from subtype T is obtained by extracting the associated index and running the appropriate constructor. In the case of the sumtype constructor the action depends at run-time.

Access

Access is performed by type or by index with compile-time resolution of both the index or the type.

Optional Sum-type

We can combine sum-type with optional, allowing the case of empty sum-type. The following elements are atted to the interface of sumtype:

optional_sumtype(); // now we have an empty constructor

void clear(); // clears the content

operator bool(); // tells if there is any content

The original idea was to implement it by deriving from sumtype but the compiler does not like it, so we had to re-implement sumtype, or to add a template option to the sumtype class.

template

struct optional_sumtype: public sumtype

{

...

};

Limitations and future

There are two elements not addressed in this post. The first is the visitor pattern that allow to match the state of the sum-type with some function, in a way equivalent to the typed case of Haskell. This has been implemented in this post.

The second is the recursive pattern: the recursive pattern is only possible when the storage is performed through a pointer because the overall size of the data structure is not known apriori. This feature is covered by boost and newcpp implementation.

Conclusions

The code has been tested with clang 5.1 and g++ 4.9 from MacPort.

The source code is available on github.

Update: see the C++ standard proposal on Any types http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3804.html

References

[1] http://www.boost.org/doc/libs/1_55_0/doc/html/variant.html

[2] http://thenewcpp.wordpress.com/2012/02/15/variadic-templates-part-3-or-how-i-wrote-a-variant-class/

[3] http://www.boost.org/doc/libs/1_55_0/libs/optional/doc/html/index.html

[4] http://www.stroustrup.com/OOPSLA-typeswitch-draft.pdf
[5] http://ecn.channel9.msdn.com/events/GoingNative12/GN12VariadicTemplatesAreFunadic.pdf

[6] http://dave.cheney.net/2012/01/18/why-go-gets-exceptions-right

Search This Blog

Emanuele Ruffaldi

Algebrical Data Types in C++

Optional

Sum-types

Compile Time

Generalized Max (Updated)

Runtime

Storage

In my initial implementation alignment was not considered. I improved it using the newcpp solution (thanks): the subtype with the biggest alignment is found, and then the storage value is placed inside a union.

Construction

Access

Optional Sum-type

Limitations and future

Conclusions

References

Comments

Popular Posts

cmakego: Simpler access to external libraries in CMake

Starting with UEFI with CMake and VirtualBox