Graph of Programming Language Tools
For some time I have collected a list of programming tools that are able to translate one language to another or convert from different types of bytecode. Now that the number of tools is quite interesting and it increasing over time I've decided to represent this collection in graph form to showing the different steps that are possible.
The starting point of this operation is the representation of the list into some semantic form and then from this form produce a graph that can present these relationships nicely. Although a XML/RDF solution could be appealing, I've decided to use Python as a Domain Specific Language (DSL) for representing entities and their relationships. In this way some type checking is provided by the language itself.
The set that we are investigating comprises two types of entities, from one side the content that can be a programming language, some bytecode or some native code, like respectively C, Java bytecode or x86 machine code. The other side is represented by programming tools that convert one target entity X from one to another Y, eventually written in the language Z. A compiler translates from language X into machine/bytecode Y, while an interpreter executes a programming language X or a bytecode Y directly, eventually with the adoption of some Just-in-Time strategy.
The mapping of this information as a Python DSL is quite easy due to the possibilities of variable introspection. In the DSL I've made some distinction for the targets (Lang,Bytecode,Native) and also for the tools (Compiler,Runner,Translator), although for tools the type of tool could be directly extracted from inputs and outputs.
For example first we define the languages:
C = Lang("C")
Cpp = Lang("C++")
Java = Lang("Java")
Then some compilers:
The following graph is the full one comprising C/C++/Java/C#/JavaScript. Entities like the languages, intermediate representations and native assembly languages are represented with rectangular shapes, while tools are shown with ellipses, marking different types of entities with colors. The Gcc compiler has been split in two parts for highlighting its clean separation between front and back.
PNG SVG
The source code containing both the entities and the graph generating code is available here.
Known limitations:
The starting point of this operation is the representation of the list into some semantic form and then from this form produce a graph that can present these relationships nicely. Although a XML/RDF solution could be appealing, I've decided to use Python as a Domain Specific Language (DSL) for representing entities and their relationships. In this way some type checking is provided by the language itself.
The set that we are investigating comprises two types of entities, from one side the content that can be a programming language, some bytecode or some native code, like respectively C, Java bytecode or x86 machine code. The other side is represented by programming tools that convert one target entity X from one to another Y, eventually written in the language Z. A compiler translates from language X into machine/bytecode Y, while an interpreter executes a programming language X or a bytecode Y directly, eventually with the adoption of some Just-in-Time strategy.
The mapping of this information as a Python DSL is quite easy due to the possibilities of variable introspection. In the DSL I've made some distinction for the targets (Lang,Bytecode,Native) and also for the tools (Compiler,Runner,Translator), although for tools the type of tool could be directly extracted from inputs and outputs.
For example first we define the languages:
C = Lang("C")
Cpp = Lang("C++")
Java = Lang("Java")
Then some compilers:
Gcc = Compiler([C,Cpp,ObjC,Java,Ada,Fortran],impl=C,to=GccIR,url="http://gcc.gnu.org")
GccGen = Compiler(GccIR,impl=C,to=[x86,MIPS,x86_64,ARM],url="http://gcc.gnu.org")
CH = Runner([C,Cpp],impl=C,url="http://www.softintegration.com/")
CINT = Runner([C,Cpp],impl=Cpp,url="http://root.cern.ch/drupal/content/cint")
TinyC = Compiler(C,to=x86all,impl=C,url="http://bellard.org/tcc/")
From the DSL expressed in Python a graph is obtained by means of the pydot package that invokes the Graphviz tool. The output can be a bitmap PNG or better a SVG with links to the tools. The mapping from the class of the entity to the style in the graph is obtained by means simple template matching as:
styles["Lang"] = dict(shape='rect')
styles["Bytecode"] = dict(shape='rect')
styles["Native"] = dict(shape='rect')
styles["Runner"] = dict(fillcolor='orange',style='filled')
styles["Compiler"] = dict(fillcolor='green',style='filled')
styles["Translator"] = dict(fillcolor='green',style='filled')
The following graph is the full one comprising C/C++/Java/C#/JavaScript. Entities like the languages, intermediate representations and native assembly languages are represented with rectangular shapes, while tools are shown with ellipses, marking different types of entities with colors. The Gcc compiler has been split in two parts for highlighting its clean separation between front and back.
PNG SVG
The source code containing both the entities and the graph generating code is available here.
Known limitations:
- some languages have not been listed (Perl, Python, Scala ...). The graph could explode
- there is no difference in the matury of the tool
- interpreters/runners capable of Just in Time compilation cannot be expressed by current semantics
Other ideas:
- present the graph interactively on the Web
- remove some languages/tools for making it easier to read
- add a legenda
Comments