A problem with working on the syntax though: any keyword must first be updated in the clang fork
It makes dev harder since you have to know the workings of [part of] clang as well as C2's source. Some of C2Parser seems to be doing C/C++ parsing (commented out at steps). Cleaning out those remnants would make it easier for further refactorings.
I've made a stab at moving DiagnosticsEngine to be wrapped by "our own" class in a step to refactor things. I had to cheat at some places and obviously the diag:: messages had to be left as is, but maybe if there's some specification as to how C2 uses the Preprocessor, Lexer, DiagnosticsEngine and SourceManager so it's easier to know what can be stripped out in what order.
Obviously since since you wrote the whole thing in the first place you probably would have an easier time than me ripping out the Clang code, but for me (and any other contributor) it has to be done piecemeal so nothing gets broken by accident.
Maybe write up some specs? That would be useful for later anyway.