Author Topic: one big design item: Macros  (Read 20137 times)

bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
one big design item: Macros
« on: April 11, 2015, 10:27:17 PM »
One of the big remaining design issues (next to countless smaller ones  :) ) is the design
of a new macro system that is not based on textual substitution by the preprocessor.

Let's start by looking at the goal of macros:
  • Feature selection
Code: [Select]
#ifdef HAVE_FOO_FEATURE
..
#endif
  • Constants
Code: [Select]
#define MAX_ITEMS 10
  • code expansion
Code: [Select]
#define print_if_positive(x) \
   if (x > 0) printf("value is %d\n", x);

I think each of these is a valid goal. So the new macro system should provide
a solution for each (in some way or another). For the Feature selection, C2 can
use the same way as C; since the C2 compiler also has a preprocessor, it's completely
identical.

The Constants goals is attained in C2 by using const of Numeric types:
Code: [Select]
const int32 MAX_ITEMS = 10;This will just 'replace' all references with 10.

The Code-expansion is the hardest. Since there is no textual replacement, the macro system
has to be language aware. This means that when parsing the macro definition, the parser must
understand what's happening. This results in 2 types of macros: local and non-local. Local macros
can be used inside functions, while non-local macros can only be used outside function
bodies. So local macros are parsed as a series of Statements, while non-local macros are parsed
as a list of global declarations.

The syntax I currently think of is:
Code: [Select]
local macro(x) {
   io.printf("value of "$$x" = %d", x);
}
macro(x) {
func gen_$x() {
}

Open issues:
  • Q: are public macros in module X allowed to access non-public Decls in X?
  • Q: what to allow as macro arguments?
  • Q: what syntax to use for argument replacement, concatenation and stringify?

« Last Edit: April 19, 2015, 08:51:59 PM by bas »

kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: one big design item: Macros
« Reply #1 on: April 17, 2015, 09:38:05 PM »
Macros are a very interesting topic to me.

Your list of uses for macros is good, but you are missing one extra thing that happens with C11, generic functions/behavior using the new _Generic keyword.

My thought was to try to do something with limited, but similar functionality to C++ templates.  I don't like the C++ syntax though.  In order to do that, you would need to find a replacement for all the existing uses of macros.

I'll take each of these in turn.

Feature Selection

Though you can use #ifdef etc. to include and exclude things, code that uses it is now generally considered bad code.    From what I have seen in the past few years, most of this has moved to the build system. 

This is sometimes used for debugging that can be compiled out:

Code: [Select]
if(DEBUG) printf("Foutje!  Bedankt.\n");

The assert macros work in a similar way.

As far as I know, that is about the only remaining case where feature selection is really used any more.  I don't have a good idea here other than using the build system to use different source files.  That can work and I use it for platforms-specific code.  It would be a bit strange for something like assert though.   I think more thought needs to go into this before it could work.

One way to do the assert idea would be to use one .c file for debug (that defined assert() a body that did something) and another for release (that defined assert() as a function with nothing in the body).    The build system would figure out which file to use based on the build target.

Constants

As you mentioned, just use const instead.  The compiler is smart enough to figure it out.

Code Expansion

This really should be something more like a template so that it can be typesafe.

Code: [Select]
generic T max`T(T a, T b) { return (a<b ? b : a); }

Note that this will work with calls like:

Code: [Select]
int a=2;
int b=3;
int c = max`int(a++,b++);

It won't evaluate the ++ multiple times.

Modern compilers can figure out that max() should be inlined by themselves, so this is not any extra overhead, but it is much safer and cleaner.  I just threw in some random syntax for variable types. 

Some code expansion is a lot more difficult to emulate.  For instance X Macros.  I have used things like that in the past.  I would like to get away from all of the pure text replacement types of macros.  While they are powerful, they are also dangerous and fragile.  Often there are better ways to do things.

I am not sure I understand where you are going with local and non-local macros...

Another alternate syntax I thought of would be like this:

Code: [Select]
macro pdebug(const char *msg, ...args) = if(debug) vaprintf(msg, args);

Visually, it makes a macro look like a declaration/definition.  It also puts types on the arguments. 

bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: one big design item: Macros
« Reply #2 on: April 19, 2015, 09:08:57 PM »
In C2, at a Module level there are 3 types of Decl's: Types, (global) vars and functions. Macros would be
a fourth type of Decl. Just like the other 3 they can be public / not.

The part about local/non-local is pretty important, so maybe I can explain better. Ansi-C macro's are just
plain text expansion, so whether you type
Code: [Select]
if (x > 10) { .. }or
Code: [Select]
typedef struct { .. } my_##x;It's all the same for the preprocessor.
The first example is only valid inside a function body, the second only outside the function body.
So when parsing macros in C2, the parser needs to know what to parse: global syntax or function body syntax.
Global syntax is a bag of Declarations (types, function, etc). Inside functions it's basically a list of Stmts.
The local keyword tells this to the parser. If local the parser will do ParseStmt() iteratively, while otherwise
it will do ParseDecl() iteratively. This also means we cannot use the same macro inside+outside a function. Not
a real problem I think.

In you max(..) example, it's easy to see that generally inlined functions are Always superior to macros, since
you get the type checking etc. In C2 max() could almost always be done with an inline function, since the compile
units are the whole program.

Maybe we can get a more concrete design by just creating a set of use-cases for macros and validating the design
on them. A few I can think of are: (only code-expansion macros, since the features/constants are covered)
  • debug:
Code: [Select]
debug("I'm in this function over here");
assert(ptr != 0);

  • enum list (keeping enum in sync with other list)
Code: [Select]
addValue(STATE1, 10, "begin", func_begin);




bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: one big design item: Macros
« Reply #3 on: May 01, 2015, 08:33:03 AM »
For generics, let me go back to one of the design principles:
C2 tries to be an evolution of C, not a completely new language. Therefore it should
not stray too far from C. Generics is one of those cases I think that might be a really
good option, but simply doesn't fit the C domain. I don't think it could be added as
a simple extra, because it influences a lot of design decisions..

On the macro-part, I think you have some really nice ideas:
  • any - I have to think about it more, but it would be nice to always 'type' the argument
  • only code expansion - yes, constants/feature selection should not use these macros
  • no nesting - check
  • macros called like function - yes, one idea to to show some difference at caller by using mymacro!(..) like Rust
  • public macros - whether public macros can access non-public decls is something to think about. A macro is part of the interface, but different compilation units could cause problems indeed, well spotted!
  • same indentation level - yes this would be required for parsing it well

One other issue to solve is described previously in this thread is whether a macro is meant to be
used at file-scope or at function-scope, because parsing it would be different. For example,
at file-scope, if-statements are invalid.


bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: one big design item: Macros
« Reply #4 on: May 03, 2015, 11:03:20 AM »
The issue about macros for use in function and for use outside functions remains tricky to understand.
Maybe this post can clear this up.

In C, parsing a macro is easy, just treat it as text and only look for some symbols like arguments etc. Replace
those and you're done.

In C2, parsing macros is a completely different story. The main difference is that the parser is used to
parse them, not the preprocessor. This offers advantages like better checking etc. I think most people agree
on the semantic part.

But a Parser, cannot simply treat the macro body as text, but needs to really understand the syntax, just like
any other part of the program. As a (pseudo) example, when the parser starts with a function definition, it is in a
state that it expects globals, eg in its own function parseFile().
The call stack might look like this:
Code: [Select]
parseGlobal();
  parseFileDecl();
    parseType();   // the return type
    parseFunctionName();
    parseFunctionArguments();
    parseFunctionBody() {
       while (..) {
         parseStatement();
        }
    }

So to parse a C2 macro, the Parser needs to know what to expect: Can it expect a sequence of Statements (like if, while
, calls, etc). Or should it expect top-level Declarations (like stuff at file scope). It could try to detect, but this would make
the Parser difficult and error-prone again. So a solution would be that the programmer tell the Paser what to expect, so it doesn't have to guess. In Rust this is less of an issue, since most things are Expressions.

I don't think adding this requirement would make the language less simple to use. In C, most macro's can only be used either at function scope or at file scope. But since the preprocessor doesn't care, it's up to the Parser to come up with good error messages.
« Last Edit: May 03, 2015, 11:09:03 AM by bas »

bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: one big design item: Macros
« Reply #5 on: May 06, 2015, 08:15:53 PM »
1) C2 has already got it's own AST, independent of Clang's. LLVM IR code is generated from that AST.
2) I'm currently looking into the include-what-you-use (iwyu) tool. With some modifications to iwyu,
I can extract definitions for functions, types and variables and convert them into C2 automatically by
walking clang's AST tree. I'm hoping to show some demo functionality this month. As always,
however, macros are hard. It is possible to find them in the AST, and the left has side (declaration
part) is usable. The right-hand side it almost useless. Only if it's a constant, would it be usable. If it's
something the code below, it's very hard to use.
Code: [Select]
#define myprint(x) if (x > 10) print_##x()
Of course we could just generate a warning on conversion and an error on use of such macros.

3) Since C2 uses clang's pre-processor, old-style macros already work. Already you can specify
the following in the recipe.txt file:
Code: [Select]
target foo
  $config FEATURE_A, FEATURE_C
  foo.c2
end
And the 2 strings are added to the preprocessor as if (#define FEATURE_A).

My first goal would be to automatically parse C headers files, ignoring the macros. This would allow
usage of stdio.h, stdlib.h, string.h, etc
The second goal would be to parse simple macros like "#define MAX_BUFFER 10".
After that we'll see..

kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: one big design item: Macros
« Reply #6 on: May 18, 2015, 08:59:07 PM »
It seems like there are only a few cases where existing macro use is not directly replaceable with something easier/better:

  • X-Macros.
  • Code snippets designed to be inlined into existing code.

Now, let's look at why you need those.

The first is really for using the compiler for code generation (in this case to avoid repetition).  The second is more often used for adding a little syntactic sugar.

For code generation, I'll take an example I currently have pending.  I am trying to write a mini-RPC library that is a single .h file in C.  Don't ask why, there is not a good answer :-)

But, to do that, I need to write macros that generate function stubs to marshal/unmarshal arguments.  And, I would like those macros to look a lot like functions themselves to avoid mixing in any extra syntax. 

So, RTTI would allow me to do that.  But RTTI is heavy and requires quite a bit of overhead.  If I can generate code at compile time, I can possibly work around that.

Now, maybe there is a way to do something about this.  Looking around at other languages, I note that Java is now using annotations more and more and more.  Perhaps (the following is not fully fleshed out) there is a way to hook into the parser with something like that.  Suppose we allow you to do annotations:

Code: [Select]
@RPC
func remote_add_nums(a:int, b:int): int

And somewhere else you define a compile-time function @RPC that takes some sort of arguments.  In the case of Java, you get a fair amount of information, but it is all handled by run-time libraries.  In C2, this is not desirable. 

I can see that you could, theoretically, have @RPC be compiled and then run at compile time.  I am not sure what the arguments should be.  In the above example, there should probably be some additional arguments to @RPC for the server etc.  Passing the function information as strings is probably not ideal since we are trying to get away from uncontrolled string handling.

Is there a way to do this usefully?  Exposing the AST seems like a problem since it would force the AST to become a fixed, external API.

Now, take the second of my major cases, syntactic sugar.  I have some macros I use to help me visually see things like mutex-protected blocks

Code: [Select]
synchronized_block(my_mutex) {
      ... do some protected things...
}

That is not natively in C.  It is just a couple of for loops and some C99 inline variable declaration magic in a macro.  If you return from the middle of the block, you lose.  But, it is much, much easier for me to see what is in the block, whether I remembered to close the block etc.  I got rid of a lot of bugs in a couple of programs when I did this.  Very handy.

I would really like to be able to introduce some things like this.  Sure, this example may be bad because it should be built into the language anyway, but hopefully you get the idea.

Perhaps we can use something like this:

Code: [Select]
sugar synchronized_block(m:*mutex)=for(...) for(...)

I'm not very happy with that, but there are some possibilities of combining annotation/compile time functions and this.

Sorry this is not well thought out.   I had a few ideas and wanted to throw them out there.

Best,
Kyle