Author Topic: Weird ideas #1: primitive data type sizes  (Read 12439 times)

kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Weird ideas #1: primitive data type sizes
« on: June 19, 2014, 02:41:27 AM »
Having ported code back and forth between platforms in C, one of the things that is very frustrating is the very vague definitions of "int", "short", "char", "long", "float" and "double".  While it is true that there were CPUs where a machine word was not 8, 16, 32 or 64 bits, that is extremely rare now.  I can't think of a single processor (that is actually used) where that is not the case now.  Chuck Moore's FORTH CPUs are the only ones I can think of and those are not exactly mainstream.

I find myself never using int, short, or char, and instead I always include stdint.h and use int32_t, int16_t and int8_t.  I need to know what size those integral types are.

Java made this mandatory.  "int" in Java is a signed 32 bit in.  Period.  No exceptions.  "long" is a signed 64-bit int.  Again, no exceptions.  While it is very annoying that Java decided that unsigned integers were not interesting, this conformity across platforms is quite handy.    I can use an int in a for loop in Java without worrying about what might happen if it is actually 16-bits on that platform.

I think it would be interesting if C2 would define char, short, int and long (and float and double) to be what people usually think they are (unless you are Microsoft): 8, 16, 32 and 64-bit words.  The translation to IR is straightforward if I understood the LLVM docs correctly (very possibly not!).  Any translation to C would be a simple mapping to one of the intX_t types.

Thoughts?

Best,
Kyle



bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: Weird ideas #1: primitive data type sizes
« Reply #1 on: June 19, 2014, 08:08:06 AM »
Hi Kyle,

You are already on the right track, since I can't agree more. C2 has this builtin:
int8, int32, int64, uint64 etc. Also bool.
keywords unsigned, long, short etc are removed. So a base type is always a single
word.
What is still unclear is how to handle uintptr_t etc. So an integer type to hold a
pointer value. I get a lot of these when building/patching code to work on both 32 and 64-bit
systems. Do you have any experience with this?

kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: Weird ideas #1: primitive data type sizes
« Reply #2 on: June 20, 2014, 09:51:32 PM »
The stdint.h intptr_t (I can't remember if it is uintptr_t or intptr_t) is a problem because it must be platform specific.  I do not see a way around it.  A 32-bit platform is a 32-bit platform.  Pointers are 32-bits (unless you use x86 PAE mode in which case they are 48, joy, happiness, sunshine!).  64-bit platforms tend to be more sane.  But there is L32 mode on AMD64 platforms, which gets weird.

This brings up the point that sometimes you actually want to use the platform specific machine word size.  Perhaps this could be "word".  That could be defined as "whatever the CPU likes best."  If you use it, you get a warning about platform dependencies. 

Maybe you have:

int8 - 8-bit integer, signed.
uint8 - 8-bit integer, unsigned.
int16 - 16-bit integer, signed.
...
uint64 - 64-bit integer, unsigned.
int128 - 128-bit integer signed.
uint128 - 128-bit integer unsigned.

float32 - 32-bit IEEE floating point.
float64 - 64-bit IEEE floating point.

iword - CPU integer register size.
fword - CPU floating point register size (also lends itself to jokes in English).
pword - CPU address size.

The last three are very much platform specific and clearly defined as such.

I would not worry about things like real mode, far/near pointers and all the other garbage that x86 systems had over the years (even ARM briefly had a 48-bit addressing mode but no one used it that I know of).  That is rapidly dying.  I would treat it like CPUs that do not have power-of-2 word sizes, ignore it.