This code that reads from ‘ptr` is probably already broken anyway, except if the initial address is 3 mod 4 (maybe even 7 mod 8). This kind of unaligned reads works on x86 but probably won’t on some ARM-based platforms such as Android, causing a SIGBUS.
Yes, same thing for SPARC and other RISC platforms. However, doing it in a portable way would just make the article less clear.
Your network protocol example is not convincing. In fact, it is just broken as it relies on pointer conversions between unrelated types. This kind of code would not pass any safety audit, but even if you don't plan for verification, using such crappy code to justify that magic numbers are good for readability is just nonsense. If you start with crap, you will end up with crappy conclusions.
Now, try rewrite your network protocol parser with some real, well… parsing. You know, by consuming bytes and using them to compute higher-level values. Use some real coding standard as a guideline (MISRA-C, perhaps?) and then you will discover that you don't need any magic numbers to keep it floating. And, as a side-effect, you might even end up with code that can defend itself at safety reviews.
(And no, it will not be any slower. Pointer conversions between single bytes and words don't buy any performance over properly indexed buffer iteration.)
Well, this is the experience from parsing 3GPP protocols: They are so irregular that whatever abstraction you use (uint16_t state = read_short (stream); or whatever) you eventually end up looking directly at the undelying buffer and messing with individual bytes.
Also, even if the above was not true, the problem is just shifted to the parsing tool. It has to extract 2 bytes from the buffer. Should it use a symbolic name or constant 2? Another option, as mentioned in the article, is sizeof(uint16_t) which is just a fancy a somehow less readable way of writing constant 2.
Do you need to extract half-word (two bytes) from the buffer? Try this:
const uint8_t lsb = buffer[current_position++];
const uint8_t msb = buffer[current_position++];
const uint16_t value = ((uint16_t)msb « 8) | (uint16_t)lsb;
I leave it up to you to decide whether magic number 8 deserves to be named, but certainly there is no need to mess with number 2 - it is implicit in code as a natural consequence of *parsing* two consecutive bytes. Obviously, if you parse two bytes, then you are two bytes further down the stream, aren't you? Note also that if you change the protocol to read full word (4 bytes) instead, there is no need to mess with that number or with sizeof(T). But what is most important, endiannes is covered by means of proper computation, so the code will work correctly no matter what is the CPU architecture, without any preprocessor conditional shit. Certainly this is more robust and more portable.
And as a bonus there are no pointer conversions between unrelated types, it's is all just, well… parsing.
If you want to see a complete non-trivial serializer/deserializer written this way, see here:
http://www.inspirel.com/yami4/misra.html
In any case, I recommend a good and established coding standard before doing such work. Low-level programming does not have to mean crappy programming.
Obviously, if you parse two bytes, then you are two bytes further down the stream, aren't you?
Yes, but try reading muliple such fields, then try to pad the whole thing to 4-byte boundary. Oops, suddenly you need to know how many bytes were written up to now. The stream abstraction is suddenly broken.
Anyway, the whole parser discussion is irrelevant. We are dealing with computers and computers deal with numbers. At some point down the stack you just have to start using numbers (e.g. number 8 in your example). So, when reaching that point, what are you going to do? Use literals, symbolic names or what?
That will depend on the number itself. Some of them are self-descriptive or are heavily coupled to foundational concepts, like 0, 1, 8 (num of bits in a byte), but some are not (2, 100, 1000, etc.). I tend to use literal values for the former and named values for the latter group.
In both cases the idea is that the reader should have no additional questions when reading the code and the meaning of each number (whether named or not) should be clear from the place where it is used.
Different language, same conclusions :
http://wiki.laptop.org/go/Forth_Lesson_6 (see Stylistic Point - Too Many Constants)