Hard Things in Computer Science: Naming things
Previous: A Microstory
- In natural languages we use existing dictionary to express our ideas. We never invent new words. That makes it easy for the listener to understand what we are saying.
- In programming languages we are inventing new names all the time. To solve a problem you invent a new language, then use that language to describe the solution. Often this is done in multiple layers: Language A is constructed to describe language B which in turn describes the solution. This makes is super hard for another person to understand a program.
- Natural languages have tens of thousands words which we learn as kids when the brain is still malleable. Learning a new language at later age is extremely hard. We can't mimic natural languages in computer science unless we are able to reduce the number of words to a manageable number.
- Enter Semitic roots. Almost every Semitic word is based on a root of three consonants. The root conveys the basic meaning. So, for instance, in Arabic, root KTB has to do with writing. Then there are different "augmentations" of the root. Type I "kataba" means "to write". Type II "kattaba" means "make someone write something". Type III "kaataba" means "correspond with someone". Type IV "aktaba" means "dictate". And so on. More examples can be found here. Each of this types can also be changed to its passive version. Also, for each type there are derived nouns. From nouns you can derive adjectives. From verbs you can derive adverbs. Even prepositions are mostly derived from the three letter roots.
- Apply the above to the programming languages. With a standardized system of name derivations one would, when writing a program, have to invent only the names corresponding to core concepts of the problem domain. All the other names, function names, object names, argument names, and so on could be derived from those in a relatively deterministic manner.
- When reading a program you would have to internalize the core concepts, say two or three of them, but after that you would be able to read the program without having to figure out what individual internally-used names mean. No more "How the hell does 'parse' function differ from 'parse2' function?"
- It should be said that this is already used to some small extent. 'parse' is a function, 'parser' is an object. The relationship is relatively clear. Unfortunately, dictionaries of programs are typically based on English which is not very good at forming derived words.
Martin Sústrik, October 30th, 2017
Previous: A Microstory
"Learning a new language at later age is extremely hard"
I'm not convinced. It really doesn't seem very difficult. There's a lot of it, but it's not difficult.
The naming of things is super important. It was one of the main thesis behind John Day's book "Patterns in Network Architecture: A Return to Fundamentals." It's priced like a college textbook but well worth the money. It changed how I think of networks and protocols.
Essentially, the name of something should map to its identity, which maps to its location (address) which maps to the underlying routing to get you there. From this architecture multi-homing and mobility fits quite naturally. With better understanding of these fundamentals we wouldn't have many of the issues (problems) we have currently with IPv4, IPv6, Ethernet broadcast domains, DHCP, security, DNS, etc.
An awesome read.
P.S. Really enjoy the blog!
Yes, I have the book in my library :)
Oh, it's fantastic how it reorganizes your thoughts about layers. Except, it does ruin your life because now everything sucks. IPv4 is a mess. IPv6 is a mess. UDP and TCP are a mess. DHCP is a mess. Ethernet is a mess! All the minor protocols that we don't really have to deal with! :-)
I just leave this there:
https://en.wikipedia.org/wiki/Hungarian_notation
In practice though I've always seen Hungarian notation to degenerate into type annotations. (Even in statically-typed languages. Yuck!) Even in the wikipedia article, if you look at examples section, it's all about types. The original Simonyi's idea of semantic annotations very much failed. No idea what's the lesson to learn there.
A commenter at lobste.rs thread gives a good example: In Java, if you see entity called FooFactory it's likely to be an object factory that produces objects of type Foo. In that case, -Factory suffix acquired pretty strict semantic meaning and, importantly, the understanding of the suffix is shared by entire Java community.
Post preview:
Close preview