The reason why all those problem exists in the code you presented is because you are doing things after a callback.
void A::foo ()
{
int tmp = i;
b->foo ();
assert (tmp == i); // Bad practice! There should be nothing here other than "return".
};
So you ask, how do I deal with cases when I do need to do something after a callback?
Push it into a global LIFO callback queue! The LIFO queue is managed by the event loop, and the event loop gives the events in the LIFO queue maximum priority.
NOTE: I use some pseudo-callbacks in this code, real code would use std::function or similar mechanisms.
struct A {
...
// Represents a queued (or not) callback. It can either be in "pushed" or "not pushed" state.
LifoEvent after_foo_event;
...
};
void A::init () // or constructor if you can stand those
{
// set up for calling our afterFooCallback method
after_foo_event.init(this, &A::afterFooCallback);
}
void A::deinit () // or destructor if you can stand those
{
// de-queue the lifo event in case it was qeueued to prevent crash if A is destroyed from b->foo() callback!
after_foo_event.deinit();
}
void A::foo ()
{
int tmp = i;
after_foo_event.push(); // push to LIFO event queue
return b->foo (); // return immediately after callback
}
void A::afterFooCallback ()
{
// HERE do what you need to do after b->foo() callback!
// No state confusion. If the b->foo() callback changed the state of this object,
// than this was handled by our own methods. From those methods we can even
// unschedule this callback.
// This includes B destroying A!
}
Note that the LIFO nature of the queue ensures that the event we push before calling b->foo() is executed now only after the b->foo() callback itself, but after all the LIFO event that pushes, recursively. The LIFO queue can be viewed as some kind of secondary stack for callbacks. You can even do recursive callbacks via the LIFO queue without overflowing the stack.
If you want to know more about this pattern, check out my SO answer here http://stackoverflow.com/questions/10064229/c-force-stack-unwinding-inside-function/10065950#10065950 . Also, please do note that I'm not just giving you theory here - I've used this pattern extensively in my software for years, with great results.
Note: the LIFO is best implemented using a doubly linked list. Singly-linked-list or array won't suffice because sometimes we want to de-queue already pushed events.
Yes. Event queue is the first step towards fully formalised state machines.
I think you're misunderstanding my design pattern as some kind of formalism you need to deal with on top the problem you're solving. You don't need to prove any theorems or do anything else formal to use this pattern. The goal is event driven code that's easier to write and maintain, not an extra burden on development.
You can check out some of my code to get a better idea. This piece of code is especially simple:
https://code.google.com/p/badvpn/source/browse/trunk/ncd/extra/BEventLock.c
It implements an asynchronous "lock". The BEventLock represents a resource, and the BEventLockJob represents an operation that requires exclusive access to a resource. The BEventLockJob_Wait() is called to request access. When access is granted, a callback is invoked. Access can either be granted immediately, or cam be delayed (put in a queue), depending on whether the resource is currently locked. In the former case, the invocation of the callback is is done by pushing to the LIFO (BPending_Set()) directly from _Wait(); in the latter, the pushing is done from BEventLockJob_Release() when the resource is released.
See how from the perspective of someone who wants to lock the resource it doesn't look any different. There's no return code indicating whether access was granted immediately or whether a wait is necessary. This avoids a lot of hard to reproduce bugs (in code that would be handling the "need to wait" return code, since that happens rarely).
And there's no formalism here, just nice and correct code ;) As far as formalism is concerned, the clear separation of event processing into individual LIFO events actually makes proofs about the behavior of code easier, whether they are formal or informal.
Out of interest, two questions:
1. Why LIFO? Traditionally FIFO is used for event queues.
2. I've never seen de-queueing used in cases like this. Is there any systemic reason for having it there?
I guess we are both speaking about the same thing. You call it LIFO, I call it event queue. The goal is to keep individual "steps" in processing fully atomic.
I agree that using term "state machine" implies a bit more than an event queue, specifically, explicit state transition diagram, so forget about what I said about state machines and substitute term "event queue" instead.
A LIFO because it provides useful guarantees about the order of event processing.
EventA {
push EventAfterB;
push EventB;
};
EventAfterB {
print "This happens not only after B, but also after C!";
};
EventB {
push EventC;
};
EventC {
print "EventC";
};
So if we start with EventA pushed, the event queue will change like that:
EventA
EventAfterB EventB
EventAfterB EventC
EventAfterB
[empty, event loop proceeds to poll()]
You could claim that a LIFO is bad because you could get caught in an infinite loop of event processing and stop processing new events. That can certainly happen, but the same can happen if you "just do things after calling callbacks", and it is your responsibility to write code that doesn't get caught in such loops (such as counting invocations where there is potential for infinite loops). The advantage of using the LIFO is that such a loop will not crash your program via stack overflow.
So the LIFO is basically just a safe replacement for the usual "do something after calling a callback". Instead of doing it after the callback, you push it before the callback.
De-queuing is very useful when the callback does something that removes the need for the "do after callback" code to be called. The best use of it is when the callback destroys the object that is calling it, and the "destructor" of that object simply unqueues the callback (if it didn't, the program would crash when the queued call is called). And this is something you need to do quite often. For example, in a Client you have a Socket, and the Socket calls the Client to tell it that the connection broke, and the Client in turn destroys itself including the Socket (so Socket is being destroyed while it's calling to the Client).
"The program will fail trying to access invalid memory location when doing executing=false"
Pretty much the problem I've mentioned above. In the delete, destructor would de-queue any pending LIFO events, preventing their execution and avoiding a crash.
"To simply make a note that it has to be executed and execute it later on when call to A::foo() exits"
This is the exact reason a LIFO is useful. Pushing A before calling B makes sure that A executes after B is done. There's no need to complicate with partial orders and a "a full-blown state machine approach".
Another thing I noticed about your examples is that the objects (A and B for instance) are coupled to each other. This coupling is one factor that makes the system harder to maintain. You can use runtime dispatch (std::function or virtual methods or function pointers) or C++ templates to decouple them (unless they really have to be coupled).
I beg to differ. With straight functions you can at least check what's going to happen when you invoke it. On the other hand, virtual functions being "abstract" in a sense, you have no idea what the call graph looks like, whether invoking the function can possibly result in a cycle etc.
That's one way to look at it. But I usually design software by breaking problems down into smaller ones, resulting in a "dependency graph" that is a tree. After you define what the behavior of a component needs to be, you can implement and verify it independently of components "above" it. If your dependency graph has cycles, it's harder to look at components in isolation.
Oh, and there's an affect directly relevant to the callback problem. If you have coupling all around your codebase, then the question "will calling this callback call me back" can indeed be a very hard one. But what if you fixed your coupling? Well, if you are the owner of the set of objects {A,B,C}, and you call into A, then you *know* only the callbacks you gave to A could be invoked in response. If B and C have nothing to do with A, they couldn't possibly call you back in response to you calling into A. Not so much if A, B and C were coupled deeply in the your circular dependency graph.
"Now imagine that at some point in the future some random developer adds a call from C to D".
In a properly decoupled program you would first have to restructure the code a bit, because C has no way to access and doesn't know about D. The restructuring would very likely reveal the new circular calls.
Is there any specific reason why do you think C won't know about D?
Is there any specific reason why do you think C won't know about D?
In your first call graph C doesn't ever communicate with D. The only way C communicates with other components is by being called by the component above it. So in a properly designed system, C would only be aware of this component above, and maybe not even that, if it doesn't need to call back to it.
Here we come to your previous problem about "Contexts". If you turn objects (like D) into global state, that makes it much easier for C to get access to D, possibly going against the design. On the other hand if you always require a pointer to D in order to access it, the problem is more obvious, because C wouldn't have a pointer to D - it didn't need it, before someone had the idea to make it talk to D.
Right. Avoiding global state would help to some extent.
Still there is another problem: Any network stack is entered from two distinct directions. From above (user API, such as BSD sockets) and from below (network interrupts). In first case the call sequence is, say, TCP->IP->Ethernet. In the latter case it is Ethernet->IP->TCP. That in turn means that there's no way to create a simple one-directional hierarchy of references. Upper layer has to have pointer to the lower layer and vice versa.
So what? In my (C) code, upper layer owns the lower layer (so it calls it directly), and lower layer has callbacks (function pointer) to the upper layer. It looks like this:
typedef struct Socket_s Socket;
typedef void (*Socket_recv_done_handler) (Socket *, size_t length);
struct Socket_s {
Socket_recv_done_handler recv_done_handler;
BPending done_pending; // LIFO event
size_t read_size;
};
void Socket_Init (Socket *o, ..., Socket_recv_done_handler recv_done_handler)
{
o->recv_done_handler = recv_done_handler;
BPending_Init(&o->done_pending, done_pending_handler);
}
void Socket_Free (Socket *o)
{
BPending_Free(&o->done_pending); // make sure LIFO event is unqueued
}
void Socket_StartRecv (Socket *o, char *dst, size_t max_length)
{
ssize_t bytes = recv(... dst, max_length);
if (bytes < 0 && (errno == EAGAIN || errno == EWOULDBLOCK) {
// Request file descriptor notification with event loop.
// Remember dst and max_length.
return;
}
if (bytes <= 0) {
// Fatal error. Use the LIFO event mechanism to report it.
// (in real code that would be another function pointer)
return;
}
// We got some data right away.
// Queue invocation of recv_done_handler via the LIFO mechanism
o->read_size = bytes;
BPending_Set(&o->done_pending);
}
void done_pending_handler (BPending *done_pending)
{
Socket *o = container_of(done_pending, Socket, done_pending);
return o->recv_done_handler(o, o->read_size);
}
The lower layer can't just do anything it wants - the only way it can communicate with the upper layer is via defined callbacks, and of course the invocation of the callbacks is subject to contracts. Here, the Socket is only allowed to call the recv_done_handler once after a StartRecv.
All my C code looks like that, and it's very maintainable.
P.S. when I said the upper layer calls the lower layer directly, not always. Sometimes thin interface classes using function pointers may be involved, such as a Read/Done interface as used here. Because sometimes you don't want to couple your code to a Socket, you want it to work with anything that can receive data.
The problem here is not the direction of function calls, there sure can be cycles in the call graph. What matters more from the design perspective is the dependency graph. My Client class knows about Socket, but the Socket doesn't really know anything about the Client other than the defined callbacks it has provided.
Yes. That's the correct and maintainable approach. But I wouldn't call it callback anymore. It's an event queue.
Depending on the complexity of problem, you may decide to stick with simple event queues, or move further towards fully formalised state machines.
Btw, I don't know much about your solution, but it may be that using LIFO instead of FIFO and dequeuing events is in your case an alternative way to deal with the problems normally addressed by state machines.
Hi Martin !
At the very beginning, you wrote :
void A::foo ()
{
int tmp = i;
b->foo ();
assert (tmp == i);
}
Shouldn't it be :
void A::foo ()
{
int tmp = i;
b->baz();
assert (tmp == i);
}
?
I'm not convinced. You have started with a conclusion (callbacks are evil) and then you have shown some very contrived and unrealistic examples to prove that conclusion.
In a more reasonable scenario there are no callbacks between peer components at the same level of the software stack, but they will rather exist across layers, which makes them easier to manage.
Also, circularity can be approached in other ways than detection - you could anticipate it and execute the callback in the state where subsequent actions on the same component would be perfectly OK and even not distinguishable from first-level calls. This approach was taken in the YAMI4 messaging library and guess what? No bugs in this area since the beginning of the project. So - it can be done properly with proper design in place. There's no hell there.
That's pretty much my opinion when I mentioned that he should decouple the components. You should read my comment about the LIFO queue, it makes callbacks "not distinguishable from first-level calls" easier to implement.
The examples are short to make the post readable.
In the real world the problem happens when a single component in "steered" from multiple directions. Typical example is network stack. There are calls originating in the user-facing API and there are calls originating from interrupts by networking hardware. The former traverse the layers from top to bottom, the latter from bottom to top. Unless you keep the two call graphs completely separate its very easy to form cycles in such environment.
And yes, the solution you are proposing is an event queue delaying the execution of actions till it's safe to call them. Event queues are a principal components of state machine implementation.
If you have to search for evidence like "ZeroMQ doesn't make it easy to add transports" as proof that callbacks are bad, your argument is broken. The difficulty in adding transports to libzmq stems from poor internal design from the start, resulting in lack of internal abstractions. This was the design you built in there from the very start. Callbacks may be involved but are not the cause. The cause was rather more profound, the notion that a tiny team could accurately predict the needs of a widely-used product.
It is most certainly not due to increasing technical debt. The libzmq engine is getting cleaner and more extensible over time, not worse. For example it now supports multiple protocol versions very nicely.
I agree that callbacks are a pain but you should not use broken argumentation, it doesn't help your overall thesis. Also if you step aside from the "ZeroMQ is crap and Nano is fantastic" theme that seems to drive your thinking, and consider where the design problems in ZeroMQ actually came from (your vision of engineering, largely), you might realize how to save Nano from being an interesting experiment that finally, no-one really uses. I'm surely not the only person who wants Nano to succeed.
What's wrong with the argument? Callbacks cause local changes to have global repercussions. Which is a maintenance nightmare. In the end they are pretty similar to using gotos. I guess Dijkstra explained the problem better than I did.
BTW, your experience with OpenAMQ would be valuable in this context. What are the problems of state-machine-based approach? Can they be solved by introducing callbacks? Etc.
Your argument was (unless I misunderstood) that callbacks caused technical debt, i.e. a build up of bad code, in libzmq that made it impossible to extend six years later. Whereas in fact adding transports was hard from day one, due to the lack of internal abstractions (i.e. internal APIs designed to be extended).
Callbacks can be entirely local, this is how CZMQ/zreactor works (passing object instances around). The resulting code is easier to maintain in some ways but harder to understand because it's fractured.
State machines can provide extreme leverage, but are poorly implemented by most people. And even when well implemented they create a barrier to entry that is the real problem. You can win on engineering but lose on participation. OpenAMQ was a prime case of this.
If you look at how we used e.g. tools like Libero in the past you will see very clean, ultra-maintainable code, all callback based. I've used the same style in FileMQ, partly to show how to use state machines in protocol engines. But it's only maintainable once you've learned the model, and that's a barrier.
Perhaps it's safe to say that callbacks in general lead to arbitrary one-off abstractions that are very hard to learn. It was the same problem with GOTOs. Nothing to do with local vs. global. GOTOs let you create arbitrary structures that follow no patterns at all. Replacing them with WHILE, IF, DO meant we could learn a small fixed set of patterns instead.
The callbacks were there from the day one. Interestingly, I've opted for callbacks because of the barrier to entry we've experienced with OpenAMQ. However, it seems that avoiding state machines altogether is not a good option either. There must be some kind of middle ground. What I was thinking about was that instead of using a special language for state machines (ragel, libero, mbeddr, etc.) just explicitly documenting small set of rules that the state machines have to adhere to. Anyway, it's hard to say in advance what kind of compromise would work the best.
Perhaps part of the barrier to entry with OpenAMQ was the XML/C DSL and code generation it uses, not sure at what version this started, but it looks like it's always used a DSL+CodeGenerator, possibly Libero.
Are you considering a code generator for nanomsg or a spec/api for the state machine?
I've already written it by hand — i.e. no code generation.
Although it means a lot of boilerplate code in the codebase (yuck!), on the other hand it allows any developer to peek directly at the source code and understand what's going on without to learn a new language.
I took a look at the state machine in cipc.c on the aio2 branch it's very clean and easy to understand the transitions. I prefer the boilerplate to code-gen or macros any day.
One thing I did find a little confusing was the nested "switch (type)" it looks like type is the state of another state machine, "switch (sock->state)".
There is also the possibility of using a state transition table and/or function lookup tables to remove some of the boiler plate. With this approach your default: handles all the common cases, like make a call and sets the next state and more complex cases get their own case statement "case NN_COMPLICATED_STATE:". If you can make the lookup arrays easy to read and maintain this could be a win. You could even case the boilerplate states and let them all fall through to the boilerplate lookup->call->set case and still maintain the default state for trapping invalid states. Lookup tables can also eliminate a lot of branching.
In even more complex state transitions a stack can be used to pass information between nested states, not sure that applies here, but it's useful in lex/parse.
Another nice advantage to the state machine approach is debugging, just log the states or even keep the last n states in a ring buffer.
I don't see any barrier to entry here, all the states and transitions are clearly defined in a hundred lines of very readable code. If you need any clarification, email or reply. I've have built quite a few DFSMs.
Several good points here. Let me comment on them one by one.
I don't feel good about name 'type' myself. It's not another state. It's the type of the event being processed. So, a single source object can emit different kind of events, say a socket can emit SENT and RECEIVED events. 'type' argument is used to distinguish between the two. Suggestions for better fitting name are welcome.
As for state transition tables I have deliberately not used them. The rationale is the same as the rationale for not using code generation: State transitions tables make the code very hard to follow. Instead, I've opted for a single transition function with three nested levels of switches (in this order): the state, the source of the event, the type of the event. That makes it relatively easy to find your way through the state machine be simply scrolling the source code up and down.
As for nested state machines, I actually use them. For example, when cipc state machine is in ACTIVE state it handles the execution to sipc sub-state-machine. When sipc state machine terminates it hands the execution back to the cipc state machine. The same trick can be applied recursively, getting an arbitrary stack depth.
The debug log is a pretty neat idea. It would be worth of implementing, I guess.
Great to hear you don't see any barrier to entry. That's what I was trying to achieve. Typically, when code generation or macros are used, people feel there's an barrier to overcome.
A source (usock) initiates an event (NN_USOCK_CONNECTED) on the target (cipc). The target state machine looks at the current state, looks at the source of the event and then acts on a valid event. target->state->source->event
If this is correct you might consider a convention such as NN_USOCK_EVENT_CONNECTED, and it might even be useful distinguish between an action and event where an action ACTION_CONNECT is initiated and EVENT_CONNECTED moves the machine into STATE_CONNECTED, action and event would still reside at the same level in the state machine.
case NN_CIPC_STATE_CONNECTING:
if (source == &cipc->usock) {
switch (type) {
case NN_USOCK_CONNECTED:
nn_sipc_start (&cipc->sipc, &cipc->usock);
cipc->state = NN_CIPC_STATE_ACTIVE;
return;
case NN_USOCK_ERROR:
nn_usock_stop (&cipc->usock);
cipc->state = NN_CIPC_STATE_STOPPING_USOCK;
return;
default:
nn_assert (0);
}
}
nn_assert (0);
Regarding the nested state machines:
For example, when cipc state machine is in ACTIVE state it hands the execution to sipc sub-state-machine.
I could not find the hand off in cipc.cs NN_CIPC_STATE_ACTIVE, can you point me to src?
Good suggestion about distinguishing the events and the actions. Currently, events have no specific prefix and actions have, confusingly, prefix _EVENT_. Let me fix that.
As for handing of the control from cipc to sipc it's done here:
nn_sipc_start (&cipc->sipc, &cipc->usock);
What happens is that sipc object takes ownership of the usock object and redirects any events from it to itself. When the connection breaks, it hands ownership of the usock back to cipc object.
One unrelated thought: I am considering passing adding one parameter to the handler function. Currently it is (target,source_ptr, event_type). I am thinking of extending it to (target, source_type, source_ptr, event_type). The problem it solves is when the state machine owns an unlimited number of source objects, for example, a list of sockets. In such case checking that source_ptr is one of the objects in the list would require list traversal (O(n) operation). Any thoughts about that?
Martin, why does the concept of a FSM exist in your code at all? For examole, in cipc.c, you do nn_fsm_init_root and pass nn_cipc_handler to that, and by passing that to other init's you set up nn_cipc_handler to handle events from three different sources. Why not just have three callback functions, what benefit does this indirection provide?
Even more, I would use different handlers for different event types, instead of using just one handler per event source. This way you can add event-specific arguments to the handler.
Regarding adding source_type, without knowing more about the number of source_types and if they could be organized into groups based on behavior. Here are some thoughts…
Reserve a few bytes at the beginning of the source object, the owner could then attach the type once before the state machine executes. This could also be used for other data such as the owner setting a flag on the source.
Create a state machine for a single type in the O(n) cases and have the source call into the handler for that machine, for example nn_cipc_handler_spic(…). If possible it would be nice eliminate the need for "if (source == &cipc->sipc)" without over complicating the state machine.
If source_type cannot be avoided and the owner cannot write to the source object even one time during the wiring/initialization phase, consider some type of context object that the owner hands to the source that the source must pass with each handler call, that context object could contain the source_type, source_ptr etc.
Reserving a few bytes in the source seems the most appealing, you could have an issue where a source object might have more than one owner, then you would have to consider how many owners a source object could have, but I could still see this being workable.
If you can explain why some of these options won't work it might provide some more insight.
@Ambroz: The reason is to keep the hierarchy of the handler this way: state=>source=>event_type.
If you have different handler functions for different events the code will be structured (presumably) like this: event_type=>source=>state. That kind of code is extremely hard to follow.
@mike: As for nn_cipc_handler_spic(…) I don;t like it for the same reason stated above: Having multiple handlers make code hard to follow (i.e. code belonging to a single state is suddenly scattered among different places in the source file.
"Reserving the few bytes" thing seems more resonable. I thought of just reserving a single integer. You would initialise it when intialising the source and get it back once the source fires an event.
For example:
nn_timer_init (&self->retry_timer, NN_REQ_RETRY_TIMER);
And then, in the handler:
if (source_type == NN_REQ_RETRY_TIMER && event_type == NN_TIMER_TIMEOUT) {
timer = (struct nn_timer*) source;
….
}
Four bytes seems reasonable due to alignment, you could even just use for example one byte if there would never be more than 255 types, or two bytes ect, and still have a couple free bytes for edge cases or anything else.
I'm not familiar enough with the code base and how initialization works yet, so put another way, the requirement could be that the first four bytes of any object participating in the state machine belong to the owner so I think we are on the same page.
Can a source object ever have more than one owner?
Sure. But there's already a base class for all state machines (nn_fsm). Storing the integer there seems to be a cleaner solution.
Can a source object ever have more than one owner?
No. Not allowed.
It's actually pretty crucial requirement. The idea is to have objects arranged in a tree (thus one owner for each one). Then we have to ways of communication:
- Up the tree (direction from root to leaves): this is implemented as simple function calls
- Down the tree (direction from leaves to root): this is implemented as state machine events
That makes interactions between the components relatively easy to grasp. With mulitple owners (i.e. graph instead of tree) it would be much harder.
I was looking over some of the new state machine code and wanted to make a clarification on the naming. Keep in mind I'm not a network programmer so these might not be the best examples.
STATE:
The internal state of the machine
NN_USOCK_STATE_CONNECTED
ACTION:
A command placing the machine in a new state and possibly raising an event
NN_USOCK_ACTION_CONNECT
EVENT:
Raised when the machine enters a new state
NN_USOCK_EVENT_CONNECTED
The distinction between ACTION and EVENT may or may not be necessary, I don't yet know enough about the internals and you might be just using STATE vs having an EVENT fire and event I believe is your up-stream function call.
I hope this makes sense.
Yes.
The only thing different is that EVENTS have no EVENT prefix. I.e. NN_USOCK_CONNECTED rather than NN_USOCK_EVENT_CONNECTED. The reason is that outgoing events are the only entities visible to the user of the object — states and actions are private to the state machine. Thus, as the state machine API goes, EVENT prefix would serve no meaningful purpose and just make the identifiers longer.
(This comment talks about callbacks among separate processes)
I can totally see the point of callback hell. Just last week I made an audit of our complete codebase to search for cycles. In my case it was combination of HTTP calls and ZeroMQ REQ/RESP. Found one. It can come back any time. And I still have it easy since I have complete control over the codebase. Imagine you have to prepare for situations where you don't have control over parts of the system. Like I suppose nanomsg doesn't.
Anyway if I had a list of communication protocols (state machines) that is explicit and testable I would feel much better. Especially the internal HTTP calls are dangerous since I don't control the URL. Stable connections from A to B are audit-able. HTTP calls are not.
Agreed. State machines would alleviate the problem. However, it's hard to enforce consistent usage of state machines, especially if the development team is large and distributed. There are no widely used software tools to enforce the rules (expect maybe for Erlang) and after all, you just need one person to screw it up.
So, Martin, can you provide a reference or example of good (and/or bad) state machines implementations?
I thought I knew what I was doing with them. Now I'm not so sure.
I don't really see a problem with state machines per se. Consistent usage of rules and control of source code are problems with software development in general.
Nice one Ambroz, really enjoyed reading your comments!
Callbacks and after effects are common problem and their separation are essential for any reactive system.
And that was one of clearest explanations on LIFO in this case.
Thanks, sla!
And that was one of clearest explanations on LIFO in this case.
Have you seen any other project using a LIFO event queue like that? For all I know, I'm the first one to invent that design pattern ;)
Funny how this stuff goes around and around to be reinvented over and over. 20 years ago, I implemented a TCP stack and rapidly came to the conclusion it needed to be an FSM if it was going to be maintainable and extensible.
I was fascinated by the discussion of events verses 'safe' callback handling. The former puts event (handlers) in a (single) queue, the later puts the event handlers in two distinct queues - the LIFO and the stack. They are both fundamentally the same paradigm, they differ only in how the queue(s) is(are) managed.
I would argue that the 'safe' callback method is actually more dangerous for two reasons;
a) because it splits event handlers between the LIFO and the stack &
b) because it depends on a high level of dev discipline to keep track.
b) violates a fundamental rule that, despite all best intentions everyone is fallible.
I have not reviewed code, but you will find that the 'cleaner' (i.e. cleanly abstracted) your event paradigm, the easier it will be for devs to use the pattern and thus, they will be less likely to spend energy implementing something else. 'Clean' abstraction means easy setup, simple APIs, transparent architecture, event queue management (typically at least 3 levels) and queue visibility (debugging and system handling).
I have since taken the (event) paradigm to the ultimate conclusion of eliminating the need for threads (RTOS) altogether in many systems.
fwiw…
+1 for not using threads at all.
I wonder how the stuff even got so widespread in the first place. We had a perfectly viable model (processes&pipes) even before threads so there was no obvious reason… except maybe, when you are coding a boring crud application, using threads generates a lot of fun. Not even speaking of job security :)