This isn't so much a design flaw in C++ as STL. Using templates to embed the data type in the nodes (creating an intrusive style list) removes the problem of allocation count and allows the address of the data to be turned into a node address to fix up pointers without any fancy iterator stuff. e.g. template < class DataType > class ListNode { ListNode* next; ListNode* prev; DataType data; /* … */ };
No it's not. It's an STL compromise, where the simplest possible case becomes slightly more complex and inefficient in C++ versus a custom-coded C solution. And for a linked-list this works.
For a non-trivial datastructure, like any kind of binary tree, a map, or even a simple vector, I would argue that C++ gains the advantage.
Plus look at maintainability afterwards. Modifying List<Person*> into Vector<Person*> in C++ … that's 5 minutes work, tops. In C you'll be coding all afternoon to make this change unless you're extremely practiced, and you still won't get it in 5 minutes. Making a red-black tree out of it in C is very likely to contain at least one hard-to-find bug, whilst for C++ you'll be done in minutes.
Macros can solve this, but that's like smearing yourself with honey and diving butt-naked in an ant's nest.
I just realized that the "standard" linked list implementation in C : glib's double linked list (google for the docs, I can't post links) … does exactly the same thing as the C++ implementation, losing type safety. Glib uses a helper object with a void* pointer in there.
GLib basically just craps itself when it runs of memory, making it laughable from the perspective of resilient code. Their attitude is:
"If any call to allocate memory fails, the application is terminated. This also means that there is no need to check if the call succeeded."
"no need" - OMG. So, every single program using GLib can fail at any time, either through its own stupidity or any *other* program's attempt to grab too much RAM. The irony is that the other program may not be using GLib, and could easily survive and go on to use the memory freed up by the GLib program imploding. Which offers an easy to to get more RAM - grab all that's left and wait for a bit voila! More memory.
GLib's memory exhaustion handling is a joke.
There's nothing else they can do. Modern OSes allow for memory overcommitment anyway, which means that allocation may succeed even if there isn't sufficient memory available.
Even if that wasn't the case, what would you do if out-of-memory condition happened? Any attempt to recover may run out of memory itself. For example, years ago I've seen closing a file fail in low-memory conditions.
Additionally, when you are running out of memory, OOM killer is probably already in progress and it may very well choose your application to be killed.
In short, memory allocation errors are the primary candidates for "fail fast" strategy.
This is a practical work-around, not a theoretical solution, but FWIW, you could allocate a "spare" memory pool you can borrow from when memory exhaustion becomes a problem.
It will not protect against all conditions, especially not a memory leak in another running process, but it could keep things going long enough to print dire warnings to the console, stderr, etc, and optimistically, allow your app to survive long enough to realloc() to restore the size of the "spare" memory pool.
What it will do is give you something useful to do to recover from malloc() when it returns a null - namely realloc() "spare" to something smaller to free up the memory you need. Hopefully.
To cut reasoning short, if you managed in your C++ part of the code achieve O(N) where O(1) is expected, you got something badly wrong. Correct your design, and for free you will get rid of the nonsense "not-really-C++" code above.
As a writer of low-level C++ code which often avoids std:: containers in places where it matters, I think the problem is somewhat correctable. Boost (and I don't count my self as a big fan) intrusive attempts to solve this problem by allowing some fields into your structures. My attempts to do this (yes, involving templates) are often less sophisticated than boost intrusive, but perhaps a little closer to the metal and simpler.
Often complaints about a language really boil down to complaints about the commonly used libraries in that language. I think its fair to say that std::list is a mainstream choice for solving the problem. But if you really care about counting your allocations and using thread aware allocators, memory pools, and optimization — looking beyond the mainstream seems necessary. In my opinion C++ is not the problem, its the libraries. If the libraries for language are awful, that is a great reason to not use the language (since we don't have time to contribute libraries which fix the problem — right?).
Quite honestly, std::list should be avoided in most code, as other containers will usually outperform it nowadays in all aspects, and even when it doesn't (like when coding a networking library, or applications requiring heavy I/O), then you'd be better off using your own linked list.
Conclusion… never use std::list. But to go as backwards as to use C… I think it's taking it a bit far.
The problem is not having used C++ instead of C, but having used some containers when they weren't appropriate. The main difference between your two approaches here are intrusive vs non intrusive containers, and that doesn't depend on the language.
In your C case, you built an intrusive list. If you build it like that, you'll have to reimplement a list every time you want to make a list of something other than a person.
In your C++ example you used a non-intrusive list. They have their advantages (if you want to use the same pointers to people in multiple lists, it's easier to do so in a non-intrusive list.
You could have used an intrusive list in C++ (see boost for an implementation if you do not want to roll your own), and with the help of templates you could have an intrusive implementation that wouldn't have to be rewritten for each listable class. So using C++ would have been an actual advantage, while maintaining the performance/memory usage of your C example.
The mere fact that std::list exists makes it easier for one to use it for all cases, and it leads to possible wrong design decisions such as the one you pointed in the article. They are appropriate for some use cases, and inappropriate for others. So I wouldn't blame it on C++, not in this case.
my crystal ball says : in 99% of the cases using std::list for making arguments about C++ vs C is trolling or ignorance.
llvm dot org/docs/ProgrammersManual.html#dss_list
But as a C++ dev I can give you examples of C++ sucking compared to c:
vector<vector<» :…
The solution to the first problem is to use a list of Persons, not Person pointers (i.e. std::list<Person>). Then the template-generated linked list node will look something like this:
struct Node {
Person person;
Node* prev;
Node* next;
};
No double memory allocation. In fact, it's almost exactly like the "private in X" construct that you're proposing.
Then, to refer to list elements, use a std::list<Person>::iterator instead of a Person*. You can pass this to the list erase function for constant-time erasure. If you think about it, an iterator is equivalent to a pointer (* and -> get you a Person), but it also lets the implementation access the linked list node.
Thanks. I've already edited the text to make it clear that "person" is non-assignable.
As for using iterators instead of pointers, imagine the object is contained in two lists. You would have to pass pair of iterators around. If it's contained in 10 lists, you would have to pass structure containing 10 iterators around. And you are back to encapsulation problem: For every new list you would have to adjust the "iterator tuple" object.
Yep. That's the way it is done in Linux kernel for example.
Yes, I think it was definitely a defect in the STL that list elements had to be assignable. They could have had a version of push_back() that took no arguments and instead default constructed a new element (then you could use back() to get to it) but they didn't.
C++11 fixed this by adding emplace_back(), which also uses new C++11 features to let it take any number of arguments which are forwarded to the element's constructor. Given how new C++11 is, I can understand why this might not be an option for most people.
Regarding the iterators, I don't think the C version would have better encapsulation - the encapsulation problem is just elsewhere. I suppose it's a matter of taste where you'd rather have that problem.
The "private in X" idea could solve the encapsulation problem, I suppose, but only if you made a distinct type for each of the 10 lists which a person could be part of - otherwise the compiler wouldn't know to add 10 prev/next pointers.
And that's the cause in most cases, isn't it? Having same object in the same list twice is pretty rare.
I'm talking types, not instances of a type. Having the same object twice in the same list instance is rare, but if you have the same object in 10 different list instances, what would the type of each list instance be? You probably want it to be people in every case, but then the compiler would add only one next/prev pointer to the person type, which isn't enough. There's no way for the compiler to know that at runtime the person will be stored on 10 different lists.
I'm not sure there's a good way to store an element on multiple lists without breaking encapsulation somewhere.
Ah. I see. What we are representing then is an relation between two types of entities. Some kind of 2D bitmap or similar would be more appropriate then rather then a set of lists.
This is exactly the point of the Boost "Intrusive" containers, which are both delicious and exceedingly simple to use. Check them out!
Yes. They break the encapsulation principle as well. My point was that in C++ you can't have an "intrusive" container and proper encapsulation at the same time.
What you're talking about is containment, and it's entirely possible to implement containment without breaking encapsulation, but your code looks more like composition.
You and other parties don't have to know about each others' implementations if you agree that containment might be important and agree on a containment protocol.
Read the GoF! ;)
If you believe it's possible, please post a short code snippet of person & people classes that doesn't break the encapsulation.
Listen, if you're going to write C++ in the manner one would write C, you're going to run into these problems. C and C++, though they share similar syntax and semantics, are fundamentally different languages. You appear to be writing C++ as one would write C, and this appears to have caused you an unnecessary amount of grief.
Excessive allocations for a list of people.
Simply put, a sane C++ programmer would simply use a `list<Person>` instead of a `list<Person*>`. This yields a memory layout that is almost exactly equivalent to the C version. Dereferencing an iterator compiles down to an add instruction.
Element removal from a list is an O(n) operation
Yeah, I can do it incorrectly in C as well.
In C++, you should iterate through lists using the appropriate const-ness iterator, e.g.
for (ListT::const_iterator i = list.begin() ; i != list.end() ; ++i) {
…
}
You can then use ‘list.remove(i)` to remove a specific element in constant time. This is equivalent to your C code. You shouldn’t be wrapping it up into a function which discards all context (and if you must remove elements by-value, at least use std::list<>::remove).
Yes, this means you cannot wrap your C++ into C-like interfaces and expect good performance. You can't wrap your C into Python-like interfaces and expect good performance either.
To conclude, ZeroMQ shouldn't have been written in C++, not because of the performance characteristics of the language, but because the author refuses to write C++ in a performant manner. If you're going to write in C, by all means, actually use C.
The complexity of std::list<>::remove is not constant, but linear in container size (comparisons).
Linked lists almost always suck compared to vectors because the way they lay out memory causes a lot of cache misses and because iterating through them is so expensive.
http://www.youtube.com/watch?v=YQs6IC-vgmo
Yes. But vectors are pretty bad a resizing. It's a trade-off.
It sounds like you're pretty much against C++ regardless of any actual corrections to misunderstandings you may have about it (C++11 fixes quite a few defects). I agree that it's a complicated language and it's easier to shoot yourself in the foot than with other languages (however this is even more true with C).
First, I'm not sure what you mean by bad at resizing. What particularly about it is bad?
Re vector vs list, cache misses are a drastically significantly more important performance impact than anything else these days & you're guaranteed them when using a list unless you do lots of jumping through hoops to allocate everything in a contiguous array (but then why not vector?).
I highly recommend you watch http://channel9.msdn.com/Events/Build/2014/2-661 (the performance stuff starts roughly around the 20 minute mark) & is language agnostic (true for C++/C/Java, etc).
Additionally, in C++11, if you declare your move constructors noexcept (or use the default one or omit destructor, move/copy contructors & move/copy assignment operators), then resizes of vectors can be much faster (if the move constructor is faster than the copy one) since the container doesn't need to copy your objects, it'll just move them.
C++11 lets you also express your non-assignable types without resorting to pointers by declaring the type to be move only (thus Person can live on the stack or moved to live in the heap):
class Person {
public:
~Person();
Person(const Person& o) = delete;
Person(Person&& o) noexcept
: // initialize member variables with the resources from o
{
// modify o so that it no longer owns any resources
// but leave it in a destructible state
}
Person& operator=(const Person&) = delete;
Person& operator=(Person&& o) noexcept
{
Person tmp(std::move(o));
swap(*this, tmp);
return *this;
}
private:
friend void swap(Person&p1, Person& p2) noexcept;
};
Additionally, it's unlikely that Person would actually own anything. Instead you can omit all the constructors:
class Person {
public:
Person();
// don't specify destructor or move/copy constructors/assignment operators
// & this type is implicitly move-only due to _data being move-only, it's exception-safe
// & everything is automatically annotated noexcept for you
private:
std::unique_ptr<PersonData> _data;
};
This post again shows that you are the problem, not the language. Why are you using "std::list <person*> people;" instead of much simpler "std::list <person> people;"? If you are worried about construction cost, use emplace_back.
You should rather:
std::list <person> people;
The equivalent of the pointer in your C program is an iterator std::list<person>::iterator which *does* allow erasing in O(1). What you don't show in your C code is how you got the pointer to the element in the first place, traversing the list?
Now you get the same or better performance with more safety/higher level of abstraction.
Your cpp and c snippets not equivallent. Results of using them are different in case when list contains same pointer twice(for whatever reason).
This statement is incorrect: In C++ solution erasing an object from the list has O(n) complexity
This one correct: In C++ solution erasing an object from the list has O(n) complexity by object value
In C++ solution you should use iterator, not pointer. Erasing an object from list has O(1) complexity by iterator
P.S. you have an error in C++ snippet. people.erase(it) method invalidates iterator, so ++it it is undefined behavior
Nope, but it looks like this:
struct person
{
struct person *list1_prev;
struct person *list1_next;
struct person *list2_prev;
struct person *list2_next;
int age;
int eight;
}
1. That multiply-linked list element is truly horrible and non-scalable, if you need to later add another list, you're going to have to keep adding next/prev pairs.
2. If you've got an object in 10 lists, why on earth are you iterating all of them together? That sounds horribly like a contrived reply to valid criticisms.
I come away with the feeling that you've not invested the time to learn C++ design properly, and then blame the language for your horrible implementation.
1. is the point of the article. Due to the described encapsulation problem there's no way to scale the solution in C++.
The "private in" construct would solve that. I am also said that Scala has a similar construct.
sure you can keep encapsulation, write idiomatic C++ and not be left with the overhead of any superfluous run-time overhead:
Find code on codepad (dot) org (slash) MqAloPgJ
and whats more you get list elements that remove themselves when they are deleted resulting in cleaner, safer code.
From the article: "The real reason why any C++ programmer won't design the list in the C way is that the design breaks the encapsulation principle: The implementer of the "person" class has to know that its instances will be eventually stored in "people" list. Moreover, if a 3rd party chooses to store it in a different list, the implementation of the person would have to be change. This is an anti-pattern that OO programmers learned to avoid."
See, the object needs to know that it will be part of the list:
class important_object: public list_node<important_object>
I was wary about using the word "encapsulation" to describe the OO-ness that the code preserves.
The point is, you can write an intrusive list that is not intrusive in your client code. Yes you have to inherit from the node class but (my implementation is not full, but Google it someone has probably done it), but the important point is if you were to need to change the design and re-factor your code it would be relatively easy, depending what you are doing of course you will probably only have to touch the class definitions not the client code. It does not "degenerate to C style"
Talking about what "a C++ programmer" would do is really a straw man argument. The point is C++ generics provide the tools to represent any abstraction you might think of and do so without any run-time overhead. Yes the implementation of the patterns can look pretty scary (in large part to the verbose template syntax) but usage usually has acceptable to good syntax.
I have been reading a lot of language specifications and papers on different type systems and programming paradigms recently and the biggest surprise that I have had from the whole project is that it turns out most of the ideas can be implemented in a usable way as a C++ library. Think very carefully before you say something can't be done.
Finally if you have a case where your code is easier to read and maintain and more efficient in C style then do it, it certainly doesn't mean that your whole project has to "degenerate".
Why do I care enough to write this?
answer 1: C++ provides the tools to let you deal with abstractions at compile-time, C doesn't. In C you must therefore either do without the abstraction or use run-time abstractions leading to just the sort of indirection this article seeks to avoid.
answer 2: I'm a geek
answer 3: someone is wrong on the internet ;¬)
PS I have very much enjoyed using ZeroMQ
:)
Ok, I guess you are into a more theoretical discussion. Here's my take then:
What OOP delivers is a way to decompose the universe into well-delimited objects. The appeal is that human brain is wired to deal with well-delimited objects and thus OOP allows common sense to kick-in (rather than pseudo-scientific reasoning required by non-object-oriented code).
In other words, if there's a box of chocolates, common sense assumes that these are two well-delimited objects and there's nothing chocolate-like about the box and nothing box-like about the chocolates.
So far so good. There's nothing wrong about OOP paradigm.
Enter C++. C++ have evolved from C and re-used C "struct" construct to define classes. What that means is that "well-delimited object" maps to "continuous chunk of memory" in C++.
This tight coupling between "object" and "memory allocation" works well in most cases, however, in few special cases it goes astray. When thinking about lists, the only way to separate box properties from chocolate properties is to allocate 2 memory chunks instead of a single one per chocolate in the box. That's basically what STL does.
Intrusive containers merge the two chunks into a single one, but then some of the properties of the box (prev,next) leak into chocolate object. Which, of course, violates the basic promise of the OOP paradigm (well-delimited objects).
I would say there's no way out, unless the language doesn't enforce tight coupling between objects and memory allocations.
"but then some of the properties of the box (prev,next) leak into chocolate object"
Intrusive containers' helper objects (the equivalent to my list_node) can quite easily make their methods private but share them by friendship with the list and iterator classes so that the appear from the outside as a normal container class. say "intrusive but not obtrusive"
"I would say there's no way out, unless the language doesn't enforce tight coupling between objects and memory allocations."
Not necessarily but it is the easiest way of telling the compiler how the data should be laid-out. If you break the coupling between object and memory structure you will end up having to look up every property like Ruby and its dynamic bretheran, and that is a _lot_ slower.
Could you point me to the actual code where this is a problem. Talking in the abstract and with toy examples can end up going 'round in circles.
I wager I can find a C++ solution that will deliver better encapsulation, making your code safer, more expressive and easier to maintain, without compromising runtime performance when compared to the bare C solution you describe. And that even in this case to which you feel C++ is particularly unsuited.
I say this with all humility, knowing I am not an equal to you as a software engineer.
Ok, let's go for it! :)
My problem is as follows:
I have many classes in the project, each has many members. I have also few mutexes. Each mutex synchronises some (but not all) members in some classes. Say mutex 1 synchronises A::a, A::b, B::i and C::x, while mutex 2 synchronises A::c and C::y etc.
Now, the challenge is:
- How to keep track of which mutex synchronises which variables
- If possible, make compile complain when synchronisation is done incorrectly
"can quite easily make their methods private but share them by friendship" — I've actually written a blog post about this: http://www.250bpm.com/blog:9
"My problem is as follows:…."
Can I see the actual code? I searched zeroMQ for use of std:list and came up empty handed.
"I've actually written a blog post about this"
Something like that but the next and previous pointers in the helper class need to point to person objects because you can't get to the person class from the helper class and all you are left with is a linked list of helpers (I guess strictly you could probably do some black magic where you find the offset of the helper data I'm assuming that's not what you wanted to do).
0MQ code is to convoluted to be discussed here, but try adding proper encapsulation to this:
class A
{
mutex m; /* uses and synchronises C::i and D::i */
};
class B
{
mutex m; /* uses and synchronises C::j and D::j */
};
class C /* doesn't use C::i or C::j */
{
int i;
int j;
};
class D /* doesn't use D::i or D::j */
{
int i;
int j;
};
As for the black magic, casting from member to the container is a standard way of doing things in C. Check e.g. Linux kernel sources.
black magic: that's interesting - does that allow you to access a member of a struct via a pointer to another member as efficiently as if you had a pointer to the struct itself or is there another step involved? I would also be interested to find out whether in any circumstances this sort of trick prevents optimizations by obfuscating what we are doing to the compiler… very interesting.
"If possible, make compile complain when synchronisation is done incorrectly"
that sounds like a job for typestate. Implementing some approximation of this with C++ metaprogramming sounds like a very interesting project (and as I think of it, some new C++11 features, in particular type erasure and variadic templates might help a lot)
"0MQ code is to convoluted to be discussed here, but try adding proper encapsulation to this"
I'm sorry I just don't have the context to understand the problems with this sort of example. If you can point me to where I can find the code, which classes in particular are problematic (what corresponds to A B C and D) then I will try to deal with the complexity or admit defeat.
black magic: It compiles into a single subtraction operation, so it's very quick. Also, it's done on regular basis in linux kernel so my guess is there is no serious performance problem like messing with optimiser. If there was, it would have been fixed long time ago.
"that sounds like a job for typestate. Implementing some approximation of this with C++ metaprogramming sounds like a very interesting project (and as I think of it, some new C++11 features, in particular type erasure and variadic templates might help a lot)"
Go for it. It would be interesting to see what C++11 has to offer in terms of decoupling memory layout from accessibility settings.
"I'm sorry I just don't have the context to understand the problems with this sort of example. If you can point me to where I can find the code, which classes in particular are problematic (what corresponds to A B C and D) then I will try to deal with the complexity or admit defeat."
0MQ goes to great lengths to avoid this kind of problem. See the convoluted algorithm for managing list of pipes in lb_t and fq_t classes. The list stores pipe index (position in the list) inside of the pipe, however, pipe is not necessarily living in the same thread as the list. What you get is members that are living in different threads packed without any safeguards into a single class. There are lock-less algorithms involved etc. so it's really top complex system to demostrate theoretical issues on.
The ABCD snippet in the previous post is much cleaner example. The only goal is to ensure that individual variables are accessed only by their owners, i's by A, j's by B. Attempt to access say C::i from B or C should yield a compile-time error.
The only solution I can think of is along the lines of enclave pattern, I've described in my blog. However, that's hardly an idiomatic C++ and leaves me asking whether using plain old C is not a better solution.
…ok this is interesting.
I thought we were talking about a linked list and that the problem was simply going to be one of choosing which intrusive linked-list implementation best fitted the problem.
This however is a much less well defined problem, but quite interesting.
What is going on in the code now is clearly not ideal.
That array class is crazy, just some thoughts so far, no need to respond:
1) why do we want random access to an array which does not
guarantee order is preserved? certainly it doesn't seem like a priority
that this is O(1)
2) pointers can be invalidated because deletion of objects does not remove them
from the array - this leads to a need to always check the pointers.
3) pointers can be invalidated because we are allowed access to the pointers
by reference
4) the array indexes can be invalidated because set_array_index is public.
5) The comments claim that insertion in O(1) but this is only average time.
Worst case time is bad because push may require new memory allocation and
copying of the array.
An intrusive linked list could actually solve all of these problems, though iteration would obviously be slower and if there is some reason why you need O(1) random access via index then that would be a problem too. Intrusive linked lists do not require any allocations to add existing nodes to a list, for this reason they are good for "real time" applications.
On the broader question of how to design the locking properly I am going to have to study the code more and review the literature in this area. I have a few ideas.
I'm gonna stop myself babbling, I will probably go quiet for a while but I will try to come back to you with something useful.
OK. Give it a stab.
However, I believe the problem is inherent: You can improve the code until it is cleanly implemented intrusive list. Then you have parts of the list object leaking into item object (next, prev). You need to ensure that these won't be used by anyone other than the list. So you'll probably go with something like the enclave pattern. At that point you should look on the code and ask: Are all these hacks used to get something that C++ promises out of the box (member access control) really worth it? Wouldn't it be cleaner to use raw C instead?
Actually you're wrong about the extra allocations with stl containers/iterators. Using vector<person> (not the lack of *) I had exactly 0 overhead. The class is exactly 8 bytes http://ideone.com/C4j0BG
Using list I get 24 bytes. I'm not sure where the extra few bytes comes from but that may be solved using a different implementation. Templates/oops doesn't cause extra allocations http://ideone.com/qGp2Am
The assumption is that the object is not assignable (imagine object that contains a running thread). Thus, there's need to store it in the list by reference, rather than be copying it.