<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wikidot="http://www.wikidot.com/rss-namespace">

	<channel>
		<title>Comments for page &quot;Implementing Network Protocols in User Space&quot;</title>
		<link>http://250bpm.com/blog:16/comments/show</link>
		<description></description>
				<copyright></copyright>
		<lastBuildDate>Sat, 01 Aug 2015 21:43:14 +0000</lastBuildDate>
		
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1791134</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1791134</link>
				<description></description>
				<pubDate>Thu, 06 Jun 2013 11:59:54 +0000</pubDate>
				<wikidot:authorName>Martin Sustrik</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>Hi,</p> <p>The patch can be found at LKML (<a href="https://lkml.org/lkml/2013/2/8/67">https://lkml.org/lkml/2013/2/8/67</a>). I has not yet been merged in the mainline kernel as I don't have much time to actively push it. If you want to help with that, it would be great.</p> <p>So yes, at the moment you have to apply the patch by hand and build the kernel yourself.</p> <p>The patch was developed with 3.x kernel, but I guess backporting it to 2.6.38 should be easy, maybe even with no work needed, except for applying the patch.</p> <p>No, nothing special has to be done with .config</p> <p>Send, setsockopt etc. are not covered by the patch as you can implement those in user space easily with no special support from the kernel: just use geonetworking_send(), geonetworking_setsockopt() etc.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1791128</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1791128</link>
				<description></description>
				<pubDate>Thu, 06 Jun 2013 11:47:41 +0000</pubDate>
				<wikidot:authorName>Mohamed</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>Hi Martin,</p> <p>Actually I'm trying to implement a network protocol Geonetworkin in the user-space level and to make it working over the 802.11 Mac layer. I found that your post can be very helpful. still I want to know;<br /> 1- If the patch &quot; EFD_MASK patch to Linux kernel&quot; has to be installed first.<br /> 2- Is the patch compatible with the linux-2.6.38 kernel.<br /> 3- If yes, are there any recommended parameters to enable in .config file before compiling the kernel.<br /> 4- If possible to provide me with the remaining functions not implemented in this post (send, setsockopt, connect, bind ).<br /> 5- small description how the send() and receive() will communicate with the low level handlers of the sk_buff structure in PHY and MAC level.<br /> 6- Finally, It will be very helpful if you have an already implemented example you can give.</p> <p>Thanks a lot.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1711198</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1711198</link>
				<description></description>
				<pubDate>Fri, 15 Feb 2013 09:47:19 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>Ah, OK, I see.</p> <p>Anyway, we've gone pretty far away from the topic of the original article. It was only about slighlty extending POSIX to make implementing protocols in user space easier. Implementing protocols in non-POSIX environment is a different, although extremely interesting, topic.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1710352</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1710352</link>
				<description></description>
				<pubDate>Thu, 14 Feb 2013 13:02:02 +0000</pubDate>
				<wikidot:authorName>Ambroz Bizjak</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>Kernel implementations of protocols utilizing such abstract libraries also come to mind.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1710351</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1710351</link>
				<description></description>
				<pubDate>Thu, 14 Feb 2013 12:59:49 +0000</pubDate>
				<wikidot:authorName>Ambroz Bizjak</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <blockquote> <p>Why so? File descriptors are defined in POSIX.</p> </blockquote> <p>Yes, I have systems other than POSIX in mind. Including Windows, and systems with no operating systems at all like microcontrollers. A library that can't work on top of abstract interfaces is useless in such cases.</p> <blockquote> <p>AFAICS the only way to make user-space implementation of the protocol behave exactly like TCP</p> </blockquote> <p>It isn't the point to make it appear *exactly* like TCP. After all SSL isn't defined to work on top of TCP. From RFC 5246: &quot;At the lowest level, layered on top of some reliable transport protocol (e.g., TCP), is the TLS Record Protocol.&quot;</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1710295</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1710295</link>
				<description></description>
				<pubDate>Thu, 14 Feb 2013 09:37:23 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>First of all, this post is about *API* problem. I am not arguing the thing cannot be done with pipe/socketpair/eventfd. What I am saying is that when doing so the API is convoluted and confusing. Check ZeroMQ mailing list for messages from confused users using the API.</p> <p>What I want to achieve is to enable user of the library to write the following code:</p> <div class="code"> <pre> <code>int s = myproto_socket (); struct pollfd fd = {s, POLLIN, 0}; poll (&amp;fd, 1, -1); assert (fd.revents &amp; POLLIN); ...</code> </pre></div> <p>Note the the above is pure POSIX, something that everyone is familiar with.</p> <p>To put it in a different way: Thanks to this patch, user-space implementations of network protocols can expose exactly the same API to the user as kernel-space implementations.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1710287</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1710287</link>
				<description></description>
				<pubDate>Thu, 14 Feb 2013 09:13:52 +0000</pubDate>
				<wikidot:authorName>Emil</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>I see. So basically the idea is to cram 3 or 4 different kinds of events into a single fd instead of 1. I just don't see why this code:</p> <div class="code"> <pre> <code>void my_read_cb(int fd, void *data) { struct myproto *s = data; /* handle POLLIN */ } void my_write_cb(int fd, void *data) { struct myproto *s = data; /* handle POLLOUT */ } void my_err_cb(int fd, void *data) { struct myproto *s = data; /* handle POLLERR */ }</code> </pre></div> <p>is much easier than this</p> <div class="code"> <pre> <code>void my_cb(int fd, void *data) { struct myproto *s = data; while (1) { switch (myproto_what(s)) { case MYPROTO_READ: /* handle POLLIN */ break; case MYPROTO_WRITE: /* handle POLLOUT */ break; case MYPROTO_ERR: /* handle POLLERR */ break; case MYPROTO_EAGAIN: return; } } }</code> </pre></div> <p>And surely if you have performance problems with the last bit of code, you'll have the same problems with the first bit. It'll just be the eventloop's job fan out the codepaths depending on the event flags rather than the switch. Right?</p> <p>Also the second version can easily be extended to protocols that have more interesting events than the TCP/stream derived POLLIN, POLLERR, POLLPRI and POLLHUP.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1710183</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1710183</link>
				<description></description>
				<pubDate>Thu, 14 Feb 2013 04:52:19 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>I think we are not on the same line here.</p> <p>Specifics being put aside, the problem discussed is about generic interface for communication between different protocol implementations and different applications. It's a standardisation problem. It only works when all parties agree on the same interface.</p> <p>I've opted for file descriptors because that's what everybody uses anyway and enhanced it to support one up to now unsupported corner case (user-space to user-space signaling). You can, of course, go for a different interface, but the more obscure it is, the less useful it will be.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1710179</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1710179</link>
				<description></description>
				<pubDate>Thu, 14 Feb 2013 04:47:02 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>&quot; On the other hand I just want working software, now, on existing systems.&quot;</p> <p>I have the working solution now. No problem with that. The problem is that it's ugly and confusing for the end users. That's what I am trying to solve.</p> <p>&quot;Yes, you need a kernel patch - on every operating system you want to support, and every user would need to have it.&quot;</p> <p>Exactly. Linux seems to be a good starting point as it is rather widely deployed.</p> <p>&quot;For that you would need a higher I/O level interface instead of file descriptors, since file descriptors are OS-specific.&quot;</p> <p>Why so? File descriptors are defined in POSIX. (Unless you are speaking of Windows, but nothing works as expected on Windows, so there's little point in caring.)</p> <p>&quot;For example take NSS/NSPR - you can use the NSS library to do SSL on top of absolutely anything as long as you can make it appear like TCP (reliable stream), in non-blocking mode (I've done it, it works).&quot;</p> <p>AFAICS the only way to make user-space implementation of the protocol behave exactly like TCP is to use the kernel patch. OK, there is still another option: Re-implement the whole BSD socket API in a library and overload the system functions by linking with library. Some products do that (SDP, OpenOnload) but it's kind of a brute-force approach.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709847</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709847</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 22:03:27 +0000</pubDate>
				<wikidot:authorName>Ambroz Bizjak</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <blockquote> <p>The only problem is that for it to be useful it has to be standardised and widely adopted.</p> </blockquote> <p>I strongly disagree with this. Sure, you need to implement the interface - but this is actually pretty easy as long as your program is not broken by design. If you use an existing event loop like libevent, implementing this interface is literally trivial.</p> <p>Having such an interface standardized sure would be nice, but any interface is much better than no interface at all. It could be the difference between using an existing library by writing a hundred or so lines of glue code, and reinventing whatever you need just because the developers of the library didn't have such abstract usage in mind.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709842</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709842</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 21:56:47 +0000</pubDate>
				<wikidot:authorName>Ambroz Bizjak</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>Yes, you need a kernel patch - on every operating system you want to support, and every user would need to have it. On the other hand I just want working software, now, on existing systems.</p> <p>In the end, if you want your library to be completely portable, i.e. independent of any particular operating system and using only the C/C++ language, such user-implementable interfaces are the only way. For that you would need a higher I/O level interface instead of file descriptors, since file descriptors are OS-specific.</p> <p>Some real libraries use such interfaces. For example take NSS/NSPR - you can use the NSS library to do SSL on top of absolutely anything as long as you can make it appear like TCP (reliable stream), in non-blocking mode (I've done it, it works).</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709557</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709557</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 16:13:56 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>Well, that's what everybody is doing. I've implemented that several times myself&#8230;</p> <p>So it starts with this:</p> <p>struct myproto_socket {<br /> int in_pipe [2];<br /> int out_pipe [2];<br /> int err_pipe [2];<br /> int hup_pipe [2];<br /> &#8230;<br /> };</p> <p>One problem with that is the amount of fds used. Users start to hit the system fd limit pretty quickly.</p> <p>So you change to implementation to use just one pipe and set of functions to test readability/writeability/error/hangup etc. That's rather weird API, but users are mostly still able to cope with it.</p> <p>Then you start hitting performance limits, so you start using the pipe in edge-triggered mode rather than level triggered mode. So, the users have to cope with edge triggered mode. At this point most users are profoundly confused.</p> <p>I can point out many discussions in ZeroMQ mailing list where people are confused about this API. I think I just saw one such email yesterday.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709451</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709451</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 13:02:39 +0000</pubDate>
				<wikidot:authorName>Emil</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>Yes, but this still doesn't explain why you need to signal !POLLIN &amp; !POLLOUT in the first place.</p> <p>In other words my question is this: Why isn't enough to have POLLIN mean &quot;please call my library as soon as you can&quot; and !POLLIN mean &quot;chill&quot;? Then after the user calls your library you can return all sorts of messages to the caller, like &quot;EGAIN&quot; or &quot;you've got a new priority message&quot; or &quot;the other end hung up unexpectedly&quot; etc.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709448</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709448</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 12:56:42 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>Yet one more try:</p> <p>Imagine that you protocol implementation needs to both read and write more. Thus, the first function in your example returns MYPROTO_WANT_READ|MYPROTO_WANT_WRITE. The user retreives the file descriptor (which is pipe under the cover) and polls for POLLIN|POLLOUT.</p> <p>However, there's no way to signal !POLLIN &amp;&amp; !POLLOUT on the pipe. So, at least one of the two events must be signaled at any time. Thus, the poll(POLLIN|POLLOUT) is never going to block!</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709438</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709438</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 12:40:03 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>There's no problem with read and write themselves. They are fully implemented by the library and can do whatever the library implementer wants them to do.</p> <p>The problem is with poll/select. These functions are global, not protocol-specific, i.e. they span across all the possible protocols: TCP, UDP, SCTP, myproto, yourproto etc.</p> <p>What that means is that you, as the library implementer, have no control of how they work. Still, you need your protocol implementation to be pollable.</p> <p>To do that you need a native file descriptor (poll/select won't accept anything else.)</p> <p>The only ways to create file descriptors in user space is pipe, socketpair and eventfd (Linux-only). All of them suffer of the same problems:</p> <ol> <li>There's no way to signal !POLLIN &amp; !POLLOUT</li> <li>There's no way to signal special conditions, such as POLLHUP or POLLERR</li> </ol> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709432</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709432</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 12:31:15 +0000</pubDate>
				<wikidot:authorName>Emil</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>Right, so what I can't seem to find in your post is why it is important that poll() returns POLLERR or POLLPRI and it is not enough that myproto_recv() or myproto_write() returns an error or something like MYPROTO_GOT_PRIORITY_MESSAGE.</p> <p>That's just another case in the switch above.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709420</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709420</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 11:52:33 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>Well, you are definitely free to define such abstraction. The only problem is that for it to be useful it has to be standardised and widely adopted. Which, of course, is hard to achieve :)</p> <p>Anyway, we have one such standardised and widely adopted abstraction called &quot;file descriptors&quot;. They work perfectly OK and there's no technical need to replace them by a different standard. There's only a little piece missing, namely the ability to create fully functional file descriptors in user space. My kernel patch solves this problem.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709417</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709417</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 11:42:26 +0000</pubDate>
				<wikidot:authorName>Ambroz Bizjak</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>I think the best approach is to define an abstraction of a notification API, like a simplified version of libevent etc, and allow the user of your library to implement it. For example, the library would define this:</p> <div class="code"> <pre> <code>#define EVENT_READ (1 &lt;&lt; 0) #define EVENT_WRITE (1 &lt;&lt; 1) struct MyLibEventLoop { /** * Registers a new file descriptor for notification of the given events. * When events are detected, implemented calls handler. The returned value * is an abstract handle for use in modify_fd and remove_fd. */ void * (*add_fd) (struct MyLibEventLoop *el, int fd, int events, void (*handler) (void *arg, int events), void *arg); /** * Modifies the notification events for a file descriptor registered using add_fd. */ void (*modify_fd) (struct MyLibEventLoop *el, void *fd_handle, int events); /** * Unregisters a file descriptor registered using add_fd. */ void (*remove_fd) (struct MyLibEventLoop *el, void *fd_handle); /** * Registers a timer. When it expires, the handler is called, and at this point * the timer is automatically removed. The timer can additionally be removed * before it has expired using remove_timer. Returns an abstract handle for use * in remove_timer. */ void * (*add_timer) (struct MyLibEventLoop *el, int milliseconds, void (*handler) (void *arg), void *arg); /** * Removes a running timer added using add_timer. After this is called it is guaranteed * that the timer's handler will not be called. */ void (*remove_timer) (struct MyLibEventLoop *el, void *timer_handle); };</code> </pre></div> <p>Then make everything in your library do I/O via this abstract API. On the other side, the user of the library is free to implement the API however he wishes, be it an existing event loop library (libevent, libev, glib, Qt) or his own creation.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709410</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709410</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 11:27:49 +0000</pubDate>
				<wikidot:authorName>martin_sustrik</wikidot:authorName>				<wikidot:authorUserId>939</wikidot:authorUserId>				<content:encoded>
					<![CDATA[
						 <p>&quot;just return a pipe or an eventfd filedescriptor that you control the other end of&quot;</p> <p>Well, exactly. However, there's no way for pipe or eventfd to signal !POLLIN &amp; !POLLOUT.</p> <p>Also there's no way to signal POLLERR, POLLHUP, POLLPRIO etc.</p> <p>That's what the kernel patch allows you to do.</p> 
				 	]]>
				</content:encoded>							</item>
					<item>
				<guid>http://250bpm.com/blog:16/comments/show#post-1709408</guid>
				<title>(no title)</title>
				<link>http://250bpm.com/blog:16/comments/show#post-1709408</link>
				<description></description>
				<pubDate>Wed, 13 Feb 2013 11:19:42 +0000</pubDate>
				<wikidot:authorName>Emil</wikidot:authorName>								<content:encoded>
					<![CDATA[
						 <p>What is wrong with the approach from OpenSSL, Postgres and probably more libraries:</p> <p>For non-blocking I/O you basically do this:</p> <div class="code"> <pre> <code>ssize_t ret = myproto_recv(s, buf, BUFLEN)); if (ret &gt;= 0) { /* yay! everything went well. now do something with the first ret bytes of buf, 0 means connection closed. */ return; } switch (ret) { case MYPROTO_WANT_READ: wait_for(myproto_getfd(s), POLLIN); break; case MYPROTO_WANT_WRITE: wait_for(myproto_getfd(s), POLLOUT); break; default: /* ohh noes! a protocol error occured */ break; }</code> </pre></div> <p>Once you detect POLLIN or POLLOUT you just run the code above again. This isn't much different from using raw sockets in non-blocking mode.</p> <p>Also this should be doable in userspace. If your myproto implementation uses threads in the background, you just return a pipe or an eventfd filedescriptor that you control the other end of.<br /> For blocking I/O it should be even simpler since myproto_recv() will only return<br /> something positive or a protocol error.</p> 
				 	]]>
				</content:encoded>							</item>
				</channel>
</rss>