[kwlug-disc] Massage Passing stuffs?

Mon Sep 18 13:18:16 EDT 2017

Good afternoon, William,

> On Sep 14, 2017, at 12:18 AM, William Park via kwlug-disc <kwlug-disc at kwlug.org> wrote:
> I posted this to GTALUG, so my apology if you've seen it before.
> I'm posting to KWLUG, because there may be more "robotic" people here.
> 
> I have various peripherals that I need to read and write.  At the
> moment, they are all local and don't need to talk to each other.  I
> think of 3 approaches from top of my head:
> 
>    1. Use select(2) (and friends) to round-robin the peripherals.  This
>    is "classic".
> 
>    2. Each peripheral is serviced by a separate thread, and main thread
>    does the business logics.
> 
>    /3/. Each peripheral is serviced by a separate process, and they pass
>    "messages".  This option is what I want to investigate.
> 
> So, do you know any "message-passing" scheme, framework, or library that
> I can look up?

I’m actually looking into this myself for an application.

I am *not* an expert at this. Take anything you read with that in mind.

I think we’d need more information in order to answer your question:

1. What platform is this? You mention select() and threads, so I assume it’s a PC platform of some type, probably Linux given the nature of the mailing list.

2. How much data are we talking? Bytes? kBytes? MBytes?

3. How often are you exchanging data? every few seconds, or many hundreds or thousands of times per second?

4. Do you have specific latency or delivery requirements? Can exchanged data be delayed or lost without issue? What happens in the case of a lost or corrupted transmission?

I’ll give you an example using my own problem. I have a hardware data generator that spits out smallish (between 32 and 400 bytes) packets of data relatively quickly (100-240 times per second depending on the data type). I have a processing program that reads the data over the serial (USB) link, figures out which packet it just received, and sends it off on one of 5 “pipes” - one for each data type. Once of these pipes is bidirectional; it’s a control channel so that another process can send messages back to the data generator hardware, but the rest are unidirectional.

I don’t have any specific requirement for the data to be shared off-computer, but being able to transmit the data over a network would be a bonus. I do, however, have multi-platform concerns. I want this data processor and whatever data consumers to be able to run on Linux, OSX and Windows.

Until things started getting complex, I was sending the data from the processing program to the consuming program via regular stdio stream (e.g. I would fput(stdout) and then redirect or pipe it to the data consumer’s stdin). I use stderr to print debug/diagnostic stuff so it doesn’t corrupt the data stream.

Now that the hardware is generating five different data types I want to be able to send one data type to one data consumer, another few to another consumer, etc. These consumers could be different processes (not threads) in my application.

Things I thought to use:
* pipes (mkfifo()) - kind of hokey, doesn’t work nicely on Windows
* shared memory - very different API on Windows
* unix domain socket - not sure how this works on Windows
* TCP or UDP - not sure of overhead, but straightforward
* third party library (rabbitMQ, 0MQ, etc.) - additional dependencies

In the end, I chose TCP. I thought UDP would be better since I do *not* have delivery guarantee requirements (they’re all sensor streams) but some brief googling tells me that this is not the case, as the stacks seem to all be geared to maximizing TCP throughput. In fact, Windows even has the SIO_LOOPBACK_FAST_PATH IOCTL which helps as well. It’s working out well, the latency is very acceptable, and has no additional library requirements on any platform. TCP_NODELAY ensures minimum latency by disabling Nagle’s algorithm and sacrificing bandwidth efficiency.

I couldn’t find any specific reason to choose one of the messaging libraries; my data rates are not that high, I don’t have complex message distribution needs and because of this, none seemed to have any real advantage over regular old socket programming.

So now, my processing application is more or less a data reflector. It opens and configures the serial port, creates five sockets, bind()s and listen()s, and then the main loop consists of a poll(), watching the fds of the serial port, those five listeners and however many connected clients there are. If the serial port has data it parses it and, if a full data packet is processed, sends it out to whomever is listening for that data type. If a listening fd has data, I connect() and add that new connection's fd to the pool of fds that poll() watches. Four out of the five data types are one way (from the hardware to the listeners) but they all get added to the pool so I can clean up when/if they disconnect. The bidirectional data type POLLIN code processes commands to send back to the hardware in addition to watching for disconnections.

It actually worked out really nicely. While I only need one connection per data type it was trivial to allow multiple connections per data type without any significant increase in code complexity. Being able to stream the data over the network (or even Internet) is a nice bonus, with the latency being a function of network quality.

-A.