How hard can I expect to lean on network activity for Netduino Plus 2? (Sample code attached)

#1 stewlg

New Member

Members
6 posts

Posted 14 March 2014 - 04:49 PM

I've been working on an implementation of NRPE for Netduino Plus 2.

NRPE is a protocol spoken by Nagios for system monitoring. So far I've hooked up a flood sensor (the reason I started the project) and a temperature/humidity sensor (so I could get the mechanics of multiple checks figured out early, and allow for easier expansion).

I feel like I'm near the end of the project, but during development I've had some threading/resource type crashes so I've made an effort to test the code under circumstances that go way beyond what will happen during deployment. I do NOT want to be going down to my basement every few weeks or days to manually reboot the device if it hangs. I am OK with it getting into a bad state briefly, SO LONG as it can recover in a few minutes by itself and keep going.

I've been testing under three scenarios.

Normal Usage: In deployment, the device will be polled every 1-5 minutes per service. This is very light duty and it is certainly possible that the code I have may already sustain this load indefinitely without crashing.

In my testing though I have stepped up the load two more levels more to try to find any lurking networking/threading/resource bugs:

Stiff Breeze: One copy of this shell script running against the device. You will need check_nrpe available to you to make this work, which is part of NRPE.:

#!/bin/bash
COUNTER=0
while [ $COUNTER -lt 100 ]; do
./check_nrpe -n -H noah.doodle.local -c check_temp
./check_nrpe -n -H noah.doodle.local -c check_flood
done

[ Yes I'm not incrementing the counter, and this runs forever, deliberately. ]

Gale: Multiple copies of the above shell script running in parallel against the device.

Under Stiff Breeze, I can consistently get the device to crash in 2-4 hours. When it crashes, I get a network 10054 error, probably when some incoming connection that the Netduino is trying to respond to times out. Then I get some 10048 errors on the aborted stream. And finally I bounce back to the main loop, but evidently the network stack is dead by then, as the Netduino is no longer responsive to ping. The Netduino must be power cycled/rebooted to become responsive to ping.

Under Gale, I can get the same thing to happen in a matter of minutes.

(I haven't done long-term Normal Usage testing, as it seems likely that it will take days for an error to occur, if it is going to occur at all.)

Am I being unrealistic about what the device can do? Or is there a coding error here I can correct?

You do not need my sensor setup to simulate the crash (although I've included a Fritzing screenshot in the zip for the interested). All you need is the ability to run check_nrpe in a loop as shown above.

Thanks very much!

Attached Files

Flood Sensor 3-15-2014.zip 506.01KB 3 downloads

#2 wendo

Advanced Member

Members
85 posts

Posted 14 March 2014 - 10:06 PM

I have no idea about your question, but integrating a netduino into Nagios monitoring is something that never even crossed my mind, but opens up so may possibilities with environment monitoring it's awesome!

Also, your wiring image has the wrong devices on the end of each set of wires, it's the DHT22 that needs multiple pins, not the water contact

#3 stewlg

New Member

Members
6 posts

Posted 14 March 2014 - 10:55 PM

I have no idea about your question, but integrating a netduino into Nagios monitoring is something that never even crossed my mind, but opens up so may possibilities with environment monitoring it's awesome!

I totally agree. You need to have something of a Nagios-centric perspective already to embrace it, but I am hoping others will find it useful too. I just couldn't bear to pay hundreds of pounds/dollars for a flood monitor for Nagios:

http://www.sensormet...nd_flooding_s_6

Also, your wiring image has the wrong devices on the end of each set of wires, it's the DHT22 that needs multiple pins, not the water contact

Crumbs. I did that late at night. I will revise before "final" publication, and this time I'll use different colors for the wires.

In the meantime, I'm seeing several others suggesting that the Netduino network stack is inherently somewhat unreliable:

http://forums.netdui...uino#entry49132

http://forums.netdui...vailable-in-42/

http://forums.netdui...ch=1#entry23819

#4 wendo

Advanced Member

Members
85 posts

Posted 15 March 2014 - 01:02 AM

Just as a counter to those, I've had a netduino running with a web server getting very light usage for many weeks at a time, so it may not be the time involed so much as the amount of data coming at a single point of time that pushes it over the edge so to speak. Of course it may also be that after a certain amount of data it dies and I never got there with my multi week test