Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.

stewlg

Member Since 01 Mar 2014
Offline Last Active Jun 10 2014 02:38 PM
-----

Topics I've Started

Netduino NRPE monitor

01 April 2014 - 04:58 AM

Full post here:

 

http://www.skyscratc...r-for-netduino/


How hard can I expect to lean on network activity for Netduino Plus 2? (Sample code att...

14 March 2014 - 04:49 PM

I've been working on an implementation of NRPE for Netduino Plus 2.

 

NRPE is a protocol spoken by Nagios for system monitoring. So far I've hooked up a flood sensor (the reason I started the project) and a temperature/humidity sensor (so I could get the mechanics of multiple checks figured out early, and allow for easier expansion).

 

I feel like I'm near the end of the project, but during development I've had some threading/resource type crashes so I've made an effort to test the code under circumstances that go way beyond what will happen during deployment. I do NOT want to be going down to my basement every few weeks or days to manually reboot the device if it hangs. I am OK with it getting into a bad state briefly, SO LONG as it can recover in a few minutes by itself and keep going.

 

I've been testing under three scenarios.

 

Normal Usage: In deployment, the device will be polled every 1-5 minutes per service. This is very light duty and it is certainly possible that the code I have may already sustain this load indefinitely without crashing.

 

In my testing though I have stepped up the load two more levels more to try to find any lurking networking/threading/resource bugs:

 

Stiff Breeze: One copy of this shell script running against the device. You will need check_nrpe available to you to make this work, which is part of NRPE.:

 

#!/bin/bash
COUNTER=0
while [ $COUNTER -lt 100 ]; do
         ./check_nrpe -n -H noah.doodle.local -c check_temp
         ./check_nrpe -n -H noah.doodle.local -c check_flood
done

 

[ Yes I'm not incrementing the counter, and this runs forever, deliberately. ]

 

Gale: Multiple copies of the above shell script running in parallel against the device.

 

Under Stiff Breeze, I can consistently get the device to crash in 2-4 hours. When it crashes, I get a network 10054 error, probably when some incoming connection that the Netduino is trying to respond to times out. Then I get some 10048 errors on the aborted stream. And finally I bounce back to the main loop, but evidently the network stack is dead by then, as the Netduino is no longer responsive to ping. The Netduino must be power cycled/rebooted to become responsive to ping.

 

Under Gale, I can get the same thing to happen in a matter of minutes.

 

(I haven't done long-term Normal Usage testing, as it seems likely that it will take days for an error to occur, if it is going to occur at all.)

 

Am I being unrealistic about what the device can do? Or is there a coding error here I can correct?

 

You do not need my sensor setup to simulate the crash (although I've included a Fritzing screenshot in the zip for the interested).  All you need is the ability to run check_nrpe in a loop as shown above.

 

Thanks very much!

 

 

 


home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.