Watchdog redeux: Available in 4.2?
#1
Posted 13 February 2012 - 01:33 AM
Issue: Our project is really making progress, but we gotta have a real watchdog to reboot the device when we hang.
Sounds like we aren't the only ones:
- http://forums.netdui...h__1#entry23819
- http://forums.netdui...h__1#entry18239
- http://forums.netdui...ch__1#entry6876
- http://forums.netdui...ch__1#entry5125
So the question..... Is a real watchdog avail in 4.2?
#2
Posted 13 February 2012 - 03:34 AM
#3
Posted 13 February 2012 - 04:04 AM
We can look at adding an extra watchdog feature with 4.2, but it'll largely be a space issue. We're hoping to have the final 4.2 firmware ready within the next 60 days...so please ping us again then.
I will be happy to.
I am trying to find the right platform for my project, which will ultimately use 10-50 boards. (Up to three now, and counting.) Netduino is really compelling, because of the embedded networking and overall pricepoint and functionality. However I am suffering from these issues:
- dhcp shortcomings (discussed in other threads): In theory fixed in 4.2
- lack of watchdog
- limited memory when using System.Net.HttpWebRequest
My evolution has been, so far:
- Get single netduino, ran into issues
- Switched to Panda, ramped up to a few boards, ran into bigger issues
- Came back to netduino, up to three boards, about to purchase a few more
But my project requires a manual reset every several hours, which is very frustrating!
It seems to me that netduino can't be used for anything real without a watchdog. What am I missing?
[You could also build this feature yourself if you want to dive down into the lower layers of NETMF...or perhaps a few of us could work on it as a customization...and then swap out another feature for it.]
Thank you for the suggestion, but this is completely beyond my ability.
I can pay for a few hours of consulting time, if it helps watchdog get it into the base platform.
Thank you for the great support!
#4
Posted 13 February 2012 - 04:22 AM
- ErikN likes this
#5
Posted 13 February 2012 - 04:53 AM
It's also possible to add an external watchdog chip, and "kick" it every few seconds. Then it can simply drive your RESET pin low to reset your board.
I see that discussion. For me, that is a LOT of work. I may have no choice, but I would like to avoid it (fwiw, it increases the effort of my project by %100, it is beyond my comfort zone by far).
The thing I'd like to investigate is why your board is locking up. If there's a bug in the .NET MF lwIP networking stack which is causing this, we'd like to fix the core issue so that it doesn't need rebooting in the first place...
I do not have any useful way to repro.
Sometimes my stuff runs for 24 hrs fine, sometimes just for an hr.
I usually do not have the debugger attached, so I can't really figure out what is going on.
A watchdog would let me get this project to nearly release stage.
(It pains me to say this) Without a watchdog, I may have to abandon the project, or find another platform that has it. The project just isn't worth anything if it needs physical interaction sometimes.
#6
Posted 13 February 2012 - 06:10 AM
Sometimes my stuff runs for 24 hrs fine, sometimes just for an hr.
Questions:
How busy is your network traffic? ie How many devices are on the network. If you run wireshark, do the records fly past?
Does your board hang completely, or just the network side of things?
#7
Posted 13 February 2012 - 03:02 PM
The thing I'd like to investigate is why your board is locking up. If there's a bug in the .NET MF lwIP networking stack which is causing this, we'd like to fix the core issue so that it doesn't need rebooting in the first place...
Do you have a repro for the lockup by any chance?
OK, this happened again overnight (on two boards). It has been discussed on other threads also:
- wired ethernet
- using a System.Net.HttpWebRequest call every 15 seconds for an HTTP GET (sleeping in between)
- system is up and working fine in PM
- system is wired to wifi router (wifi router -> cable modem)
- cable modem is disconnected overnight
- in the AM, cable modem is reconnected
- SOL, netduinos are both hung... require physical reboot
I have seen this many times, and today on two boards, side by side.
There are other reports here on threads of issues like this, of hangs when enet is physically disconnected/reconnected
#8
Posted 13 February 2012 - 03:04 PM
How busy is your network traffic? ie How many devices are on the network. If you run wireshark, do the records fly past?
Network is not so busy. Just a home network.
Does your board hang completely, or just the network side of things?
I can't tell, because no debugger is attached. (Or if there is a way to tell, I don't know it! Pls suggest!)
#9
Posted 14 February 2012 - 05:26 AM
I can't tell, because no debugger is attached. (Or if there is a way to tell, I don't know it! Pls suggest!)
Make your LED flash, so if the network hangs, but the LED continues flashing, your code is still executing.
If your main runs in a loop, just add:
//Outside loop: private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true); //Inside loop: LED.Write(!LED.Read()); Thread.Sleep(1000);
or if you just have a Thread.Sleep(Timeout.Infinite); then add:
private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true); Timer flashLED = new Timer(new TimerCallback((object data) => { LED.Write(!LED.Read()); }), null, 1000, 1000);
Note: code typed without compiler, so excuse any mistakes
Edit: added sleep to main loop code, otherwise there would be no flash!
#10
Posted 14 February 2012 - 05:47 AM
That's a clever diagnostic solutionMake your LED flash, so if the network hangs, but the LED continues flashing, your code is still executing.
#11
Posted 14 February 2012 - 02:45 PM
#12
Posted 21 February 2012 - 04:23 AM
#13
Posted 19 August 2012 - 09:10 PM
Make your LED flash, so if the network hangs, but the LED continues flashing, your code is still executing.
If your main runs in a loop, just add:
//Outside loop: private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true); //Inside loop: LED.Write(!LED.Read()); Thread.Sleep(1000);
or if you just have a Thread.Sleep(Timeout.Infinite); then add:
private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true); Timer flashLED = new Timer(new TimerCallback((object data) => { LED.Write(!LED.Read()); }), null, 1000, 1000);
Note: code typed without compiler, so excuse any mistakes
Edit: added sleep to main loop code, otherwise there would be no flash!
And if you are able to read the led with some hardware you know the program is still running and if the led is not blinking the hardware can reset the board (this is a strange watchdog but in this case it can help)
#14
Posted 10 September 2012 - 02:01 PM
For the test project, we're using the web client example from the netmftoolbox project but updated for 4.2. We then changed the GET to a POST to test inserted data in the a database via php. This works well.
We then wrapped the code in a while to test longevity as this needs to be running for at least a few weeks at a time.
public static void Main() { OutputPort led = new OutputPort(Pins.ONBOARD_LED, false); // Creates a new web session HTTP_Client WebSession = new HTTP_Client(new IntegratedSocket("www.SERVERNAME.com", 80)); int numTests = 720; while (numTests > 0) { // Requests the latest source HTTP_Client.HTTP_Response Response = WebSession.Post("/InsertTest/InsertTest.php", "ID='',data=TestData"); // Did we get the expected response? (a "200 OK") if (Response.ResponseCode != 200) throw new ApplicationException("Unexpected HTTP response code: " + Response.ResponseCode.ToString()); led.Write(true); Thread.Sleep(2500); led.Write(false); Thread.Sleep(2500); numTests--; } }
Regardless of how long I sleep for (tested between 250 ms and 10000 ms) the application on the N+ becomes unresponsive (led stops blinking) but no exception is thrown and the debugger never disconnects. I also tried this without being attached to the debugger and got that same result.
I know I should aggregate the data and upload in batches instead but this should still work and leads me to believe there is some sort of memory leak in the firmware.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users