Netduino home hardware projects downloads community

Jump to content


Photo

Watchdog redeux: Available in 4.2?


  • Please log in to reply
13 replies to this topic

#1 samjones

samjones

    Advanced Member

  • Members
  • PipPipPip
  • 105 posts

Posted 13 February 2012 - 01:33 AM

I don't want to beat a dead horse... so if this is a dead end, pls say so and I will move to other things!

Issue: Our project is really making progress, but we gotta have a real watchdog to reboot the device when we hang.

Sounds like we aren't the only ones:

- http://forums.netdui...h__1#entry23819

- http://forums.netdui...h__1#entry18239

- http://forums.netdui...ch__1#entry6876

- http://forums.netdui...ch__1#entry5125


So the question..... Is a real watchdog avail in 4.2?

#2 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 13 February 2012 - 03:34 AM

Hi samjones, We can look at adding an extra watchdog feature with 4.2, but it'll largely be a space issue. We're hoping to have the final 4.2 firmware ready within the next 60 days...so please ping us again then. [You could also build this feature yourself if you want to dive down into the lower layers of NETMF...or perhaps a few of us could work on it as a customization...and then swap out another feature for it.] Chris

#3 samjones

samjones

    Advanced Member

  • Members
  • PipPipPip
  • 105 posts

Posted 13 February 2012 - 04:04 AM

We can look at adding an extra watchdog feature with 4.2, but it'll largely be a space issue. We're hoping to have the final 4.2 firmware ready within the next 60 days...so please ping us again then.


I will be happy to.

I am trying to find the right platform for my project, which will ultimately use 10-50 boards. (Up to three now, and counting.) Netduino is really compelling, because of the embedded networking and overall pricepoint and functionality. However I am suffering from these issues:

  • dhcp shortcomings (discussed in other threads): In theory fixed in 4.2
  • lack of watchdog
  • limited memory when using System.Net.HttpWebRequest


My evolution has been, so far:

  • Get single netduino, ran into issues
  • Switched to Panda, ramped up to a few boards, ran into bigger issues
  • Came back to netduino, up to three boards, about to purchase a few more

But my project requires a manual reset every several hours, which is very frustrating!

It seems to me that netduino can't be used for anything real without a watchdog. What am I missing?


[You could also build this feature yourself if you want to dive down into the lower layers of NETMF...or perhaps a few of us could work on it as a customization...and then swap out another feature for it.]


Thank you for the suggestion, but this is completely beyond my ability.

I can pay for a few hours of consulting time, if it helps watchdog get it into the base platform.


Thank you for the great support!

#4 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 13 February 2012 - 04:22 AM

Hi samjones, It's also possible to add an external watchdog chip, and "kick" it every few seconds. Then it can simply drive your RESET pin low to reset your board. The thing I'd like to investigate is why your board is locking up. If there's a bug in the .NET MF lwIP networking stack which is causing this, we'd like to fix the core issue so that it doesn't need rebooting in the first place... Do you have a repro for the lockup by any chance? Chris

#5 samjones

samjones

    Advanced Member

  • Members
  • PipPipPip
  • 105 posts

Posted 13 February 2012 - 04:53 AM

It's also possible to add an external watchdog chip, and "kick" it every few seconds. Then it can simply drive your RESET pin low to reset your board.


I see that discussion. For me, that is a LOT of work. I may have no choice, but I would like to avoid it (fwiw, it increases the effort of my project by %100, it is beyond my comfort zone by far).

The thing I'd like to investigate is why your board is locking up. If there's a bug in the .NET MF lwIP networking stack which is causing this, we'd like to fix the core issue so that it doesn't need rebooting in the first place...


I do not have any useful way to repro.

Sometimes my stuff runs for 24 hrs fine, sometimes just for an hr.

I usually do not have the debugger attached, so I can't really figure out what is going on.


A watchdog would let me get this project to nearly release stage.

(It pains me to say this) Without a watchdog, I may have to abandon the project, or find another platform that has it. The project just isn't worth anything if it needs physical interaction sometimes.

#6 ColinR

ColinR

    Advanced Member

  • Members
  • PipPipPip
  • 142 posts
  • LocationCape Town, South Africa

Posted 13 February 2012 - 06:10 AM

Sometimes my stuff runs for 24 hrs fine, sometimes just for an hr.


Questions:

How busy is your network traffic? ie How many devices are on the network. If you run wireshark, do the records fly past?

Does your board hang completely, or just the network side of things?

#7 samjones

samjones

    Advanced Member

  • Members
  • PipPipPip
  • 105 posts

Posted 13 February 2012 - 03:02 PM

The thing I'd like to investigate is why your board is locking up. If there's a bug in the .NET MF lwIP networking stack which is causing this, we'd like to fix the core issue so that it doesn't need rebooting in the first place...

Do you have a repro for the lockup by any chance?


OK, this happened again overnight (on two boards). It has been discussed on other threads also:

  • wired ethernet
  • using a System.Net.HttpWebRequest call every 15 seconds for an HTTP GET (sleeping in between)
  • system is up and working fine in PM
  • system is wired to wifi router (wifi router -> cable modem)
  • cable modem is disconnected overnight
  • in the AM, cable modem is reconnected
  • SOL, netduinos are both hung... require physical reboot


I have seen this many times, and today on two boards, side by side.

There are other reports here on threads of issues like this, of hangs when enet is physically disconnected/reconnected

#8 samjones

samjones

    Advanced Member

  • Members
  • PipPipPip
  • 105 posts

Posted 13 February 2012 - 03:04 PM

How busy is your network traffic? ie How many devices are on the network. If you run wireshark, do the records fly past?


Network is not so busy. Just a home network.


Does your board hang completely, or just the network side of things?


I can't tell, because no debugger is attached. (Or if there is a way to tell, I don't know it! Pls suggest!)

#9 ColinR

ColinR

    Advanced Member

  • Members
  • PipPipPip
  • 142 posts
  • LocationCape Town, South Africa

Posted 14 February 2012 - 05:26 AM

I can't tell, because no debugger is attached. (Or if there is a way to tell, I don't know it! Pls suggest!)


Make your LED flash, so if the network hangs, but the LED continues flashing, your code is still executing.


If your main runs in a loop, just add:
//Outside loop:
private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true);

//Inside loop:
LED.Write(!LED.Read());
Thread.Sleep(1000);

or if you just have a Thread.Sleep(Timeout.Infinite); then add:
private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true);
Timer flashLED = new Timer(new TimerCallback((object data) =>
{
    LED.Write(!LED.Read());
}), null, 1000, 1000);


Note: code typed without compiler, so excuse any mistakes :mellow:
Edit: added sleep to main loop code, otherwise there would be no flash!

#10 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 14 February 2012 - 05:47 AM

Make your LED flash, so if the network hangs, but the LED continues flashing, your code is still executing.

That's a clever diagnostic solution :)

#11 samjones

samjones

    Advanced Member

  • Members
  • PipPipPip
  • 105 posts

Posted 14 February 2012 - 02:45 PM

Thanks! Will add this tonight.

#12 samjones

samjones

    Advanced Member

  • Members
  • PipPipPip
  • 105 posts

Posted 21 February 2012 - 04:23 AM

Chris, How can we collab to get a real watchdog in 4.2 ? Were you able to repro the hang when there is no inet connectivity ? (it does seem to be the disconnected inet that is the issue, with connected inet my netduino code runs 40-80 hours or more no problem)

#13 plakias

plakias

    New Member

  • Members
  • Pip
  • 5 posts

Posted 19 August 2012 - 09:10 PM

Make your LED flash, so if the network hangs, but the LED continues flashing, your code is still executing.


If your main runs in a loop, just add:

//Outside loop:
private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true);

//Inside loop:
LED.Write(!LED.Read());
Thread.Sleep(1000);

or if you just have a Thread.Sleep(Timeout.Infinite); then add:
private static OutputPort LED = new OutputPort(Pins.ONBOARD_LED, true);
Timer flashLED = new Timer(new TimerCallback((object data) =>
{
    LED.Write(!LED.Read());
}), null, 1000, 1000);


Note: code typed without compiler, so excuse any mistakes :mellow:
Edit: added sleep to main loop code, otherwise there would be no flash!


And if you are able to read the led with some hardware you know the program is still running and if the led is not blinking the hardware can reset the board (this is a strange watchdog but in this case it can help)

#14 Patrick

Patrick

    Advanced Member

  • Members
  • PipPipPip
  • 54 posts
  • LocationTampa

Posted 10 September 2012 - 02:01 PM

I wanted to chime in here as I'm seeing the same thing with my board (N+, Rev A, 4.2) and I have something that is quite reproducible.

For the test project, we're using the web client example from the netmftoolbox project but updated for 4.2. We then changed the GET to a POST to test inserted data in the a database via php. This works well.

We then wrapped the code in a while to test longevity as this needs to be running for at least a few weeks at a time.
public static void Main()
        {
            OutputPort led = new OutputPort(Pins.ONBOARD_LED, false);

            // Creates a new web session
            HTTP_Client WebSession = new HTTP_Client(new IntegratedSocket("www.SERVERNAME.com", 80));

            int numTests = 720;

            while (numTests > 0)
            {                

                // Requests the latest source
                HTTP_Client.HTTP_Response Response = WebSession.Post("/InsertTest/InsertTest.php", "ID='',data=TestData");

                // Did we get the expected response? (a "200 OK")
                if (Response.ResponseCode != 200)
                    throw new ApplicationException("Unexpected HTTP response code: " + Response.ResponseCode.ToString());

                led.Write(true);
                Thread.Sleep(2500);
                led.Write(false);
                Thread.Sleep(2500);

                numTests--;
            }
        }

Regardless of how long I sleep for (tested between 250 ms and 10000 ms) the application on the N+ becomes unresponsive (led stops blinking) but no exception is thrown and the debugger never disconnects. I also tried this without being attached to the debugger and got that same result.

I know I should aggregate the data and upload in batches instead but this should still work and leads me to believe there is some sort of memory leak in the firmware.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.