Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.
Photo

Netduino Plus stops running after a while


  • Please log in to reply
9 replies to this topic

#1 mohammad

mohammad

    Advanced Member

  • Members
  • PipPipPip
  • 79 posts

Posted 16 April 2013 - 08:10 PM

Hi there,

 

My Netduino Plus has been programmed in a way that it sends heart beat requests (every minute) frequently to a data service. Unfortunately, after 1 (or sometimes 2) day(s) it stops running with no error or no reason. Is there any limitations on the Netduino Plus hardware structure?

I tested the data service with my PC as a client (instead of Netduino Plus) to send heart beat requests, so I was convinced there is no problem on the data service.

 

FYI. It has no power limitation. I mean its power is provided by a USB cable connected to my computer.

 

Any comments, advice and solutions are more than welcome.

 

Thanks for your time and attention,

Mohammad



#2 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 18 April 2013 - 03:00 PM

Hi!

 

The problem could be such that it won't show as an exception but simply lock up. Could be you run out of memory in such away that there's not enough memory to even be able to fire and exception.

 

Here are a few questions for which answers might help you in trouble shooting:

 

1. Are you debugging the Netduino for the whole duration of the test?

2. Does it run for a longer time or about the same when not debugging?

3. Does it run for a longer time when not attached to USB but using an external power source?

4. What do you have to do in order for the Netduino to run again - is a soft reset enough or do you have to cycle power (i.e. USB)?

5. Can you talk to the Netduino from MFDeploy when it has stopped?

 

For this kind of long running "critical" stuff you could use a watch dog. I don't know if you can access the integrated watch dog from managed code but if not, there are watch dog modules/chips that you can connect externally. A watchdog could automatically restart your Netdunino every two hours or whatever as appropriate by power cycling (most reliable) or by issuing a reset.



#3 ahmedfme

ahmedfme

    Member

  • Members
  • PipPip
  • 14 posts
  • LocationEgypt

Posted 18 April 2013 - 03:15 PM

I was facing the same problem in one of my projects which i implement a heartbeat live signal sent to web service..

and it was all about how you declare your timer ...

try declaring it in the class Program like below : 

 

 

 

partial class Program    {                public static System.Threading.Timer clkTimer = null;        public static TimerCallback clkTimerCallback = new TimerCallback(onclkTimer);        public static System.Threading.Timer srvTimer = null;        public static TimerCallback srvTimerCallback = new TimerCallback(onsrvTimer);                public static void Main()        {            clkTimer = new Timer(clkTimerCallback, null, 0, 1000);            srvTimer = new Timer(srvTimerCallback, null, 0, 60000);         } 

by doing this you guarantee that the garbage collector will not dispose your timers later..

try and inform us.

 



#4 mohammad

mohammad

    Advanced Member

  • Members
  • PipPipPip
  • 79 posts

Posted 30 April 2013 - 12:35 AM

Hi!

 

The problem could be such that it won't show as an exception but simply lock up. Could be you run out of memory in such away that there's not enough memory to even be able to fire and exception.

 

Here are a few questions for which answers might help you in trouble shooting:

 

1. Are you debugging the Netduino for the whole duration of the test?

2. Does it run for a longer time or about the same when not debugging?

3. Does it run for a longer time when not attached to USB but using an external power source?

4. What do you have to do in order for the Netduino to run again - is a soft reset enough or do you have to cycle power (i.e. USB)?

5. Can you talk to the Netduino from MFDeploy when it has stopped?

 

For this kind of long running "critical" stuff you could use a watch dog. I don't know if you can access the integrated watch dog from managed code but if not, there are watch dog modules/chips that you can connect externally. A watchdog could automatically restart your Netdunino every two hours or whatever as appropriate by power cycling (most reliable) or by issuing a reset.

 

Hi Hanzibal,

 

Thanks a lot for your valuable reply. The reason that I am replying late is becasue I was testing my Netduino+ based on your comments.

1. It is not dependent of my debugging. I have the same problem when I also run it in a non-debug environment.

2. Almost the same (the working time is not constant in both situation).

3. I didn't try.

4. Soft reset is enough.

5. I haven't tried it yet.

 

Honestly, I have no watch dog module. How does it understand when it should restart? Can I implement it in my code? Any advice would be appreciated.

 

Thanks again,

Mohammad



#5 ziggurat29

ziggurat29

    Advanced Member

  • Members
  • PipPipPip
  • 244 posts

Posted 30 April 2013 - 02:02 AM

fwiw, mohammad,  I have had a similar problem; in my case with ethernet.  an easy way to stimulate it:

 

*  have a persistent connection to my server.  this runs in a separate thread.

*  in the absence of other traffic, periodically I send heartbeat, and if connection is lost or non-responsive, it loops around to restore the connection, with appropriate backoff, etc.  this code is all thoroughly tested when running on desktop.  I can unplug cables, take interceding routers and server process up and down, and things are resilient.

*  if, however, I am running the same code on the netduino, if the server process it taken down

*  the netduino will detect it, try to reconnect (to the not-yet-back-up server process)

*  and hang forever

*  but only the thread that is handling the ethernet server connection is affected.  my other threads are lively.

*  I confirmed that, while in the fail mode with the locked ethernet thread, I can successfully issue a PowerState.RebootDevice() and at least restart the system.

 

The lockup you are experiencing seems to be similar, but I don't know what stimulates the problem in your case, so maybe it is different.  Anyway, here is what I have done to cope with it -- and mind you I haven't finished implementing yet because I'm on other things just now, so fair warning, but initial testing makes me confident that it works:

 

*  the netduino chip has a hardware watchdog, but it is not really available to us, alas.  so, in lieu of that....

*  I made a software watchdog.  (I also added a MAX824 (-compatible; I think it is actually a diodes inc chip) to my PCB, but that was last minute and I haven't written code for it)

*  the software watchdog works like this:

  *  all my various 'app services' (worker thread that do stuff, like handling the ethernet connection to my infrastructure)

  *  call a 'checkin' function with my 'watchdog service'.  that checkin() says 'reboot if you don't hear back from me in XXX ms, and oh please call this method if you reboot because someone else failed'

  *  a checkin discards any previous outstanding checkin

  *  the watchdog service periodically checks for expired checkin()s (Kentucky Fried Checkins?) and if any were found, it will issue the reboot

 

In my case, I did it this way because I already had all these app services coded, so I needed to retrofit to that impl.  They did handily have idle() methods which I could use those to do the checkin(), but the duration needed to vary because some activities would normally block for up to 3 min in some cases (like cranking up GSM).  The particulars of your design my mean you can do it more simply that I was able to do or think of.

 

hth; dave



#6 GrZeCh

GrZeCh

    Member

  • Members
  • PipPip
  • 29 posts
  • LocationPoland

Posted 30 April 2013 - 06:46 AM

Hello,

 

I just want to let you know that there is firmware with software watchdog added:

 

http://forums.netdui...o-plus/?p=47334

 

Maybe it will solve your problem.

 

Regards



#7 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 01 May 2013 - 01:35 AM

  Honestly, I have no watch dog module. How does it understand when it should restart? Can I implement it in my code? Any advice would be appreciated.

Hello again Mohammad, I really think you should read carefully what the other two guys just wrote, but in answer to your question, a watchdog normally listens to some kind of "alive beacon", "heartbeat" or "check-in" (like the others wrote). When the watchdog has not seen/heard the beacon for some pre-defined time (timeout), it issues a reset causing a "reboot". This way, a watchdog could keep your application from locking up. Watchdogs are for the situations where you can't actually figure out why things eventually go wrong and so instead of trying to solve the actual issue, you simply restart the whole thing. This is a "when all else has failed" type of thing to do but sometimes it's the only thing to do. Personally, I would never trust a software watchdog, instead I want someone to "pull the plug" and then reinsert it again afterwards.

#8 mohammad

mohammad

    Advanced Member

  • Members
  • PipPipPip
  • 79 posts

Posted 03 May 2013 - 06:55 AM

fwiw, mohammad,  I have had a similar problem; in my case with ethernet.  an easy way to stimulate it:

 

*  have a persistent connection to my server.  this runs in a separate thread.

*  in the absence of other traffic, periodically I send heartbeat, and if connection is lost or non-responsive, it loops around to restore the connection, with appropriate backoff, etc.  this code is all thoroughly tested when running on desktop.  I can unplug cables, take interceding routers and server process up and down, and things are resilient.

*  if, however, I am running the same code on the netduino, if the server process it taken down

*  the netduino will detect it, try to reconnect (to the not-yet-back-up server process)

*  and hang forever

*  but only the thread that is handling the ethernet server connection is affected.  my other threads are lively.

*  I confirmed that, while in the fail mode with the locked ethernet thread, I can successfully issue a PowerState.RebootDevice() and at least restart the system.

 

The lockup you are experiencing seems to be similar, but I don't know what stimulates the problem in your case, so maybe it is different.  Anyway, here is what I have done to cope with it -- and mind you I haven't finished implementing yet because I'm on other things just now, so fair warning, but initial testing makes me confident that it works:

 

*  the netduino chip has a hardware watchdog, but it is not really available to us, alas.  so, in lieu of that....

*  I made a software watchdog.  (I also added a MAX824 (-compatible; I think it is actually a diodes inc chip) to my PCB, but that was last minute and I haven't written code for it)

*  the software watchdog works like this:

  *  all my various 'app services' (worker thread that do stuff, like handling the ethernet connection to my infrastructure)

  *  call a 'checkin' function with my 'watchdog service'.  that checkin() says 'reboot if you don't hear back from me in XXX ms, and oh please call this method if you reboot because someone else failed'

  *  a checkin discards any previous outstanding checkin

  *  the watchdog service periodically checks for expired checkin()s (Kentucky Fried Checkins?) and if any were found, it will issue the reboot

 

In my case, I did it this way because I already had all these app services coded, so I needed to retrofit to that impl.  They did handily have idle() methods which I could use those to do the checkin(), but the duration needed to vary because some activities would normally block for up to 3 min in some cases (like cranking up GSM).  The particulars of your design my mean you can do it more simply that I was able to do or think of.

 

hth; dave

 

 

Hello,

 

I just want to let you know that there is firmware with software watchdog added:

 

http://forums.netdui...o-plus/?p=47334

 

Maybe it will solve your problem.

 

Regards

 

 

Hello again Mohammad,
I really think you should read carefully what the other two guys just wrote, but in answer to your question, a watchdog normally listens to some kind of "alive beacon", "heartbeat" or "check-in" (like the others wrote). When the watchdog has not seen/heard the beacon for some pre-defined time (timeout), it issues a reset causing a "reboot". This way, a watchdog could keep your application from locking up.

Watchdogs are for the situations where you can't actually figure out why things eventually go wrong and so instead of trying to solve the actual issue, you simply restart the whole thing. This is a "when all else has failed" type of thing to do but sometimes it's the only thing to do. Personally, I would never trust a software watchdog, instead I want someone to "pull the plug" and then reinsert it again afterwards.

 

Hi All,

 

Thanks for your valubale comments. I am exactly doing as the same as Ziggurat29. Based on my understanding, I should reboot my Netduino after some time (e.g. 10 hours). At this time, the only thing I did was a software reboot by means of Powerstate.RebooteDevice(false). Is it right?

 

Thanks,

Mohammad



#9 ziggurat29

ziggurat29

    Advanced Member

  • Members
  • PipPipPip
  • 244 posts

Posted 03 May 2013 - 12:42 PM

'yes' on the API (actually, I use the one with the timeout param)

 

'not exactly' on the understanding.  What you described is 'unconditional reboot after 10 hours', which is not exactly the same as a watchdog.  It might still help you though.  The watchdog is more like 'reboot the system if your timer runs out', with the timeout being on the order of seconds.  Like 5.  The theory is that your product runs in a loop, doing its work, but also resetting watchdog timer.  In a perfect world, the timer is continually reset, and you never reboot.  But if your work loop hangs for whatever reason, a reset will be missed, and the watchdog will timeout, and will issue the reboot command.

 

my long-winded explanation in the previous post was to relate my experience to you some ideas that you might or might not care about in your case.

 

GrZeCh's post on the modified firmware that exposes a NETMF is interesting.

 

And as Hanzi pointed out, the One True Watchdog is a hardware watchdog, because what's to say that the software watchdog is not the thing that locks up?!  But we have to work with what we've got, and at this time the on-chip hardware watchdog isn't that, alas...



#10 mohammad

mohammad

    Advanced Member

  • Members
  • PipPipPip
  • 79 posts

Posted 22 May 2013 - 12:13 AM

Hi all,

 

Just to keep you updated, my problem was fixed by using a watchdog solution: http://forums.netdui...lus/#entry49735

Now, my device is working well and continuously for 10 days.

 

Thanks all specially Ziggurat29 for your valuable comments.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.