We have reproduce a lockup (when the internet fails) with a video - Netduino 2 (and Netduino 1) - Netduino Forums
   
Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.
Photo

We have reproduce a lockup (when the internet fails) with a video


  • Please log in to reply
25 replies to this topic

#1 JoopC

JoopC

    Advanced Member

  • Members
  • PipPipPip
  • 148 posts

Posted 19 February 2013 - 09:39 PM

Hello to all,

There is a nasty bug in the Netduino core software who is not solved even after many calls from users.

 

Sometimes when the Internet connection fails (for any reason, it could be the Internet provider or something else) and on the same time the Netduino wants to send data over the Internet, the netduino will lockup, the only solution is then to reset the Netduino and pray that the connection is up again.

 

We have reproduce the lockup and made a video to make it more visible and easier to understand. We hope that the Microsoft core software programmers come quickly with a patch/solution.

 

 

 

Imports System.NetImports System.Net.SocketsImports System.TextImports Microsoft.SPOTImports Socket = System.Net.Sockets.SocketImports Microsoft.SPOT.HardwareImports SecretLabs.NETMF.HardwareImports SecretLabs.NETMF.Hardware.NetduinoPlusImports Microsoft.SPOT.NetImports Microsoft.VisualBasicModule Module1   	 Const cnstApiKeyCosm As String = "u3V0nHMw3Huxx8HODSdGjKZ3GwSAKxyK1ZDRnl3OHZ0ND0g"    Const cnstFeedIdCosm As String = "105586"    Const cnstHOSTCosm As String = "api.pachube.com"    Const cnstField1Cosm As String = "netduinoTEST,"   	 Sub Main()	  		 Dim sbfeeds As New StringBuilder	    sbfeeds.Clear()	    Dim OnboardLed As New OutputPort(Pins.ONBOARD_LED, False)	    Dim Counter As Integer = 0	    Do While True		    Counter += 1		    sbfeeds.Clear()		    sbfeeds.Append(cnstField1Cosm & Counter.ToString & Constants.vbCrLf)		    Try			    Debug.Print("begin " & DateTime.Now.ToString & "   try: " & Counter.ToString)			    OnboardLed.Write(True)			    Dim IPEndPoint As New IPEndPoint(Dns.GetHostEntry("api.pachube.com").AddressList(0), 80)			    Using Host As Socket = New Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)				    Host.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, True)				    Host.SendTimeout = 3000				    Host.ReceiveTimeout = 1000				    Debug.Print("Try to send....to Cosm dataProvider")				    Host.Connect(IPEndPoint)				    Host.SendTo(Encoding.UTF8.GetBytes("PUT /v2/feeds/" & cnstFeedIdCosm & ".csv HTTP/1.1" & Constants.vbCrLf), IPEndPoint)				    Host.SendTo((Encoding.UTF8.GetBytes(("Host: api.pachube.com" & Constants.vbCrLf & "X-PachubeApiKey: " & cnstApiKeyCosm & _														 Constants.vbCrLf & "Content-Type: text/csv" & Constants.vbCrLf & _														 "Content-Length: " & sbfeeds.Length & Constants.vbCrLf & Constants.vbCrLf))), IPEndPoint)				    Host.SendTo(Encoding.UTF8.GetBytes(sbfeeds.ToString), IPEndPoint)				    Host.Poll(500000, SelectMode.SelectRead)				    Debug.Print("Oké, stil connection")			    End Using		    Catch ex As Exception			    Debug.Print("error " & ex.Message)		    End Try		    OnboardLed.Write(False)		    Debug.Print("sleep")		    Thread.Sleep(15000)	    Loop    End SubEnd Module
 

#2 NooM

NooM

    Advanced Member

  • Members
  • PipPipPip
  • 490 posts
  • LocationAustria

Posted 21 February 2013 - 09:13 AM

first: the socket code/drivers is from an external nsource, lwip i think, so its not microsoft based :(

 

than: i had the same problem with my arduino, also cosm.

i fixed it that way:

i just restart the device every 10 minutes -.-  (with software, later iam gonna add an attiny85 as external "watchdog" todo that)

since than i had no more problems.



#3 JoopC

JoopC

    Advanced Member

  • Members
  • PipPipPip
  • 148 posts

Posted 21 February 2013 - 09:27 AM

Are you kidding with this solution, resetting every 10 minutes? The next 10 years there will be replaced 7 million Electric boards for smart  meters in the Netherlands, 10% wants to logout this device. I estimate 25% with a Netduino.

And sending to Cosm was just an example.



#4 GeBrander

GeBrander

    Member

  • Members
  • PipPip
  • 29 posts

Posted 21 February 2013 - 09:43 AM

first: the socket code/drivers is from an external nsource, lwip i think, so its not microsoft based :(

 

than: i had the same problem with my arduino, also cosm.

i fixed it that way:

i just restart the device every 10 minutes -.-  (with software, later iam gonna add an attiny85 as external "watchdog" todo that)

since than i had no more problems.

1) If I buy a Netduino Plus, I expect it to work correctly, regardless what lies behind the development. Of course we can help finding bugs, but when we do find bugs I expect we can post them here so they can be picked up by the creators of the Netduino so it can be fixed... Or is it so that when your TV doesn't work completely you are going to contact the capacitor manufacurer because of a broken capacitor? Or are you contacting the TV manufacturer? ;)

 

2) That is a workaround, not a fix ;)

 

Further I see the same behaviour with my Netduino Plus... So please can this be fixed?



#5 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 21 February 2013 - 09:48 AM

Hi JoopC, A few quick questions for you, to help understand the scenario: 1. You are revoking the DHCP address of the Netduino Plus 2. So when it tries to send a request, the router is basically rejecting it (as if it was a hacker on the network). When you do this is the Netduino reflecting the loss of address? Also, when you "re-enable" DHCP on the router...is the Netduino getting that address back immediately...or does it need to re-request the address? I'm not sure if there's a good way for the Netduino to know when the router will let it back into the network (or else it would be sending constant ARPs etc.) 2. In the network lockup scenario, can you reboot the Netduino using its pushbutton? 3. When you reboot (or unplug and then repower) the Netduino, does it automatically get an IP address again from the router? And then connect to COSM as usual? In other words...does this issue only happen when the network connection is invalid? 4. Do you have a timeout set on your socket connection? Now, in parallel to researching the scenario more through the above questions... It's generally a good idea with any electronics project to include a watchdog. With a watchdog, your board reboots if you run into trouble. Basically it counts down a timer and then resets automatically after a certain amount of time; by "kicking the watchdog" that timer start over so that the board doesn't reset. As long as you kick it every few seconds, your application will continue to run. But if your application locks up, the board will automatically reboot after the timeout (set at somewhere between 1ms - 30s) and your application will restart. If your application restarts consistently, we should be able to provide you with a watchdog feature which lets your board restart whenever it runs into an issue with disconnection. You'll also want to check for a connection in your code so that it doesn't just sit there rebooting over and over again. Some routers take offense to devices which request DHCP addresses thousands of times in a loop :) Would a watchdog help in your application? With Netduino gen2, we have enough room to add this in. There's an independent watchdog in the microcontroller which should be perfect. You can of course also add an external watchdog today that does the same thing with the /RESET pin; lots of commercial projects took this approach with Netduino gen1. BTW--there are a few glitches in the lwIP network stack. The new .NET MF 4.3 is moving to a much newer version with lots of enhancements and some bug fixes. There's a good chance that this issue will be taken care of with that update...but regardless of that and in the meantime we should narrow down the issue and also investigate some good software/hardware backup plans (like a watchdog). Chris P.S. This may be an odd thing to say, but I kind of enjoyed the music in the video too :) Thank you for taking time to illustrate and detail the issue. The video really helped us understand and diagnose the scenario a bit better.

#6 NooM

NooM

    Advanced Member

  • Members
  • PipPipPip
  • 490 posts
  • LocationAustria

Posted 21 February 2013 - 10:09 AM

joopc: no iam not kidding. i offered a solution (or workaround or whatever you want to call it)

 

gebrander: your 1.) i dont even try to answer, whatever man ...

 

2.) yes a workaround, cos it will take time to fix the real issue (wich has todo with the dhcp i assume)

if your lucky it will be fixed with the .netmf 4.3 version (new lwip version) -

if that isent the case, it will take a long time to get this fix, so what todo?

cry? or try to "fix" or workaround it yourself?

 

you can also do something usefull and support .netmf / lwip with better code

 

 

keep in mind that sl is not a billion dollar company with hundreds of developers, and there is many stuff that needs fixes/attention

also - if you change too much on the core, it gives troubles on the further netmf updates.



#7 Stefan

Stefan

    Moderator

  • Members
  • PipPipPip
  • 1965 posts
  • LocationBreda, the Netherlands

Posted 21 February 2013 - 10:27 AM

Hi NooM,

 

Your provided solution is indeed a workaround. For most home applications that's no big deal, but I feel Joop is looking for a stable solution so i can be used in an actual product, which is good!

 

I think if there's an issue, it should at least be addressed, and sent to the right person. This said; there are a few parties involved in this: Secret Labs who provides hardware and a port of the .NETMF for that hardware. Microsoft for actually releasing the .NETMF, and then there's the open source tcp/ip stack.

The good thing: it's all open source. So if someone has time and knowledge to dive into this issue, he or she can do so freely. I wish I had the knowledge though ;)


"Fact that I'm a moderator doesn't make me an expert in things." Stefan, the eternal newb!
My .NETMF projects: .NETMF Toolbox / Gadgeteer Light / Some PCB designs

#8 NooM

NooM

    Advanced Member

  • Members
  • PipPipPip
  • 490 posts
  • LocationAustria

Posted 21 February 2013 - 11:17 AM

-.- i never said that i dont want it all fixed or something like that. i like working stuff :P

 

but the stupid answers about me kidding and that tv stuff indeed make me angry, i mean, i tried to offer

some help and my experience. iam in no way affiliated with sl or some company.

 

//edit: if id know a better solution (for now) id tell it.



#9 emg

emg

    Advanced Member

  • Members
  • PipPipPip
  • 129 posts

Posted 21 February 2013 - 12:32 PM

Chris, Watchdog please! It won't help N+1 issues but this will help JoopC and other 'black box' installations (like when you have a Netduino installed at your mother-in-laws house and it locks up...) if using N+2.

 

BTW, I thought that with N+2 and it's 'offloaded' Ethernet chip, this was supposed to allow you to recover from networking issues by simply power-cycling the Ethernet chip?

 

JoopC, can you try another test with your router? Instead of disabling DHCP, what happens if you unplug the router from the ADSL line and reconnect. Does it still lock up?



#10 jp73

jp73

    New Member

  • Members
  • Pip
  • 4 posts

Posted 21 February 2013 - 09:10 PM

Hi all. I have been following JoopC 's development of a programm to log data from solarpanels and the Dutch smartmeter sinds he started it and I'm pretty sure that it s all set up to use a fixed IP address so no dhcp. (JoopC's engagement was for me a reason to buy a net duino plus v2) Maybe JoopC can confirm this.

#11 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 21 February 2013 - 10:11 PM

emg -- the ENC28J60 chip on Netduino Plus 2 could be power cycled. We would need to re-architect NETMF a bit to re-initialize the chip after rebooting it. We'll look into this and see what the best approach would be. It sounds like the next new feature you guys would like added to Netduino gen2 is a watchdog? That's something we could put together and release as a firmware update within 3-4 weeks. I can't guarantee any specifics about how the ST chip will behave until it's finished and tested--but the IWDG (independent watchdog peripheral) in the STM32 should be designed to enable scenarios like this. Chris

#12 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 21 February 2013 - 10:13 PM

JoopC--to confirm, are you using static IP? And in your router you're simply disallowing all traffic from that static IP/MAC address? If we can get to the root of the problem, we may be able to pull a few specific lines of code from the latest lwIP stack to remedy the issue. Once we get all the specifics we can see what's happening in WireShark and then in the native code. It can be tricky to debug all of that at the same time--but collectively there's no reason we can't solve this. Thank you for your enthusiasm and your test details. Chris

#13 JoopC

JoopC

    Advanced Member

  • Members
  • PipPipPip
  • 148 posts

Posted 22 February 2013 - 10:16 AM

Hello Chris, great to have your attention.

We can confirm that we use a static IP address or in some cases DHCP with a MAC reservation.

We have provide constants in the Solar program to set the IP inside the program like this as example:

 

Const cnstStaticIP As String = "192.168.1.68"Const cnstSubSetMask As String = "255.255.255.0"Const cnstGateWay As String = "192.168.1.1" Dim NetinfoStaticIP = Microsoft.SPOT.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces()(0)NetinfoStaticIP.EnableStaticIP(cnstStaticIP, cnstSubSetMask, cnstGateWay)

 

The simulation we have made was disconnecting the router from the internet and back on (but leave the local network working).

The behaviour is the same in version 4.2.2.2 and version 4.3

We like to have the same behaviour as when there is no network cable is connected or when we pull the cable from the Netduino/router, an error is thrown than. (That works all correct, after we put the cable back in the Netduino the program continues working like a champ.)

And off course, we can wait a month for that solution.

Thanks in advance Joop.  

 

@NooM:

It was not intended be disrespectful (if so, sorry) but when we have a project like this on a commercial base we get plenty claims to repair.

 

 



#14 cdn

cdn

    New Member

  • Members
  • Pip
  • 3 posts

Posted 25 February 2013 - 09:51 PM

I did something similar as JoopC with the ND+2.

In my case talking to cosm would hang a Cisco cablemodem (in bridge mode) within 6 hours.

Tried 3 different units and 2 modem types.

Weird thing is that a router (FritzBox 7390) in between would no be effected.

The Cisco could only be reactivated with a power cycle.

Sometimes the ND+2 would also die with a static blue led.

I used static IP addresses and I don't think it is the connect/disconnect/dhcp release.

I did some wireshark sniffing and found weird incomplete packets.

My guess is the tcpip stack has some flaws and the Ciscomodem hang might be caused by overflow of the unreleased arp tables.



#15 emg

emg

    Advanced Member

  • Members
  • PipPipPip
  • 129 posts

Posted 25 February 2013 - 11:04 PM

Check and see if the router or Cisco has any DOS blocking filters and enable them. My Draytek 2820 has these and rejects large/malformed packets. 



#16 cdn

cdn

    New Member

  • Members
  • Pip
  • 3 posts

Posted 26 February 2013 - 09:12 PM

Check and see if the router or Cisco has any DOS blocking filters and enable them. My Draytek 2820 has these and rejects large/malformed packets. 

The sniffing tests were done on an isolated network without router/switch etc.

So ND+2 directly connected to PC running echotool in server mode and wireshark.

So malformed packets could no be caused by any filter (firewall on PC turned off).



#17 emg

emg

    Advanced Member

  • Members
  • PipPipPip
  • 129 posts

Posted 26 February 2013 - 10:41 PM

cdn,

Sorry, you mis-understood me. Enable the DOS (Denial of Service) packet filters on your router (if they have them) to >prevent< malformed packets from locking up your router, not suggesting they were the cause.

 

Also, I would be wary about testing via direct cross-over cable between 2 NIC's as I've seen this cause issues, depending on the NIC. it's better to connect both devices to a switch and use port mirroring to sniff packets. 



#18 ziggurat29

ziggurat29

    Advanced Member

  • Members
  • PipPipPip
  • 244 posts

Posted 24 March 2013 - 03:18 PM

(sorry to ressurect this thread, but it seemed betterto continue this than to start a new one on the same topic).

I just wanted to add this experience if it is useful in some way:

 

scenarios:

*  NP2 on 4.2.2.1, official firmware

*  my NP2 makes an outbound connection to a server and maintains it for some time

*  if the server process is taken down (e.g. for maintenance), the netduinos hang

*  if the netduino is cycled, and the server process is not yet back up, the neduinos hang

*  if the server process is restored, the netduinos do not magically unhang

*  if server process is restored, and netduino power cycled, nohang (expected)

*  I seem to be able to unplug and plug the ethernet cable and have correct behaviour, but I probably need to test that more thoroughly

 

some other things:

*  when I kill the server process, the netduino detects this properly.  I have a recovery loop that will backoff and attemp to restore the connection.  (his same code works properly when run from the desktop, so I consider it to be implemented correctly)

*  when it attempts to restore with Connect() to the (now not running) server process, this is when the 'hang' occurs

*  my netduino application has several 'services' in it, amongst which is a diagnostic port over a COM port.  When the netduino is in the failure mode, I can still interact over the diagnostic port.  I can start and stop other services, but not the one that involves the ethernet.  So the hang seems localized to the thread, rather than the whole board.

*  I have a 'reboot' command, which does

  PowerState.RebootDevice(true, 30000);

  but this does not work.  It's a 'soft' reboot, I haven't tried 'hard' yet.  But it does not timeout and it does not reboot the board.

 

I came across this post while searching for 'watchdog' motivated by trying to cope with this issue, so hopefully we will get a watchdog in the firmware.  But I can't wait for whenever so I'l probably have to implement an external one on my board.  I think I still have an gpio pin free...

 

-dave



#19 Lunddahl

Lunddahl

    Advanced Member

  • Members
  • PipPipPip
  • 152 posts
  • LocationEurope, Denmark

Posted 27 March 2013 - 06:12 PM

I have used endless hours getting my Netduino 2 plus to talk to cosm for more than 24-48 hours, it simply just hangs, and it happens in random places of the code.

 

Now i have tried to post the data to a local web server, and the Netduino has been stable for +7 days.

 

This, compared with the fact that only 10-25% of the packets i send to cosm actually gets there, or should i say gets logged in the cosm database, tells me that there is something wrong with cosm, or a combination of me being in europe and cosm being in the states.

 

The local web server logged the requests into a text file, not a single packet missed.

 

However a C# PC console application also posting to cosm, works exactly as expected, no datapoints missing, and the program ran for days before i terminated it.

 

I have no clue as to why, and no change of debugging with external debuggers, but to replicate one only needs a netduino 2 plus + a single DS18B20 sensor and my code, i will be happy to supply my code to SL.

 

I have 3 NP2s, all works the same.

 

- Ulrik Lunddahl



#20 JoopC

JoopC

    Advanced Member

  • Members
  • PipPipPip
  • 148 posts

Posted 28 March 2013 - 09:32 AM

Lunddahl, this software bug, with a hanging Netduino, is a nightmare for many users. We hope that Secret Labs comes quickly with a solution.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.