The Netduino forums have been replaced by new forums at community.wildernesslabs.co.
This site has been preserved for archival purposes only
and the ability to make new accounts or posts has been turned off.
We have reproduce a lockup (when the internet fails) with a video
There is a nasty bug in the Netduino core software who is not solved even after many calls from users.
Sometimes when the Internet connection fails (for any reason, it could be the Internet provider or something else) and on the same time the Netduino wants to send data over the Internet, the netduino will lockup, the only solution is then to reset the Netduino and pray that the connection is up again.
We have reproduce the lockup and made a video to make it more visible and easier to understand. We hope that the Microsoft core software programmers come quickly with a patch/solution.
Imports System.NetImports System.Net.SocketsImports System.TextImports Microsoft.SPOTImports Socket = System.Net.Sockets.SocketImports Microsoft.SPOT.HardwareImports SecretLabs.NETMF.HardwareImports SecretLabs.NETMF.Hardware.NetduinoPlusImports Microsoft.SPOT.NetImports Microsoft.VisualBasicModule Module1 Const cnstApiKeyCosm As String = "u3V0nHMw3Huxx8HODSdGjKZ3GwSAKxyK1ZDRnl3OHZ0ND0g" Const cnstFeedIdCosm As String = "105586" Const cnstHOSTCosm As String = "api.pachube.com" Const cnstField1Cosm As String = "netduinoTEST," Sub Main() Dim sbfeeds As New StringBuilder sbfeeds.Clear() Dim OnboardLed As New OutputPort(Pins.ONBOARD_LED, False) Dim Counter As Integer = 0 Do While True Counter += 1 sbfeeds.Clear() sbfeeds.Append(cnstField1Cosm & Counter.ToString & Constants.vbCrLf) Try Debug.Print("begin " & DateTime.Now.ToString & " try: " & Counter.ToString) OnboardLed.Write(True) Dim IPEndPoint As New IPEndPoint(Dns.GetHostEntry("api.pachube.com").AddressList(0), 80) Using Host As Socket = New Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp) Host.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, True) Host.SendTimeout = 3000 Host.ReceiveTimeout = 1000 Debug.Print("Try to send....to Cosm dataProvider") Host.Connect(IPEndPoint) Host.SendTo(Encoding.UTF8.GetBytes("PUT /v2/feeds/" & cnstFeedIdCosm & ".csv HTTP/1.1" & Constants.vbCrLf), IPEndPoint) Host.SendTo((Encoding.UTF8.GetBytes(("Host: api.pachube.com" & Constants.vbCrLf & "X-PachubeApiKey: " & cnstApiKeyCosm & _ Constants.vbCrLf & "Content-Type: text/csv" & Constants.vbCrLf & _ "Content-Length: " & sbfeeds.Length & Constants.vbCrLf & Constants.vbCrLf))), IPEndPoint) Host.SendTo(Encoding.UTF8.GetBytes(sbfeeds.ToString), IPEndPoint) Host.Poll(500000, SelectMode.SelectRead) Debug.Print("Oké, stil connection") End Using Catch ex As Exception Debug.Print("error " & ex.Message) End Try OnboardLed.Write(False) Debug.Print("sleep") Thread.Sleep(15000) Loop End SubEnd Module
Are you kidding with this solution, resetting every 10 minutes?
The next 10 years there will be replaced 7 million Electric boards for smart meters in the Netherlands, 10% wants to logout this device. I estimate 25% with a Netduino.
first: the socket code/drivers is from an external nsource, lwip i think, so its not microsoft based
than: i had the same problem with my arduino, also cosm.
i fixed it that way:
i just restart the device every 10 minutes -.- (with software, later iam gonna add an attiny85 as external "watchdog" todo that)
since than i had no more problems.
1) If I buy a Netduino Plus, I expect it to work correctly, regardless what lies behind the development. Of course we can help finding bugs, but when we do find bugs I expect we can post them here so they can be picked up by the creators of the Netduino so it can be fixed... Or is it so that when your TV doesn't work completely you are going to contact the capacitor manufacurer because of a broken capacitor? Or are you contacting the TV manufacturer?
2) That is a workaround, not a fix
Further I see the same behaviour with my Netduino Plus... So please can this be fixed?
Hi JoopC,
A few quick questions for you, to help understand the scenario:
1. You are revoking the DHCP address of the Netduino Plus 2. So when it tries to send a request, the router is basically rejecting it (as if it was a hacker on the network). When you do this is the Netduino reflecting the loss of address? Also, when you "re-enable" DHCP on the router...is the Netduino getting that address back immediately...or does it need to re-request the address? I'm not sure if there's a good way for the Netduino to know when the router will let it back into the network (or else it would be sending constant ARPs etc.)
2. In the network lockup scenario, can you reboot the Netduino using its pushbutton?
3. When you reboot (or unplug and then repower) the Netduino, does it automatically get an IP address again from the router? And then connect to COSM as usual? In other words...does this issue only happen when the network connection is invalid?
4. Do you have a timeout set on your socket connection?
Now, in parallel to researching the scenario more through the above questions... It's generally a good idea with any electronics project to include a watchdog. With a watchdog, your board reboots if you run into trouble. Basically it counts down a timer and then resets automatically after a certain amount of time; by "kicking the watchdog" that timer start over so that the board doesn't reset. As long as you kick it every few seconds, your application will continue to run. But if your application locks up, the board will automatically reboot after the timeout (set at somewhere between 1ms - 30s) and your application will restart.
If your application restarts consistently, we should be able to provide you with a watchdog feature which lets your board restart whenever it runs into an issue with disconnection. You'll also want to check for a connection in your code so that it doesn't just sit there rebooting over and over again. Some routers take offense to devices which request DHCP addresses thousands of times in a loop
Would a watchdog help in your application? With Netduino gen2, we have enough room to add this in. There's an independent watchdog in the microcontroller which should be perfect. You can of course also add an external watchdog today that does the same thing with the /RESET pin; lots of commercial projects took this approach with Netduino gen1.
BTW--there are a few glitches in the lwIP network stack. The new .NET MF 4.3 is moving to a much newer version with lots of enhancements and some bug fixes. There's a good chance that this issue will be taken care of with that update...but regardless of that and in the meantime we should narrow down the issue and also investigate some good software/hardware backup plans (like a watchdog).
Chris
P.S. This may be an odd thing to say, but I kind of enjoyed the music in the video too Thank you for taking time to illustrate and detail the issue. The video really helped us understand and diagnose the scenario a bit better.
Your provided solution is indeed a workaround. For most home applications that's no big deal, but I feel Joop is looking for a stable solution so i can be used in an actual product, which is good!
I think if there's an issue, it should at least be addressed, and sent to the right person. This said; there are a few parties involved in this: Secret Labs who provides hardware and a port of the .NETMF for that hardware. Microsoft for actually releasing the .NETMF, and then there's the open source tcp/ip stack.
The good thing: it's all open source. So if someone has time and knowledge to dive into this issue, he or she can do so freely. I wish I had the knowledge though
Chris, Watchdog please! It won't help N+1 issues but this will help JoopC and other 'black box' installations (like when you have a Netduino installed at your mother-in-laws house and it locks up...) if using N+2.
BTW, I thought that with N+2 and it's 'offloaded' Ethernet chip, this was supposed to allow you to recover from networking issues by simply power-cycling the Ethernet chip?
JoopC, can you try another test with your router? Instead of disabling DHCP, what happens if you unplug the router from the ADSL line and reconnect. Does it still lock up?
Hi all.
I have been following JoopC 's development of a programm to log data from solarpanels and the Dutch smartmeter sinds he started it and I'm pretty sure that it s all set up to use a fixed IP address so no dhcp. (JoopC's engagement was for me a reason to buy a net duino plus v2)
Maybe JoopC can confirm this.
emg -- the ENC28J60 chip on Netduino Plus 2 could be power cycled. We would need to re-architect NETMF a bit to re-initialize the chip after rebooting it. We'll look into this and see what the best approach would be.
It sounds like the next new feature you guys would like added to Netduino gen2 is a watchdog? That's something we could put together and release as a firmware update within 3-4 weeks. I can't guarantee any specifics about how the ST chip will behave until it's finished and tested--but the IWDG (independent watchdog peripheral) in the STM32 should be designed to enable scenarios like this.
Chris
JoopC--to confirm, are you using static IP? And in your router you're simply disallowing all traffic from that static IP/MAC address?
If we can get to the root of the problem, we may be able to pull a few specific lines of code from the latest lwIP stack to remedy the issue. Once we get all the specifics we can see what's happening in WireShark and then in the native code. It can be tricky to debug all of that at the same time--but collectively there's no reason we can't solve this.
Thank you for your enthusiasm and your test details.
Chris
We can confirm that we use a static IP address or in some cases DHCP with a MAC reservation.
We have provide constants in the Solar program to set the IP inside the program like this as example:
Const cnstStaticIP As String = "192.168.1.68"Const cnstSubSetMask As String = "255.255.255.0"Const cnstGateWay As String = "192.168.1.1" Dim NetinfoStaticIP = Microsoft.SPOT.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces()(0)NetinfoStaticIP.EnableStaticIP(cnstStaticIP, cnstSubSetMask, cnstGateWay)
The simulation we have made was disconnecting the router from the internet and back on (but leave the local network working).
The behaviour is the same in version 4.2.2.2 and version 4.3
We like to have the same behaviour as when there is no network cable is connected or when we pull the cable from the Netduino/router, an error is thrown than. (That works all correct, after we put the cable back in the Netduino the program continues working like a champ.)
And off course, we can wait a month for that solution.
Thanks in advance Joop.
@NooM:
It was not intended be disrespectful (if so, sorry) but when we have a project like this on a commercial base we get plenty claims to repair.
Sorry, you mis-understood me. Enable the DOS (Denial of Service) packet filters on your router (if they have them) to >prevent< malformed packets from locking up your router, not suggesting they were the cause.
Also, I would be wary about testing via direct cross-over cable between 2 NIC's as I've seen this cause issues, depending on the NIC. it's better to connect both devices to a switch and use port mirroring to sniff packets.
(sorry to ressurect this thread, but it seemed betterto continue this than to start a new one on the same topic).
I just wanted to add this experience if it is useful in some way:
scenarios:
* NP2 on 4.2.2.1, official firmware
* my NP2 makes an outbound connection to a server and maintains it for some time
* if the server process is taken down (e.g. for maintenance), the netduinos hang
* if the netduino is cycled, and the server process is not yet back up, the neduinos hang
* if the server process is restored, the netduinos do not magically unhang
* if server process is restored, and netduino power cycled, nohang (expected)
* I seem to be able to unplug and plug the ethernet cable and have correct behaviour, but I probably need to test that more thoroughly
some other things:
* when I kill the server process, the netduino detects this properly. I have a recovery loop that will backoff and attemp to restore the connection. (his same code works properly when run from the desktop, so I consider it to be implemented correctly)
* when it attempts to restore with Connect() to the (now not running) server process, this is when the 'hang' occurs
* my netduino application has several 'services' in it, amongst which is a diagnostic port over a COM port. When the netduino is in the failure mode, I can still interact over the diagnostic port. I can start and stop other services, but not the one that involves the ethernet. So the hang seems localized to the thread, rather than the whole board.
* I have a 'reboot' command, which does
PowerState.RebootDevice(true, 30000);
but this does not work. It's a 'soft' reboot, I haven't tried 'hard' yet. But it does not timeout and it does not reboot the board.
I came across this post while searching for 'watchdog' motivated by trying to cope with this issue, so hopefully we will get a watchdog in the firmware. But I can't wait for whenever so I'l probably have to implement an external one on my board. I think I still have an gpio pin free...
I have used endless hours getting my Netduino 2 plus to talk to cosm for more than 24-48 hours, it simply just hangs, and it happens in random places of the code.
Now i have tried to post the data to a local web server, and the Netduino has been stable for +7 days.
This, compared with the fact that only 10-25% of the packets i send to cosm actually gets there, or should i say gets logged in the cosm database, tells me that there is something wrong with cosm, or a combination of me being in europe and cosm being in the states.
The local web server logged the requests into a text file, not a single packet missed.
However a C# PC console application also posting to cosm, works exactly as expected, no datapoints missing, and the program ran for days before i terminated it.
I have no clue as to why, and no change of debugging with external debuggers, but to replicate one only needs a netduino 2 plus + a single DS18B20 sensor and my code, i will be happy to supply my code to SL.