Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.
Photo

Netduino Plus Reliability


  • Please log in to reply
25 replies to this topic

#1 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 11 May 2012 - 04:16 PM

I just had a Netduino Plus based product returned to me for failure to run. The product had been operational at the customer's site for several months before this failure occurred. In trouble shooting the problem, I was able to determine the problem was with the Netduino Plus board. Strangely the board was running and performing must of the functions programed into it, but was not able to make connections over the Ethernet port. I was able to get the N+ to be fully functional again by erasing everything, then installing the TinyBooterDecompressor, ER_CONFIG, ER_FLASH, and my managed code. Just re-installing managed code alone was not enough to fix the board, so I am assuming that something happened to the N+ that affected only the network connectivity. Has anyone else experienced this type of problem? Is there anything I can do to avoid this happening again in the future? Should I suspect that there is something special about the board that failed making it likely to fail again? Version 4.2.0 RC4

#2 Mario Vernari

Mario Vernari

    Advanced Member

  • Members
  • PipPipPip
  • 1768 posts
  • LocationVenezia, Italia

Posted 11 May 2012 - 04:27 PM

Could you explain better what was the trouble?
Biggest fault of Netduino? It runs by electricity.

#3 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 11 May 2012 - 04:39 PM

Could you explain better what was the trouble?


Mario, thank you for the reply.

I am not sure what more I can say, the N+ board is programmed to act as a client to a web server. It is supposed to do a "POST" transaction every few minutes. After several months of operation, this one machine simply stopped making the connections to the web server, although all other functionality apparently remained. There was nothing special happening at the time. The last successful message was sent at 1am, and the customer reports that the machine was not disturbed before or after for several hours. The same with the server, it was not disturbed before or after the failure. Other customer's N+ products continue to operate.

I had the customer return the failing unit to me, I was concerned that his network was the cause of the trouble, but it failed the same way at my location. Also functionality was restored by swapping in a fresh N+ board. Since erasing everything on the bad board and reinstalling everything restored it to working order, I think the failure was not due to damaged hardware.

#4 Stefan

Stefan

    Moderator

  • Members
  • PipPipPip
  • 1965 posts
  • LocationBreda, the Netherlands

Posted 11 May 2012 - 04:44 PM

I had the customer return the failing unit to me, I was concerned that his network was the cause of the trouble, but it failed the same way at my location. Also functionality was restored by swapping in a fresh N+ board. Since erasing everything on the bad board and reinstalling everything restored it to working order, I think the failure was not due to damaged hardware.

The only thing I could think of is that the MAC address got reset to 0. If that were the case, it would be unable to network. Until you restored it. Does this sound plausible?
"Fact that I'm a moderator doesn't make me an expert in things." Stefan, the eternal newb!
My .NETMF projects: .NETMF Toolbox / Gadgeteer Light / Some PCB designs

#5 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 11 May 2012 - 05:06 PM

Hi Robert, Very interesting. So the hardware appears to be fine, but you had to erase and reflash it to get it back up and working? If that's the case, then the most likely scenario is that the network settings somehow got changed (MAC address as Stefan mentioned, or IP address). Erasing and reflashing the board and redploying the app shouldn't change anything other than your network config. Also, I just want to confirm...after erasing and reflashing, the hardware seems to be working fine? Chris

#6 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 11 May 2012 - 05:31 PM

Hi Robert,

Very interesting. So the hardware appears to be fine, but you had to erase and reflash it to get it back up and working?

If that's the case, then the most likely scenario is that the network settings somehow got changed (MAC address as Stefan mentioned, or IP address). Erasing and reflashing the board and redploying the app shouldn't change anything other than your network config.

Also, I just want to confirm...after erasing and reflashing, the hardware seems to be working fine?

Chris



Yes, after erasing and reflashing, the hardware seems to be working fine. After I did the reflashing, I sort of "kicked" myself for not checking the MAC address before doing so. The IP in this case is obtained dynamically, so I doubt that was the trouble. But the MAC could have been.

#7 JerseyTechGuy

JerseyTechGuy

    Advanced Member

  • Members
  • PipPipPip
  • 870 posts

Posted 11 May 2012 - 05:34 PM

I had this happen to a board also. For some unknown reason the MAC address went all zeros. At that time I was running 4.1 on it. Haven't had it happen but that one time, but one time is strange enough.

#8 ColinR

ColinR

    Advanced Member

  • Members
  • PipPipPip
  • 142 posts
  • LocationCape Town, South Africa

Posted 11 May 2012 - 06:08 PM

Being able to set the MAC address in code would be a nifty fix if it is the case.

#9 Geancarlo2

Geancarlo2

    Advanced Member

  • Members
  • PipPipPip
  • 70 posts

Posted 11 May 2012 - 06:22 PM

NetworkInterface.PhysicalAddress :)

#10 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 12 May 2012 - 01:16 AM

NetworkInterface.PhysicalAddress :)


I cannot say for sure that the MAC address was wiped out in my case, but it seems a likely cause. To make sure I can recover, should that happen (again), it would seem this code would let my machine recover:

byte[] MACaddress = {0x5C, 0x86, 0x4A, 0x00, 0x28, 0x29 };

IF  = Microsoft.SPOT.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces();
IF[0].PhysicalAddress = MACaddress;    // make sure MAC address is set
IF[0].EnableStaticIP("192.168.5.100", "255.255.255.0", "192.168.5.1");
IF[0].EnableDhcp();       // make sure DHCP is enabled



When I tried setting the MAC without setting the (static/initial) IP address, the IP address was wiped out, and DHCP failed.

This code also has the advantage that there is no need to set the network configuration in MFdeploy after flashing in ER_CONFIG, ER_FLASH, which I find easier to set in the source code. YMMV.

#11 Geancarlo2

Geancarlo2

    Advanced Member

  • Members
  • PipPipPip
  • 70 posts

Posted 12 May 2012 - 03:35 AM

Yes, in case your mac has been somehow wiped and you have to set it once again, don't forget you also gotta reboot the netduino so as to make it reconnect

#12 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 12 May 2012 - 10:59 AM

Hi Robert, If you're going to set the MAC address and/or IP address from code, I would recommend first _checking_ the current MAC/IP settings and only rewriting them if they've changed. Due to the implementation of config in NETMF (single-sector, possible chaining of config settings), you could theoretically run out of space and/or rewrite cycles... Chris

#13 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 12 May 2012 - 01:05 PM

Hi Robert,

If you're going to set the MAC address and/or IP address from code, I would recommend first _checking_ the current MAC/IP settings and only rewriting them if they've changed.

Due to the implementation of config in NETMF (single-sector, possible chaining of config settings), you could theoretically run out of space and/or rewrite cycles...

Chris



Hmmm. would that include the flag to enable DHCP? I notice that there is a check box for enabling DHCP on the network configuration page in MFdeploy. Whenever I reboot either due to a power cycle or my watchdog, I issue this call:

IF[0].EnableDhcp();

Would that call cause a rewriting of the config area, possibly running out of space or write/rewrite cycles? If so, perhaps that is the cause of the loss of the MAC address that I experienced? I was under the (apparently mistaken) impression that changing the network configuration values, was changing a working copy rather than the values stored in FLASH. I would guess that this call to EnableDhcp() would have occurred between 500-1000 times before the failure occurred.


Here is the adjusted code that checks to see if updating the configuration is necessary before doing so:

        if(( IF[0].PhysicalAddress != Global.MACaddress ) 
            || (IF[0].IPAddress != "192.168.5.100" ) 
            || (IF[0].GatewayAddress != "192.168.5.1" ) 
            || (IF[0].SubnetMask != "255.255.255.0" ) 
            || (IF[0].IsDhcpEnabled != true ) ) {
            IF[0].PhysicalAddress = Global.MACaddress;    // make sure MAC address is set
            IF[0].EnableStaticIP("192.168.5.100", "255.255.255.0", "192.168.5.1");
            IF[0].EnableDhcp();       // make sure DHCP is enabled
            }


#14 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 12 May 2012 - 03:22 PM

Here is the adjusted code that checks to see if updating the configuration is necessary before doing so:

        if(( IF[0].PhysicalAddress != Global.MACaddress ) 
            || (IF[0].IPAddress != "192.168.5.100" ) 
            || (IF[0].GatewayAddress != "192.168.5.1" ) 
            || (IF[0].SubnetMask != "255.255.255.0" ) 
            || (IF[0].IsDhcpEnabled != true ) ) {
            IF[0].PhysicalAddress = Global.MACaddress;    // make sure MAC address is set
            IF[0].EnableStaticIP("192.168.5.100", "255.255.255.0", "192.168.5.1");
            IF[0].EnableDhcp();       // make sure DHCP is enabled
            }


Hmm, I find the above code does not work as desired. It seems that the values returned by the DHCP server are recorded in the FLASH, and survive a power cycle or reboot. So anytime the DHCP server provides an address change, the network configuration area will be rewritten (apparently).

I wonder if that is a good idea if what Chris says is correct about limitations on writing/rewriting to the network configuration values.

If I want to have the application code determine if a re-write is needed based on the IPAddress, GatewayAddress, or SubnetMask, the best I will be able to do is some kind of sanity check on these values, like this:

        if(( MAC(IF[0].PhysicalAddress) != MAC(Global.MACaddress) ) 
            || (IF[0].IPAddress == "0.0.0.0" ) 
            || (IF[0].GatewayAddress == "0.0.0.0" ) 
            || (IF[0].IsDhcpEnabled != true ) ) {
            IF[0].PhysicalAddress = Global.MACaddress;    // make sure MAC address is set
            IF[0].EnableStaticIP("192.168.5.100", "255.255.255.0", "192.168.5.1");
            IF[0].EnableDhcp();       // make sure DHCP is enabled
            }


    public string MAC( byte[] mac ) {
        int i;
        string str = "";
        for( i=0; i<6; i++ ) {
            str += mac[i].ToString("X2");
            if( i != 5 )  str += ":";
            }
        return str;
        }




#15 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 14 May 2012 - 12:44 PM

I have learned a bit more about this issue. I have another unit, one that was never shipped, but was kept powered up as a long term test. I installed the updated managed code in it this morning and discovered it could not update the PhysicalAddress value. The MAC was stuck at a value of: 00-00-00-00-00-01 and could not be changed by managed code.

After completely erasing the chip and reinstalling everything, the MAC was able to be changed once again. For this test, I did not update the Network Configuration using MFDeploy.

My conclusions at this point are:


1) changing network values: PhysicalAddress, EnableStaticIP(), or EnableDhcp() do cause the values to be written into the Network Configuration portion of FLASH, even if the values being written match the values already there.

2) repeated updating of the Network Configuration portion of FLASH, will cause the device to be unable to accept further updates.

3) it seems that once the Network Configuration portion of FLASH is "full" the MAC is likely to revert to a value of "00-00-00-00-00-01", but other values may also be possible, further examples are needed to determine this.

4) some networks will let the N+ operate with a MAC of "00-00-00-00-00-01" assuming it is the only device having that address, while others may not allow it to operate with that MAC.

5) Network Configuration values that are set by DHCP do get written into the Network Configuration portion of FLASH. I am not sure if they are written every time, or just when the values change. If written every time a lease is renewed, that could result in the N+ MAC being reset.


Recommendations:

I think the documentation of certain functions should include a warning about the limitations of updating the Network Configuration portion of FLASH.

I worry that using DHCP will eventually fill up the Network Configuration portion of FLASH. If the FLASH is rewritten every time a lease is renewed, the FLASH will likely fill up over the course of months. If rewritten only when values change, the FLASH will likely fill up over the course of years. It would be better if DHCP values did not update the Network Configuration portion of FLASH.

#16 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 25 May 2012 - 12:17 PM


I worry that using DHCP will eventually fill up the Network Configuration portion of FLASH. If the FLASH is rewritten every time a lease is renewed, the FLASH will likely fill up over the course of months. If rewritten only when values change, the FLASH will likely fill up over the course of years. It would be better if DHCP values did not update the Network Configuration portion of FLASH.



I have been tracking one of my customer's units since posting in this thread. I do see that his DHCP server is occasionally assigning different IP Addresses to the N+ after restarts. From previous tests, I know that dynamic addresses are written into the FLASH network configuration area. From Chris we learn that this area has some limitations on the number of updates that can be done. Now if we are talking about the (likely) 100,000 rewrites that FLASH typically has for rewrites, that will not be a problem for me, as I am seeing less than 4 watchdog restarts per day. So 100,000 rewrites will give a product life of 68 years. However if there is some other limitations, for example, if writing net configuration values does not over write existing data, but begins to fill up some fixed sized table, then the DHCP updates could mean a much shorter life for the board between "factory" erases.

Does anyone know how many DHCP address changes the N+ can adsorb before the N+ will stop working?

#17 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 26 May 2012 - 01:55 AM

I believe that the IP settings are only written if you use the .EnableDhcp() or .EnableStaticIp features (or otherwise change the IP config settings via code or GUI). If the DHCP-allocated IP address is rewritten into flash every time it changes, we need to bring that up with Microsoft and change that behavior. The config sector can be chained (i.e. new settings written one after another), so the rewrite cycles aren't the issue...it's running out of space that is the potential issue. Chris

#18 Arbiter

Arbiter

    Advanced Member

  • Members
  • PipPipPip
  • 132 posts
  • LocationBrisbane, Australia

Posted 26 May 2012 - 02:28 PM

The config sector can be chained (i.e. new settings written one after another), so the rewrite cycles aren't the issue...it's running out of space that is the potential issue.


So the chaining rather than rewriting is a deliberate longevity strategy?

If so, a more sophisticated strategy would rewrite when there was no more room for chaining.
One day, all this too shall parse.

#19 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 26 May 2012 - 04:32 PM

So the chaining rather than rewriting is a deliberate longevity strategy?

If so, a more sophisticated strategy would rewrite when there was no more room for chaining.

I believe that the strategy relates to how flash works...

To write new data in an empty section of flash, you can simply write the data. To write new data over old data you have to read all the data you want to keep out of a sector, erase the entire sector, and then write everything back you want (including the new data). Or have two sectors and then switch which one is the "current" sector by erasing the unused one.

Since the config sector is only one sector, and NETMF doesn't want to create a big buffer to read in all that data, and because NETMF doesn't want to have such an important sector get corrupted by a power outage or code failure in the middle of writing it...chaining may have been implemented. By writing new data after old data and looking for the newest data by browsing the whole sector, enhanced reliability and lower memory consumption is achieved.

The config sector was really designed, way-back-when, as something that would rarely if ever be changed. If NETMF writes to it frequently, then that's an inconsistent design that we need to figure out how to fix in a contribution back to the core...

We can analyze a lot of these things more easily on the new Netduino Go hardware, where we can interactively browse the Flash while running NETMF on the board (using the JTAG connector). I'm really looking forward to getting the first hand-built samples of the new Ethernet module for Netduino Go, and digging into the details of the config sector usage.

Chris

#20 Arbiter

Arbiter

    Advanced Member

  • Members
  • PipPipPip
  • 132 posts
  • LocationBrisbane, Australia

Posted 27 May 2012 - 01:44 PM

The more I learn about this, the less I seem to know. It's very disturbing.
One day, all this too shall parse.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.