Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.
Photo

Netduino Plus Reliability


  • Please log in to reply
25 replies to this topic

#21 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 27 May 2012 - 02:01 PM

The config sector was really designed, way-back-when, as something that would rarely if ever be changed. If NETMF writes to it frequently, then that's an inconsistent design that we need to figure out how to fix in a contribution back to the core...

We can analyze a lot of these things more easily on the new Netduino Go hardware, where we can interactively browse the Flash while running NETMF on the board (using the JTAG connector). I'm really looking forward to getting the first hand-built samples of the new Ethernet module for Netduino Go, and digging into the details of the config sector usage.

Chris



Sounds like a good plan. If the N+ is not erasing and re-writing the configuration sector once all the chained "positions" are used up, then the behavior I saw would make sense. Here are some statics from a system installed at a customer site that might help:

Over a 5 days period of continuous operation, I saw:

14 times the N+ froze and was reset by the watchdog timer.
3 times, following a watchdog reset, the DHCP server (provided by a DSL modem) assigned a different IP address.
1 time, following a watchdog reset, the configuration was updated by my managed code, due to finding bogus data there. Specifically, the IP and gateway addresses had been changed to 0.0.0.0

Note my N+ operates as a client, and the freezes usually occur when creating a socket (on the N+, outgoing sockets cannot be reused unfortunately), but sometimes occur when doing a CONNECT();

I think that means the configuration sector was written to/updated approximately once per day, if we assume a DHCP operation does not cause a configuration write when the IP address is unchanged. And three times per day if it causes a rewrite every time (A DHCP transactions will occur each time there is a reset). And even more if a write occurs behind-the-scenes each time the DHCP client does a "RENEW" transaction.

#22 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 27 May 2012 - 05:10 PM

Hi Arbiter,

The more I learn about this, the less I seem to know. It's very disturbing.

I'm just glad that this is all open source, where we can dig in and see exactly what's going on. .NET Micro Framework has proven to be a strong platform in-field and by understanding and addressing any issues that pop up...we all make it an even better platform. With millions of .NET MF-powered devices in the field, these kinds of issues have really shrunk over time.

Here's some simple guidance for writing the config sector:
  • Config info is meant to be changed infrequently
  • Network configuration changes can rewrite the config sector
  • Avoid change the network config in code unless responding to a user request (or unless there's an infrequent change you need to make).

If needed it's also possible to create a managed code method (with native interop) which reads out the config sector, erases the config sector, and rewrites it in compact fashion. But if there is frequently-changing config information which needs to be saved in flash, the better solution might be to figure out a way to allocate two config sectors instead (or store the data elsewhere).

Chris

#23 Arbiter

Arbiter

    Advanced Member

  • Members
  • PipPipPip
  • 132 posts
  • LocationBrisbane, Australia

Posted 28 May 2012 - 01:36 AM

But if there is frequently-changing config information which needs to be saved in flash, the better solution might be to figure out a way to allocate two config sectors instead (or store the data elsewhere).


I gave up and used static IP. For a device in the field with its own Wireless 3G modem this solution is trivial and reliable, so in that respect this is for me a non-issue. What bothers me is realising that there are whole chunks of the N+ execution cycle that are, for me, magic black boxes. Once upon a time when I built a Z80 system I knew how the boot sequence worked right down to the resistor/cap arrangement that pulled reset low while the DRAM was getting a charge up. Now, I just have to have faith until the framework calls Main() and that freaks me out.

I know it's all open source, but I have yet to find, in my travels, a "How it works" narrative that goes from hardware behaviour at power on to the invocation of Main(), describing the complete bootstrap mechanism, the model for loading and initialisation of device drivers, and the loading and initialisation of the framework right up the loading of an assembly and the call to Main().

I'm sure this is all in the documentation, but a unified narrative with links to supporting documents would be worth its weight in iridium (which is a great deal rarer and more useful than gold).
One day, all this too shall parse.

#24 Valkyrie-MT

Valkyrie-MT

    Advanced Member

  • Members
  • PipPipPip
  • 315 posts
  • LocationIndiana, USA

Posted 28 May 2012 - 04:12 AM

Wow, that's great info. I've had very similar findings with my 4.2 firmware. I basically have accumulated a bunch of work-arounds for these things. I am hopeful that with the reliability improvements in NETMF 4.3 and the Netduino GO allowing anyone to compile with GCC will allow the platform to greatly improve.

14 times the N+ froze and was reset by the watchdog timer.

On the lockups, I have found that some of the socket methods cannot be trusted to return and there is no timeout! They may block forever, specifically, GetHostEntry and Connect. So, the best solution is to run it on it's own thread. I have found that this works. To pretty it up, I wrap them in a nice extension method, so my calls look fairly normal :) Here is an example:

    public static bool TryConnect(this Socket s, EndPoint ep)
    {        
        bool connected = false;             
        new Thread(delegate        
            {                               
                try                              
                {
                    s.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.NoDelay, true);
                    s.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.KeepAlive, false);
                    s.SendTimeout = 3000;
                    s.ReceiveTimeout = 3000;
                    s.Connect(ep);

                    connected = true;                              
                }                              
                catch { }                 
            }).Start();             
        int checks = 10;                     
        while (checks-- > 0 && connected == false) Thread.Sleep(200);             
        // if (connected == false) throw new Exception("Failed to connect");             
        return connected;
    }


...the IP and gateway addresses had been changed to 0.0.0.0

Now I and others have seen the MAC address get reset to 00:00:00:00:00:01, but until about 3 weeks ago, I had not seen the IP address spontaneously get set to 0.0.0.0. I wanted to see if it happened again before posting about it. But, since you saw it too, I am writing about it. I think this is a VERY rare event, on the order of once in a month or two. For me, this resulted in my main thread continuing to run, but the network access was unresponsive. So, the Netduino did not lockup, it continued to run with that IP (0.0.0.0). But, I did see it, here is the event straight from my logs:

22:29:11 Networking has a bad address: 0.0.0.0


I have an event handler on NetworkAvailabilityChanged and NetworkAddressChanged, which checks the MAC address and IP address and in this case the IP was bad. There had been no loss of network connectivity or power and there was nothing in the 30 minutes prior in the log. So, it's a mystery, and now I have another work-around in my code... Something like if IP address or MAC is bad, then reboot.

3 times, following a watchdog reset, the DHCP server (provided by a DSL modem) assigned a different IP address.

DHCP may give you a different address when the lease expires. I don't know that this is a bad thing...


Ultimately, I do not have a hardware watchdog (although it would be awesome to have the firmware enable the onboard one). But, I have had my Netduino Plus operate for several months straight at times. All the issues I have seen, really stem from the networking.

Thanks for the great info Robert!
-Valkyrie-MT

#25 Robert L.

Robert L.

    Advanced Member

  • Members
  • PipPipPip
  • 100 posts

Posted 29 May 2012 - 11:43 AM

DHCP may give you a different address when the lease expires. I don't know that this is a bad thing...


It could be if being issued different addresses fills up the configuration sector. Something I hope Chris will let us know more about once he has the tools to see how DHCP activity affects the configuration sector.

Robert

#26 pkobl

pkobl

    New Member

  • Members
  • Pip
  • 5 posts

Posted 06 December 2012 - 10:44 PM

It could be if being issued different addresses fills up the configuration sector. Something I hope Chris will let us know more about once he has the tools to see how DHCP activity affects the configuration sector.

Robert


Hi,

does anybody know if any of the problems covered in this thread were corrected in 4.2?

Or a better knowledge in regard to how DHCP (IP lease renewals or IP changes) affect the configuration sector? May the configuration sectors still fill up after a couple of hundred updates? I am mostly concerned with the simple case with DHCP enabled in config.

Thanks for any new light on this issue.

..Peter




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.