Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.
Photo

SerialPort Receive Data Corruption

Netduino Plus 2 Netduino 3 WiFi SerialPort Data Receive

Best Answer Chris Walker, 22 July 2015 - 02:59 AM

Hi Hawkez,

Some diagnostics steps to simplify this a bit and figure out what's going on:

1. First of all, use hardware flow control. COM2 on pins D2/D3 (with RTS/CTS on D7/D8) will make sure that data doesn't overflow on the serial lines (i.e. buffer corruption).
2. Before creating an app which reads SD card data for loopback testing, try sending the numbers 0-9 repeatedly (or bytes 0-255). Then add in the SD card. If/when the test breaks, we have a path for resolution.

BTW, we have done a lot of loop back tests with Netduino hardware (with flow control on) and have never seen data corruption in a production environment. If NETMF is losing/corrupting data somewhere...we want to fix that.

Welcome to the Netduino community,

Chris Go to the full post


  • Please log in to reply
17 replies to this topic

#1 Hawkez

Hawkez

    New Member

  • Members
  • Pip
  • 7 posts

Posted 20 July 2015 - 01:39 PM

Hello,

 

I am using a Netduino Plus 2 and a Netduino 3 WiFi to interact serially.  When receiving data over a SerialPort I was having issues getting consistent data back.  So, in order to test the SerialPort functionality, I created a loop back test. that reads a file from an SD card and sends it over COM2.  I've jumpered the Tx (D3) and Rx (D2) pins on the header so that the data is then received.  I then take that data and write it to another file on the SD card.

 

A diff of original file and the newly created file reveals lots of differences no matter what baud rate is used.  Lower baud rates are worse than higher baud rates.  The Netduino 3 WiFi never completes the test due to lost data which may be a separate issue.

 

I've attached a file with the application and example data.

 

Is there something I am doing wrong?

 

I've also posted the same issue on the .NetMF codeplex site with no resolution.

https://netmf.codepl...m/workitem/2508

 

Thanks!

Attached Files



#2 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 22 July 2015 - 02:59 AM   Best Answer

Hi Hawkez,

Some diagnostics steps to simplify this a bit and figure out what's going on:

1. First of all, use hardware flow control. COM2 on pins D2/D3 (with RTS/CTS on D7/D8) will make sure that data doesn't overflow on the serial lines (i.e. buffer corruption).
2. Before creating an app which reads SD card data for loopback testing, try sending the numbers 0-9 repeatedly (or bytes 0-255). Then add in the SD card. If/when the test breaks, we have a path for resolution.

BTW, we have done a lot of loop back tests with Netduino hardware (with flow control on) and have never seen data corruption in a production environment. If NETMF is losing/corrupting data somewhere...we want to fix that.

Welcome to the Netduino community,

Chris

#3 Hawkez

Hawkez

    New Member

  • Members
  • Pip
  • 7 posts

Posted 23 July 2015 - 12:06 PM

Hello Chris,

 

Thanks for the input.  I ran my test with flow control turned on and it worked perfectly. :)

 

I did not expect for flow control to be required because of the structure of the test: send some data and do not send more until you get back the data sent.

 

The issue that took me down the loopback test road was that I saw corruption of data when using an XBee on the SparkFun XBee shield.  I'm am excited to try that again now that I know I need to turn flow control on.

 

I appreciate the help!



#4 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 23 July 2015 - 12:32 PM

Hawkez, I replied to your issue on codeplex.

 

For what it's worth, I have had good results using COM1, with a device that doesn't support flow control, at 9600 baud. I think the trick is that you have to service the DataReceived event as quickly as possible and not introduce any unnecessary delay. The serial port raises the event sometimes for every character so you have to service it really quickly - at 9600 you've only got about a millisecond (104 microseconds) before the next event could arrive.

 

My event handler looks like this:

        void HandleDataReceivedEvent(object sender, SerialDataReceivedEventArgs serialDataReceivedEventArgs)
            {
            if (serialDataReceivedEventArgs.EventType == SerialData.Eof)
                {
                Dbg.Trace("SerialData.Eof event - ignoring", Source.SerialPort);
                return;
                }
            try
                {
                lock (receiveBuffer)
                    {
                    var bytesToRead = serialPort.BytesToRead;
                    if (bytesToRead < 1) 
                        return;
                    var bufferAvailable = bufferSize - bufferHead;
                    if (bytesToRead > bufferAvailable)
                        throw new IOException("Serial read buffer overflow. Consider increasing the buffer size");
                    var bytesReceived = serialPort.Read(receiveBuffer, bufferHead, bufferAvailable);
                    bufferHead += bytesReceived;
                    Dbg.Trace("Received " + bytesReceived + " bytes, " + bufferHead + " bytes in buffer",
                        Source.SerialPort);
                    }
                dataReceivedSignal.Set();
                }
            catch (Exception ex)
                {
                Dbg.Trace("Exception handling DataReceivedEvent: "+ex.Message, Source.SerialPort);
                }
            }

dataReceivedSignal is a ManualResetEvent; Dbg.Trace() is essentially a wrapper around Debug.Print() - and yes the code works, even with a Debug.Print in there.

 

When it comes to getting the data out of the buffer, I do it like this (this happens on a different thread, which is why I need to lock the buffer):

        /// <summary>
        ///     Receives raw bytes from the console up to the maximum specified, optionally skipping bytes from the start,
        ///     until no data is received fro the specified quiet time.
        /// </summary>
        /// <param name="bytesToReceive">
        ///     The number of bytes to receive from the serial port, including any skipped
        ///     bytes. Additional data received by the serial port will be discarded.
        /// </param>
        /// <param name="timeout">
        ///     The timeout period, after which an exception is thrown if the required number of bytes
        ///     have not been received.
        /// </param>
        /// <param name="skipBytes">
        ///     The number of bytes to be dropped from the front of the received data. Optional;
        ///     default is 0.
        /// </param>
        /// <returns>
        ///     System.Byte[] containing the received data. The size of this array will be
        ///     <paramref name="bytesToReceive" /> - <paramref name="skipBytes" />.
        /// </returns>
        /// <exception cref="SerialReadTimeoutException">
        ///     Timed out while waiting for  + maxBytesToCopy.ToString() +
        ///     bytes
        /// </exception>
        byte[] ReceiveRawBytes(int bytesToReceive, Timeout timeout, int skipBytes = 0)
            {
            bool receiving = true;
            while (receiving)
                {
                receiving = dataReceivedSignal.WaitOne(timeout, false);
                dataReceivedSignal.Reset();
                if (bufferHead >= bytesToReceive)
                    break; // Got enough data, break out of the receive loop.
                }
            /*
             * We have either received enough data, or timed out waiting.
             */
            if (!receiving)
                throw new SerialReadTimeoutException("Timed out while waiting for " + bytesToReceive.ToString() +
                                                     " bytes");
            var bytesToCopy = bytesToReceive - skipBytes;
            byte[] rawBytes = CopyAndClearReceiveBuffer(bytesToCopy, skipBytes);
            return rawBytes;
            }

        /// <summary>
        ///     Copies the receive buffer into a new byte array, and clears receive buffer.
        ///     This is done in a thread-safe manner.
        /// </summary>
        /// <param name="maxBytesToCopy">
        ///     The maximum number of bytes to receive. Any bytes in the receive buffer beyond the maximum
        ///     are discarded.
        /// </param>
        /// <param name="skipBytes">
        ///     Number of bytes to skip from the start of the receive buffer. Useful for ignoring some leading
        ///     character such as ACK.
        /// </param>
        /// <returns>System.Byte[] containing the contents of the receive buffer.</returns>
        byte[] CopyAndClearReceiveBuffer(int maxBytesToCopy = Int32.MaxValue, int skipBytes = 0)
            {
            lock (receiveBuffer)
                {
                var byteCount = Math.Min(maxBytesToCopy, bufferHead);
                int bytesToCopy;
                if (byteCount + skipBytes >= bufferHead)
                    {
                    // After skipping bytes, there are still enough bytes in the receive buffer to return the requested number of bytes.
                    // We'll return the requested number of bytes and discard the rest.
                    bytesToCopy = byteCount;
                    }
                else
                    {
                    // After skipping bytes, there are too few bytes in the buffer to meet the requested size.
                    // We'll return as many as possible.
                    bytesToCopy = bufferHead - skipBytes;
                    }
                byte[] copy = new byte[bytesToCopy];
                Array.Copy(receiveBuffer, skipBytes, copy, 0, bytesToCopy);
                bufferHead = 0; // Truncate the buffer
                return copy;
                }
            }


#5 Hawkez

Hawkez

    New Member

  • Members
  • Pip
  • 7 posts

Posted 28 July 2015 - 02:12 PM

As a follow-up, The loop-back test with flow control was successful using the Netduino Plus 2.  I was able to loop ~18K of data over the serial port and get back exactly what I sent repeatedly.

 

However, when I tried the same test on the Netduino 3 WiFi I only got back the first two chunks of data and it stops receiving data that is sent. 

 

I'll investigate further and post something in the Netduino 3 WiFi forum when I am able to.  At that point I will post a link in this discussion.

 

Thanks!



#6 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 28 July 2015 - 04:33 PM

Hi Hawkez,

Interesting. How simple is your loopback code? Please post a link back to this discussion when you are able to...loopback should always "just work" unless the serialport buffers are overflowing (i.e. more data sent faster than the UART can keep up).

Chris

#7 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 28 July 2015 - 11:01 PM

For what it's worth, I saw something similar on N+2; the DataReceived events would just stop happening and no amount of incoming data (as observed on a logic analyzer) would make them start again. My application sends tiny amounts of command data and receives up to 100 bytes in response.

 

In the end I refactored my code to close the port and re-open it between each transaction, and that has fixed the problem for me - but it shouldn't be necessary to do that really and it could be a showstopper when larger amounts of data are involved. I wouldn't be surprised if there is a bug in there somewhere and it seems like the ports are not stable over long periods of time.

--Tim Long



#8 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 29 July 2015 - 02:00 AM

I have just observed this issue again today - so it looks like closing and re-opening the port has *not* fixed this issue. I can see data arriving on my logic analyzer, but no DataReceived events are firing and my receiver times out waiting for data to arrive.

 

I open the port like this before sending a command...

        public void Open()
            {
            Dbg.Trace("Console Opening", Source.WeatherConsole);
            if (serialPort.IsOpen)
                {
                Dbg.Trace("ERROR - Open() called but the serial port was already open. Throwing.", Source.WeatherConsole);
                throw new InvalidOperationException("Serial port can only be opened once");
                }
            receiveBuffer = new byte[bufferSize];
            bufferHead = 0;
            serialPort.DataReceived += HandleDataReceivedEvent;
            serialPort.Open();
            }

And then after I receive a response (or time out) I close it again like this:

        public void Close()
            {
            if (!serialPort.IsOpen)
                {
                Dbg.Trace("WARN - Close() called but the serial port was already closed. Continuing.", Source.WeatherConsole);
                }
            serialPort.DataReceived -= HandleDataReceivedEvent;
            serialPort.Close();
            Dbg.Trace("Console Closed", Source.WeatherConsole);
            }

Since I re-subscribe to the event each time I open the port, I don't think there is any risk that I'm not correctly subscribed to the event. Nevertheless, the events are not happening.

 

--Tim



#9 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 29 July 2015 - 11:46 PM

Hi Tim,

If, instead of using DataAvailable, you put a blocking SerialPort.Read(...) call on another thread, do you see the same issue?

Also--are there any other events which stop firing?

I know that there used to be a bug in NETMF's SerialPort.DataAvailable event where it wouldn't always fire when data was received. We might need to revisit that (once we have a simple repro to test against).

Chris

#10 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 01 August 2015 - 10:49 PM

Chris, it isn't trivial for me to restructure my application but I think what I might do is spin up a separate repro solution with just minimal code in it.

 

What I did try yesterday was, for each transaction of a small command and 100-byte response, I created a new serial port instance, subscribe to its DataReceived event, perform the transaction, then unsubscribe from the event, close the port and dispose the instance and force garbage collection. When I do that, I find that things work on every other transaction; on the alternate transactions I get no DataReceived events (even though I can see data flowing on the logic analyzer) and so the transaction times out. Why it would work every other time is a bit puzzling.

 

I'm still seeing the 1-minute-past-the-hour issue that I previously mentioned and it's not conclusive but on at least one occasion I saw my DataReceived events stop at 1 minute past the hour. I will get to the bottom of this one way or another!

 

Other events - I am subscribed to the NetworkAvailabilityChanged event, but I don't think I ever see that fire more than once at startup.

 

I use several timers to flash LEDs and I use some Thread.Sleep() calls to time how often I request data over the serial port, currently I do that about every 10 seconds.

 

I'm not running out of memory or anything, I print out the free memory at the end of every transaction and it holds steady at about 40K.

--Tim



#11 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 02 August 2015 - 03:53 AM

I've produced a cut down version of my code that omits everything except the serial handling, and some diagnostics so I can see what is going on. The code is here: https://bitbucket.or...rial-diagnostic

 

The code basically polls a device for some data every 5 seconds, in a never ending loop. I will leave it running overnight to see if it fails at all.

 

--Tim



#12 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 02 August 2015 - 03:05 PM

The result of running that code overnight was:

[Device      ] Serial Port Opening
[Device      ] ERROR - Open() called but the serial port was already open. Throwing.
[Main Loop   ] Exception caught in Main loop (attempting to continue):
[Main Loop   ] System.InvalidOperationException: Serial port can only be opened once
[Main Loop   ] Iteration 7880; successful=1459; failed=6421

It looks like it failed more than it worked, but actually there is a flaw in the test code in that once it fails it never closes the port and will always fail. This problem wasn't in the original code because the open/close operations were in a try/finally block. I will modify the test code and try again.

 

--Tim



#13 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 02 August 2015 - 06:44 PM

OK, I think we have a "smoking gun". I ran my test code and it ran for 1135 transactions without any issues. There was 1 failure in that time but that's because the device failed to wake up, and this is expected behaviour. The device goes into low power mode and sometimes takes several attempts to wake it up. I send it a <LF> line feed and it may or may not respond with <LF><CR>. It usually wakes up within 3 tries, but not always. If it doesn't wake up, I count that as a failed transaction and simply go around the loop again.

 

On transaction 1136, the device apparently did not respond, and never responded again. However, crucially, I can see it responding on my logic analyzer, so the device IS responding but the DataReceived events have just stopped working. Here is the key part of the log output (the full file is attached):

[Device      ] Serial Port Opening
[Device      ] Wakeup - try 1
A first chance exception of type 'TA.NetMF.SerialDiagnostic.SerialReadTimeoutException' occurred in TA.NetMF.SerialDiagnostic.exe
[Device      ] Device did not respond
[Device      ] Wakeup - try 2
A first chance exception of type 'TA.NetMF.SerialDiagnostic.SerialReadTimeoutException' occurred in TA.NetMF.SerialDiagnostic.exe
[Device      ] Device did not respond
[Device      ] Wakeup - try 3
A first chance exception of type 'TA.NetMF.SerialDiagnostic.SerialReadTimeoutException' occurred in TA.NetMF.SerialDiagnostic.exe
[Device      ] Device did not respond
[Device      ] Wakeup failed after 3 attempts; throwing IOException.
A first chance exception of type 'System.IO.IOException' occurred in TA.NetMF.SerialDiagnostic.exe
[Main Loop   ] Exception caught in Main loop (attempting to continue):
[Main Loop   ] System.IO.IOException: Unable to wake up the device
[Device      ] Serial Port Closed
[Main Loop   ] Iteration 1137; successful=1135; failed=2
[Main Loop   ] First fail: 08/02/2015 16:23:12 last fail: 08/02/2015 18:01:30

May I draw your attention to the last line of that log snippet - the 'last fail' time -- one minute past the hour. This is not a coincidence as I think I have soundly demonstrated. This thing always fails between one and two minutes past the hour, yet there is no time dependent code anywhere in the diagnostic app.

 

Here's a screen shot of the logic analyzer output taken during the failure, somewhere near the end of the log, clearly showing that serial data is present on the physical pins:

Attached File  Logic-serial-2015-08-02-1918.PNG   101.49KB   1 downloads

 

The exact code used to produce this result is: https://bitbucket.or...134b5b8de4aa79e

 

I do not believe that my code is causing this failure. I'm am solidly convinced there is an issue with the DataReceived event from the serial port, which is clearly failing to happen even though data is present on the serial pins. The significance of one-minute-past-the-hour is a curious mystery, but it must be a clue as to what is happening.

 

Best regards,

Tim Long

 

PS. One difficulty in reproducing this is that you are unlikely to have the device I have (it is a Davis Vantage Pro2 weather station that has been modified to expose a serial port). However, a seconds Netduino could be used fairly easily to simulate it. Unfortunately I don't have any other serial devices available, and I didn't want to over-complicate the diagnostic app by making it be both the master and slave devices.

Attached Files



#14 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 02 August 2015 - 07:10 PM

One more observation: when connecting to my Netduino Plus 2 using MFDeploy, it seems to have changed its name to something unutterable:

Attached File  Netduino-unutterable-name.PNG   6.64KB   1 downloads

The original (correct) name is still there in the list, but this strange new one has appeared and is the one that MFDeploy seems to prefer. It responds and behaves correctly... odd!!

--Tim



#15 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 03 August 2015 - 02:39 AM

As requested in an earlier post, I have made a version of the diagnostic that doesn't use the DataReceived event but instead runs a thread that uses SerialPort.Read(). I have pushed it to the public repository on the no-events branch. Leaving it to run overnight to see what happens.

 

First observation: It's throwing System.Exception internally when there's no data available, about every 2 seconds.

A first chance exception of type 'System.Exception' occurred in Microsoft.SPOT.Hardware.SerialPort.dll
[DataReceived] Exception in receive thread: Exception was thrown: System.Exception

So Read() will time out after a little while with no data. I countered this by setting SerialPort.Timeout to 60 seconds. My loop executes every 5 seconds so I should never see a timeout from the serial port - and sure enough I don't, so I've pushed that change to the repo.

 

Exceptions that derive from NetMF are singularly unhelpful, because more often than not it throws System.Exception, which is meaningless, and the message is something unhelpful like "An exception was thrown". Duh! That is compounded by the documentation being even worse, as in most cases it doesn't even bother to document exceptions. That all adds up to making it very difficult to work through this type of issue!



#16 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 03 August 2015 - 12:40 PM

The diagnostic version that doesn't use events has run for over 5,000 iterations and not missed a beat. So it looks like it's only the DataReceived event that is affected.

 

I think I have seen a similar problem affecting the network sockets interface, so I'm moving on to investigate that next. Can I leave the serial issue with SecretLabs for further investigation?

 

Best regards,

Tim



#17 Hawkez

Hawkez

    New Member

  • Members
  • Pip
  • 7 posts

Posted 03 August 2015 - 06:08 PM

I have finally been able to post the results of a simple loopback test on the Netduino 3.

 

See the post here: http://forums.netdui...-loopback-test/

 

This test also has interesting findings for Netduino Plus 2.



#18 NameOfTheDragon

NameOfTheDragon

    Advanced Member

  • Members
  • PipPipPip
  • 112 posts
  • LocationCanterbury, Kent, UK

Posted 05 August 2015 - 07:02 AM

If, instead of using DataAvailable, you put a blocking SerialPort.Read(...) call on another thread, do you see the same issue?

 

I've re-architected my application to work without the DataReceived event. The verdict: is still hangs, but I think no longer because the serial port is getting stuck. The problem now moves back to the network stack and the original one-minute-past-the-hour bug that I was seeing earlier. I've posted debugging details over at http://forums.netdui...-after/?p=63774- I will resume the discussion there since that's where it originally started.

 

--Tim







Also tagged with one or more of these keywords: Netduino Plus 2, Netduino 3 WiFi, SerialPort, Data Receive

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.