Netduino speed vs native 48Mhz Atmel code
#1
Posted 15 June 2012 - 08:41 PM
#2
Posted 16 June 2012 - 03:55 AM
#3
Posted 16 June 2012 - 11:43 AM
#4
Posted 16 June 2012 - 07:38 PM
IMHO in order not to be misleading you'd need to say something like "Arduino has Atmel AVR microcontroller running at 8 or 16 MHz, Netduino has Atmel ARM7TDMI microcontroller running at 48 MHz. Despite their completely different architectures (cores, instruction sets, pipelines etc.), they have similar performance, according to the respective datasheets the AVR achieves throughput approaching 1 MIPS per MHz, ARM7TDMI a little bit lower 0.9 MIPS/MHz. So, Netduino microcontroller has claimed theoretical performance 0.9*48 = 43.2 MIPS, which is 2.7× (~3×) more than Arduino AVR 1*16 = 16 MIPS."When you talk about Arduino, you can say it's an MCU running at 12Mhz (or whatever clock speed it has). But it in the same way, it would be misleading to say hat the Netduino is an MCU running at 48MHz.
You could compare .NET MF runtime to BASIC interpreter (the difference is that CLR does not perform any source code parsing, it directly interprets the CIL bytecode from the flash, which has been deployed from a PC). The managed code execution overhead and several abstraction layers (PAL, HAL) cause that the duration of a method call such as OutputPort.Write() is about 50 µs, which translates to 20 kHz (pin toggling); the runtime is able to handle managed (!) interrupt events at about 10 kHz (the limitation is the length of the event queue and the duration of the managed event handler). Unfortunately, these are weak points of the current implementation of .NET MF, especially when comparing to micros like AVRs. The execution of managed code that does not access hardware peripherals should be faster, you'd probably need to measure the code using for example the Stopwatch class - but keep in mind the resolution of Netduino system timer is ~21 µs.Yes, I know that a single line of C# could result in many hundreds of instructions being executed by the CPU. I just wonder how to describe the speed of Netduino (or maybe of .NETMF in general) to people who are familiar with MCUs but not with the concept on running .NETMF clr in firmware on them.
#5
Posted 17 June 2012 - 06:25 PM
#6
Posted 17 June 2012 - 07:32 PM
According to my measurement of a trivial "i++;", it takes about 10 µs to execute 4 CIL instructions, which translates to 400000 per second. I am not sure whether or how should I convert this to MIPS, as the number of 'real' instructions vary significantly for different CIL instructions.Thanks for clarifying CW2.
Given the theoretical 43.2 MIPS of the MCU, can you roughly say how many MIPS tthe CLR on a Netduino can manage?
Test case:
public static void Main() { int i = 0; Debug.Print("Ticks per ms = "+ TimeSpan.TicksPerMillisecond.ToString()); // Get the method call overhead var ts0 = Utility.GetMachineTime(); var ts1 = Utility.GetMachineTime(); var ts2 = Utility.GetMachineTime(); var ts3 = Utility.GetMachineTime(); Debug.Print((ts1.Ticks - ts0.Ticks).ToString()); Debug.Print((ts2.Ticks - ts1.Ticks).ToString()); Debug.Print((ts3.Ticks - ts2.Ticks).ToString()); // The measurement ts0 = Utility.GetMachineTime(); i++; ts1 = Utility.GetMachineTime(); i++; ts2 = Utility.GetMachineTime(); i++; ts3 = Utility.GetMachineTime(); Debug.Print((ts1.Ticks - ts0.Ticks).ToString()); Debug.Print((ts2.Ticks - ts1.Ticks).ToString()); Debug.Print((ts3.Ticks - ts2.Ticks).ToString()); ts0 = Utility.GetMachineTime(); i++; ts1 = Utility.GetMachineTime(); i++; ts2 = Utility.GetMachineTime(); i++; ts3 = Utility.GetMachineTime(); Debug.Print((ts1.Ticks - ts0.Ticks).ToString()); Debug.Print((ts2.Ticks - ts1.Ticks).ToString()); Debug.Print((ts3.Ticks - ts2.Ticks).ToString()); Thread.Sleep(Timeout.Infinite); }IL disassembly of "i++;" in Release:
ldloc.0 ldc.i4.1 add stloc.0
Results: average GetMachineTime() method call overhead = 650 ticks, average "i++;" execution time without the overhead = 102 ticks. Ticks per microsecond = 10. Used firmware with increased timer resolution (2.667 µs).
#7
Posted 18 June 2012 - 02:53 PM
#8
Posted 19 June 2012 - 08:35 AM
Yes, I now understand completely why the speed ratio can't be expressed in a simple number.1. There are lots of things that are going on behind the scenes, like asynchronous sending/receiving of TCP packets. These happen mostly in native code, at the full speed of the microcontroller.
2. Managed code instructions are interpreted...and many of them call into libraries which execute dozens of native code instructions (those required to accomplish the task). For example, the PWM constructor is one line of code...but there are a dozen registers and such which need to be configured.
Thanks guys!
#9
Posted 07 July 2012 - 02:32 AM
#10
Posted 07 July 2012 - 02:38 AM
#11
Posted 09 July 2012 - 10:49 AM
Netduino does not expose the JTAG pins of the CPU but as you can upload new firmware via serial, it would surprise me if you cannot do this via AVR, especially since AVR is an Atmel development tool.Along these lines, although maybe a bit off topic, I am wondering if the NetDuino can be programmed with AVR Studio?
Found this howto on programming using AVR and serial programmers are mentioned but could be they are merely serial interfaces for JTAG headers:
http://www.ladyada.n...rogrammers.html
If I'm not mistaken Secret Labs used/use AVR when porting tinyCLR.
See also this thread discussing toolchains for native programming of the Netduino:
http://forums.netdui...ools-do-i-need/
#12
Posted 09 July 2012 - 05:22 PM
Netduino does not expose the JTAG pins of the CPU but as you can upload new firmware via serial, it would surprise me if you cannot do this via AVR, especially since AVR is an Atmel development tool.
Found this howto on programming using AVR and serial programmers are mentioned but could be they are merely serial interfaces for JTAG headers:
http://www.ladyada.n...rogrammers.html
If I'm not mistaken Secret Labs used/use AVR when porting tinyCLR.
See also this thread discussing toolchains for native programming of the Netduino:
http://forums.netdui...ools-do-i-need/
Thanks very much! That thread is very interesting. I'm glad I'm not the only one who is overwhelmed by all this ARM stuff.
AVR Studio 6 seems to support Atmel's Mega, XMega and Cortex M offerings. That last part I'm not so sure about, "Cortex M." I think that means the SAM7x is not supported since its not listed. Back to Wikipedia I guess... I gotta learn more about the ARM variations.
#13
Posted 17 July 2012 - 01:02 AM
#14
Posted 17 July 2012 - 01:44 AM
In my own tests a while ago, I determined that the C# code runs roughly 100 to 1000 TIMES slower than the equivalent C++ code. The bottom line is that it is very slow but you get an excellent development environment and a managed language that makes coding up complex tasks much easier and faster, as long as you don't need super fast execution.
Not sure whether that helps or not.
I guess I could always throw more hardware at it! What's nice about .NET is the threading. It would be very easy to code up my custom web server. If I want to do this as native I probably need a threaded RTOS.
#15
Posted 18 July 2012 - 09:56 PM
Were these tests made on equivalent hardware and regardless of that, which hardware(s)?In my own tests a while ago, I determined that the C# code runs roughly 100 to 1000 TIMES slower than the equivalent C++ code.
#16
Posted 18 July 2012 - 10:16 PM
Were these tests made on equivalent hardware and regardless of that, which hardware(s)?
I don't remember the details but yes, it was the same hardware. I think it was on a Netduino. IIRC the test was to see how fast a pin could be toggled in C vs C#.
#17
Posted 18 July 2012 - 10:26 PM
#18
Posted 19 July 2012 - 07:00 AM
Your results sounds reasonable and pretty much what I expected, just wanted to make sure we were talking about the same thing here.
Btw, what toolchain etc. did you use for native C programming on the Netduino? I've been meaning to do a little native stuff for some time now but never actually got around to do so.
Well, that whole project ended up going into a completely different direction. Now remember that most of this was done as a pet project just to see whether it could be done or not, so it might not make sense why I did it this way...
The first idea was to have a .Net decompiler that recompiles the .Net IL into C++. First it would do a static code path analysis so it only decompiles the code that is actually used by your application, then create the equivalent C++ headers and source files based on that IL. It includes a garbage collector and everything else you need that makes it "managed". The only thing it didn't do was reflection. This actually worked quite well and I was getting very encouraging performance results. But this was always on my desktop computer, not on a microcontroller. The idea was that once reliable C++ code could be created, that a standard ARM C++ compiler could then be used to compile that code to native ARM code.
But after looking into C++ ARM compilers, I realized that was a bad idea. Most were really expensive and the cheap/free ones were not worth the setup/maintenance headaches. I then started looking into compiling the .Net IL code into native ARM code. This project also progressed well and I ended up making a .Net decompiler, ARM compiler and ARM emulator. So basically I could take any .Net assembly (it didn't even needed to be a MF assembly, it could be the a full .Net assembly, as long as it didn't use things like WCF, WPF, etc), and this would then be decompiled to IL, re-compiled into ARM assembly and then executed by the ARM emulator. This also worked quite well but I got stuck on a particularly difficult to track down bug, and like most pet projects, before I could complete it, another shiny object attracted my attention. I still plan to get back to it at some point and turn it into something really useful.
What made this complex as well is that while it could cross-compile a lot of the full .Net classes into native ARM code, as soon as any of those methods make a native call, all bets are off. For instance, combining two strings in the .Net framework is not done in C# code by the String class, but it does a PInvoke to native code. So I had to add special checks for any IL code that makes native calls, and throw an error. Since many of these classes are critical to most applications, I had a system where there would be specially created substitution classes that do implement the full functionality in either C# or in "inline" ARM code. So if the decompiler came across a String.Concat(string, string) method, it would know to substitute it for the explicitly implemented one.
That was probably more info than you wanted to know, so to answer your actual question... I never ended up using any ARM C++ compilers.
EDIT: I just realized you were talking about the native vs C# performance results. Well, I never did that testing myself, but someone else posted performance results of toggling the pin with C/C++ code and then with C# code. So I base the "100 to 1000 times" values on results I have seen from other people's tests.
#19
Posted 20 July 2012 - 09:44 PM
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users