Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.
Photo

Netduino speed vs native 48Mhz Atmel code


  • Please log in to reply
18 replies to this topic

#1 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 15 June 2012 - 08:41 PM

In wonder, could anyone make a statement as of what virtual speed measured in Mhz (or kHz) a 48MHz Atmel based Netduino running C# is in comparison to running the equivalent native C and/or assembler code on the same hardware? I'm fully aware that the sought for number would be highly dependant on what sort of code we're talking about, but for now I'm just looking for a rough kind of general X:48 kind of ratio. What is X, anyone?

#2 Mario Vernari

Mario Vernari

    Advanced Member

  • Members
  • PipPipPip
  • 1768 posts
  • LocationVenezia, Italia

Posted 16 June 2012 - 03:55 AM

Not sure to understand, but...I guess you're trying to compare oranges with knives. Is the C# IL code running using CPU ops? So what? The comparison might be done on: "what's the ratio between the developing time for a certain app in C# and in C/C++?" Cheers
Biggest fault of Netduino? It runs by electricity.

#3 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 16 June 2012 - 11:43 AM

Hi Mario! Yes, I know that a single line of C# could result in many hundreds of instructions being executed by the CPU. I just wonder how to describe the speed of Netduino (or maybe of .NETMF in general) to people who are familiar with MCUs but not with the concept on running .NETMF clr in firmware on them. When you talk about Arduino, you can say it's an MCU running at 12Mhz (or whatever clock speed it has). But it in the same way, it would be misleading to say hat the Netduino is an MCU running at 48MHz. In this context, should I say that the Netduino is an MCU running at 200kHz or should I say 2Mhz? This is what I'm talking about.

#4 CW2

CW2

    Advanced Member

  • Members
  • PipPipPip
  • 1592 posts
  • LocationCzech Republic

Posted 16 June 2012 - 07:38 PM

When you talk about Arduino, you can say it's an MCU running at 12Mhz (or whatever clock speed it has). But it in the same way, it would be misleading to say hat the Netduino is an MCU running at 48MHz.

IMHO in order not to be misleading you'd need to say something like "Arduino has Atmel AVR microcontroller running at 8 or 16 MHz, Netduino has Atmel ARM7TDMI microcontroller running at 48 MHz. Despite their completely different architectures (cores, instruction sets, pipelines etc.), they have similar performance, according to the respective datasheets the AVR achieves throughput approaching 1 MIPS per MHz, ARM7TDMI a little bit lower 0.9 MIPS/MHz. So, Netduino microcontroller has claimed theoretical performance 0.9*48 = 43.2 MIPS, which is 2.7× (~3×) more than Arduino AVR 1*16 = 16 MIPS."

Yes, I know that a single line of C# could result in many hundreds of instructions being executed by the CPU. I just wonder how to describe the speed of Netduino (or maybe of .NETMF in general) to people who are familiar with MCUs but not with the concept on running .NETMF clr in firmware on them.

You could compare .NET MF runtime to BASIC interpreter (the difference is that CLR does not perform any source code parsing, it directly interprets the CIL bytecode from the flash, which has been deployed from a PC). The managed code execution overhead and several abstraction layers (PAL, HAL) cause that the duration of a method call such as OutputPort.Write() is about 50 µs, which translates to 20 kHz (pin toggling); the runtime is able to handle managed (!) interrupt events at about 10 kHz (the limitation is the length of the event queue and the duration of the managed event handler). Unfortunately, these are weak points of the current implementation of .NET MF, especially when comparing to micros like AVRs. The execution of managed code that does not access hardware peripherals should be faster, you'd probably need to measure the code using for example the Stopwatch class - but keep in mind the resolution of Netduino system timer is ~21 µs.

#5 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 17 June 2012 - 06:25 PM

Thanks for clarifying CW2. Given the theoretical 43.2 MIPS of the MCU, can you roughly say how many MIPS tthe CLR on a Netduino can manage?

#6 CW2

CW2

    Advanced Member

  • Members
  • PipPipPip
  • 1592 posts
  • LocationCzech Republic

Posted 17 June 2012 - 07:32 PM

Thanks for clarifying CW2.
Given the theoretical 43.2 MIPS of the MCU, can you roughly say how many MIPS tthe CLR on a Netduino can manage?

According to my measurement of a trivial "i++;", it takes about 10 µs to execute 4 CIL instructions, which translates to 400000 per second. I am not sure whether or how should I convert this to MIPS, as the number of 'real' instructions vary significantly for different CIL instructions.

Test case:

public static void Main()
{
  int i = 0;
  Debug.Print("Ticks per ms = "+ TimeSpan.TicksPerMillisecond.ToString());

  // Get the method call overhead
  var ts0 = Utility.GetMachineTime();
  var ts1 = Utility.GetMachineTime();
  var ts2 = Utility.GetMachineTime();
  var ts3 = Utility.GetMachineTime();
  Debug.Print((ts1.Ticks - ts0.Ticks).ToString());
  Debug.Print((ts2.Ticks - ts1.Ticks).ToString());
  Debug.Print((ts3.Ticks - ts2.Ticks).ToString());

  // The measurement
  ts0 = Utility.GetMachineTime();
  i++;
  ts1 = Utility.GetMachineTime();
  i++;
  ts2 = Utility.GetMachineTime();
  i++;
  ts3 = Utility.GetMachineTime();

  Debug.Print((ts1.Ticks - ts0.Ticks).ToString());
  Debug.Print((ts2.Ticks - ts1.Ticks).ToString());
  Debug.Print((ts3.Ticks - ts2.Ticks).ToString());

  ts0 = Utility.GetMachineTime();
  i++;
  ts1 = Utility.GetMachineTime();
  i++;
  ts2 = Utility.GetMachineTime();
  i++;
  ts3 = Utility.GetMachineTime();

  Debug.Print((ts1.Ticks - ts0.Ticks).ToString());
  Debug.Print((ts2.Ticks - ts1.Ticks).ToString());
  Debug.Print((ts3.Ticks - ts2.Ticks).ToString());

  Thread.Sleep(Timeout.Infinite);
}
IL disassembly of "i++;" in Release:

ldloc.0 
ldc.i4.1 
add 
stloc.0 

Results: average GetMachineTime() method call overhead = 650 ticks, average "i++;" execution time without the overhead = 102 ticks. Ticks per microsecond = 10. Used firmware with increased timer resolution (2.667 µs).

#7 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 18 June 2012 - 02:53 PM

Hi hanzibal, It's a good question, but a bit of a tough one to answer with a simple number. A few thoughts: 1. There are lots of things that are going on behind the scenes, like asynchronous sending/receiving of TCP packets. These happen mostly in native code, at the full speed of the microcontroller. 2. Managed code instructions are interpreted...and many of them call into libraries which execute dozens of native code instructions (those required to accomplish the task). For example, the PWM constructor is one line of code...but there are a dozen registers and such which need to be configured. Chris

#8 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 19 June 2012 - 08:35 AM

CW2, thankyou so much for the extensive explanation and testing, I cannot ask for more.

1. There are lots of things that are going on behind the scenes, like asynchronous sending/receiving of TCP packets. These happen mostly in native code, at the full speed of the microcontroller.
2. Managed code instructions are interpreted...and many of them call into libraries which execute dozens of native code instructions (those required to accomplish the task). For example, the PWM constructor is one line of code...but there are a dozen registers and such which need to be configured.

Yes, I now understand completely why the speed ratio can't be expressed in a simple number.

Thanks guys!

#9 tree frog

tree frog

    Member

  • Members
  • PipPip
  • 19 posts

Posted 07 July 2012 - 02:32 AM

Hello, Pure C runs much faster, thats why the netduino has a much faster clock speed etc. If it did not the result with c# would be very slow compared to Pure c code...

#10 skyjumper

skyjumper

    Member

  • Members
  • PipPip
  • 15 posts

Posted 07 July 2012 - 02:38 AM

Along these lines, although maybe a bit off topic, I am wondering if the NetDuino can be programmed with AVR Studio?

#11 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 09 July 2012 - 10:49 AM

Along these lines, although maybe a bit off topic, I am wondering if the NetDuino can be programmed with AVR Studio?

Netduino does not expose the JTAG pins of the CPU but as you can upload new firmware via serial, it would surprise me if you cannot do this via AVR, especially since AVR is an Atmel development tool.

Found this howto on programming using AVR and serial programmers are mentioned but could be they are merely serial interfaces for JTAG headers:
http://www.ladyada.n...rogrammers.html

If I'm not mistaken Secret Labs used/use AVR when porting tinyCLR.

See also this thread discussing toolchains for native programming of the Netduino:
http://forums.netdui...ools-do-i-need/

#12 skyjumper

skyjumper

    Member

  • Members
  • PipPip
  • 15 posts

Posted 09 July 2012 - 05:22 PM

Netduino does not expose the JTAG pins of the CPU but as you can upload new firmware via serial, it would surprise me if you cannot do this via AVR, especially since AVR is an Atmel development tool.

Found this howto on programming using AVR and serial programmers are mentioned but could be they are merely serial interfaces for JTAG headers:
http://www.ladyada.n...rogrammers.html

If I'm not mistaken Secret Labs used/use AVR when porting tinyCLR.

See also this thread discussing toolchains for native programming of the Netduino:
http://forums.netdui...ools-do-i-need/


Thanks very much! That thread is very interesting. I'm glad I'm not the only one who is overwhelmed by all this ARM stuff.

AVR Studio 6 seems to support Atmel's Mega, XMega and Cortex M offerings. That last part I'm not so sure about, "Cortex M." I think that means the SAM7x is not supported since its not listed. Back to Wikipedia I guess... I gotta learn more about the ARM variations.

#13 BitFlipper

BitFlipper

    Advanced Member

  • Members
  • PipPipPip
  • 61 posts

Posted 17 July 2012 - 01:02 AM

In my own tests a while ago, I determined that the C# code runs roughly 100 to 1000 TIMES slower than the equivalent C++ code. The bottom line is that it is very slow but you get an excellent development environment and a managed language that makes coding up complex tasks much easier and faster, as long as you don't need super fast execution. Not sure whether that helps or not.

#14 skyjumper

skyjumper

    Member

  • Members
  • PipPip
  • 15 posts

Posted 17 July 2012 - 01:44 AM

In my own tests a while ago, I determined that the C# code runs roughly 100 to 1000 TIMES slower than the equivalent C++ code. The bottom line is that it is very slow but you get an excellent development environment and a managed language that makes coding up complex tasks much easier and faster, as long as you don't need super fast execution.

Not sure whether that helps or not.


I guess I could always throw more hardware at it! What's nice about .NET is the threading. It would be very easy to code up my custom web server. If I want to do this as native I probably need a threaded RTOS.

#15 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 18 July 2012 - 09:56 PM

In my own tests a while ago, I determined that the C# code runs roughly 100 to 1000 TIMES slower than the equivalent C++ code.

Were these tests made on equivalent hardware and regardless of that, which hardware(s)?

#16 BitFlipper

BitFlipper

    Advanced Member

  • Members
  • PipPipPip
  • 61 posts

Posted 18 July 2012 - 10:16 PM

Were these tests made on equivalent hardware and regardless of that, which hardware(s)?


I don't remember the details but yes, it was the same hardware. I think it was on a Netduino. IIRC the test was to see how fast a pin could be toggled in C vs C#.

#17 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 18 July 2012 - 10:26 PM

Your results sounds reasonable and pretty much what I expected, just wanted to make sure we were talking about the same thing here. Btw, what toolchain etc. did you use for native C programming on the Netduino? I've been meaning to do a little native stuff for some time now but never actually got around to do so.

#18 BitFlipper

BitFlipper

    Advanced Member

  • Members
  • PipPipPip
  • 61 posts

Posted 19 July 2012 - 07:00 AM

Your results sounds reasonable and pretty much what I expected, just wanted to make sure we were talking about the same thing here.

Btw, what toolchain etc. did you use for native C programming on the Netduino? I've been meaning to do a little native stuff for some time now but never actually got around to do so.


Well, that whole project ended up going into a completely different direction. Now remember that most of this was done as a pet project just to see whether it could be done or not, so it might not make sense why I did it this way...

The first idea was to have a .Net decompiler that recompiles the .Net IL into C++. First it would do a static code path analysis so it only decompiles the code that is actually used by your application, then create the equivalent C++ headers and source files based on that IL. It includes a garbage collector and everything else you need that makes it "managed". The only thing it didn't do was reflection. This actually worked quite well and I was getting very encouraging performance results. But this was always on my desktop computer, not on a microcontroller. The idea was that once reliable C++ code could be created, that a standard ARM C++ compiler could then be used to compile that code to native ARM code.

But after looking into C++ ARM compilers, I realized that was a bad idea. Most were really expensive and the cheap/free ones were not worth the setup/maintenance headaches. I then started looking into compiling the .Net IL code into native ARM code. This project also progressed well and I ended up making a .Net decompiler, ARM compiler and ARM emulator. So basically I could take any .Net assembly (it didn't even needed to be a MF assembly, it could be the a full .Net assembly, as long as it didn't use things like WCF, WPF, etc), and this would then be decompiled to IL, re-compiled into ARM assembly and then executed by the ARM emulator. This also worked quite well but I got stuck on a particularly difficult to track down bug, and like most pet projects, before I could complete it, another shiny object attracted my attention. I still plan to get back to it at some point and turn it into something really useful.

What made this complex as well is that while it could cross-compile a lot of the full .Net classes into native ARM code, as soon as any of those methods make a native call, all bets are off. For instance, combining two strings in the .Net framework is not done in C# code by the String class, but it does a PInvoke to native code. So I had to add special checks for any IL code that makes native calls, and throw an error. Since many of these classes are critical to most applications, I had a system where there would be specially created substitution classes that do implement the full functionality in either C# or in "inline" ARM code. So if the decompiler came across a String.Concat(string, string) method, it would know to substitute it for the explicitly implemented one.

That was probably more info than you wanted to know, so to answer your actual question... I never ended up using any ARM C++ compilers.

EDIT: I just realized you were talking about the native vs C# performance results. Well, I never did that testing myself, but someone else posted performance results of toggling the pin with C/C++ code and then with C# code. So I base the "100 to 1000 times" values on results I have seen from other people's tests.

#19 hanzibal

hanzibal

    Advanced Member

  • Members
  • PipPipPip
  • 1287 posts
  • LocationSweden

Posted 20 July 2012 - 09:44 PM

Wow really cool project, that IL compiler of yours. I guess the equivalent job is made by tinyCLR at runtime or maybe rather "just in time". Compilier technology is rather fun to work with, I once created a C-like language with both compiler and linker for SUN Sparc and it actually produced code that was a little bit faster than that of GCC. Since the language was so close to C syntax I could even use the GCC preprocessor for macros and includes wich added a great deal of power.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.