Quick and dirty GPIO speed test - findings
#1
Posted 22 January 2011 - 11:25 PM
Now, how about baking the BitBanger driver (or a similar solution) into the official firmware? ;-)
Cheers,
-Fabien.
#2
Posted 23 January 2011 - 12:23 AM
There was a bit of discussion on this in an earlier thread:
http://forums.netdui...ch__1#entry6380
... also interesting, in the quadrocopter discussion, I had checked out the timings using my saleae logic and you can see what is apparently the scheduler getting involved:
http://forums.netdui...ndpost__p__7645
I did some new measurements with my saleae so i could zoom in and provide a different perspective of what you're talking about:
using this code: (this was so I could get the timings of the on/offs as equal as possible)
while(true) { b = !b; d0.write(B); }I got this:
and using this code: (which is faster)
while(true) { d0.write(true); d0.write(false); }
I got this:
There is lots of interesting things to think about, I think.
1. It's interesting to me how much slower the first example is
2. We might want to keep track of these types of timings to have a handle on if firmware changes impact the timings at all
3. I wish I could mail Corey my saleae to get the timings under his fluet interopt... which is awesome. (Or, alternatively, I with I was brave enough to try it myself!)
http://www.youtube.com/watch?v=9EcEUbtgO2I
4. I wonder what the impact of things like GC and the scheduler are
#2 sounds like the start of development of a test suite for the netduino, where we can independently test the netduino during firmware upgrades. I might consider taking on this project myself. (Once I learn what a test suite is)
#3
Posted 23 January 2011 - 12:34 AM
#4
Posted 23 January 2011 - 02:52 AM
Bill / Chris,
Don't get me wrong, I love the .Net Micro Framework It's really nice to be able to use a good environment to write and debug code compared to tool chain such as AVR Studio 4, AVR Gcc or even the Arduino environment. I have done my fair share of embedded development in C/C++ and assembler to appreciate the value proposition here
I just had high expectations of a system running on an ARM 7 @ 48 MHz.
The Fluent Interop project is an awesome achievement in itself and Corey can be proud of it! I think that it may be the right direction for people like us in need of bare-metal access to the platform for time-critical routines. My wish here would be to be able to program such routines with inline Thumb assembler instructions, using CPU registers (and a stack) as the gateway between C# variables and the inline native Thumb code (Corey's CodeGenerator class is a bit too arcane for my taste). I realize that there's a challenge here, but Visual Studio plug-ins can do amazing things.
The real-time issue in itself is an entire different ball of wax though.
Cheers,
-Fabien.
PS: Bill, how do you like your Saleae analyzer so far?
#5
Posted 23 January 2011 - 04:29 AM
OK ok ok ok fine. Tell you what. I'll order one of these Saleae thingies if you'll promise to explain to me how to use it. Also, I'm pretty sure you still owe me a mutz!I wish I could mail Corey my saleae to get the timings under his fluet interopt
#6
Posted 23 January 2011 - 05:52 AM
Using the test program below I have not been able to replicate your guys' results. I'm getting about 1/2 the speed you report (this is a Netduino, not a Plus, though I don't think that matters). Maybe you can try this program on your systems and tell me what I'm doing wrong.
Test 0 is a plain-old C# routine. Test 1 aims to reduce the overhead of the for loop by doing 10 times as much work per iteration (and therefore for proper comparison, the stats for that test should be adjusted by a factor of 10). Test 2 is like Test 0 but implemented in the Fluent framework.
My results are:
*** RUNNING TEST 0 *** Total time for 1000= 00:00:00.2136960. us per iteration=213.69599999999999792. hertz=4679.54477388439590872 Total time for 10000= 00:00:02.1316906. us per iteration=213.16906000000000176. hertz=4691.11230307062396606 Total time for 100000= 00:00:21.3122346. us per iteration=213.12234599999999318. hertz=4692.14054165863944946 *** RUNNING TEST 1 *** Total time for 1000= 00:00:01.7383467. us per iteration=1738.34670000000005528. hertz=575.25923913796941632 Total time for 10000= 00:00:17.3784960. us per iteration=1737.84959999999978208. hertz=575.42378811146841144 Total time for 100000= 00:02:53.7789654. us per iteration=1737.78965399999992768. hertz=575.44363766824449160 *** RUNNING TEST 2 *** Total time for 1000= 00:00:00.0167466. us per iteration=16.74660000000000080. hertz=59713.61350960792333352 Total time for 10000= 00:00:00.1566507. us per iteration=15.66507000000000184. hertz=63836.29310306304978440 Total time for 100000= 00:00:01.5498667. us per iteration=15.49866700000000112. hertz=64521.67789655716478592
The source code is here:
#define HAVE_FLUENT using System; #if HAVE_FLUENT using FluentInterop.CodeGeneration; #endif using Microsoft.SPOT; using Microsoft.SPOT.Hardware; using SecretLabs.NETMF.Hardware.Netduino; namespace Tests { public class Program { private delegate void Test(OutputPort port, int numIterations); public static void Main() { #if HAVE_FLUENT var code=CodeGenerator.Compile((g, pin, count, ង, ច, ឆ, ជ, ឋ, ឌ, ព, ផ, ត, ថ, ទ, ធ, ម, វ, firmware) => { g.While(count>0) .Do(() => { firmware.SetPinState(pin, 1); firmware.SetPinState(pin, 0); count.Value=count-1; }); }); #endif var tests=new Test[] {Test1, Test2, #if HAVE_FLUENT (p, n) => code.Invoke((int)p.Id, n) #endif }; for(var i=0; i<tests.Length; ++i) { Debug.Print("*** RUNNING TEST "+i+" ***"); for(var numIterations=1000; numIterations<1000000; numIterations*=10) { using(var port=new OutputPort(Pins.ONBOARD_LED, false)) { var startTime=Utility.GetMachineTime(); tests[i](port, numIterations); var endTime=Utility.GetMachineTime(); var elapsed=endTime-startTime; var secondsPerIteration=(double)elapsed.Ticks/TimeSpan.TicksPerSecond/numIterations; var microsecondsPerIteration=secondsPerIteration*1000000; var hertz=1/secondsPerIteration; Debug.Print("Total time for "+numIterations+"= "+elapsed+". us per iteration="+microsecondsPerIteration+ ". hertz="+hertz); } } } } private static void Test1(OutputPort port, int numIterations) { for(var i=0; i<numIterations; ++i) { port.Write(true); port.Write(false); } } private static void Test2(OutputPort port, int numIterations) { for(var i=0; i<numIterations; ++i) { port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); } } } }
#7
Posted 23 January 2011 - 06:00 AM
#8
Posted 23 January 2011 - 06:28 AM
Deep in the heart of my system is a class called Emitter. It supports almost all the Thumb opcodes (except the ones I didn't care about ) If you want a quick-and-dirty start to emitting opcodes, you could rip it out and use just that part of it.My wish here would be to be able to program such routines with inline Thumb assembler instructions, using CPU registers (and a stack) as the gateway between C# variables and the inline native Thumb code
I found this chapter of the ARM documentation invaluable.
I'm about to post version 1.4 of my framework, which has a syntax that's slightly (slightly!) friendlier.
By the way, I completely realize that the fluent stuff is pretty foreign-looking, and the utter lack of documentation makes it even more alienating. But if anyone out there is interested in diving into it, I would certainly be happy to guide you.
#9
Posted 23 January 2011 - 06:58 AM
...and certainly nowhere near 10s of MHz, but then again, I'm doing a function call to a firmware routine; no doubt I could go a lot faster if I directly read and wrote whatever chip register I needed to.
I just ran some native code as a test, called from a .NET MF function. Here's the speed as shown on the Saleae Logic analyzer.
// native code follows (C++, not C#) // code is a derivative of .NET MF firmware, licensed under the Apache 2.0 license UINT32 port = 61 / 32; // pin 61 = Analog 2 UINT32 bitmask = 1 << (61 % 32); // pin 61 = Analog 2 AT91_PIO &pioX = AT91::PIO (port); pioX.PIO_PER = bitmask; // Enable PIO function pioX.PIO_SODR = bitmask; // Output should start HIGH pioX.PIO_OER = bitmask; // Enable Output pioX.PIO_CODR = bitmask; // turn off GPIO pioX.PIO_SODR = bitmask; // turn on GPIO pioX.PIO_CODR = bitmask; // turn off GPIO pioX.PIO_SODR = bitmask; // turn on GPIO pioX.PIO_CODR = bitmask; // turn off GPIO pioX.PIO_SODR = bitmask; // turn on GPIO
Speed: >6MHz (cycle @ <= 125ns)
A bitbang method access an in-memory array is going to be slower due to the pin value lookups and loops etc., but we should be able to get into the MHz range...
Chris
Attached Files
#10
Posted 23 January 2011 - 07:09 AM
Oh, that is so awesome. (Hmmm, I now have a hankering to jam support for that into my framework)I just ran some native code as a test,
#11
Posted 23 January 2011 - 03:37 PM
...but not the Netduino, Secret Labs, or Chris Walker???Don't get me wrong, I love the .Net Micro Framework
Your posts ruffle my feathers! I'm still reeling from the basic comment on loading assemblies.
But, you're obviously brilliant and successful, so, I am still a Fabien Fan. I wish I was at least better looking, but we kind of look like we could be brothers.
I like it, it's very "sexy" in a mac sort of way, and works in 7 and OSX.PS: Bill, how do you like your Saleae analyzer so far?
..but, if feels very basic, feature wise. I've never used a logic analyzer before, so I'm not sure what I'm expecting. I was trying to "reverse engineer" an IR signal with it, and while it was useful, there was no way to compare two "bit trains" side-by-side from different samples, so I was cutting and pasting screen shots into paint.net. I have no idea if any logic analyzer has that sort of functionality. The software essentially has no menus, so I found myself hungry to do simple things like split the screen in two vertically.
#12
Posted 23 January 2011 - 08:00 PM
Deep in the heart of my system is a class called Emitter. It supports almost all the Thumb opcodes (except the ones I didn't care about ) If you want a quick-and-dirty start to emitting opcodes, you could rip it out and use just that part of it.
I found this chapter of the ARM documentation invaluable.
I'm about to post version 1.4 of my framework, which has a syntax that's slightly (slightly!) friendlier.
By the way, I completely realize that the fluent stuff is pretty foreign-looking, and the utter lack of documentation makes it even more alienating. But if anyone out there is interested in diving into it, I would certainly be happy to guide you.
Thanks Corey! I'm definitely going to investigate this class more closely
Have you considered writing an in-depth blog post on your Fluent Interop implementation?
Cheers,
-Fabien.
#13
Posted 23 January 2011 - 08:04 PM
I just ran some native code as a test, called from a .NET MF function. Here's the speed as shown on the Saleae Logic analyzer.
// native code follows (C++, not C#) // code is a derivative of .NET MF firmware, licensed under the Apache 2.0 license UINT32 port = 61 / 32; // pin 61 = Analog 2 UINT32 bitmask = 1 << (61 % 32); // pin 61 = Analog 2 AT91_PIO &pioX = AT91::PIO (port); pioX.PIO_PER = bitmask; // Enable PIO function pioX.PIO_SODR = bitmask; // Output should start HIGH pioX.PIO_OER = bitmask; // Enable Output pioX.PIO_CODR = bitmask; // turn off GPIO pioX.PIO_SODR = bitmask; // turn on GPIO pioX.PIO_CODR = bitmask; // turn off GPIO pioX.PIO_SODR = bitmask; // turn on GPIO pioX.PIO_CODR = bitmask; // turn off GPIO pioX.PIO_SODR = bitmask; // turn on GPIO
Speed: >6MHz (cycle @ <= 125ns)
A bitbang method access an in-memory array is going to be slower due to the pin value lookups and loops etc., but we should be able to get into the MHz range...
Chris
Thanks for running this test Chris!
It's good to see that there's a lot of room for performance improvements here
Cheers,
-Fabien.
#14
Posted 23 January 2011 - 08:20 PM
...but not the Netduino, Secret Labs, or Chris Walker???
Your posts ruffle my feathers! I'm still reeling from the basic comment on loading assemblies.
But, you're obviously brilliant and successful, so, I am still a Fabien Fan. I wish I was at least better looking, but we kind of look like we could be brothers.
I like it, it's very "sexy" in a mac sort of way, and works in 7 and OSX.
..but, if feels very basic, feature wise. I've never used a logic analyzer before, so I'm not sure what I'm expecting. I was trying to "reverse engineer" an IR signal with it, and while it was useful, there was no way to compare two "bit trains" side-by-side from different samples, so I was cutting and pasting screen shots into paint.net. I have no idea if any logic analyzer has that sort of functionality. The software essentially has no menus, so I found myself hungry to do simple things like split the screen in two vertically.
...but not the Netduino, Secret Labs, or Chris Walker???
I certainly do! I would not be hanging around here if I didn't ;-)
Your posts ruffle my feathers!
Really? Why?
I'm still reeling from the basic comment on loading assemblies.
That comment was not meant to offend anyone.
we kind of look like we could be brothers
We kind of do
I like it, it's very "sexy" in a mac sort of way, and works in 7 and OSX...but, if feels very basic, feature wise
Thanks for the review. I have been considering getting one lately as my single channel o-scope feels very limited when it comes to diagnosing anything semi-complex. The cool thing about the Saleae is that it has an SDK which lets you access the raw data. It sounds like some more advanced data export/analysis features could be implemented that way, even outside of the Saleae GUI.
Cheers,
-Fabien.
#15
Posted 23 January 2011 - 09:07 PM
I imagine in real life you have a very strong personality. It comes across in your posts. Instead of hopes and wishes, you have expectations and demands. This has served you very well in life! I'd like to know and understand where in your life this comes from, as I try and find my own path in the world, as I'm sure besides helping you advance through life, it's even helped others around you achieve, if not in admiration of you, than out of fear of disappointing you. I bet you are a good leader. Some people might "get" you, and others might think you come across as a bit of a jerk sometimes, but both will work hard for you.Really? Why?
So, I get ruffled when I think you might be acting like a jerk to Chris and the state of the union here. But this is just me being oversensitive and trying to learn about life and technology. And Chris is his own successful person, with an awesome product, quoted in magazines, etc. and can fend for himself!
I'd like to check out this: http://www.sump.org/...nalyzer/client/
or this: http://www.lxtreme.nl/ols/
using this: http://dangerousprot...-logic-sniffer/ -- which i just noticed is doing exactly what I was doing with decoding IR in this pic:
#16
Posted 23 January 2011 - 11:34 PM
This is the Best. Thing. Ever....but we should be able to get into the MHz range...
I've taken Chris' approach directly and put it into "version 1.5" of my framework. Here is the output of my revised timing program:
*** RUNNING TEST C# v1 (1 on/offs per loop iteration) Total time for 65536 on/offs is 00:00:13.9269120. usec per on/off=212.50781250000000000, which is 4705.70934892099558056 Hertz *** RUNNING TEST C# v2 (10 on/offs per loop iteration) Total time for 81920 on/offs is 00:00:14.1916373. usec per on/off=173.23776000976562272, which is 5772.41358895213579672 Hertz *** RUNNING TEST Fluent with firmware call (1 on/offs per loop iteration) Total time for 1048576 on/offs is 00:00:16.2333440. usec per on/off=15.48132324218749824, which is 64593.96166310527769376 Hertz *** RUNNING TEST Fluent with Chris Walker-style direct access (1 on/offs per loop iteration) Total time for 67108864 on/offs is 00:00:15.3813546. usec per on/off=2.29200044274330140e-1, which is 4363000.90240426547825344 Hertz *** RUNNING TEST Chris Walker v2 (10 on/offs per loop iteration) Total time for 83886080 on/offs is 00:00:11.3617280. usec per on/off=1.35442352294921876e-1, which is 7383214.94758543744683264 Hertz
And here is the code
#define HAVE_FLUENT using System; #if HAVE_FLUENT using FluentInterop.CodeGeneration; #endif using FluentInterop.Fluent; using Kosak.SimpleInterop; using Microsoft.SPOT; using Microsoft.SPOT.Hardware; using SecretLabs.NETMF.Hardware.Netduino; namespace BitBangingTimings { public class Program { public static void Main() { var tests=new Test[] { new Test1(), new Test2(), #if HAVE_FLUENT new Test3(), new Test4(), new Test5(), #endif }; foreach(var test in tests) { test.Init(); } foreach(var test in tests) { test.Run(); } } private abstract class Test { private const int maxSeconds=10; private readonly string name; private readonly int onOffsPerLoopIteration; protected Test(string name, int onOffsPerLoopIteration) { this.name=name; this.onOffsPerLoopIteration=onOffsPerLoopIteration; Debug.Print("*** I will do repeatedly larger runs until a given test takes "+maxSeconds+" seconds or more"); } public virtual void Init() {} public void Run() { Debug.Print("*** RUNNING TEST "+name+" ("+onOffsPerLoopIteration+" on/offs per loop iteration)"); var numIterations=1024; while(true) { using(var port=new OutputPort(Pins.ONBOARD_LED, false)) { var startTime=Utility.GetMachineTime(); DoTheTest(port, numIterations); var endTime=Utility.GetMachineTime(); var elapsed=endTime-startTime; var totalSeconds=(double)elapsed.Ticks/TimeSpan.TicksPerSecond; var totalOnOffs=numIterations*onOffsPerLoopIteration; var secondsPerOnOff=totalSeconds/totalOnOffs; var microsecondsPerOnOff=secondsPerOnOff*1000000; var hertz=1/secondsPerOnOff; if(totalSeconds>=maxSeconds) { Debug.Print("Total time for "+totalOnOffs+" on/offs is "+elapsed+". usec per on/off="+microsecondsPerOnOff+ ", which is "+hertz+" Hertz"); break; } numIterations*=2; } } } protected abstract void DoTheTest(OutputPort port, int numIterations); } private class Test1 : Test { public Test1() : base("C# v1", 1) { } protected override void DoTheTest(OutputPort port, int numIterations) { for(var i=0; i<numIterations; ++i) { port.Write(true); port.Write(false); } } } private class Test2 : Test { public Test2() : base("C# v2", 10) {} protected override void DoTheTest(OutputPort port, int numIterations) { for(var i=0; i<numIterations; ++i) { port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); port.Write(true); port.Write(false); } } } #if HAVE_FLUENT private class Test3 : Test { private CompiledCode code; public Test3() : base("Fluent with firmware call", 1) {} public override void Init() { code=CodeGenerator.Compile((g, pin, count, ង, ច, ឆ, ជ, ឋ, ឌ, ព, ផ, ត, ថ, ទ, ធ, ម, វ, firmware) => { g.While(count>0) .Do(() => { firmware.SetPinState(pin, 1); firmware.SetPinState(pin, 0); count.Value=count-1; }); }); } protected override void DoTheTest(OutputPort port, int numIterations) { code.Invoke((int)port.Id, numIterations); } } private class Test4 : Test { private CompiledCode code; public Test4() : base("Fluent with Chris Walker-style direct access", 1) {} public override void Init() { code=CodeGenerator.Compile((g, pin, count, ង, ច, ឆ, ជ, ឋ, ឌ, ព, ផ, ត, ថ, ទ, ធ, ម, វ, firmware) => { var pio=g.Declare.PIOReference("pio"); var bitmask=g.Declare.Int("bitmask"); firmware.GetPIOAndBitmask(pin, ref pio, ref bitmask); pio.PER=bitmask; //enable PIO function pio.SODR=bitmask; //output should start high pio.OER=bitmask; //enable output g.While(count>0) .Do(() => { pio.SODR=bitmask; pio.CODR=bitmask; count.Value=count-1; }); }); } protected override void DoTheTest(OutputPort port, int numIterations) { code.Invoke((int)port.Id, numIterations); } } private class Test5 : Test { private CompiledCode code; public Test5() : base("Chris Walker v2", 10) { } public override void Init() { code=CodeGenerator.Compile((g, pin, count, ង, ច, ឆ, ជ, ឋ, ឌ, ព, ផ, ត, ថ, ទ, ធ, ម, វ, firmware) => { var pio=g.Declare.PIOReference("pio"); var bitmask=g.Declare.Int("bitmask"); firmware.GetPIOAndBitmask(pin, ref pio, ref bitmask); pio.PER=bitmask; //enable PIO function pio.SODR=bitmask; //output should start high pio.OER=bitmask; //enable output g.While(count>0) .Do(() => { pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; pio.SODR=bitmask; pio.CODR=bitmask; count.Value=count-1; }); }); } protected override void DoTheTest(OutputPort port, int numIterations) { code.Invoke((int)port.Id, numIterations); } } #endif } }
- Jarrod Sinclair likes this
#17
Posted 23 January 2011 - 11:49 PM
Total time for 67108864 on/offs is 00:00:15.3813546. usec per on/off=2.29200044274330140e-1, which is 4363000.90240426547825344 Hertz *** RUNNING TEST Chris Walker v2 (10 on/offs per loop iteration) Total time for 83886080 on/offs is 00:00:11.3617280. usec per on/off=1.35442352294921876e-1, which is 7383214.94758543744683264 Hertz
Now that's what I call speed. Great job Corey!
I bet you could simplify the language more too. Nice.
Chris
#18
Posted 24 January 2011 - 12:58 AM
I imagine in real life you have a very strong personality. It comes across in your posts. Instead of hopes and wishes, you have expectations and demands. This has served you very well in life! I'd like to know and understand where in your life this comes from, as I try and find my own path in the world, as I'm sure besides helping you advance through life, it's even helped others around you achieve, if not in admiration of you, than out of fear of disappointing you. I bet you are a good leader. Some people might "get" you, and others might think you come across as a bit of a jerk sometimes, but both will work hard for you.
So, I get ruffled when I think you might be acting like a jerk to Chris and the state of the union here. But this is just me being oversensitive and trying to learn about life and technology. And Chris is his own successful person, with an awesome product, quoted in magazines, etc. and can fend for himself!
I'd like to check out this: http://www.sump.org/...nalyzer/client/
or this: http://www.lxtreme.nl/ols/
using this: http://dangerousprot...-logic-sniffer/ -- which i just noticed is doing exactly what I was doing with decoding IR in this pic:
Bill,
I speak my mind with the intent of providing honest and constructive feedback about the platform. I'm passionate about it and I want it to succeed.
I may not always express myself in a politically correct style, but you'll always know what I really think. Believe me when I tell you that this has not always served me well
As far as having expectations: I know what the hardware powering the netduino is capable of doing and I want to be able to tap into that power: I believe that there's an opportunity here, both for the community and Secret Labs, to turn something great into something fantastic and highly competitive with other platforms.
Cheers,
-Fabien.
#19
Posted 24 January 2011 - 01:07 AM
#20
Posted 24 January 2011 - 02:52 AM
Have you considered writing an in-depth blog post on your Fluent Interop implementation?
I've thought about it but can't seem to find the 'blog post' button on this forum software. But seriously, I think I will do that eventually. One thing I wanted to do first is get some experiences with a real person who tried to use it. I've been courting bill.french for this very thing
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users