Many people want some support for native (ARM) programming of the Netduino in a user-friendly way, usually to support some time-critical requirement. I claim that the best way to do so is via a "fluent" library. As a proof of concept, I have written such a library, and tested it by using it to reimplement the popular "BitBanger".
Lengthy manifesto:
The topic of generating ARM code keeps coming up (e.g. here or here or here or here).
There have been a couple of approaches discussed, but so far they seem to have significant drawbacks. The approaches (and their drawbacks) are:
- Come up with a smorgasboard of standard routines, such as BitBanger, which are general-purpose enough to solve common problems (the drawback is you might be slowed down or stuck if you have a custom requirement unmet by these routines)
- Compile your custom code into the firmware (requires major knowledge of ARM programming, NETMF internals, and toolchain; plus significantly slows down the compile-test-fix cycle; easy to introduce subtle bugs; impairs your ability to share your project with others)
- SecretLabs makes a standard entry point into the firmware which allows storing ARM code there; the user's program compiles a C function externally and ships its bytes over (this is an improvement over the above, but still requires knowledge of C programming, toolchain, and perhaps also NETMF internals)
I claim that most people don't need to program the ARM in a 100% general-purpose way. What they need is the ability to create tight loops that act on the I/O pins with custom logic, and perhaps also to make responsive interrupt handlers. They also need a quick compile-test-fix cycle. Most Netduino users are learning as they go, and so they don't actually mind if their program crashes, but when it does crash they want to quickly fix it and try again. I think it's safe to say that any solution that involves reflashing the firmware is out.
Given all the above, I claim the best way to provide ARM functionality is via a Fluent Interface. This allows people to write their logic inside their familiar IDE environment. Although we cannot change the C# language, we can use various techniques to make the interface feel as much like regular programming as possible.
As a proof of concept, I've written such a fluent interface for the Netduino, and tested it by using it to implement part of BitBanger. Because it is a proof of concept, it does not actually execute code yet. However, it produces assembly output that is sufficiently legitimate to validate the approach.
The system assumes the existence of three parts:
- An API on the firmware side that can allocate RAM, copy opcodes into that RAM, and execute (move the instruction pointer to) that RAM. (This part does not exist yet)
- The ability to output real opcodes rather than assembly syntax (this also does not exist yet)
- The ability to translate fluent programming constructs into assembly (This is what the proof of concept does)
To give you a better feel for what I am going for and what this program does, I think it may help to talk about some examples. I will do several in order of increasing complexity:
- Calculating a+b
- Turning a pin on and off X times
- Finding a value in an array
- BitBanger
I won't talk about interrupt handling, or the continuation/coroutine issues that Chris Walker brought up here. The reasons are because this is just a prototype, and there is a lot more work required to turn this into a valid approach. I would first like to find out if people like this idea. Maybe there are opportunities to improve it or collaborate.
Example 1: Calculating a+b:
Here is the code on the C# side:
using FluentInterop.CodeGeneration; using FluentInterop.Deployment; using Microsoft.SPOT; namespace Driver { public static class Example1 { public static void Test() { //This is the function we want to build: // //int add(int x, int y) { // return x+y; //} // //In order to simplify the system, the function we build //always fits into a standard signature with 8 arguments. //So really, the function we are bulding is // //int add(int x, int y, ignore, ignore, ignore, ignore, ignore, ignore) { // return x+y; //} // //Hopefully that will make sense in the below: //(1) we provide 8 argument names (this is used for human readability) //(2) we get 8 arguments (plus the code generator) in our build callback //(3) we call the resulting delegate with 8 arguments //Now, the magic happens: var code=InteropBuilder.Create( "x", "y", null, null, null, null, null, null, //your argument names (g, x, y, _, __, ___, ____, _____, ______) => { //the code generator, plus your arguments g.Return(x+y); //similar to saying "return x+y;" }); //we also have to call our function with 8 arguments var result=code.Invoke(3, 4, 0, 0, null, null, null, null); //calculates 7 Debug.Print("result: "+result); code.Dispose(); } } }
There is clearly some boilerplate here that may be a little distracting, but the heart of the program is here:
g.Return(x+y); //similar to saying "return x+y;"
By the way, this is what is both beautiful and eerie about the fluent interface. This code does not add x and y, nor does it return. This code "captures" the concept of return x+y so that a machine language program can be generated from it and executed later. We are writing a program that builds other programs. This is the first step towards the robot apocalypse.
By the way, the reason there are so many arguments is because I wanted to simplify the interface to the native side. From the native side point of view, every method you are generating has exactly the same signature. That signature is:
int YourMethod(int i0, int i1, int i2, int i3, byte[] ba0, byte[] ba1, int[] wa0, int[] wa1);
The purpose of this is to keep things simple for both sides. Hopefully there are enough arguments in there for whatever people want to do. If you don't need an argument, just ignore it!
When you run this program, it produces the following output:
MOV R0,#FUNCTION_ARGUMENT_BASE_ADDRESS //something like this MOV R1,#SCRATCH_MEMORY_BASE_ADDRESS //something like this LDR R2,[R0,0] //arg0 (x) LDR R3,[R0,4] //arg1 (y) ADD R0,R2,R3 RETURN //I assume the ARM wants the return value in R0 MOV R0,#0 RETURN //I assume the ARM wants the return value in R0
Some points to observe:
- As I said above, I am producing textual assembly output only. The work of producing numerical opcodes (as well as actually copying them over to the ARM side) still remains to be done.
- Because I am new to ARM assembly, I've figured out what I could from reading ARM manuals, then I've made some assumptions. Some of what is produced is probably wrong but hopefully can be fixed. This is why the above assembly output says "something like this". I'm not sure what the convention is for passing arguments, but I'm going to pretend for now that they're all lined up starting at some base address.
- I've tried to make certain simple optimizations, but inefficient stuff does creep in (like the double-return)
Example 2: Turning a pin on and off X times
Here is the C# code:
public class Example2 { public static void Test() { //This is the function we want to build: //int OnOffALot(int count) { // while(count>0) { // pin.Write(true); // pin.False(false); // count--; // } //} const Cpu.Pin pin=Pins.GPIO_PIN_D0; var code=InteropBuilder.Create( "count", null, null, null, null, null, null, null, //your argument names (g, count, _, __, ___, ____, _____, ______, _______) => { //the code generator, plus your arguments g.While(count>0, () => { g.SetPinState(pin, true); g.SetPinState(pin, false); g.Assign(count, count-1); }); }); code.Invoke(5000, 0, 0, 0, null, null, null, null); //turns the pin on and off 5000 times code.Dispose(); } }
Here we see our first use of a control flow statement. The format is
g.While(condition, whileBodyLambda);
The "condition" looks sort of familiar, but the "whileBodyLambda" may seem sort of strange. Early on, I made the decision to do all my control flow by passing lambdas around. This is a very cool way to do things actually, and it makes the implementation extremely pleasant. I realize it might seem strange to people unfamiliar with functional programming. I hope people don't dislike this too much. As a preview, you can probably guess what "If" looks like. Yes, that's right:
g.If(condition, truePartLambda, falsePartLambda);
Anyway, here is the assembly output for the above program.
MOV R0,#FUNCTION_ARGUMENT_BASE_ADDRESS //something like this MOV R1,#SCRATCH_MEMORY_BASE_ADDRESS //something like this B while0_condition while0_body: CALL_SOMETHING (to set cpu pin 27 to True) CALL_SOMETHING (to set cpu pin 27 to False) LDR R3,[R0,0] //arg0 (count) SUB R2,R3,#1 STR R2,[R0,0] //arg0 (count) while0_condition: LDR R2,[R0,0] //arg0 (count) CMP R2,#0 BGT while0_body MOV R0,#0 RETURN //I assume the ARM wants the return value in R0
Here again we seem to be producing semi-reasonable code. Two things to notice:
- I don't actually know what code I need to emit to turn on a pin, so I have stubbed this out as "CALL_SOMETHING"
- There are a lot of optimization opportunities in this code, but my feeling is that we don't want to reimplement an optimizing compiler. It's possible that this code is "good enough" for many people's purposes
Example 3: Finding a value in an array
This is not likely to be useful in practice (because the overhead of copying data to the ARM side makes it pointless). However, it still hopefully has educational value.
using FluentInterop.CodeGeneration; using FluentInterop.Deployment; namespace Driver { public static class Example3 { private delegate int FriendlySignature(int offset, int length, int target, int[] wordData); public static void Test() { //This is what we are building // //int find(int offset, int length, int target, int[] wordData) { // for(var index=offset; index<length; ++index) { // if(wordData[index]==target) { // return index; // } // } // return -1; var handle=InteropBuilder.Create( "offset", "length", "target", null, null, null, "wordData", null, (g, offset, length, target, _, __, ___, wordData, _____) => { g.For(offset, length, 1, index => g.If(wordData[index]==target, () => g.Return(index))); g.Return(-1); }); //make a little "adapter" to make calling this thing more friendly FriendlySignature friendly=(offset, length, target, wordData) => handle.Invoke(offset, length, target, 0, null, null, wordData, null); var data=new int[] { 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 }; var index0=friendly(0, data.Length, 3, data); //find 3 (should return 7) var index1=friendly(0, data.Length, 77, data); //find 77 (should return -1) handle.Dispose(); } } }
If you look closely at the above, you can see some crazy stuff going on. I've got a "for", an "if", some array indexing(!!!), all in a relatively convenient package. By the way, the version of "for" I implemented looks like this:
For(inclusiveStart, exclusiveEnd, increment, loopIndexer => {body});
The cool (or strange, depending on your point of view) thing is that the "For" library allocates your loop indexer variable for you, and passes it to your lambda. This is a departure from C# style but it should feel sort of "functional". By the way, there are other variants of "For" that could be implemented easily enough. You could implement your own! If you look at the implementation of "For", you'll see it's not using any black magic; rather, it is implemented via the same API you've already seen. Here is a peek at the implementation of For:
public static void For(this CodeGenerator g, Expression inclusiveStart, Expression exclusiveEnd, Expression increment, ActionWithExpression action) { g.AllocateTemporary("loopIndex", loopIndex => { g.Assign(loopIndex, inclusiveStart); g.While(loopIndex<exclusiveEnd, () => { action(loopIndex); g.Assign(loopIndex, loopIndex+increment); }); }); }
Anyway, before things get too confusing, here is the output of the original program:
MOV R0,#FUNCTION_ARGUMENT_BASE_ADDRESS //something like this MOV R1,#SCRATCH_MEMORY_BASE_ADDRESS //something like this LDR R2,[R0,0] //arg0 (offset) STR R2,[R1,0] //scratch0 (loopIndex) B while0_condition while0_body: LDR R3,[R0,24] //arg6 (wordData) LDR R4,[R1,0] //scratch0 (loopIndex) LDR R2,[R3,+R4,LSL 2] LDR R3,[R0,8] //arg2 (target) CMP R2,R3 BNE conditional1_endif LDR R0,[R1,0] //scratch0 (loopIndex) RETURN //I assume the ARM wants the return value in R0 conditional1_endif: LDR R3,[R1,0] //scratch0 (loopIndex) ADD R2,R3,#1 STR R2,[R1,0] //scratch0 (loopIndex) while0_condition: LDR R2,[R1,0] //scratch0 (loopIndex) LDR R3,[R0,4] //arg1 (length) CMP R2,R3 BLT while0_body MOV R0,#-1 RETURN //I assume the ARM wants the return value in R0 MOV R0,#0 RETURN //I assume the ARM wants the return value in R0
If you've gotten good at reading assembly code which is perhaps-or-perhaps-not actually conformant to the ARM specification, you will agree that this code seems to be doing the right thing. Not too shabby. As my final example I want to show my implementation of BitBanger in this fluent style. This is my most complicated example. To best understand it, we should think of there being three different players involved:
- The Fluent library
- The Fluent BitBanger author
- The customer
First, let us see the code as the customer would use it:
public static void FluentBangerTest() { const Cpu.Pin clkPin=Pins.GPIO_PIN_D0; const Cpu.Pin dataPin=Pins.GPIO_PIN_D1; var data=new byte[]{1,2,3,4,5}; using(var bb=new FluentBanger(clkPin, dataPin, true, false)) { bb.Write(data, 0, data.Length); } }
This is similar to the examples provided for the original BitBanger, by sweetlimre.
Now we can look at how the FluentBanger class is implemented:
using System; using FluentInterop.CodeGeneration; using FluentInterop.Deployment; using Microsoft.SPOT.Hardware; namespace FluentBitBanger { public sealed class FluentBanger : IDisposable { private readonly InteropHandle handle; public FluentBanger(Cpu.Pin clockPin, Cpu.Pin dataPin, bool risingClock, bool bigEndian) { //use dynamic code generation techniques with a "fluent" syntax, //in order build a routine with the following pseudocode: // //int doit(byte[] data, int offset, int length) // while(length>0) { // var nextByte=data[offset] // foreach(nextBit in nextByte, going in the direction of "bigEndian") { // SetPinState(clockPin, !risingClock); // if(nextBit) { // SetPinState(data, true); // } else { // SetPinState(data, false); // } // SetPinState(clockPin, risingClock); // } // ++offset; // --length; // } // return 0; (the default return value is zero unless you do something) this.handle=InteropBuilder.Create( "offset", "length", null, null, "data", null, null, null, (g, offset, length, _, __, data, ___, ____, _____) => { g.While(length>0, () => { g.AllocateTemporary("nextByte", nextByte => { g.Assign(nextByte, data[offset]); g.ForEachBit(nextByte, 0, 8, bigEndian, nextBit => { g.SetPinState(clockPin, !risingClock); g.If(nextBit!=0, () => g.SetPinState(dataPin, true), () => g.SetPinState(dataPin, false)); g.SetPinState(clockPin, risingClock); }); }); g.Assign(offset, offset+1); g.Assign(length, length-1); }); }); } public void Dispose() { handle.Dispose(); } /// <summary> /// A user-friendly interface to our compiled code, which /// adapts the signature we want to the standard signature /// </summary> public void Write(byte[] data, int offset, int length) { handle.Invoke(offset, length, 0, 0, data, null, null, null); } } }
This looks sort of like stuff we've seen so far. There is one juicy extension method "ForEachBit", which was provided to make the code easier to write. As with everything else, it is built using the library API:
public static void ForEachBit(this CodeGenerator g, Expression expr, int offset, int count, bool bigEndian, ActionWithExpression action) { int inclusiveStart; int exclusiveEnd; int increment; if(!bigEndian) { inclusiveStart=offset; exclusiveEnd=offset+count; increment=1; } else { inclusiveStart=offset+count-1; exclusiveEnd=offset-1; increment=-1; } g.For(inclusiveStart, exclusiveEnd, increment, loopCounter => g.AllocateTemporary("mask", mask => { g.Assign(mask, Expression.ShiftLeft(1, loopCounter)); action(mask); })); }
The ability that anyone can provide composable building blocks such as ForEachBit is perhaps the biggest win of this library.
And now, the assembly output of BitBanger:
MOV R0,#FUNCTION_ARGUMENT_BASE_ADDRESS //something like this MOV R1,#SCRATCH_MEMORY_BASE_ADDRESS //something like this B while0_condition while0_body: LDR R3,[R0,16] //arg4 (data) LDR R4,[R0,0] //arg0 (offset) LDRB R2,[R3,+R4] STR R2,[R1,0] //scratch0 (nextByte) MOV R2,#0 STR R2,[R1,4] //scratch1 (loopIndex) B while1_condition while1_body: MOV R3,#1 LDR R4,[R1,4] //scratch1 (loopIndex) MOV R2,R3 LSL R4 STR R2,[R1,8] //scratch2 (mask) CALL_SOMETHING (to set cpu pin 27 to False) LDR R2,[R1,8] //scratch2 (mask) CMP R2,#0 BNE conditional2_then CALL_SOMETHING (to set cpu pin 28 to False) B conditional2_endif conditional2_then: CALL_SOMETHING (to set cpu pin 28 to True) conditional2_endif: CALL_SOMETHING (to set cpu pin 27 to True) LDR R3,[R1,4] //scratch1 (loopIndex) ADD R2,R3,#1 STR R2,[R1,4] //scratch1 (loopIndex) while1_condition: LDR R2,[R1,4] //scratch1 (loopIndex) CMP R2,#8 BLT while1_body LDR R3,[R0,0] //arg0 (offset) ADD R2,R3,#1 STR R2,[R0,0] //arg0 (offset) LDR R3,[R0,4] //arg1 (length) SUB R2,R3,#1 STR R2,[R0,4] //arg1 (length) while0_condition: LDR R2,[R0,4] //arg1 (length) CMP R2,#0 BGT while0_body MOV R0,#0 RETURN //I assume the ARM wants the return value in R0
Whew! That's the end of my manifesto. The solution file that created all of these examples is attached. (You'll want to open Driver\Driver.sln)
Please remember that there are a lot of things missing. I just built enough structure to make the above examples work. I would love to hear from the community regarding their reactions, whether people think this is a worthwhile approach, whether people want to work on making it real, etc. etc.