Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.

Corey Kosak

Member Since 03 Oct 2010
Offline Last Active May 15 2015 02:30 AM
-----

#7316 Fluent Interop 1.0

Posted by Corey Kosak on 09 January 2011 - 06:01 AM

I am quite pleased to announce that I have finally built a working version of the Fluent Interop library proposed here. The goal of this library is to provide a composable and friendly way to write speed-critical routines such as twiddling output bits in cases where the .NET side cannot be made to run fast enough.

While it is certainly possible to compile any custom code that one might want into the firmware, there are many disadvantages to this:
  • lengthy compile-run-test cycle
  • requires knowledge of C++ and toolchain
  • not shareable with others unless they too are willing to reflash their firmware
The fluent interop system requires a firmware change as well, but the difference is that the change need only be deployed once. What is required is an entry point in the firmware that makes the processor jump to the base address of an array of code in RAM. Once that change is in place (one can even imagine it becoming a standard part of the firmware in some future release), then any custom code can be run, with no reflashing, simply by cooking up the right opcodes in an array on the C# side.

To show just how zippy native code can be, here is a video that compares a simple algorithm running on the .NET side to the same algorithm running on the native side:

http://www.youtube.com/watch?v=9EcEUbtgO2I

And here are some statistics, cut from the output of the program:
  • CSharp code ran in 0 minutes, 33 seconds, and 886 millseconds
  • Compiling fluent code ran in 0 minutes, 22 seconds, and 414 millseconds
  • Run Native code 1X ran in 0 minutes, 0 seconds, and 410 millseconds
From these stats we can conclude that the native code for this algorithm runs 82 times faster than the C# code, which is pretty sweet, and is a promising sign for potential real-world uses like the BitBanger. (I would do more direct measurements if I could, but Santa didn't bring me my Saleae logic analyzer like he was supposed to).

It's kind of a bummer that the compiler took 22 seconds to run. It's slower than I had hoped. Maybe some improvements are possible there; I'm not sure. In any case, one usage model is to perfect your code offline (perhaps running the compiler under the NETMF emulator) and then when you're done, paste the resulting opcodes into your target program.

Now to provide a little more background. The fluent interop system comes in three parts:
  • A new entry point in the firmware which allows the processor to jump to an array of opcodes stored in RAM. These opcodes could come from anywhere (so long as they are valid Thumb opcodes conforming to the proper calling convention).
  • A C# interface to that library
  • The Fluent Interop "compiler", a library which transforms programs that are written in a funky meta-language into Thumb opcodes.
The first two parts are very small. The change I made to the firmware is trivial, and so is the C# interface to it. This means that you can embed Thumb opcodes (whether generated by my freaky system or obtained elsewhere) into your program in a very low-overhead manner. The key benefit here is that you can run those opcodes out of RAM, which means that you can change them at will without needing to reflash your firmware all the time.

The third part is the bulk of the system. It is my fluent compiler, provided for people who don't find it convenient to hand-assemble Thumb codes. I wrote a proof-of-concept in the other post, but now I have a real, non-vaporware version.

To make these concepts more concrete, here are the moving parts involved in the YouTube video.

The demo algorithm is straightforward; the thing I did which may not be obvious is to use XOR in order to identify which bits changed between iterations. This was intended to eliminate needless calls to OutputPort.Write() and hopefully make the C# code run as fast as possible.

    private static void TestCSharp(int limit) {
      var outputPorts=AllocateOutputPorts();

      for(var numberToDisplay=0; numberToDisplay<limit; ++numberToDisplay) {
        var priorNumber=numberToDisplay-1;
        var bitsThatAreDifferent=numberToDisplay^priorNumber;

        var loopMask=numberToDisplay;
        for(var bitIndex=0; bitIndex<outputPorts.Length; ++bitIndex) {
          if((bitsThatAreDifferent&1)!=0) {
            var valueToDisplay=(loopMask&1)!=0;
            outputPorts[bitIndex].Write(valueToDisplay);
          }
          bitsThatAreDifferent>>=1;
          loopMask>>=1;
        }
      }

      DeallocateOutputPorts(outputPorts);
    }

The native version looks like this:

    private static readonly short[] compiledCode=unchecked(new[] {
        (short)0xB5F0, (short)0xB086, (short)0x9005, (short)0x9104, (short)0x9E13, (short)0x9603, (short)0x9E17, (short)0x9602,
        (short)0x2600, (short)0x9601, (short)0xE027, (short)0x9801, (short)0x1E40, (short)0x9E01, (short)0x4046, (short)0x9600,
        (short)0x9901, (short)0x2200, (short)0xE019, (short)0x9E00, (short)0x2701, (short)0x1C33, (short)0x403B, (short)0x2B00,
        (short)0xD00E, (short)0x2701, (short)0x1C0B, (short)0x403B, (short)0x9E02, (short)0x6874, (short)0xB44F, (short)0x9802,
        (short)0x0080, (short)0x9E08, (short)0x5830, (short)0x9903, (short)0xF000, (short)0xF815, (short)0x1C07, (short)0xBC4F,
        (short)0x9E00, (short)0x1076, (short)0x9600, (short)0x1049, (short)0x1C52, (short)0x9F05, (short)0x42BA, (short)0xDBE2,
        (short)0x9E01, (short)0x1C76, (short)0x9601, (short)0x9E01, (short)0x9F04, (short)0x42BE, (short)0xDBD3, (short)0xB006,
        (short)0xBCF0, (short)0xBC02, (short)0x4708, (short)0x4720
      });

    private static void TestFluent(short[] code, int limit) {
      var outputPorts=AllocateOutputPorts();

      //unfortunately the fluent code cannot take an array of enums.
      //So I copy it to an array of ints
      var pinIds=new int[allPins.Length];
      for(var i=0; i<allPins.Length; ++i) {
        pinIds[i]=(int)allPins[i];
      }

      code.Invoke(i0: pinIds.Length, i1: limit, ia0: pinIds);

      DeallocateOutputPorts(outputPorts);
    }

And that's basically the meat of the program running on the YouTube video. (The entire program is available in the attachment in the solution Demo_Precompiled\CountToN\CountToN.sln)

If you happen to be conversant in the Thumb instruction set, this would take you pretty far. For everyone else, I've cooked up this freaky meta-language which I feel is the best way to embed native constructs in C# code. What follows is the source code which generated the above opcodes.

    private static short[] CompileFluent() {
      var code=CodeGenerator.Compile((g, numPins, limit, ង, ច, ឆ, ជ, ឋ, ឌ, ព, ផ, ត, ថ, pins, ធ, ម, វ) => {
        g.For(numberToDisplay => numberToDisplay.AssignFrom(0),
          numberToDisplay => numberToDisplay<limit,
          numberToDisplay => numberToDisplay.AssignFrom(numberToDisplay+1),
          numberToDisplay => {
            var priorNumber=new IntVariable("priorNumber", numberToDisplay-1);
            var bitsThatAreDifferent=new IntVariable("bitsThatAreDifferent", numberToDisplay^priorNumber);

            var loopMask=new IntVariable("loopMask", numberToDisplay);

            g.For(i => i.AssignFrom(0), i => i<numPins, i => i.AssignFrom(i+1), i => {
              g.If((bitsThatAreDifferent&1)!=0,
                () => {
                  var valueToDisplay=new IntVariable("valueToDisplay", loopMask&1);
                  g.SetPinState(pins[i], valueToDisplay);
                });
              bitsThatAreDifferent.AssignFrom(bitsThatAreDifferent.ShiftRight(1));
              loopMask.AssignFrom(loopMask.ShiftRight(1));
            });
          });
      });
      return code;
    }

If you compare it to the C# version, you can see strong similarities in structure, though the syntax is totally different. (By the way, those funky little characters (ង, ច, ឆ, ជ, etc) are my cute way of using tiny Unicode characters to denote unused arguments to the function. If you recall from the last post, every function generated in my system needs to have a fixed number of parameters; so this is a cute way of keeping the unused ones from being visually distracting).

The routine above returns a short[] array, ready to invoke immediately. (You could also print out the values in that array and embed the results in another program. That is what I did for Demo_Precompiled).

The source code for the above is in the attachment in the solution called DemoFluent\CountToN\CountToN.sln

To run any of these, you will need to reflash your tinybooter (because I built the firmware with gcc 4.4.1) and also your firmware. The files you need to do so are in the directories TinyBooter and Firmware respectively. This is a relatively straightforward process, but you should only do it if you are comfortable reflashing your Netduino. Do this at your own risk! I disclaim all responsibility!

The attached zip file has four subdirectories:
  • TinyBooter - for reflashing your bootloader with SAM-BA
  • Firmware - for reflashing your firmware with MFDeploy
  • Demo_Precompiled - a lightweight project, which is the code behind the YouTube video
  • Demo_Fluent - a copy of the above, but this time also including the big-ass compiler library
In a separate post, I'll provide the source code for the firmware changes. Learning how to rebuild the firmware from source was such a nightmare for me that I would not wish it on anyone else.

I will also try to follow up with more information, assuming anyone is interested in this crazy little project. There are lots of moving parts, so it's not possible to explain the whole thing in one post. This system is quirky and Corey-specific enough that I'm sure it looks rather baffling on first glance. If it turns out that anyone does care, one thing I can post is my test suite, which has lots of little fluent programs in order of increasing complexity. There's some good stuff in there towards the end, like a fluent version of factorial and of quicksort (recursive!!!)

:ph34r: ALERT :ph34r: Scroll down to a later message for a newer version of the library

Attached Files




#5365 Atomic Class

Posted by Corey Kosak on 27 November 2010 - 12:57 AM

Use either of the approaches below. I prefer the second approach, for my fancy purist reasons.


  public static class AtomicClassExample1 {
    [MethodImpl(MethodImplOptions.Synchronized)]
    public static void Method1(int a, int B) {
      //your code here
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    public static void Method2(int x, int y) {
      //your code here
    }
  }


  public static class AtomicClassExample2 {
    private static readonly object sync=new object();

    public static void Method1(int a, int B) {
      lock(sync) {
        //your code here
      }
    }

    public static void Method2(int x, int y) {
      lock(sync) {
        //your code here
      }
    }
  }





home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.