Netduino home hardware projects downloads community

Jump to content


The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.
Photo

Generics, Data structures, and Libraries


  • Please log in to reply
38 replies to this topic

#21 sterned

sterned

    New Member

  • Members
  • Pip
  • 5 posts

Posted 17 February 2011 - 10:44 PM

Thanks for the Array vs ArrayList insight, guys. I did an initial test with array list, and the number of integers I could store was so miserably low that I assumed there was no way I could sufficiently use the onboard RAM as a data cache before I connected to my PC and dumped the info. With array, the storage increases enough that it may work for my purposes.

#22 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 18 February 2011 - 03:13 AM

Thanks for the Array vs ArrayList insight, guys.

Good to hear. Also if you want the growable behavior of ArrayList but without the large amount of overhead, you could roll your own class customized for integers, like the below. This uses the usual array-doubling trick (as ArrayList itself does), in order to do an Add() operation in amortized constant time. If you want to be a little more space-efficient in the average case (and have slower but still amortized-constant-time Adds), you can change the *2 constant to 4/3 or 5/4



  public class IntArrayList {
    private int[] data=new int[4];
    private int length;

    public int Count { get { return length; } }

    public void Add(int value) {
      if(length==data.Length) {
        var newData=new int[data.Length*2];
        Array.Copy(data, newData, data.Length);
        data=newData;
      }
      data[length++]=value;
    }

    public int this[int index] {
      get {
        CheckIndex(index);
        return data[index];
      }
      set {
        CheckIndex(index);
        data[index]=value;
      }
    }

    private void CheckIndex(int index) {
      if(index>=length) {
        throw new Exception("out of range "+index);
      }
    }

    public int[] ToArray() {
      var result=new int[length];
      Array.Copy(data, result, length);
      return result;
    }
  }


#23 ErikN

ErikN

    Advanced Member

  • Members
  • PipPipPip
  • 119 posts
  • LocationNew York, NY

Posted 21 February 2011 - 01:49 AM

I'm unsure if you'd find it helpful and I apologize that I'm just now seeing this thread but I started a CodePlex project to implement some features I've come to rely on that are missing from NETMF.

This includes a small implementation of LINQ extensions.
For my own amusement I added to that solution a couple other projects. One is a collections extension which provides a NearlyGenericArrayList with added Sort method.

The project is at http://microlinq.codeplex.com/ if you want to take a look. The source is included and it's free to use in any project, personal or commercial. (I used the most permissive license CodePlex has, as far as I can tell.)

-Erik

Corey:
Ah ok. I see what you mean. I've been reading Jon Skeet's blog on his re-implementation of LINQ and while it is an interesting exercise that I've been enjoying following, I can't imagine getting excited about re-implementing it myself. So I guess you could say that I'll probably stick with casting if I use LINQ in NETMF.

Chris:
Thanks for cluing me in on the on-board LED. I figured it would require PWM in software, but thought I might ask. Good to know about that class library for the 4 digital pins that support PWM. It's nice that there's a hardware way to do it.

Cheers!



#24 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 21 February 2011 - 04:58 AM

I started a CodePlex project...

Fascinating! I enjoyed reading over your source code. Do you mind if I provide some feedback? I know I have a lot of complaints below, but they are all meant to be constructive :wacko:

ParallelForEach
  • It looks like it does not wait for all the threads to be done like it ought to. (You had probably meant to have a .Wait() at the end)
  • It is going to create an awful lot of threads. I wonder if you felt like implementing ThreadPool to deal with this
  • It traverses the enumerable twice (once for the call to .Count() and once to fire off all the threads). It might be more LINQy to redo it so it traverses the enumerable just once. One way to do this is to make some changes to your synchronization mechanism so it counts up rather than down. The reason this is important is because some IEnumerables are expensive to traverse.
  • Does Countdown.Signal() really need to take a lock?

NearlyGenericArrayList
  • .Sort() should probably throw an exception (rather than silently returning) if the array cannot be sorted
  • Since your compares are kind of expensive, maybe you want to use a faster sort algorithm like quicksort here
  • I think it probably should delegate to (rather than inherit from) ArrayList. (In other words, NearlyGenericArrayList should contain an member of type ArrayList, and delegate to that). Otherwise it's a little to easy to subvert your type checking, for example, by calling .Insert (since you didn't provide a new Insert). Even in cases where you did provide an overload, as with Add, it's pretty easy for a caller to accidentally call ArrayList.Add rather than your Add, violating your invariant

MicroLinq
  • You should provide another overload to your OrderBy methods which take a comparer, to support other types (like int) which are not IComparable in this framework and don't even have a CompareTo method
  • The optimization that you do for OrderBy (namely call IListOrderBy on it directly if the enumerable is an IList) is not correct behavior for OrderBy. None of the standard LINQ methods ever modify their input enumerables.
  • Similar to what I said earlier, it would be nice if IEnumerableOrderBy didn't traverse its enumerable twice (once by calling .Count(), and once by building the array). You could avoid this by using ArrayList rather than an array.
  • To be totally compliant, ConditionalEnumerator and CombinatorialEnumerator should implement IDisposable, and have a Dispose method that looks like the below. To save code, they might even be made to inherit from the same base class.
      void IDisposable.Dispose() {
        var d=e as IDisposable;
        if(d!=null) {
          d.Dispose();
        }
      }

Speed

MicroLinqExtensions.IListOrderBy is going to be really slow when the list items do not implement IComparable, as it is going to have to call GetMethod() N-squared times. GetMethod() is pretty slow and if you use the technique below you'll only need to call it once. You could make things a whole lot zippier if you first implemented an (N log N) sort algorithm like quicksort, and then secondly, did a prescan of the IList once to find out what the highest type is, and then figure out which compare method to call based on that type and store the result in a delegate. Then your sort routine can just call that delegate at full speed. If you don't know what I mean, I can provide code or examples.

The following code takes 17 seconds to sort 128 strings on my netduino, which is pretty bad (and 70 seconds to sort just 256 strings).
using System;
using Microsoft.SPOT;
using VikingErik.NetMF.MicroLinq;

namespace NetduinoApplication1 {
  public class Program {
    public static void Main() {
      var items=new string[128];
      for(var i=0; i<items.Length; ++i) {
        items[i]=(128-i).ToString();
      }
      var start=DateTime.Now;
      var i2=items.OrderBy(s => s);
      var elapsed=DateTime.Now-start;
      Debug.Print("elapsed is "+elapsed);
    }
  }
}

Nitpicks
  • In MicroLinqExtensions.IEnumerableOrderBy, you could have said new object[e.Count()] rather than System.Array.CreateInstance(typeof(object), e.Count());
  • A bunch of comments in Extensions.cs misspell object as 'objet'
  • The class you meant to call CombinatorialEnumerator is misspelled 'CombinatoriaEnumerator' (note no 'l')


#25 ErikN

ErikN

    Advanced Member

  • Members
  • PipPipPip
  • 119 posts
  • LocationNew York, NY

Posted 21 February 2011 - 07:44 PM

Thanks for the feedback. I only get a few hours a week to work on this project so I appreciate the thorough, in-depth feedback. My driving motivation was convenience but I'd like to squeeze out speed too. You touched on a number of points I had already considered but given my time constraints I thought it would be better to get something out that can be seen and inspire than wait until I get it as good as it can be.

I would like to move these to individual task items on the codeplex project. Maybe some generous person would like to help me make things better? :)

I'm curious what you mean about scanning the IEnumerable to see what the highest type is to cache the CompareTo method. I wasn't happy with having to reflect the types to get the CompareTo method. I initially wanted to use IComparable but System.String wasn't marked. I thought a good compromise would be to do what I've done and wait for NETMF 4.2 to fix the slowness. (I submitted an issue on the NETMF team to mark System.String as IComparable and it's done but in the next version.) If there is a faster way I would love to implement it now. Not only for the speed but in case there are other types which implement but failed to mark as IComparable.

The NearlyGenericArrayList and ParallelForEach were just tinkering of my own. I only included them as a proof of concept - the meat is definitely the MicroLinq project. I appreciate the feedback on the other items though. You're absolutely right about the oversight on the ForEach - there was to be a WaitOne. It looks like I struggled a little with the threads and once it was working I started looking at implementing ThreadPool and missed finishing the object. That's one of the problems with getting small windows of opportunity to work on the project. Oh, and yes I think a ThreadPool would be nice but I'm not sure how to implement it without requiring a very large allocation of program space. That's why I stuck with bare threads. Also, it's not meant to be used over a very large lists but rather a few tasks which might wait on IO. It's just something I was thinking about but haven't practically used.

Again, I appreciate all the feedback you've given. I certainly don't find them as "complaints" rather a critique of the current state and how it could be made better. I'm new to NETMF but not to programming. My other life revolves around SharePoint and some Silverlight so I haven't really had to think at this lower level in awhile.

Many thanks.
-Erik



Fascinating! I enjoyed reading over your source code. Do you mind if I provide some feedback? I know I have a lot of complaints below, but they are all meant to be constructive :wacko:

ParallelForEach

  • It looks like it does not wait for all the threads to be done like it ought to. (You had probably meant to have a .Wait() at the end)
  • It is going to create an awful lot of threads. I wonder if you felt like implementing ThreadPool to deal with this
  • It traverses the enumerable twice (once for the call to .Count() and once to fire off all the threads). It might be more LINQy to redo it so it traverses the enumerable just once. One way to do this is to make some changes to your synchronization mechanism so it counts up rather than down. The reason this is important is because some IEnumerables are expensive to traverse.
  • Does Countdown.Signal() really need to take a lock?

NearlyGenericArrayList
  • .Sort() should probably throw an exception (rather than silently returning) if the array cannot be sorted
  • Since your compares are kind of expensive, maybe you want to use a faster sort algorithm like quicksort here
  • I think it probably should delegate to (rather than inherit from) ArrayList. (In other words, NearlyGenericArrayList should contain an member of type ArrayList, and delegate to that). Otherwise it's a little to easy to subvert your type checking, for example, by calling .Insert (since you didn't provide a new Insert). Even in cases where you did provide an overload, as with Add, it's pretty easy for a caller to accidentally call ArrayList.Add rather than your Add, violating your invariant

MicroLinq
  • You should provide another overload to your OrderBy methods which take a comparer, to support other types (like int) which are not IComparable in this framework and don't even have a CompareTo method
  • The optimization that you do for OrderBy (namely call IListOrderBy on it directly if the enumerable is an IList) is not correct behavior for OrderBy. None of the standard LINQ methods ever modify their input enumerables.
  • Similar to what I said earlier, it would be nice if IEnumerableOrderBy didn't traverse its enumerable twice (once by calling .Count(), and once by building the array). You could avoid this by using ArrayList rather than an array.
  • To be totally compliant, ConditionalEnumerator and CombinatorialEnumerator should implement IDisposable, and have a Dispose method that looks like the below. To save code, they might even be made to inherit from the same base class.
      void IDisposable.Dispose() {
        var d=e as IDisposable;
        if(d!=null) {
          d.Dispose();
        }
      }

Speed

MicroLinqExtensions.IListOrderBy is going to be really slow when the list items do not implement IComparable, as it is going to have to call GetMethod() N-squared times. GetMethod() is pretty slow and if you use the technique below you'll only need to call it once. You could make things a whole lot zippier if you first implemented an (N log N) sort algorithm like quicksort, and then secondly, did a prescan of the IList once to find out what the highest type is, and then figure out which compare method to call based on that type and store the result in a delegate. Then your sort routine can just call that delegate at full speed. If you don't know what I mean, I can provide code or examples.

The following code takes 17 seconds to sort 128 strings on my netduino, which is pretty bad (and 70 seconds to sort just 256 strings).
using System;
using Microsoft.SPOT;
using VikingErik.NetMF.MicroLinq;

namespace NetduinoApplication1 {
  public class Program {
    public static void Main() {
      var items=new string[128];
      for(var i=0; i<items.Length; ++i) {
        items[i]=(128-i).ToString();
      }
      var start=DateTime.Now;
      var i2=items.OrderBy(s => s);
      var elapsed=DateTime.Now-start;
      Debug.Print("elapsed is "+elapsed);
    }
  }
}

Nitpicks
  • In MicroLinqExtensions.IEnumerableOrderBy, you could have said new object[e.Count()] rather than System.Array.CreateInstance(typeof(object), e.Count());
  • A bunch of comments in Extensions.cs misspell object as 'objet'
  • The class you meant to call CombinatorialEnumerator is misspelled 'CombinatoriaEnumerator' (note no 'l')



#26 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 21 February 2011 - 10:13 PM

I'm curious what you mean about scanning the IEnumerable to see what the highest type is to cache the CompareTo method.

What I said didn't make much sense as stated. What I was trying to get at was that your code is calling GetMethod() N^2 times, which doesn't make sense to do given that there are (at most!) N different answers, and generally much fewer even than that. So why keep calling GetMethod() where you are going to keep getting the same answer back. The idea I was proposing was to (a) find all the distinct types in the array, (b ) look up a comparer for each distinct type.

However, by far the biggest problem is the choice of sort algorithm. By implementing Quicksort (see below) I was able to reduce the runtime of my test program on 256 elements from around 70 seconds to 2.5 seconds.

Next, I think (given the limitations of the NETMF) you should probably not even try to guess at a comparer, and just force the caller to pass one in.

Using this approach I got rid of the comparer-guessing algorithm altogether. In the final version of the code, the 256 elements were sortable in 1.66 seconds on the Netduino--a big improvement over the 70 seconds we started with.


Test framework:
using System;
using System.Collections;
using Microsoft.SPOT;
using VikingErik.NetMF.MicroLinq;

namespace NetduinoApplication1 {
  public class Program {
    public static void Main() {
      const int count=256;
      var items=new string[count];
      for(var i=0; i<items.Length; ++i) {
        items[i]=(count-i).ToString();
      }
      ShowItems("start", items);
      var start=DateTime.Now;
      var i2=items.OrderBy(s => s, (a, b ) => ((string)a).CompareTo((string)b ));
      var elapsed=DateTime.Now-start;
      Debug.Print("elapsed is "+elapsed);
      ShowItems("finish", i2);
    }

    private static void ShowItems(string message, IEnumerable items) {
      Debug.Print("--- "+message+" ---");
      foreach(var item in items) {
        Debug.Print(item.ToString());
      }
    }
  }
}


My changes to Extensions.cs
   public static IEnumerable OrderBy(this IEnumerable e, Selector s, Comparer c)
        {
            IList coll = e as IList;
            if (null != coll)
                return IListOrderBy(coll, s, c);

            return IEnumerableOrderBy(e, s, c);
        }

        private static IEnumerable IEnumerableOrderBy(IEnumerable e, Selector s, Comparer c)
        {
            IList list = System.Array.CreateInstance(typeof(object), e.Count());
            int i = 0;
            foreach (var o in e)
            {
                list[i++] = o;
            }

            return IListOrderBy(list, s, c);
        }

        public delegate int Comparer(object lhs, object rhs);

        private static IEnumerable IListOrderBy(IList list, Selector s, Comparer c) {
          Quicksort(list, s, c, 0, list.Count-1);
          return list;
        }

      private static void Quicksort(IList array, Selector s, Comparer c, int left, int right) {
        if(right>left) {
          var pivotIndex=left+(right-left)/2;
          var pivotNewIndex=Partition(array, s, c, left, right, pivotIndex);
          Quicksort(array, s, c, left, pivotNewIndex-1);
          Quicksort(array, s, c, pivotNewIndex+1, right);
        }
      }

      private static int Partition(IList array, Selector s, Comparer c, int left, int right, int pivotIndex) {
        var selectedPivotValue=s(array[pivotIndex]);
        Swap(array, pivotIndex, right); // Move pivot to end
        var storeIndex=left;
        for(var i=left; i<right; ++i) {
          if(c(selectedPivotValue, s(array[i]))>=0) {
            Swap(array, i, storeIndex);
            ++storeIndex;
          }
        }
        Swap(array, storeIndex, right); // Move pivot to its final place
        return storeIndex;
      }

      private static void Swap(IList array, int left, int right) {
        var temp=array[left];
        array[left]=array[right];
        array[right]=temp;
      }


#27 ErikN

ErikN

    Advanced Member

  • Members
  • PipPipPip
  • 119 posts
  • LocationNew York, NY

Posted 22 February 2011 - 12:46 AM

I like it!

Can I use it?

What I said didn't make much sense as stated. What I was trying to get at was that your code is calling GetMethod() N^2 times, which doesn't make sense to do given that there are (at most!) N different answers, and generally much fewer even than that. So why keep calling GetMethod() where you are going to keep getting the same answer back. The idea I was proposing was to (a) find all the distinct types in the array, (b ) look up a comparer for each distinct type.

However, by far the biggest problem is the choice of sort algorithm. By implementing Quicksort (see below) I was able to reduce the runtime of my test program on 256 elements from around 70 seconds to 2.5 seconds.

Next, I think (given the limitations of the NETMF) you should probably not even try to guess at a comparer, and just force the caller to pass one in.

Using this approach I got rid of the comparer-guessing algorithm altogether. In the final version of the code, the 256 elements were sortable in 1.66 seconds on the Netduino--a big improvement over the 70 seconds we started with.


Test framework:

using System;
using System.Collections;
using Microsoft.SPOT;
using VikingErik.NetMF.MicroLinq;

namespace NetduinoApplication1 {
  public class Program {
    public static void Main() {
      const int count=256;
      var items=new string[count];
      for(var i=0; i<items.Length; ++i) {
        items[i]=(count-i).ToString();
      }
      ShowItems("start", items);
      var start=DateTime.Now;
      var i2=items.OrderBy(s => s, (a, b ) => ((string)a).CompareTo((string)b ));
      var elapsed=DateTime.Now-start;
      Debug.Print("elapsed is "+elapsed);
      ShowItems("finish", i2);
    }

    private static void ShowItems(string message, IEnumerable items) {
      Debug.Print("--- "+message+" ---");
      foreach(var item in items) {
        Debug.Print(item.ToString());
      }
    }
  }
}


My changes to Extensions.cs
   public static IEnumerable OrderBy(this IEnumerable e, Selector s, Comparer c)
        {
            IList coll = e as IList;
            if (null != coll)
                return IListOrderBy(coll, s, c);

            return IEnumerableOrderBy(e, s, c);
        }

        private static IEnumerable IEnumerableOrderBy(IEnumerable e, Selector s, Comparer c)
        {
            IList list = System.Array.CreateInstance(typeof(object), e.Count());
            int i = 0;
            foreach (var o in e)
            {
                list[i++] = o;
            }

            return IListOrderBy(list, s, c);
        }

        public delegate int Comparer(object lhs, object rhs);

        private static IEnumerable IListOrderBy(IList list, Selector s, Comparer c) {
          Quicksort(list, s, c, 0, list.Count-1);
          return list;
        }

      private static void Quicksort(IList array, Selector s, Comparer c, int left, int right) {
        if(right>left) {
          var pivotIndex=left+(right-left)/2;
          var pivotNewIndex=Partition(array, s, c, left, right, pivotIndex);
          Quicksort(array, s, c, left, pivotNewIndex-1);
          Quicksort(array, s, c, pivotNewIndex+1, right);
        }
      }

      private static int Partition(IList array, Selector s, Comparer c, int left, int right, int pivotIndex) {
        var selectedPivotValue=s(array[pivotIndex]);
        Swap(array, pivotIndex, right); // Move pivot to end
        var storeIndex=left;
        for(var i=left; i<right; ++i) {
          if(c(selectedPivotValue, s(array[i]))>=0) {
            Swap(array, i, storeIndex);
            ++storeIndex;
          }
        }
        Swap(array, storeIndex, right); // Move pivot to its final place
        return storeIndex;
      }

      private static void Swap(IList array, int left, int right) {
        var temp=array[left];
        array[left]=array[right];
        array[right]=temp;
      }



#28 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 22 February 2011 - 12:57 AM

I like it! Can I use it?

Sure. BTW, I took the Quicksort implementation pretty much straight out of Wikipedia.

#29 ErikN

ErikN

    Advanced Member

  • Members
  • PipPipPip
  • 119 posts
  • LocationNew York, NY

Posted 23 February 2011 - 12:09 AM

I implemented a number of your recommendations. A couple things I'd like to ask about remain.

NearlyGenericArrayList:
I think this is no longer necessary. This was just a toy to consider putting restrictions in place to ensure an ArrayList contained a common base type. This meant I could cache the compare method for the local sort function. With the performance enhancements from the OrderBy changes I wonder if any of this is even necessary anymore. I'm considering cutting this project out entirely. Any reason to see it stay?

MicroLinq:
"Similar to what I said earlier, it would be nice if IEnumerableOrderBy didn't traverse its enumerable twice (once by calling .Count(), and once by building the array). You could avoid this by using ArrayList rather than an array."

A grow/copy operation of the ArrayList is potentially faster than iterating the IEnumerable to get the count and copy the elements but after a few of these is it really faster? And if the IEnumerable is castable to ICollection then the extension method gets the native Count anyway without iterating. Am I missing something here?


I tried my hand at a count-up rather than count-down for the ParallelForEach if you'd like to take a look and point out any flaws. The old Countdown is there for comparison and reversion in case the Countup is unworkable. I think the above argument about Count() applies here as well but I was more enthusiastic about trying this change.


Thanks again for your thorough walk-through. Your observations pushed the project a bit farther by casting some shame and embarrassment my way.
-Erik


Sure. BTW, I took the Quicksort implementation pretty much straight out of Wikipedia.



#30 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 23 February 2011 - 04:00 AM

Thanks again for your thorough walk-through. Your observations pushed the project a bit farther by casting some shame and embarrassment my way.

If I have shamed or embarrassed you then I have failed. It was my intention only to give helpful feedback, which might help make it a better product (and thus hopefully more people would use it).

A grow/copy operation of the ArrayList is potentially faster than iterating the IEnumerable to get the count and copy the elements but after a few of these is it really faster? And if the IEnumerable is castable to ICollection then the extension method gets the native Count anyway without iterating. Am I missing something here?

Let me first respond to the case where the IEnumerable is not castable to ICollection. You really can't make any assumptions about the IEnumerable; in particular you know nothing about how fast the IEnumerable is; in fact you don't even know if it will yield the same values when it is run twice.

As a concrete example, let's say I hand your routine an IEnumerable which recursively enumerates all the filenames in every subdirectory of my SD card. First, this is an expensive operation to perform, and second, it might not return the same values every time it is run (suppose that activity on another thread changes the population of files between your two traversals of the IEnumerable). So because of the first point, running this enumerable twice is wasteful; and because of the second point, your code is actually incorrect (if the second iteration were to return more items than the first, your code would throw an exception as it runs off the end of its array).

I think one can come up with many real-world examples of such "volatile" IEnumerables (reading data from a sensor for example).

As for the case where you can cast it to an ICollection, I think this is more defensible. As long as you don't iterate over the enumerable more than once, you're OK in my book. Full-framework LINQ is scrupulous about this and if you want to call yourself MicroLINQ you ought to be just as scrupulous <_<

NearlyGenericArrayList

Yeah, probably less interesting. However, since (once you count up all the costs) boxing is so enormously expensive in this framework, there may be value in providing a few standard "specific implementations", like IntArrayList, DoubleArrayList, and perhaps CharArrayList aka StringBuilder.

By the way, now that your framework takes a comparer argument, you can provide standard comparers for the convenience of your users. You could do something like:

public static class DefaultComparers {
  public static int IntComparer(object lhs, object rhs) {
    var l=(int)lhs;
    var r=(int)rhs;

    if(l==r) return 0;
    if(l<r)
      return -1; 
    else
      return 1;
  }
  public static int DoubleComparer(object lhs, object rhs) {
    ...
  }
}

Not sure what the fastest implementation of IntComparer is, but I do know that the C-like trick of using (int)lhs-(int)rhs is wrong because it fails for certain extremal situations (e.g. lhs=0, rhs=int.MinValue)

I tried my hand at a count-up rather than count-down for the ParallelForEach

Sounds cool! I'll take a look at it either tomorrow or the next day if that's OK with you. I have a date with my cat to watch Glee and V tonight (not kidding!)

#31 ErikN

ErikN

    Advanced Member

  • Members
  • PipPipPip
  • 119 posts
  • LocationNew York, NY

Posted 23 February 2011 - 05:11 AM

Your example of what the IEnumerable might contain was great. Thanks. It really helped clarify your point.

On a whim I decided to take a look at how LINQ does Count to see how they find out how many items are in an IEnumerable and this is what Reflector shows:

public static int Count<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Count;
    }
    ICollection is3 = source as ICollection;
    if (is3 != null)
    {
        return is3.Count;
    }
    int num = 0;
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            num++;
        }
    }
    return num;
}

Isn't GetEnumerator precisely what ForEach does under the hood? So at least the way in which I'm getting Count is correct but I need to find a different way of processing the records without knowing the size.

I took a quick look at the OrderBy but that wasn't as immediately clear how it was done but I did not see any use of Count at first look. So, again, you're right! I think I've got an idea how to accomplish this in one iteration but I won't be able to try it out until I get some more free time (maybe this weekend).

I'm happy to have my pride hurt a bit! Really. The more you find now, the less I can be blamed for later. :)

-Erik

#32 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 23 February 2011 - 01:22 PM

I took a quick look at the OrderBy but that wasn't as immediately clear how it was done but I did not see any use of Count at first look.

Yeah, they end up copying all of the elements into an array-like type (they call it a Buffer) and then they sort the buffer. They use the usual array-doubling trick to populate Buffer's array. If it helps these are the steps that lead to it here

By the way, I'm guessing their sort implementation is probably better than the one that I posted in that it is stable (the version of Quicksort I posted is not stable). That means should probably be changed again (maybe mergesort) if you want to be a purist (of course you do! :ph34r: )

#33 Chris Walker

Chris Walker

    Secret Labs Staff

  • Moderators
  • 7767 posts
  • LocationNew York, NY

Posted 25 February 2011 - 12:41 AM

Sounds cool! I'll take a look at it either tomorrow or the next day if that's OK with you. I have a date with my cat to watch Glee and V tonight (not kidding!)


Wait, V? As in the alien show from a few decades ago? It's on TV? That show scared me as a youngster.

[sorry, back to the topic at hand...]

Chris

#34 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 25 February 2011 - 01:48 AM

Wait, V? As in the alien show from a few decades ago? It's on TV? That show scared me as a youngster.


Yup. It's nearing the end of its second season. I'm more into Glee, but the cat likes V.

#35 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 25 February 2011 - 03:37 AM

VikingErik,

I finally took a look at your ParallelForEach. There's a serious bug that I didn't catch the first time. :blink:

It's a race so it doesn't always happen. You can force it to happen by going to ParallelForEach.cs and putting a Thread.Sleep(1000) right before the action(current). That is:

   try {
     Thread.Sleep(1000);
     action(current);
   }

If you do that, then this program will print "5 5 5 5 5" rather than the expected 1 2 3 4 5.

using Microsoft.SPOT;
using VikingErik.NetMF.ParallelExtensions;

namespace NetduinoApplication1 {
  public class Program {
    public static void Main() {
      var items=new[] {1, 2, 3, 4, 5};
      items.ParallelForEach(item=>Debug.Print(item.ToString()));
    }
  }
}

The fix, in case you don't see it, is to copy current into a temporary variable so you're not sharing a closure amongst all your threads. That is:
                foreach (object current in enumerable)
                {
                    var temp=current;  //NEW LINE
                    cu.Adding();
                    new Thread(
                        () =>
                        {
                            try {
                              Thread.Sleep(1000); //you can remove this of course!
                              action(temp);       //CHANGED LINE
                            }
                            finally
                            {
                                cu.Signal();
                            }
                        }
                    ).Start();
                }

As an exercise, and because I've become a fan of the practice of putting all cooperating code in one place (rather than a separate CountUp class), I came up with a different approach. The big idea here is that:
  • everybody counts up
  • there's a "target" variable which stays 0 until we know what the final count is
  • we start N-1 threads as normal, but we delay the start of the very last thread until we've established the value of the "target"

        public static void ParallelForEach2(this IEnumerable enumerable, Action action) {
          var count=0;
          var threadsFinished=0;
          var threadsFinishedTarget=0;
          var mre=new ManualResetEvent(false);
          Thread lastThread=null;

          foreach(object current in enumerable) {
            ++count;
            var tempCurrent=current;
            if(lastThread!=null) {
              lastThread.Start();
            }
            lastThread=new Thread(() => {
              try {
                action(tempCurrent);
              } finally {
                if(Interlocked.Increment(ref threadsFinished)==threadsFinishedTarget) {
                  mre.Set();
                }
              }
            });
          }
          if(lastThread!=null) {
            threadsFinishedTarget=count;
            lastThread.Start();
            mre.WaitOne();
          }
        }

The point of the last thread is not that it will definitely be the one that will sets the ManualResetEvent; rather that it guarantees that there is at least one thread who hasn't finished yet which is eligible to set the ManualResetEvent.

Let me know what you think.

P.S. I like the changes you've made to Extensions.cs. I was sort of wondering, to save code, if you might to do even more sharing of code in the various overloads. For example:

     private static bool True(object o) {
        return true;
     }

     public static int Count(this IEnumerable e) {
       return e.Count(True);
     }

     public static object FirstOrDefault(this IEnumerable e) {
       return e.FirstOrDefault(True);
     }


#36 ErikN

ErikN

    Advanced Member

  • Members
  • PipPipPip
  • 119 posts
  • LocationNew York, NY

Posted 25 February 2011 - 04:01 AM

The change to Count would eliminate the ability to check ICollection and get a native count rather than enumerating, right?

Looking at my full framework version of ParallelForEach I did originally have a capture object. I just did an all around bad job of reimplementing it.

The code changes you suggest for ParallelForEach are interesting.


        public static void ParallelForEach2(this IEnumerable enumerable, Action action) {
          var count=0;
          var threadsFinished=0;
          var threadsFinishedTarget=0;
          var mre=new ManualResetEvent(false);
          Thread lastThread=null;

          foreach(object current in enumerable) {
            ++count;
            var tempCurrent=current;
            if(lastThread!=null) {
              lastThread.Start();
            }
            lastThread=new Thread(() => {
              try {
                action(tempCurrent);
              } finally {
                if(Interlocked.Increment(ref threadsFinished)==threadsFinishedTarget) {
                  mre.Set();
                }
              }
            });
          }
          if(lastThread!=null) {
            threadsFinishedTarget=count;
            lastThread.Start();
            mre.WaitOne();
          }
        }

The point of the last thread is not that it will definitely be the one that will sets the ManualResetEvent; rather that it guarantees that there is at least one thread who hasn't finished yet which is eligible to set the ManualResetEvent.

Let me know what you think.

P.S. I like the changes you've made to Extensions.cs. I was sort of wondering, to save code, if you might to do even more sharing of code in the various overloads. For example:

     private static bool True(object o) {
        return true;
     }

     public static int Count(this IEnumerable e) {
       return e.Count(True);
     }

     public static object FirstOrDefault(this IEnumerable e) {
       return e.FirstOrDefault(True);
     }



#37 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 25 February 2011 - 04:28 AM

The change to Count would eliminate the ability to check ICollection and get a native count rather than enumerating, right?


Yeah, you're right

#38 ErikN

ErikN

    Advanced Member

  • Members
  • PipPipPip
  • 119 posts
  • LocationNew York, NY

Posted 26 February 2011 - 04:35 AM

The high from seeing that wore off when I realized it didn't have to be all or nothing.

Implemented some more of your bright ideas including the new ParallelForEach logic and got rid of the Countup and Countdown classes - the library is now a bit smaller.

Check the release page for all the changes or see how many times you can spot your name in the Source Code tab in the commit comments.

Thanks for the extra eyes and the inspiration. I've been sitting at my office for 13 hours now. The updated build is published and I'm headed home to enjoy my weekend...and work on my other CodePlex project.

MicroLinq Build 5763

-Erik

Yeah, you're right



#39 Corey Kosak

Corey Kosak

    Advanced Member

  • Members
  • PipPipPip
  • 276 posts
  • LocationHoboken, NJ

Posted 28 February 2011 - 06:24 AM

Sweet! I just started checking it out. I'll let you know if I have any comments. Love the switch to Mergesort! :wub:




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

home    hardware    projects    downloads    community    where to buy    contact Copyright © 2016 Wilderness Labs Inc.  |  Legal   |   CC BY-SA
This webpage is licensed under a Creative Commons Attribution-ShareAlike License.