Now your goal is surely more clear.
The only info missing is about the magnitude of the times involved, and their precision.
Until you don't ask time accuracy under 5-10 ms, (almost) everything could be realized.
However, the biggest problem is the sound recognizer.
What you are asking is much like a sound recognizer, than a simple freq/amp detector. Otherwise the circuit would be very very easy to realize.
A generic sound is composed of many frequencies composed together, having an amplitude changing over time. For instance, you may play the A note both on a guitar, and on a flute. Both the instruments generates two different sounds having the same frequency (e.g. 880 Hz), maybe the same amplitude as well. However it is clear that you hear two totally different instruments.
I don't know what's the best way to realize a sound recognizer, surely you can not do it using a Netduino, and probably you need a DSP.
A possible approach could be done via correlation:
record your reference sound as a bunch of samples (e.g. wav);
to detect, the sampler will sample the microphone as a continuous stream;
the "most recent" part of the stream is taken and "correlated" with the reference.
The result of the computation gives "how many the pattern is matching the reference".
Another approach would be using the neural-networks, similarly as they are used for the handwriting recognition.
In both the ideas the calculations are a lot: you must use a very good machine such a fast PC or a good DSP.
Anyway, I surely will choose to buy one!
Cheers
Thank you for the reply, time is not that critical because there are other losses that cannot be compenstated, for example, a camera flash is only so fast and so is the time for a camera to actuate the trigger. something in the 30ms range would work if my math is correct.
I am just playing around, do you have a recommended mic and speaker setup I can start with just to get started with hooking things up. Baby steps!
Thanks