Javascript Disabled Detected

You currently have javascript disabled. Several functions may not work. Please re-enable javascript to access full functionality.

The Netduino forums have been replaced by new forums at community.wildernesslabs.co. This site has been preserved for archival purposes only and the ability to make new accounts or posts has been turned off.

Silly Voice Synth on NP2....

Started by ziggurat29, May 26 2013 02:59 AM

sound voice synthesizer sp0256 metadataprocessor

Please log in to reply

4 replies to this topic

#1 ziggurat29

Advanced Member

Members
244 posts

Posted 26 May 2013 - 02:59 AM

Recently, while looking for some unneeded junk on ebay, I stumbled across a listing for an antique voice synthesiser chip, the SP0256. It made me remember that I might have some on-hand, and I looked in my junkbox and Lo! and Behold! amongst decaying antistatic foam were a couple samples, replete with the odd-valued crystal it prefers. So I thought I'd amuse myself for a couple hours and hook it up and reminisce the scarcely intelligible sounds from yesteryear

After the fun of a buzzy 'Hello' had worn off, I found myself inspired to do a particular Doors song, but one thing about this chip is that you have to manually specify a sequence of phoneme variants called 'allophones'. This is a bit time consuming, and so drains the fun, but by now I was doomed because improvement obsession had set in, and so I wasted several days implementing a text-to-speech algorithm for it (don't be too impressed, this is derivative from some research done in the 70s, and some concrete code done in years hence).

Anyway, I thought I'd share for your amusement if you're bored. (well, I can't really promise this will truly cure that, but I did try!):

While a trivial pursuit, there were some useful things I did learn that I want to share:

1) C#/NetMF is not great at constant data, like for tables of... 'stuff'

By 'not great' I mean 'horrible'. The text2speech uses about 2000 rules, consisting of 4 parts: three patterns and a phoneme sequence to emit for the match. This is the sort of stuff that in C, etc, you declare something like:

[color=#4b0082;][font="'courier new', courier, monospace;"] struct thingy { const char* first; const char* second; const char* third; const char* fourth; };[/color][/font]

[color=#4b0082;][font="'courier new', courier, monospace;"] static const _rules[] = { "l1", "m1", "r1", "f1" }, { "l2", "m2", "r2", "f2" }, };[/color][/font]

and when you did that, the linker would have the sense to put it all in ROM, thereby consuming no RAM. Because it is read-only; what do you need it in RAM for? (on some processors you need to add some compiler-specific qualifiers, or some linker magikry, but that is not pertinent here).

On netmf, apparently 'const' and 'readonly' are neither, and netmf will take such a naively constructed array and blithely instantiate it in RAM. Goodbye RAM, we'll miss you. And miss it, I did, because I consumed it all (about 105k available to user programs on the NP2) and had none with which to run my program. This was when I had a little over 300 rules that were derived from the US Navy research.

So in eagerness I cut a few, and it sounded like crap, but it ran, a little, with about 20K RAM.

So 'caveat programmer' on tables of data, I'll describe how I coped and with what results after explaining a few more things....

2) Netmf apps need more than about 15K RAM

If you go below this, the system will become unstable. You will need to furiously litter your code with

[color=#000080;][font="'courier new', courier, monospace;"] Debug.GC(true);[/color][/font]

[font="'courier new', courier, monospace;"][font="arial, helvetica, sans-serif;"] just to keep it alive. You'll see memory allocations fail on the debugger output, and your program will run for a little while, but things will come to an end soon enough.[/font][/font]

Nothing to say here other than realize that you will need to help netmf out by not getting to rammy with your impls, and that a [color=#000080;][font="'courier new', courier, monospace;"]GC(true)[/color][/font] now and then is most effectacious.

3) An array (i.e. []) of something will cost you RAM

all the const and readonly in the world apparently will not help you, there will be a RAM based array of references. Hmm!

4) Too many initializers makes your metadataprocessor mad

A companion to the RAM consumption problem is a build tool called the 'MetaDataProcessor'. It evidently processes the data that is meta, but it doesn't like it if you have too much. It will give you abstruse error messages (one of which I do not have a sample at the ready just now, alas), so like you just cut a few. Like when you have too many notes in your music score. Well, you coded them, and it's not like you're just typing them in to keep the blood flowing to your fingers on a cold winter's eve, they're important!

I found that the count of items is important, and the type drastically even more so. Which leads to the next point...

5) NetMF is pretty good at [color=rgb(0,0,128);][font="'courier new', courier, monospace;"]string[/color][/font]s

strings, which in dotnet and other recent languages of the javainian persuasion, are immutable, and the runtime seems to have some sense in that regard. So, if you declare a string constant, it's value actually will be stored in the Flash, and not come into RAM until you fiddle with it......

So, OK, this last bit is call

What I Did To Cram 2000 Structs Of Read Only Data Into My App And Only Waste 25k Of RAM,

or alternatively

thank Goodness My Processor Is Fast And Doesn't Have Better Things To Do.

I'll spare you the details of the 5 or so experiments I did before this final one -- you've been kind (or bored!) enough to read this far. Ultimately, I:

* flattened my jagged array into single array

and a secondary array of offsets to the starts of sections, and then manually treated it as an array-of-arrays in code.

This helped a little with the RAM usage; a small gain, but a gain nonetheless.

* transcoded my binary fields into text fields

this was surprisingly helpful, because evidently metadataprocessor gets really miffed at initializers for binary fields. way more so than for text fields. Hmmm. I'm going to have to look at this metadataprocessor someday. Anyway, this made is possible to even compile with a reasonable-sized rule set. I find it rather puts a damper on your fun if you can't compile, but maybe that's just me....

* flattened my structs into a string

this was a much more tedious, because I was effectively encoding the struct into a delimited format, and scootching the character set to avoid the field delimiter.

this was the big win by far. ultimately it was what made it possible to include the total rule set, and still have a healthy bit of RAM left over for the app. Mind you, in a sane world the RAM cost would be zero. Thats a '0' with infinitely many '0' after it. So I'm still a little disappointed (and some of my crypto code is as well because I need tables for my S-boxes etc and compiles fail sporadically because of it but that's a separate rant), but this project was for fun so whatevs.

Then, I punished my STM32F4 by making it depersist the encoded rule on-the-fly every time it was needed. Which really is a lot. And speech is slow to emit, and so there's time abounding to translate text into speech, and so you can keep it quite fluid if you have a worker thread that emits speech sequences from a queue, separate from your text-to-speech translation process

Oh! That reminds me that there is one other thing I learned in all this:

* Event Notifications Come In On a High(est!) Priority Thread

To make the demo, I added an [font="'courier new', courier, monospace;"][color=rgb(0,0,128);]InterruptPort[/color][/font] to execute one of my test sequences. Click the button, and text to speech something fun, and out comes sound. EasyPeasy. But, my speech came out sllooowwww, for a couple seconds, and then picked up the normal pace. Well, the reason why is that the button event handler gets invoked on a thread executing at priority 4 (highest), and my speech rendering worker thread was running on priority 2 (normal). So, my text-to-speech was running in a higher-priority thread than my output-to-synthesiser worker thread, and so was taking time away feeding the synth. Actually, I am impressed that the synthesizer thread was not starved out-right, so kudos to the scheduler implementer, but nonetheless the text to speech needed to be at or less than the priority of the output. The interrupt handler had nowhere to go but down, and anyway I felt dirty fiddling with it's priority since it was not my thread, and so I made yet another queue for text to speech work.

Oh, there were two last things I learned that are worth mentioning:

* arrays cost

For fun I tried setting all my rules to null. guess what? I still used 25k. So that ram usage seems to be for the array of object references, and it WILL be in RAM. No ROMming for you!

* netmf uses utf8

hahaha I was worried that encoding as a string would incur a UCS16 penalty, but it does seem that netmf uses UT8 for at least it's const string literals. Hollow victory, though, because once its rom-able, the space is less an issue. Still, interesting fact to note.

Anyway, there it is. I have spent a little more time than I should have on this, and now that I have posted, perhaps I can dismantle my breadboard and move on with more useful things!....

Attached Files

NP2-SP0256-Test-20130525a.zip 805.54KB 8 downloads

Back to top

#2 hanzibal

Advanced Member

Members
1287 posts

LocationSweden

Posted 26 May 2013 - 07:16 AM

Sounds like my old A500 once did when fooling around with speech from within A-Basic. We used phonetics to try and make it speak swedish with some quite amusing results. Very interesting, I sure didn't know that RAM/ROM thing but just took for granted those where stored on flash and not read into ram until upon actually used. That knowledge would probably explain a thing or two that I've experienced when seen those "failed to allocate x blocks" debug outputs. Without having looked in the zip file (I'm on a tablet that can do unzip) nor checked the d/s of that vintaged speech synthesis chip of yours - what exactly are you using for playback? You mentioned a synth and by that, do you mean Netduino PWMs or some actual external h/w not seen in the video? Is there perhaps some kimd of schematics or wiring diagram inside the zip or could you just roughly describe how you wired this thing?

Back to top

#3 ziggurat29

Advanced Member

Members
244 posts

Posted 26 May 2013 - 09:01 AM

...nor checked the d/s of that vintaged speech synthesis chip of yours - what exactly are you using for playback? You mentioned a synth and by that, do you mean Netduino PWMs or some actual external h/w not seen in the video?...

Mmm! that would be useful info.

General Instruments SP0256-AL2, and an LM386. Its on the white breadboard, but I should have angled it forward to the camera. I'm not a photographer at all.

Interfacing was straightforward, I used d0-d7 as an 8-bit parallel port, with d8,d9, d10 to connect to the latch and busy lines.

The output of the chip itself is PWM, which is sent through a passive lowpass filter before hitting the amp. This is a stock datasheet circuit, so nothing really noteworthy there.

But your mention of the Netduino PWMs is interesting. If I'm not yet exorcised of this diversion, I might try my hand at implementing a software version of the chip. The chip's predecessor's datasheet seems pretty explicit about the digital filter's design, but we'll see. I'll probably have to go native to do that, though.

Back to top

#4 hanzibal

Advanced Member

Members
1287 posts

LocationSweden

Posted 26 May 2013 - 09:48 PM

Yes, I believe it whould be very hard to modulate that in managed code.

If I'm not mistaken, you need about 10 times the freq of the highest note. With maybe ~8khz for the "ssss" sounds (?), you would need ~80kHz of PWM.

That is of course no problem in itself but you also need to be around to change the duty cycle every 125us and even if you manage that, there would be very little time left for other processing and I suppose there are quite some.

EDIT: Oh, and this is the song from your video. I find your robotic vocals slightly more rhythmic though ^_^

Back to top

#5 ziggurat29

Advanced Member

Members
244 posts

Posted 27 May 2013 - 05:46 PM

...EDIT: Oh, and this is the song from your video. I find your robotic vocals slightly more rhythmic though ...

haha, yes I'm saving a PWM synth to provide the melody as a follow-on project.

There's more classics in the source for different test cases, but this one initially came to mind since my first test case was saying 'hello' in a loop.

Back to top

Back to Project Showcase

Also tagged with one or more of these keywords: sound, voice, synthesizer, sp0256, metadataprocessor

	General → General Discussion → light up LED when Sound in played Started by dweddle, 08 Jul 2014 microphone, Sound, music		0 replies 2379 views	dweddle 08 Jul 2014
	General → Project Showcase → Netduino Plus 2 --- MIDI Player Started by RonZon, 05 Feb 2013 netduino, plus, midi, player and 5 more...		2 replies 4756 views	RonZon 07 Feb 2013