You could do this with the faces model and papagayo features in xlights too. Most people use this for the coro type faces or those on P-10 panels, but you can use that for a single channel as well. Just define the color intensity differently for the different phonemes. For a green LED, you could set it at G-255 for harder sounding letters where you want it to shine brightly, set to G-128 for softer letters where it's on but not as intense, and G-0 for resting. You could add several more in there if you wanted too, just depends on how much detail you want to get for each. Since most of these sounds don't last long and the difference in color isn't that great, I would guess fewer is easier and just as good.