UTAU Search and Rescue: Wiki Wednesday #312 - VoiceFace's KIGAI / 危害

Photosensitivity warning! KIGAI's download page has a flickering background. The contrast is low enough that it doesn't look too dangerous, but I felt motion sickness when it popped up and I didn't expect it.

Wiki Wednesday #312 - VoiceFace's KIGAI / 危害

KIGAI is a dangerous broken Android. He is made of scrap parts.

Who is KIGAI?

Art from site

Here's a question that I learned the answer to!

Can Diffsinger do Screamo? Actually, apparently, yeah.

Someone reached out with their screaming data set. I said, yeah, sure, this is really interesting! Well, I really wanted to know the answer as soon as possible, so within like 24 hours I had an answer. They sent me a large data set, but I only used one file. It was just random words sang at different pitches. I put that in one folder and my own English data set in another and trained those together.

I'm not a big fan of that kind of stuff, so I have no idea if it was what the voice provider was expecting. But the fact that it worked at all was amazing!

Now, this is just my theorizing. It could be debunked by labeling the entire screamo data set and training it on its own. I'm going to avoid that by asking the person who gave me the data set if they'll sing my list again in their normal voice. So, unless someone else tries, this is my understanding.

It worked! I already said that, but it did bear repeating. I will start by saying that the Screamo voice color was able to sing and sounded better at pitches that the normal data set couldn't really hit, so this may debunk any of my theories.

However! I think what happened was that Diffsinger took the normal voice and just kind of... Applied RVC in a sense? So it knows what "ba" at C4 sounds like with the normal voice color. So how do we figure out what "ba" at C4 sounds like with a data set that lacks tonality in general? Well, this vocal mode has spectrograms that are borked. Let's make "ba" borked by making the spectrograms look like the ones in the training data!

This was just shocking to me because of how it generally works. If there's no data at a certain pitch for a vocal mode, it just becomes the other vocal mode. That's it. But either the Screamo data set really had data for all of the pitches or the act of borking the data set THAT HARD will actually allow you to have a screamo Diffsinger without the need for all of that information that it usually needs.

This would explain why the "make an utau sing and import it" test was so horribly bad. There were borked spectrograms that ruined the entire vocal mode. Only... In this case, you WANT a borked vocal mode.

So yeah! If you do a good enough job making the spectrograms unreadable, you'll have a vocal mode that sounds like it has a broken spectrogram. In the case of death metal growls, that's a good thing!

The normal vocal mode looked to be unaffected, so this is definitely something that could be used to make a super cool Diffsinger model! I wonder why I haven't seen it before by now!

How is KIGAI's bank?

KIGAI has a one pitch CV+VV bank. There is an English CV+VV bank included, but it was made by trying to edit the original recordings. Given my methodology, it would require editing the WAV files to get working and that is outside the scope of this article.

The voice is interesting! The files were processed to reflect the fact that KIGAI is a poorly functioning android.

Where Can I Download KIGAI?

You can find him on his official site. He is unique!

UTAU Search and Rescue

Wednesday, February 26, 2025

Wiki Wednesday #312 - VoiceFace's KIGAI / 危害

Wiki Wednesday #312 - VoiceFace's KIGAI / 危害

Who is KIGAI?

How is KIGAI's bank?

Where Can I Download KIGAI?

No comments:

Post a Comment