It's best to think of how much I have left to do in terms of months!
Wiki Wednesday #309 - VoiceFace's Fuyuki / フユキ
Fuyuki is a horror themed utau. She has based on the shattered dreams of children. Some of her artwork may be triggering to those who have an aversion to blood.
Who is Fuyuki?
Art from site |
Okay but why do people have one multilingual Diffsinger model with several vocal modes instead of several different models?!
With storage being so cheap and with me having experience with many UTAUs that were over 400 megabytes, storage space alone doesn't seem like the correct answer here. I have seen people figure out how to get the size down to smaller than a normal VCV bank. Even if all of the models were so small, people would probably still only end up with one model each.
I knew Diffsinger was going to be my special interest, so I got a gaming PC. It takes about twelve hours to train an acoustic and variance model using the Colab locally. Which, seriously, I beg you. If you don't know Python, be a silly billy like me and just install Docker. You'll lose out on the latest features, but you won't have to use any brain power! Besides, if it's good enough for everyone who can't train locally, why wouldn't it be good enough for me?
For me, it's not a big deal to train a model at all. I'm not the average Diffsinger user.
I had trained several banks using Google's GPU and it was pain. Legitimate pain! You can only train for about an hour before you get kicked off. Once you're kicked off, you need to wait a day. Then sometimes, even without using anything, the GPUs can be strained and you can't connect at all at certain times! If you just want to train one model, this is annoying but okay. You can just run the Colab for an hour every day for two weeks and get your result for free.
An alternative to make it go faster is to pay Google to give you priority. The price I've heard quoted for one model is between fifty and a hundred bucks. Ouch! Imagine if you had accidentally messed up the file structure and made a big mess after spending all that money! (Thankfully, I was already on my gaming PC when I made that mistake. No money lost, only a few hours lost. Not weeks!)
When doing cross language synthesis, it would be a bit silly to make a model for every separate language. You would be training the same data set of yours with different data sets made by different people. Smashing them all together makes more sense!
For vocal modes, the difference between training separately and together are more or less subtle. Not worth a month of your time or hundreds of dollars to preserve if you're at the mercy of Google's GPUs!
So, for most people, it just doesn't make sense to have more than one model. It just takes a lot of resources to make a single model. I'm really lucky I was able to be able to afford my gaming PC because it allows me to work differently from everyone else. It was very much worth it for me!
How Are Fuyuki's banks?
The people behind Fuyuki were very kind. They meant for her to have a strange and distorted voice that they achieved using audio effects. Thankfully, they also put the original recordings up also. In addition to those three pitch VCV banks, there is also a three pitch CV bank.
First we will look at the edited VCV. The effect is very cool! It is like Fuyuki is trapped in an old radio.
We will look at her clean VCV next. Her voice is beautiful and adorable!
Finally, we have her CV bank. Her voice is still lovely!
Where Can I download Fuyuki?
You can find her on her official site. She is cool!
No comments:
Post a Comment