UTAU Search and Rescue: Resources

Showing posts with label Resources. Show all posts

Monday, May 19, 2025

On Labeling Palatalized Consonants for Japanese

When it comes to vocal synths, we should strive to remember to just have fun! This article will go in to detail about handling of syllables like "hya" and "kya" in Japanese. If you are happy with how your UTAU or Diffsinger model sings in Japanese, then please ignore this article.

Trying To Make a Diffsinger G2P Phonemizer Plugin

So, hey guys! I'll just be real with you... I haven't actually gotten this to work all the way to the end. But it was really painful to try and get this all to work so... I'll share my current knowledge with you.

Diffsinger does not need a traditional phonemizer. The timing is all taken care of by the duration model. Technically with a good enough dsdict, you'll be happy with just using the default DIFFS phonemizer. Heck, you can just type phoneme hints and be happy.

But by creating a phonemizer, you are able to make it so that people can type in any word (or nonsense) and get results... Whereas using a dsdict method means that if a word isn't in the dsdict, it just doesn't know what the heck is going on.

There are three ways to make your phonemizer usable as far as I know. The first is by handing it over to the people in charge of OpenUtau for them to put into the newest release. You really wanna make sure it's good before you do that! The second is by recompiling OpenUtau after creating the phonemizer. This would be great for testing, but very annoying to share. The final method is releasing it as a plugin. I feel a little bad I'm writing this before I actually complete the project, but I just keep running into problems that I'm not equipped to handle given I just woke up.

So, if you're interested in learning how to do this, keep reading!

As a precaution - Anaconda and Visual Studio are pretty hefty. If you have a computer with little hard drive space, you may want to sit this one out because it takes up a lot of space. That's why I've been doing this all on my laptop and not my tablet.

I am a Colab girl. I love Google Colab. However, I was unable to get the Colab that had been provided to work at all... Though shout out to the person behind it because I wouldn't have been able to figure out the correct requirements without it.

Credits:

OpenUtau (and g2p code): Stakira

Google Colab (that I used to figure out the correct requirements): LotteV

Repo to create phonemizer plugin: Tyler (spicytigermeat)

Update 2025/03/05 -

I made it work!!!

Art Tutorial - Easy Painterly Anime Skin

I got into art again.

Something that really annoyed me was that I just couldn't figure out how artists got the skin to look the way that they do.

I don't have all of the answers! But I do have a workflow that can get you some pretty nice results!

Credit to Fungzau on YouTube for making the videos that taught me this.

Using English C+V As a Voice Copier

I never thought I'd be C+V anything!

Using English C+V As a Voice Copier

Something I wish I had learned a long, long time ago was to just let people have fun. How people use UTAU doesn't affect me. If the done thing is suddenly to put the green line after the red line, all that means is that I'll figure out it's a thing everyone does for some reason and just silently fix it for my renders.

As intended, English C+V banks are unusable for this blog. I can't run OpenUtau nearly as well as OG UTAU. I'm deeply uncomfortable with the results of non-AI phonemizers. The only intended way to use these banks is with a specific OpenUtau phonemizer.

But these banks are getting popular. I need to know how to use them!

[Art Tutorial] Heads Up! The Biggest Part of Drawing Bodies

This resource doesn't belong on this blog, but why not put it here?!

What are heads?

This sounds like a very silly question. Heads are heads! However, in art anatomy, heads are one of the most useful yardsticks.

The head can be used to divide a body up into parts. If you focus most on realism, then to should expect a body to be roughly seven and a half heads. However, there are many styles out there!

Some characters are much taller than eight heads and some are smaller than four!

Let's look at three different characters and divide them up!

Realism

While this is a stock model and not a character, it's a good idea to start here. For me, real people are the easiest to learn to draw. There's people everywhere! And even if there weren't, I can quickly guesstimate proportions seeing as my hand is almost vaguely the size of my head.

For this example, I copy and pasted the model's head with some lines. If you are drawing traditionally, you can use a ruler!

Image from Pexels

As you can see, the body actually divides quite nicely! While the model has her hands in her pocket and is wearing high heels that make her feet look to be one head length as opposed to two, this image demonstrates the proportions quite well.

These proportions are not the same proportions as in cartoons and anime. If you try to draw a cartoon character with these proportions, it may feel strange!

At the first head line, we have the bust line. This is where the breasts go. The second head line is roughly where the hips are, though some sources say that the second line is where the bellybutton goes. In general, this is where the elbows hit. Both parts of your arm tend to be roughly the same length.

From there down, using heads to mark things off gets a little wonky. However, you can figure it out with the baseline you've already set!

When learning art, it's very helpful to simplify things.

Here we can see that the crotch is roughly the halfway point of the entire body. This means that when drawing a person, the legs make up half of the body!

You can test this next bit yourself by pulling your knees up to your chest. Your shin and your thigh are roughly the same length!

There is much more detail to go into, but let's try drawing a character with just the knowledge we went over here.

I added a line for the shoulders, but tried to keep the information I used limited to what I had already shown! I added little balls for feet and the knees. Let's add shapes to connect each line! For the arms, let's make it simple. Draw the circle for the shoulder, then draw a little circle for the elbow at the halfway point of the bust and the hips. Make a line between the shoulder and elbow, then make a line the same length downwards towards where the hand will go. The hands can just be squares for now!

Now, we can fix the feet a little and connect everything with tubes.

The proportions look a little awkward and blocky! This is okay - people are awkward and blocky. Let me fix it up a bit and give her clothes!

Now, for an anime fan, this feels a bit wrong! But when comparing to the original image, you'll see how close the two bodies are. Humans are just not the same shape as cartoons!

Anime Style

There's not one anime style, of course. But let's try with Miku!

I think this is the official Miku box art pose

As you can see, Miku is only six heads tall, not counting the tips of her toes! Her proportions are much more simple. Because she's six heads tall, her halfway point is an even number of head lengths. Her legs take up three heads and the knees are at the halfway point of those heads. So her calves and thighs are both roughly 1.5 heads tall.

As someone who learned traditional proportions, the bust line was what kept me away from making good looking anime art! Instead of the bust being at the first head length, the bust is instead at the center of that head length. The second head length hits her waist, which is roughly where her elbows go. It's a bit difficult to figure out if her elbows perfectly hit her waist because of her weird angle. However, you can assume that the two parts of her arm are roughly the same length!

Princess Peach

I knew I wanted to include a Mario Princess here, but I was worried! The skirts obscure most of what is happening. However, Mario Odyssey includes a swim suit outfit for Peach!

With the exception of her feet, Princess Peach is only four heads tall! The rules for arms and legs are the same as before. The pose makes it hard to see her elbows, but they are hitting her waist just like with the real person and with Miku.

Any Style You Want!

Using art software or a ruler, you can find the head lengths and where those heads hit for almost any art! There's rules that seem to always stay the same, but there's also stuff that always seems to change.

Once you start to learn the rules, it gets easier and easier to follow them.

I hope this was helpful :) !

Saturday, March 16, 2024

The Easiest Way to Make Realistic UST Files Using Praat!

I should have explained this sooner!

The Easiest Way to Make Realistic UST Files Using Praat!

I am incapable of understanding pitch. I can hear it and replicate it with my voice, but I can't understand it. This may sound like I'm just not trying hard enough until I explain that it's part of the same learning disorder that leaves me unable to mentally do basic multiplication or division. I'm like a computer with no math library!

So... How do I make usts? I started by using the Melodia plugin for Sonic Visualizer and REAPER. It was wonderful to be able to do that, but it was painfully slow and finicky.

Eventually, I stumbled upon Hataori-P's work with Praat. And from then on, I could not be stopped! Here's an example!

Before the read more break, I will explain that this requires source audio. I've been creating more and more usts for this blog using my voice, but the vast majority of what I do involves vocals isolated from songs using AI. The script I use is specifically for realism and uses CVVC style aliases. The original script was for CV and VCV.

This method requires a CVVC style bank with diphonic aliases. This will not work with systems like VCCV as this script can only create combinations of two different phonemes, not anything like CCV.

Mandarin Chinese Phonemes for English Speakers

Here is the pinyin list! The files must be saved with the pinyin names to work with the base oto. Some of the phonemes might actually break filenames, but if you figure out a way to encode the filenames in phonemes and fix the base oto to reflect that, I will happily link to that as an option!

Here is the phoneme list. You might be able to set up OREMO to have it save the names from the pinyin list with the phoneme list as comments, but I was never able to get that to work with my English lists before I found the magic of using words. (I still couldn't get the comments to work, they just weren't needed anymore!)

(Note: I have no idea if I got the syllables like bo, po, mo, fo correct. They're in their own section under 'o', but have the same phonemes as those under -uo.)

I need to make a few edits to the base OTO. Not many, just a few. But I'm waiting until some more people record this list! Why? Because I'm doing this on my phone, but I need to use my computer to access the base oto :P

Use My Voice to Make Your UTAU Sing Cantarella Realistically!

Can you use Praat instead of VocalShifter?! How can you make a CV UTAU voicebank realistic? I'm actually using SEO tricks I learned at LearnMMD!

Use My Voice to Make Your UTAU Sing Cantarella Realistically!

Hey there! Before we get started, two shout-outs! First, to Hataori-P. He was able to make the scripts that make this all possible! Secondly, thank you to Kwirk Music for actually suggesting a blog post tutorial instead of a video! (thank you so much, videos are much trickier to do than blog posts!)

So, what does this tutorial aim to do? This:

I was a big dummy - Multi-pitch CVVC banks are broken by the automatic button.

Hi there! I'm on mobile at the moment, but I felt it was important to make this announcement.

Almost every single time I complained about a Multi-pitch CVVC bank being broken, it wasn't the bank's fault. I had left the automatic button in shareware utau on at all times as that button automatically converts CV to VCV and shows exactly what samples is playing. Really useful!

However, it breaks banks that include VC samples. In a single pitch bank, you don't notice as what it does as it just gives up and refuse to apply a suffix. With Multi-pitch, especially when every sample has a suffix, that means stuff just doesn't play.

With the number of articles on the blog, I have no idea how many places I've made this mistake. If I find an article where I messed up, I will edit it to point out the mistake. However, like I said, I have no idea what articles have the issue. Articles posted after this may have the issue, as I have articles written and scheduled until 2022.

I will try to do better in the future!

Wednesday, December 2, 2020

Hopefully this isn't Goodbye, Clyp.

Clyp isn't shutting down as far as I know, but there is writing on the wall.

Recently, Clyp announced that they were moving to a fully premium model. SoundCloud and Photobucket have already pulled that. SoundCloud backed down, but Photobucket just seemingly stopped existing aside from emails telling me my bucket was full.

There's virtually no reason to choose Clyp over SoundCloud, so I do not see a reason that Clyp will make enough money to keep the lights on indefinitely. (Especially seeing as people mention using Discord to share all files under eight megabytes.)

I downloaded everything from my middle school Photobucket and deleted enough to get them to stop yelling at me for having too much stuff, so my first instinct was to take all of my files off of Clyp and move them to Google Drive.

Well, there's no option to download everything at once on Clyp. Even if there was, I would need to go through every single one of my 145 Clyps and enable downloads for each one.

Even if I downloaded everything, there's another hitch. I just rendered off WAV files and dropped them onto Clyp. So, on top of downloading all 145 Clyps, I would need to convert 145 Clyps to MP3 files. And then, on top of all that, I would probably need to clip the sixteen second samples down to my normal six seconds for storage reasons.

If enough people buy Clyp premium, then Clyp will keep their lights on and I won't have to do anything. I do hope that enough people buy it - not just for my laziness, but also because I don't want anyone to lose their job because their company folded.

I will slowly download all of my content from Clyp so that if their site goes down suddenly, I will have all of the audio files. But, unless that happens or Clyp disables embeds, all of the articles with audio from Clyp will continue to have audio from Clyp.

Here's to hoping that Clyp makes enough money to keep the lights on!

Monday, September 21, 2020

Downloading from Axfc In The Year of Our Lord 2020

Well, I'm glad I can make this article, but I'm sad that I have to.

Downloading from Axfc In The Year of Our Lord 2020

This advice may stop working tomorrow, and it might stop working next year. It may hold well for a few years.

Of all the websites that we say "the company is killing it", "they hate their users", and so on in America... None of them are as infuriating as Axfc.

I don't know if it's a Japanese thing, or if even Japanese people can't stand this. But, with the help of nmasao and a few blog posts, I figured out how to make it work right at this minute.

All the steps and so forth are under the "read more".

Do I need an expensive microphone?

Let me get it out of the way - I will always suggest the AT2020 USB. It is more expensive than a comparable microphone from Blue, but I never really liked how the Blue Yeti in particular sounded. Both microphones seem really expensive at over $100 (Yeti seems to have dropped to sub-100 used), but that's because they include an audio interface inside of the body of the microphone.

I didn't know what audio interfaces were, at all, for probably my entire career making an UTAU. An audio interface will allow you to get a higher quality microphone for less money (AT2020 USB goes for $149, but XLR goes for $99 according to Amazon), but the audio interface (especially with phantom power) will likely cost more than just getting a really nice USB microphone.

If you really, really want to spend money on a microphone, ask yourself what you will use it for. If your only answer is UTAU, I would cap spending at under $200 USD and go with an AT2020 USB. People do buy thousand dollar microphones for UTAU. (Of course, used and from eBay so it's closer to $600.) If you have the money, and you can see yourself using it for singing, streaming, or podcasting, then go for it! But please never feel like you need to spend anything for this hobby.

inb4 mae wants to drag us all down to her level - remember that there are different economic levels. You may be able to afford a lot of audio equipment, but not everyone can. This article is to help people who can't afford that audio equipment be happy with what they can get.

Do I need an expensive microphone?

When people in the fandom want to insult you, one of the first jabs is microphone quality. It doesn't even have to be true - they just go there. That can make you feel like you will never get anywhere in the fandom without paying big money. That isn't true!

Below, I'm plopping down a comparison between three banks recorded with three different microphones. I made choices I'm not exactly proud of that keep the comparisons from being one to one exactly, but they're close enough.

Which sample do you like best? Which feels the most clear?

To me, it's the second sample. The first sample is a little tinny and in some places abrasive to the ears. The third is muffled (and is recorded at C#3, but that's irrelevant here).

The second sample is clear. It has some issues with sibilants, but that can be edited out using equalizers.

Knowing that, how much do you think each microphone cost?

For the first two, it's a big, whooping zero (additional) cents. The first microphone was my laptop's embedded microphone - the same one I use for all of my "let's play" videos on YouTube. The second microphone is my phone's microphone. The third... I tried out someone's setup and I have no idea how much it cost. But, I believe their audio interface alone might have cost more than my phone, which also plays Stardew Valley and calls my mom.

If you look at this like an audio engineer, the stupidly expensive microphone is better and sounds the closest to real life, of course. But, there's two important notes here. The first is that the fandom tends to celebrate bright vocals compared to darker vocals. The phone microphone UTAU would likely be more popular than the expensive microphone UTAU due to that. The next point is just that the difference in quality is not worth the money to me. It's not night and day - it's just slightly better. This isn't even comparing it to the AT2020 USB, which likely costs only as much as the audio interface itself. (My stand broke, so the best I can do is give you a sample from my UTAU's last bank. The AT2020 bank definitely would be more popular than the expensive microphone's bank due to tone.)

How can I make my current microphone better?

There are two main things that will affect your recording quality: frequency response and background noise. Other than using equalization after the fact, the only thing that will change frequency response is changing the microphone itself. Background noise is a lot easier to deal with. Here's a quick list of ways to help with it for free:

Turn the volume/sensitivity of your microphone down in volume settings if possible. You'll need to be a bit louder, but it will stop it from picking up a lot of background noise.
Turn off all fans and heaters.
Don't put your microphone on the same surface as your computer without some kind of buffer, like a pillow. (Your mileage may very - people told me this, but all I really noticed was my microphone falling on my face.)
Try using a blanket as a makeshift sound booth. (Never worked for me, but I only tried using OREMO, meaning my computer was under the blanket with me... Getting hotter and making the fan run harder... I was a dumb kid.)
If possible, hide in a closet. If there aren't enough clothes to buffer sound, put up quilts or blankets.
Record as far away from your computer as possible.

I did all that and it's still noisy!

If you tried to get rid of noise as best you could, but your samples are still noisy, try making a whisper bank or a soft bank. That layer of noise actually improves a whisper bank by allowing it to stay nice and whispery as opposed to getting normalized to being quite loud by the resampler / wavtool.

However, don't feel like you have to do that. You can make a strong kire bank with your muffled samples. You can experiment with noise removal (ReaFIR is bae).

UTAU isn't a contest. You can make amazing things without paying a single cent (assuming you have a phone or an embedded microphone on your computer). Don't let people bring you down just because of microphone quality!

Monday, December 2, 2019

RSL English Recording List Review

Want your UTAU recording list reviewed? Leave a comment or send me a message on dA or Twitter! (annamaeblythe)

RSL English Recording List Review

Here's a fun new thing I might do monthly if enough people write in with their lists! Recording list reviews. Except for the fact that I will be using UTAUs created by the people who made the recording list to demonstrate the recording lists, these articles will be completely divorced from the UTAUs themselves. No character images or bios, just technical details.

What is the RSL English Recording List?

So, I get manic and download things. That's just part of being me. I have no idea where this bank came from, but I vaguely remember seeing the name on an old Mediafire page.

There is absolutely no information that I can find on this UTAU or on this list. I can read the name of the person who made it in the Readme, but I have no idea who they are because I can't find information. Should I make this article? Probably not without permission... But I don't think I'll find the person to get permission, so this is what we got. I believe the UTAU's name is Lia Skye, but as I said, I don't have any proof for anything at all.

This list hasn't been used much. I believe that there is a sentiment that if you aren't following CZ or Hua, you're wrong. I stan my list hard, and it is tempting to just say "use my list plz". However, doing that makes me as bad as the people who talk down to people for trying their own path. I'd be talking down to myself with that kind of attitude!

Voicebank structure

When I see a single pitch separated into several folders, I hiss. CZ's method involves several folders, so I can't fault anyone for following her lead. That being said, this bank is divided into C, CV_E, CVC, and V.

An important note is that the easiest way to describe this UTAU is 単独音, or "solo sound". Kidding, it's CV. But tandokuon describes it better, because there is both CV and VC within CVC. Each recording is a single, unstringed syllable.

C is a bit strange. The vowels are presented in CV format, which means that you can only realistically get "- C" from them, and not "C -". This is made redundant by the CVC folder. Is this because of the "separate sections, separate folders" philosophy? Probably, but I never fully understood the appeal. It allows you to be in separate mindsets for "- C" and "CV", but you are able to replicate that in SetParam as long as you don't edit the OTO in UTAU. Once the base is done, I see no real use for this kind of structure. (I blame this on CZ, not on the person who made this list.)

CV_E is for consonants that cannot realistically have a "V C -" sample. This is the first thing that can pose an issue for someone obsessed with realism and clarity like myself. But I calmed my behind down the moment I realized that I'm totally fine with how Japanese CV banks work in UTAU. This section is lacking "V C" that a stringed list would give you. However, overlap will blend the preceding vowel and the consonant really well.

CVC is, as the name would suggest, CVC samples (or CV and VC samples, depending on if you go by the recording name or the alias name.) There's a rather esoteric discussion to be had about the merits of stringed and unstringed, but the important thing is that it looks like, at first glance, all required samples exist.

V is simply standalone recordings of each vowel. This is really useful and a nice addition to have.

What is missing?

I don't believe in triphones (or quadphones) in English banks. Why? Because I'm a control freak who wants to make sure that I have control of every single phoneme. So, my criteria for a full bank is simple:

Does the bank have "- V", "V", "V -"? (RSL does.)
Does the bank have "CV", "VC"? (RSL does.)
Does the bank have "- C" and "C -"? (RSL does.)
Does the bank have all common consonant clusters? (Sadly, not here.)

Having C - and - C means that I am able to smush together standalone consonants to make Lia sing something like "stoves".

audio

How is the phonetic system?

Let me start by saying that this isn't recorded with an American accent, so anyone who speaks with an American accent will be confused. I can't find any documentation with what is supposed to sound like what, so I'm just guessing. However, I know "er" universally means "@r as in bird". This bank has it sounding closer to "V".

This isn't a fault of the list itself, and anyone who records it can record it in whatever accent they choose to. However, if you notice that it sounds off, it isn't the list's fault. It's my fault for not understanding British English.

The system is intuitive enough that I don't think I need to remake the OTO with my phonetic system. I'll be annoyed that the symbol for an aspirated "t" is used for "T" and that an umlaut vowel is used for "dZ", but I expected to be way more in the weeds with this than I am. "N" is missing from CVC, and I believe "ir" is pulling double time for "I@" and "e@".

vowel list

How is using the bank?

First, let me complain about a dumb nit-pick. I really dislike when there aren't spaces between the phonemes. For example, I want to input "l e", but to get the bank to recognize it, I have to write "le". That's the silliest little nitpick, and I could fix it in the OTO in no time.

The current OTOs are set so that all VC are actually "V C -". Depending on the exact usage, this may need to be fixed to add "V C". (Note, V C - includes the consonant inside of the pink with silence after it in white, whereas with V C, the consonant (or the space before it with plosives) itself is stretched out in the white.)

An issue with diphthongs is that you need to make sure that CV does not contain any of the second element, and I am needed to fix that a bit. That's why my list lacked them. But, for some accents they're a requirement.

I feel like one way to improve this bank may be to have separate OTO entries for diphthong pairings that will be used in conjunction with VC and those that will be by themselves. See? I don't have to worry about this with my list!

But, it's actually fun to use a bank that uses this list. There's less worries about having VC transitions for everything, and I do like puzzles.

Who is this for?

... I don't know. CV English sounds pretty tempting. It's an interesting puzzle, but you'll get more bang for less work with a list like mine.

I don't feel comfortable distributing someone else's list without permission, and I don't think I'll ever get permission from someone who seemingly doesn't exist. That means that for now, no one else can really try this list out. But, if I could point you to a link, I'd say that it's a fun experiment to see how CV English fits into the larger picture of UTAU.

Monday, June 17, 2019

Japanese to English Equivalency Chart

Which Hiragana make which sound? How can I make my UTAU English without recording an English bank? How can I make Defoko sing English?

Japanese to English Equivalency Chart

This is an appendix to my Japanese CV English in UTAU videos in which I show exactly what you use to get the sound you want.

Note: Shortly after writing this, I realized that it was impossible for me to create the video without any monetary input from my viewers. If you want the video series I was intending to make happen, please donate to my Patreon.

Monday, May 19, 2025

Thursday, February 27, 2025

Update 2025/03/05 -

Sunday, January 5, 2025

Wednesday, December 11, 2024

Using English C+V As a Voice Copier

Tuesday, October 1, 2024

What are heads?

Realism

Anime Style

Princess Peach

Any Style You Want!

Saturday, March 16, 2024

The Easiest Way to Make Realistic UST Files Using Praat!

Wednesday, February 15, 2023

Saturday, May 14, 2022

Use My Voice to Make Your UTAU Sing Cantarella Realistically!

Friday, July 2, 2021

Wednesday, December 2, 2020

Monday, September 21, 2020

Downloading from Axfc In The Year of Our Lord 2020

Monday, February 10, 2020

Do I need an expensive microphone?

How can I make my current microphone better?

I did all that and it's still noisy!

Monday, December 2, 2019

RSL English Recording List Review

What is the RSL English Recording List?

Voicebank structure

What is missing?

How is the phonetic system?

How is using the bank?

Who is this for?

Monday, June 17, 2019

Japanese to English Equivalency Chart