Vocaloid 5 is the latest version of Yamaha’s famous singing synthesizer. I reviewed the previous iteration of the instrument two years ago (check out our Vocaloid 4 review), and a lot has changed in the meantime.

To sum it up in a few words, Vocaloid became more user-friendly while managing to give its singing robots a touch of soul.

Soulful Robots

Vocaloid is associated mainly with Hatsune Miku, who has now been out for eleven years and is something of a mainstream cultural icon who even advertises pizzas. That said, Yamaha seems to be deliberately trying to get away from cute, robotic and character-driven vocals and appeal more to mainstream producers and media composers. In a way, they’re almost going back to the pre-Miku roots. Two of the first Vocaloid voices, Lola and Leon, were described as soul vocalists. Whether it’ll work out that way long-term will be interesting to see, but in the meantime, we get some compelling new goodies.

So, why do I say that the Vocaloid singing robots now seem to have soul implants? It’s because a lot of work has gone into more accurately emulating the things human singers do when they sing expressively, mainly focusing on two points. One is vibrato, which is now less mechanically even and more convenient to control. The other is the improved envelope feature, which can add various ornamentations and other typical human gestures to the starts and ends of notes, such as the pitch dipping downwards at the end of the last note of a phrase. Likewise, the dynamics controls sound much more convincing now, blending between quieter and louder voice data instead of trying to emulate the dynamics based on recordings which were all made at the same level.

See also: Best FREE Distortion VST Plugins!

One funny result of the new features is that Vocaloid is certainly not stuck at sounding too perfect and robotic. Thanks to the Singer Skill parameter, it can now easily sound sloppy and like a mediocre singer who’s having trouble finding the pitch of notes. In other words, it is not only possible to sound bad in a robotic way but to also sound bad in a very human way. Vocaloid 5 can undoubtedly sound overly dramatic, too, though thankfully an accurate emulation of Ceephus and Reesie is still thankfully out of reach. The one thing that can quickly get over-the-top are the breaths. It’s almost too easy to make them unrealistically loud, and having to set the volume at integers between 0 and 10 makes it impossible to use decimal values like 0.7 or 1.2, which would sometimes be nice.

Friendly Robots

Other than the changes in the way expression works which were explained above, the underlying synthesis engine does not seem to have evolved significantly. However, the user interface has been rebuilt from scratch. The presets are now sorted into two different types: styles and phrases.

Styles cover many types of lead vocals and backing vocals, as well as more exotic things such as chopped vocals and robot voices. A style is a collection of voice settings, from singing skill to FX configuration. Vocaloid 5 includes a full set of eleven standard vocal-mixing effects but allows for them all to be turned off during export with a single checkbox. That’s a neat little feature which makes it very convenient to use the built-in effects as a placeholder while working on the vocal, before exporting the dry vocals for final mixing in the DAW.

Phrases are a lot more interesting than the name might imply. Sure, many virtual singers come with phrases, which are inflexible audio files of, well, vocal phrases. A phrase in Vocaloid 5 is not an audio recording, but rather a short sequence of notes, phonetics, and all the expression settings within each note. There are both phrases with lyrics and wordless vocals. This means that the key, lyrics, and every detail of the expression can be changed. Unlike phrases in the form of an audio recording, they are flexible. I don’t see myself using them much, because I almost always produce songs written by singers, but they might be something that’s new in the world of music as a source of material for jump-starting ideas.

FX settings in Vocaloid 5.

FX settings in Vocaloid 5.

As another user-friendliness improvement, Vocaloid can now also be used as a VST plugin, although it works more like one DAW ReWired inside another than a typical plugin. It’s not possible to play a part in live using keys and the mod wheel to record dynamics changes, for example. Still, this is a big step forward in terms of convenience, and something that makes it much easier to make changes to an instrumental based on what’s happening with the vocal, and vice versa. I should also mention that there’s a lovely series of small tutorial videos explaining specific features and tasks on Yamaha’s website.

So what do all these workflow improvements add up to? Well, back when I reviewed Vocaloid 4, I made a short video where I made Cyber Diva sing a few bars of “Art Is Calling For Me.” It took me almost six minutes to complete the task. I have now done the same thing with Amy, and even while adjusting a couple of additional things which did not exist in Vocaloid 4, I needed one minute less. That’s not a giant difference in a single take, but one that you will notice with continuous use. Sure, Vocaloid 5 is still not as user-friendly as most “standard” synthesizers. However, as I’ve said just about every time I’ve reviewed anything vocal-related, the MIDI protocol wasn’t designed around emulating the way the human voice works, so nothing involving vocals is going to be as efficient workflow-wise as, say, a subtractive synthesizer.

Some features have disappeared since the previous version of the software. One is the backward compatibility for Vocaloid 2 era voices – Vocaloid 5 loads V3, V4, and V5 voices. The other major feature which has disappeared is the cross-synthesis (XSY) parameter, which was used to create a hybrid of two different voices. It was a workaround dynamics control, mainly intended for use with voices with multiple banks. With the dynamics control now apparently using voice data recorded at different dynamics, XSY was no longer needed. However, Vocaloid fans used it to create new voice colors by cross-synthesizing completely different voices, so they’ve complained about it being gone, as well as about the inability to use V2 voices. Thus, the missing features are something Vocaloid fans are going to miss, but to anyone who’s not interested in the characters associated with the voices, they’re no big deal.

Loud Robots

Vocaloid 5 Standard comes with four default voices. Amy and Chris are the English voices, while Kaori and Ken are the Japanese vocals. Amy and Chris are described as soul vocalists, and they both sound quite warm, with their higher dynamics being quite loud, like a good R&B singer should be. Amy even has a bit of a Southern accent, pronouncing the word “I” almost like “ah,” but not quite. The Japanese voices are also warm, but a little more subdued and middle-of-the-road. Ken can give off an indie rock vibe. The voices take up a more significant amount of disk space than the Vocaloid 4 voices did, perhaps because more dynamics and pitches were recorded. There’s a corresponding increase in realism, too. The English voices are much larger than the Japanese ones, but that’s not because they’re better. The reality is that the English phonetics require a lot more data.

Does bundling four vocals with the editor make sense for users who only intend to make music in one language? I mean, I speak no Japanese at all, so would I have any use for Kaori or Ken? Sure. Not for singing lead vocals with lyrics, but for things such as “aah” or “mmm” backing vocals, having four singers with four distinct timbres can come in very handy and allows four-part arrangements.

Vocaloid 5 Premium adds four more voices – new versions of the Cyber Diva and Cyber Songman which I reviewed alongside version 4, and updates of Japanese voices VY1 and VY2, which were the first two Vocaloids created by Yamaha. As of this writing, there’s one other Vocaloid 5 voice available – the female Japanese vocal Haruno Sora, who has “cool” and “natural” voicebanks available.

As Good As The Humans?

So, how does the new version of Vocaloid stack up against its ultimate competition, human singers? Well, if you need someone to sing a song, human vocalists are going to be faster and require less work, plus they’re still going to sound more natural and expressive. Sure, Vocaloid can sound quite soulful now, but getting there is just a lot more work than naturally adding the soulful expressions on the fly. However, short bits of backing vocals (especially if they need to sound like several different people), wordless vocals, chopped vocals and lead vocals for short commercial jingles have become increasingly possible tasks for robots to sing.

I’ve used a synthesized voice in a song I produced last year. I thought a violin part would be better as a vocal, so I tried that with a synth, and it worked. Bringing in the lead singer from another city just to record that as a backing vocal part would have been more trouble than it’s worth. Also, having a voice with a different timbre made the sound more varied and easier to mix. I’ve also used vocal synths as a source of abstract, glitchy noises, which could have been made by recording and processing a human voice, but it would have taken longer to plug in a microphone and set levels than it did to synthesize.

Summary

The robots aren’t quite ready to replace us all yet, but they’re able to replace the singers among us in an increasing number of contexts. Vocaloid 5 leads the pack in the vocal synthesis market, bringing virtual vocalists that sound more human than ever before. The software costs 25,000 Japanese yen (approximately $225 USD) plus tax, and the Deluxe edition with four additional voices is 40,000 Japanese yen.

More info: Vocaloid 5

Vocaloid 5 Review

90%
90%
Brilliant

Vocaloid 5 leads the pack in the vocal synthesis market, introducing us to virtual vocalists that sound more human than ever before.

  • Features
    9
  • Worfkflow
    9
  • Performance
    10
  • Design
    9
  • Sound
    9
  • Pricing
    8
Share this article. ♥️

About Author

This article was written by two or more BPB staff members.

5 Comments

  1. Thanks for the review. The video was the first time I heard a Vocaloid voice without backing and not smothered in effects.

  2. I write 4-part and 5-part acappella arrangements for my mixed voice choir. Many of the singers are not music readers and would appreciate “teacher tracks” so that they can play them at home and learn their particular part. Do you think this software would be suitable for creating these tracks ?

Leave A Reply