Does Vowel Blender require an audio carrier signal the way a vocoder does?

A vocoder modulates a carrier with the spectrum of a voice signal — two audio inputs, the result heavily tied to the carrier's harmonic character. Vowel Blender applies a fixed formant filter derived from phonetics research to whatever signal you insert it on, one audio input, no second source required. The formant frequencies come from the voice model and pad position, not from an analysis of external audio. An envelope follower can make the pad respond to input dynamics, but that is modulation routing, not vocoder carrier processing.

What happens to a mix source with strong low-frequency content when Q is pushed high?

High Q narrows each of the four formant bandpass stages, which concentrates gain at specific peak frequencies and reduces energy between them. On a full-range source, those between-peak nulls fall in the 60–300 Hz range where the ear is sensitive to level changes, thinning the bass body audibly. The LOWS crossover resolves this by routing everything below the crossover cutoff directly to output, bypassing the formant chain entirely. Set the cutoff to 200–300 Hz on drum buses or full mixes; the Linkwitz-Riley topology sums cleanly at any crossover setting without phase issues.

Can the motion recorder path be part of a saved preset?

The recorded path is serialised as XML with the preset state, including the (x, y, t) tuples, loop mode, sync setting, and the active shape if a procedural shape generated it. Loading the preset restores the path and automatically sets the mode to Replay. A preset saved with a hand-drawn motion path reproduces that exact gesture on any source, in any session, regardless of host tempo — though tempo-synced loop duration scales to whatever BPM the project is running at when the preset loads.

How does Shift differ from the pre-formant Pitch control?

Shift multiplies all four formant frequencies by a semitone factor without touching the audio signal itself — the formants move, the source pitch does not. The pre-formant Pitch toggle transposes the source audio via phase vocoder before it reaches the filter, leaving the formant frequency map at its set position. Combining both at opposed values — source transposed down 12 semitones, Shift raised 12 semitones — produces a dissociated register effect where a low-pitched source feeds high-pitched formants, a relationship no natural vocal tract produces.

What source types produce the weakest vowel character through this filter?

Sources without substantial harmonic energy above 1 kHz return minimal vowel output from the formant bandpasses — the peaks at 2 kHz and above that define front vowels like EE and IH have nothing to shape. A 60 Hz sine bass, a pure sub, a heavily low-passed pad: these will pass through the formant filter largely unchanged. Drive adds harmonics before the filter and can compensate for sparse spectral content, but at the cost of committed saturation character. The Breath knob adds a noise component shaped by the same vowel filter independently of input level, which provides vowel texture even on spectrally sparse sources — but that texture is additive noise, not filtered signal.

UB DSP Vowel Blender [WiN]

Daniel Holden

4 weeks ago

UB DSP Vowel Blender vocal filter plugin interface featuring interactive vowel morphing controls, formant filter spectrum display, XY vowel blending pad, and modulation shaping for creative vocal sound design.

Product: Vowel Blender
Developer: UB DSP
Version: 1.0.0
Format: VST3, AAX, CLAP
Requirements: Windows 10 or later
Source: ubdsp.com/vowel-blender

Download (7 MB)

Vowel Blender is a phonetic formant filter built around a 2D IPA vowel pad — ten vowels sourced from Peterson & Barney and Hillenbrand phonetics datasets, each running through a four-formant ZDF SVF cascade at 24 dB/oct. It sits between sound design and performance: the XY pad is playable live, automatable, and fully modulatable, while the motion recorder captures and loops gestures in tempo. Seven voice models shift the underlying formant frequency tables without touching pad position. The single differentiator is the combination of research-accurate formant maps and a gesture-loop system that no other formant filter ships together.

Key Takeaway

A sustained source with harmonic content above 1 kHz activates it — pads, leads, bass with grit, clavinet, processed vocals. The motion recorder and nine-source modulation engine displace the workflow of manually automating vowel position each session. Static vowel placement is available but undersells the tool; the plugin is built around moving formants, not holding them. A pure sine, a clean sub, or anything that needs its consonant structure intact bypasses the core value entirely. Engineers who work dry and rarely touch filter automation can skip it.

ZDF Cascade, Log-Interpolated Between Vowels

The filter engine runs four parallel bandpass stages per channel, each built from two cascaded ZDF State Variable Filter sections — 24 dB/oct slope per formant, Q^0.7 gain compensation keeping perceived loudness stable as resonance climbs. Formant frequencies interpolate in log space between the four nearest IPA reference points as the pad cursor moves, which keeps the morph perceptually smooth rather than arithmetically linear. The consequence: vowel transitions don’t lurch through mid-tones that sound wrong relative to any natural phoneme.

Q above 1.5 sharpens formant peaks into filter whistles — controllable for science fiction processing, costly on bright transient sources where the peaks become audible artifacts. Below 0.8, the peaks broaden enough that individual formants lose definition; the sound stays warm but phonemically vague. The sweet zone for speech-convincing results sits between 0.8 and 1.3, which covers most mixing applications without crossing into obvious synthesis territory.

Drive feeds a soft pre-filter saturator that pushes harmonic content into the formant chain before the bandpasses touch it. On sources thin above 2 kHz, this is functional necessity rather than character choice: pure fundamentals without upper partials produce little vowel output from the filter. The soft clipper auto-compensates output gain, so heavy Drive doesn’t tip levels downstream — but it does commit harmonic character that can’t be undone inside the plugin.

Voice Models and the Formant Frequency Tables

Switching voice models changes the 10×4 frequency matrix the filter draws from — ten vowels, four formants, scaled to the anatomy of the selected model — without moving the pad cursor or altering any modulation state. Man 1–3 pull from Peterson & Barney 1952 (76 male speakers, Bell Labs); Woman 1–3 and Kids pull from Hillenbrand et al. 1995 (139 speakers). The frequency spread between models is wide enough to be functional: Man 2 baritone sits around 500 Hz F1 on an open AH, Kids pushes that same vowel above 800 Hz F1. On identical pad positions, the two models sound like different-sized acoustic bodies producing the same phoneme.

Stacking multiple Vowel Blender instances on duplicate tracks with different voice models — Man 1 on one, Woman 2 on another — produces formant-divergent unison. The slight inter-model variation creates a chorus effect denser than detuning alone, because formant relationships shift rather than pitch ratios. It costs CPU per instance, and the technique requires gain management at the bus, but it’s a different category of widening than what a standard chorus plugin generates.

The Kids model is the sharpest-cutting of the seven — highest formants, most pronounced upper register shift, least neutral. On a synth pad it adds a quality closer to toy piano than human voice. Useful deliberately, disorienting if chosen without intent.

Motion Recorder and the Gesture Loop System

The motion recorder captures XY pad positions at 60 Hz and stores them as (x, y, t) tuples. Replay locks the playback head to host PPQ position, which means the vowel sweep arrives on the bar grid rather than drifting relative to transport position. Thirteen procedural shapes — Circular, Diphthong, Figure-8, Heart, Spiral, Square Spiral, Star, and others — generate ready-made paths one click away, each producing a different vowel traversal pattern without requiring a recorded gesture.

Diphthong moves between two specific vowel positions and back. All Vowels sequences through all ten IPA points in order. Square Spiral steps through right-angled segments instead of smooth curves, landing on hard vowel jumps that read as rhythmic articulation rather than smooth morph. These aren’t interchangeable: the geometric paths hit the corners of the vowel chart where the most distinct phonemes live, while the smooth paths spend time in the perceptually blended middle.

PingPong loop mode is the most musically useful for sustained sources — the path reverses at the end rather than snapping back to the start, which removes the seam that forward looping produces on audio. Free Hz mode lets the path run independent of tempo, which suits ambient work and sound design but breaks rhythmic alignment. The Snap parameter, shared between live drag and recorder playback, magnetises the cursor toward IPA reference points — at values above 0.6, recorded paths stop mid-vowel rather than passing through interpolated mid-phonemes, which tightens the articulation on rhythmic vowel patterns.

Nine Modulator Types, Intermodulation Supported

The modulation engine processes all active modulators in dependency order each block, which makes intermodulation functional rather than theoretical. An LFO set to modulate another LFO’s rate produces a tremolo whose speed itself moves. An Envelope Follower modulating a Step Sequencer’s rate ties the rhythm of vowel steps to input loudness. These chains are limited to 16 simultaneous modulator instances across the whole plugin.

The MSEG editor draws up to 128 points with five segment curve types — Linear, Smooth, SCurve, Step, Hold — plus loop markers and a randomise function that scales complexity to the current snap grid. This handles shapes that LFOs cannot: a two-bar build that holds at peak before decaying, an asymmetric ramp that hits an EE on beat 1 and returns slowly to AH before the next bar. The step sequencer runs 1 to 64 steps with per-step value, shape, and curve settings, each step independently controlled — so a 16-step pattern hitting different vowel positions at 1/16 produces a different rhythmic quality from a smoothed eight-step version of the same pattern.

The Audio Oscillator runs at audible frequencies rather than sub-audio LFO rates. Routing it to formant position at any depth audible above modulation noise produces FM-style spectral artifacts that aren’t separable from intentional sound design effects. It’s the one modulator type that changes the output signal’s character beyond vowel placement — treat it as a waveshaping tool, not an automation source.

LOWS Crossover on the Mix Bus

The LOWS crossover is a fourth-order Linkwitz-Riley splitter — two cascaded 2nd-order Butterworth sections per channel, LP+HP summing to a flat all-pass response. The low band routes directly to output. The high band runs through the full processing chain — vowel filter, Drive, Breath, Size, Alien, pitch shifter, stereo modes, modulators — then recombines with the low band at the output stage. Crossover cutoff adjusts from 20 to 500 Hz via the blue handle on the response curve.

On a bass track, 80–120 Hz preserves the fundamental while formant processing shapes the upper harmonics independently. On a full mix or drum bus, 200–300 Hz keeps the kick, sub, and lower-body of the snare clean while the midrange takes vowel coloration. Toggling LOWS uses a 30 ms one-pole crossfade between the split path and the full-band path, keeping the filter state warm across toggles — no clicks on programme material regardless of how hard Q and Drive are pushed.

Without LOWS active, high Q values at low formant positions narrow peaks in the 60–300 Hz range, thinning the body of the source audibly. This isn’t a bug — it’s the formant filter doing its job in a frequency range where the ear is sensitive to level changes. LOWS is the correct response to that condition, not a reduction in Drive or Q.

Pre-Formant Pitch Shifter Is a Wet-Path-Only Insert

The pitch shifter is a phase vocoder implemented in Rust — FFT block 512, hop 128, 4× oversample, Hann window, peak picking with phase locking on strong spectral peaks. It sits on the wet signal path only. The dry path includes a ring buffer delay matched to the wet path, so Mix at any value including 0% produces sample-accurate bypass. Latency is ~12 ms at 44.1 kHz when active; the plugin reports zero latency to the host when the toggle is off, and most DAWs compensate the 12 ms automatically when it’s on.

Transposing the source before the formant filter changes what the filter processes without changing the vowel model’s reference frequencies. Source pitched up 12 semitones through a Man 1 model produces the classic vocoder texture: harmonics in the high register feeding a large-tract formant map. Source pitched down 12 semitones through the same model adds a sub-vowel register the original source doesn’t contain. Neither effect is achievable by adjusting Shift alone — Shift moves all four formants together; the pitch shifter moves the source content before the formants touch it.

The phase vocoder introduces audible artifacts on transient-heavy sources at high transpose values — percussive hits, hard attacks, fast-moving harmonics. On sustained tones it stays clean. On sources with both transient and sustained content, the balance between artifact and vowel character depends on transpose depth.

FAQs

Does Vowel Blender require an audio carrier signal the way a vocoder does?

A vocoder modulates a carrier with the spectrum of a voice signal — two audio inputs, the result heavily tied to the carrier’s harmonic character. Vowel Blender applies a fixed formant filter derived from phonetics research to whatever signal you insert it on, one audio input, no second source required. The formant frequencies come from the voice model and pad position, not from an analysis of external audio. An envelope follower can make the pad respond to input dynamics, but that is modulation routing, not vocoder carrier processing.
What happens to a mix source with strong low-frequency content when Q is pushed high?

High Q narrows each of the four formant bandpass stages, which concentrates gain at specific peak frequencies and reduces energy between them. On a full-range source, those between-peak nulls fall in the 60–300 Hz range where the ear is sensitive to level changes, thinning the bass body audibly. The LOWS crossover resolves this by routing everything below the crossover cutoff directly to output, bypassing the formant chain entirely. Set the cutoff to 200–300 Hz on drum buses or full mixes; the Linkwitz-Riley topology sums cleanly at any crossover setting without phase issues.
Can the motion recorder path be part of a saved preset?

The recorded path is serialised as XML with the preset state, including the (x, y, t) tuples, loop mode, sync setting, and the active shape if a procedural shape generated it. Loading the preset restores the path and automatically sets the mode to Replay. A preset saved with a hand-drawn motion path reproduces that exact gesture on any source, in any session, regardless of host tempo — though tempo-synced loop duration scales to whatever BPM the project is running at when the preset loads.
How does Shift differ from the pre-formant Pitch control?

Shift multiplies all four formant frequencies by a semitone factor without touching the audio signal itself — the formants move, the source pitch does not. The pre-formant Pitch toggle transposes the source audio via phase vocoder before it reaches the filter, leaving the formant frequency map at its set position. Combining both at opposed values — source transposed down 12 semitones, Shift raised 12 semitones — produces a dissociated register effect where a low-pitched source feeds high-pitched formants, a relationship no natural vocal tract produces.
What source types produce the weakest vowel character through this filter?

Sources without substantial harmonic energy above 1 kHz return minimal vowel output from the formant bandpasses — the peaks at 2 kHz and above that define front vowels like EE and IH have nothing to shape. A 60 Hz sine bass, a pure sub, a heavily low-passed pad: these will pass through the formant filter largely unchanged. Drive adds harmonics before the filter and can compensate for sparse spectral content, but at the cost of committed saturation character. The Breath knob adds a noise component shaped by the same vowel filter independently of input level, which provides vowel texture even on spectrally sparse sources — but that texture is additive noise, not filtered signal.

UB DSP Vowel Blender

Price: 39

Price Currency: USD

Operating System: Windows 10

Application Category: Multimedia

Editor's Rating:
4