Mixing Vocals in Electronic Music: From Raw Recording to Polished Mix
Complete vocal chain order, pitch correction, vocal chops, parallel processing, de-essing, and genre-specific vocal techniques for house, future bass, and trap.
The Professional Vocal Processing Chain
The standard vocal chain, in order: 1) Gain staging (set input level to peak at -12 to -6 dBFS). 2) Subtractive EQ (high-pass at 80–100 Hz, cut mud at 200–300 Hz, cut nasal at 800–1 kHz). 3) Compression (FET or VCA, 3:1–5:1, medium attack, auto-release, 3–6 dB reduction). 4) De-essing (target 5–8 kHz sibilance). 5) Additive EQ (presence boost at 2–5 kHz, air boost at 10–12 kHz). 6) Saturation (subtle tube or tape for warmth). 7) Delay (slapback or rhythmic). 8) Reverb (plate or room, on a send).
This order matters. Compress before de-essing because compression raises the level of sibilance, making it more audible and easier to control. EQ before compression to remove problem frequencies that would trigger the compressor unnecessarily. Saturate after compression to add harmonics to the already-controlled signal. Delay and reverb always last, on sends.
Use clip gain or volume automation before any processing to level out the vocal performance. Reduce loud phrases by 2–4 dB and boost quiet phrases by the same. This 'pre-compression' gives the compressor a more consistent signal to work with, requiring less compression and sounding more natural. This step alone transforms amateur vocal mixes.
Pitch Correction: Transparent vs. Effect
Transparent pitch correction (Melodyne, Auto-Tune in Graph mode): corrects pitch errors while preserving the natural character of the voice. Set retune speed slow (50–100 ms for Auto-Tune, gentle correction in Melodyne), correct only notes that are noticeably off-pitch, and leave vibrato intact. The listener should not be able to tell pitch correction was used.
Effect pitch correction (Auto-Tune with fast retune speed, 0–15 ms): creates the iconic 'Auto-Tune effect' popularized by T-Pain, Travis Scott, and modern trap/pop. Set the retune speed to 0–5 ms for hard-tuned, robotic pitch. Set the key and scale correctly to avoid wrong-note artifacts. This is a creative choice, not a corrective tool.
For electronic music: the Auto-Tune effect is standard in future bass, trap, and hyperpop vocals. For deep house and tech house, use transparent correction or leave vocals untuned for a raw, organic feel. The choice depends entirely on genre and artistic intent.
Vocal Chops and Processing
Vocal chops are short, sliced fragments of a vocal recording repitched and rearranged as melodic or rhythmic elements. They are a signature element of future bass, tropical house, and pop EDM. Technique: record or import a vocal phrase, slice it into individual syllables in your sampler (Ableton Simpler, Kontakt, Serato Sample), map them across a keyboard, and play/program new melodies from the fragments.
Processing chain for vocal chops: pitch-shift to desired key, EQ (high-pass at 200 Hz, boost 2–5 kHz for presence), heavy compression (6:1, fast attack, fast release — chops should be loud and consistent), reverb (medium plate, 1–2 seconds, 25–35% mix), delay (1/8 or dotted 1/8, 15–20% mix). Layer multiple chop instances at different pitches for harmony — thirds and fifths above the root work well.
Formant shifting: when pitching vocal chops significantly (more than 4–5 semitones), the formants shift and the voice sounds unnatural — chipmunk when pitched up, monster when pitched down. Use a formant-preserving pitch shifter (Soundtoys Little AlterBoy, Melodyne) to shift pitch while maintaining the original formant character of the voice.
De-Essing: Controlling Sibilance
Sibilance ('s', 'sh', 'ch' sounds) lives in the 4–10 kHz range and can be painfully harsh in electronic mixes where the vocal sits on top of bright synths and hi-hats. De-essing is frequency-specific compression that only reduces the level when sibilant frequencies spike above a threshold.
Use a dedicated de-esser (FabFilter DS, Waves DeEsser, Oeksound Sibilance) or a dynamic EQ band at 5–8 kHz. Set the threshold so the de-esser engages only on actual sibilant sounds, not on bright consonants or breaths. Aim for 3–6 dB of reduction on sibilant peaks. Over-de-essing creates a lispy, dull vocal — always A/B with bypass to check.
Place the de-esser after compression in the chain. Compression raises the level of sibilance relative to the rest of the vocal, making sibilance more prominent. De-essing after compression catches these amplified sibilants. Some engineers use two de-essers: one light one before compression and one after, each doing 2–3 dB of reduction rather than one doing 6 dB.
Genre-Specific Vocal Approaches
House / Deep House: natural, organic vocal with minimal processing. Light compression (2:1), subtle EQ, short room reverb. The vocal should sound like a real person singing in a room, not a studio-processed production. Vinyl-style saturation and slight high-frequency roll-off (above 12 kHz) for a warm, analog feel. Sample vocal phrases from classic records for authenticity.
Future Bass / Pop EDM: heavily processed, layered vocal. Hard Auto-Tune (retune speed 0–5 ms), stacked harmonies (3rds and 5ths above and below), wide stereo spread on harmonies with the lead centered. Heavy compression, bright EQ (boost 5 kHz and 10 kHz), long reverb (2–3 seconds), and pitch-shifted vocal layers (octave up at -12 dB for shimmer). The vocal is the centerpiece of the production.
Trap: Auto-Tuned with fast retune, ad-libs panned hard left and right, aggressive compression. Double-track the lead vocal for thickness. Use slap-back delay (40–60 ms) for presence. Reverb is optional and usually short if used. The vocal should sound forward, in-your-face, and loud relative to the beat. Heavy low-mid EQ presence (200–400 Hz) for chest-voice weight.
Techno / Minimal: processed as an instrument rather than a lyrical element. Vocoder, granular synthesis, heavy time-stretching, stutter edits, bit-crushing. Strip the humanity from the voice and treat it as raw audio material to manipulate. Delay and reverb are heavily modulated and effected. The words may be unintelligible — the voice is texture, not message.