FFT Phase Vocoder crackling

Hey guys,

Trying to morph two sounds by taking the FFT amplitudes of one and combining it with the Phases of another. I'm using the FFT Macros by Robin Davies and am using the phase vocoder within the EzFFT as a reference. The sound is there, however, there are very distinct crackles everywhere.

https://soundcloud.com/freedownloadsofsongsyeah/problem

Second half is what it sounds like in ezfft's phase vocoder

The bits in red are where it crackles

I've tried changing the FFT Size, duplicating the modules and adding 180 degrees of phase to duplicate version, swapping the hamming windows to hanning, but to no avail.

When I change PartB a to PartA a, (red circle) I only hear sound A, but there is no crackles, vice versa when changing PartA ph to Part B ph. So the problem seems to do with the different phase and amplitudes. So you'd think it would relate to windowing, but from what I tried, nothing worked.

Any ideas?

Find more posts tagged with

Reaktor 6

Reaktor Building

Accepted answers

All comments

ANDREW221231

are you married to using the robin davies fft macros? the ezfft version sounds more to me like what a phase vocoder is supposed to sound like

IIRC, the robin davies don't automatically sync up their indexes at initialization, maybe its got something to do with that... really i'd recommend ezfft for most purposes, and especially if you're still learning your way around. ezfft also discards the top 'mirrored' part of the fft signal that you don't need to worry about and im not sure robin davies does

BLAIR

Damn :( wanted to use EzFFT but its just really unclear about if you are allowed to redistribute the macros inside your ensembles, I wanted to make a phase vocoder with a better UI to give away for free and Robin's macros from what I've seen are free for personal use and even commercial use, which I thought would be useful if I ever made something bigger.

Other than that EzFFT seemed much nicer haha. Would adding a synchronized overlap-add do anything?

ANDREW221231

https://community.native-instruments.com/discussion/comment/2937#Comment_2937

i mean... the original ezfft libraries were uploaded to the UL by a native instruments employee as a christmas present to the community, so people building their own stuff with them was pretty much the whole idea. a commercial product is more of a grey area, and not something i really know much about but if you're talking about making an updated phase vocoder and slapping it up on the UL no one's going to begrudge the use of FFT macros

i recommend these https://www.native-instruments.com/en/reaktor-community/reaktor-user-library/entry/show/6639/ and if you're nervous about what its okay to use them for the guy who coded them is here on the forum (tr97) you could always just ask him

ANDREW221231

and as far as the robin davies macros, i would suspect there is a way to get all working correctly, it would just require some understanding of how the nuts and bolts of an fft operation works

im not sure what you mean about overlap add... what i mean about syncronization is that i though maybe the bins are reading out of order... so like one fft is reading out bin 1 as the other is on say... 33. this is something ezfft just does without muss or fuss

colB

https://community.native-instruments.com/discussion/comment/3138#Comment_3138

I suggested looking up overlap-add. It's a common (fundamental?) part of the process for building a phase vocoder using fft/ifft. From one of the examples, I guess that what Blair is trying to do is a phase vocoder... maybe not, but it seems related, so makes sense to try using similar methods.

Anyhoo, overlap add solves the problems you get when you tamper with fft frames in a way that means the phases of the partials at the end of a frame don't line up with the phases of those partials at the start of the next frame causing clicks/glitches (at least that was my interpretation, someone with better fft chops could explain more clearly for sure @Tr97 ?).

You run multiple parallel channels. Each has a windowed frame of audio followed with a frame of silence, then audio, then silence... each channel is offset compared to it's neighbours. when you mix them together at the end, you get continuous audio, but there is no issue with clicks caused by phase discontinuities, because of the blocks of 'zero padded' silence. Something like that anyway.

Tr97

Think the explanation by Colin why windows overlap is very good, the ezfft macros are save with this, they run with 2x overlap (and the cool part in it is how this is realized with 2 continuous audiostreams (amp and phase))

some thing i have to mention: the ezfft phase vocoder is more a vocoder (that does interesting stuff on the phases and has no explicit modulator or carrier but 2 equal inputs) than a 'phase vocoder' (which uses 1 input signal only) that isn't a vocoder at all and is done for doing timestretch of a single signal

KoaN

https://community.native-instruments.com/discussion/comment/3204#Comment_3204

I used a few vst plugins in the past that were defined as Phase vocoder where you morph from one sound to another using the phase information of one onto another. You could transform a dog barking slowly into a human voice or whatever the sources you used.

But it seems that term isn't used anymore for that maybe? I see they sometimes simply say spectral morphing.

BLAIR

https://community.native-instruments.com/discussion/comment/3223#Comment_3223

Yeah exactly, I guess it just sounds cooler to market it that way haha. Zynaptiq and Melda have morphing plugins which I've realised are just phase vocoders

KoaN

https://community.native-instruments.com/discussion/comment/3235#Comment_3235

I used to love a plugin called Shapee,phase vocoder...i liked it more then the Zynaptiq one,transients were preserved better to my ears...it's very old though and was never updated to 64 bit.

Never tried the Melda one.

Tr97

another example of this vocoding/morphing stuff is when kontakt 4 with aet (authentic expression technology) came up, with all the examples like choir vowels and realistic brass swells. at the time i thought this was next level stuff, but it disappeared more or less, a bit too difficult to handle

BLAIR

https://community.native-instruments.com/discussion/comment/3262#Comment_3262

I checked it out, the example I heard sounded soo good! Sucks that it's not in 64 bit couldn't get the 32 bit one working either, also noticed they released some matlab code for the algorithm and its for a stereo version which is cool.

https://community.native-instruments.com/discussion/comment/3289#Comment_3289

Interesting! Also hi haha, was analysing your fft to sinebank stuff last week, cool stuff :)

KoaN

https://community.native-instruments.com/discussion/comment/3291#Comment_3291

You have to copy those intel files into Windows/SysWOW64 folder...i also was using JBridge,so there's a way to make it work.

ANDREW221231

https://community.native-instruments.com/discussion/comment/3168#Comment_3168

oh! overlap add as in adding an increasing amount of overlaps together. i was thinking of something else.. maybe overlap save? was it overlap save you used in convolution macros to be able to convolve something that lasts longer than a single frame of an fft?

i believe increasing the overlap factor farther than 2x requires some kind of trick. ezfft seems to do it by cutting off the top half mirrored part of the fft and then alternating the output of two ffts, so the useful half of two ffts fits are output in the time that a single one with its redundant upper half normally would. not sure if zero padding is needed for that or not. from what i've read zero padding an fft is equivelent to interpolating it with a sinc filter

@tr97 it seems stuff i've seen of your that used greater than 2x overlap used either that iterator event blast trick or distributing the overlaps across voices. guess you can't increase the overlap more without sacrificing the '2 continuous audio streams' attribute?

BLAIR

Hey guys, colb helped me figure out what was wrong, he noticed within the hamming and hann windows there was a little bug, that merge module needed to be removed, also the / was replaced with the a div x macro.

It also helped to have hann windows placed at the end of the chain. If anyone ever has this problem and you've found this thread here is the simple setup that worked well.

colB

https://community.native-instruments.com/discussion/comment/3393#Comment_3393

Can't remember if it was overlap save or overlap add in the convolution.

Overlap add in the case of this vocoder thing is just having two (or more) channels where they are windowed and offset so that when one is at a frame transition, it's volume is zero, and the other is max, and vice versa. That way, you remove the artefacts at the frame transitions from the audible output.

I found so far that for this vocoder, the best sound quality was using four channels, noticeably better then two for some things, but going higher didn't make a difference to my ears. Maybe best compromise is just two though - better cpu.

The 512 and 1024 fft sizes seemed the best, and different enough to both be useful. I guess 2048 and 256 would also be useful for some source audio. going down to 128 didn't work so well for me (also found a bug in the Davies 128 iFFT module - the phase output isn't connected internally)

Next is to try and understand how to use this for pitch and speed manipulation :)

Quick links

Unanswered questions