by Paris Smaragdis
- Unlike the single-source case where we search for a nearest-point, we now search for two points, one from each source dictionary, that form a subspace that passes the closest to our input.
- In order to address this problem in an efficient manner we recast this search as a sparse coding problem
- In order to consider mixtures we willmake the assumption that when we have sounds that mix, their magnitude spectra superimpose linearly. Although this is not an exact consequence, it is an assumption that has been used frequently by the source separation community and is generally accepted as being approximately true.
- The use of Euclidean distance inside the spectral composition simplex implies that we are making a Gaussian distribution assumption for the spectral composition frames.
- A more appropriate distribution in this space is the Dirichlet distribution [2], which is explicitly defined on a simplex and is used to describe compositional data like the ones we have.
- If we examine the log likelihood of this model we can see that it resolves to the following form that is the formula for cross-entropy, which from information theory we know to be an appropriate measure to compare two probability vectors.
- It is in principle invariant to the number of sources since any mixture problem can be seen as a binary segmentation between a target and an interference (the only complication of having many sources being the increased probability of the target and the interference overlapping in the frequency composition simplex). Other factors such as reverberation and propagation effects are also not an issue as long as they don’t color the sources enough to significantly change their spectral composition (not an observed problem in general).
No comments:
Post a Comment