Contributions:
- Fuse facial expression, shoulder gesture and speech cues in analysis of human affect.
- Propose an output-associative fusion framework that incorporates correlations and covariances between the emotion dimensions.
- Demonstrate that capturing temporal correlations and remembering the temporally distant events (or storing them in memory) is of utmost importance for continuous affect prediction.
Interesting observation:
- Arousal can be much better predicted than valence using audio cues.
- For valence dimension instead, visual cues (facial expressions and shoulder movements) appear to perform better.
No comments:
Post a Comment