|
Caption 1: An engine rumbles loudly at 0.2-3.9s and an air horn honks at 4.7-5.5s, 6.5-6.8s.
|
||||
|
PicoAudio2 (Ours)
|
AudioComposer
|
AudioLDM2
|
Tango2
|
Make-An-Audio 2
|
|
Caption 2: A bell is ringing loudly and quickly at 0.0-3.2s.
|
||||
|
PicoAudio2 (Ours)
|
AudioComposer
|
AudioLDM2
|
Tango2
|
Make-An-Audio 2
|
|
Caption 3: Loud wind noise at 0.2-2.0s and a car accelerating fast at 2.0-9.9s.
|
||||
|
PicoAudio2 (Ours)
|
AudioComposer
|
AudioLDM2
|
Tango2
|
Make-An-Audio 2
|
|
Caption 4: A quick loud explosion at 0.1-0.8s and music plays with pulsating sounds at 1.1-5.8s and a man talking at 6.8-9.6s.
|
||||
|
PicoAudio2 (Ours)
|
AudioComposer
|
AudioLDM2
|
Tango2
|
Make-An-Audio 2
|
|
Caption 5: Tap water is running at 1.0-7.6s and a tapping noise at 0.5-0.6s, 9.0-9.4s.
|
||||
|
PicoAudio2 (Ours)
|
AudioComposer
|
AudioLDM2
|
Tango2
|
Make-An-Audio 2
|
| Type | Caption | LLM | Audio |
|---|---|---|---|
|
Frequency
|
A pet cat meows for two times.
|
A pet cat meows at 1.3-1.9s and 3.5-4.1s.
|
|
|
Frequency
|
A train horn for three times.
|
A train horn at 0.0s-0.2s, 2.9-3.6s and 5.3-7.8s
|
|
|
Order
|
Cinking followed by a toilet flushing.
|
Cinking at 1.0-1.2s and a toilet flushing at 7.2-10.0s.
|
|
|
Order
|
A man speaks then digital beeps.
|
A man speaks at 0.8-9.4s and digital beeps at 9.4-10.0s.
|
|