Caption 1: An engine rumbles loudly at 0.2-3.9s and an air horn honks at 4.7-5.5s, 6.5-6.8s.
|
||||
PicoAudio2 (Ours)
![]() |
AudioComposer
![]() |
AudioLDM2
![]() |
Tango2
![]() |
Make-An-Audio 2
![]() |
Caption 2: A bell is ringing loudly and quickly at 0.0-3.2s.
|
||||
PicoAudio2 (Ours)
![]() |
AudioComposer
![]() |
AudioLDM2
![]() |
Tango2
![]() |
Make-An-Audio 2
![]() |
Caption 3: Loud wind noise at 0.2-2.0s and a car accelerating fast at 2.0-9.9s.
|
||||
PicoAudio2 (Ours)
![]() |
AudioComposer
![]() |
AudioLDM2
![]() |
Tango2
![]() |
Make-An-Audio 2
![]() |
Caption 4: A quick loud explosion at 0.1-0.8s and music plays with pulsating sounds at 1.1-5.8s and a man talking at 6.8-9.6s.
|
||||
PicoAudio2 (Ours)
![]() |
AudioComposer
![]() |
AudioLDM2
![]() |
Tango2
![]() |
Make-An-Audio 2
![]() |
Caption 5: Tap water is running at 1.0-7.6s and a tapping noise at 0.5-0.6s, 9.0-9.4s.
|
||||
PicoAudio2 (Ours)
![]() |
AudioComposer
![]() |
AudioLDM2
![]() |
Tango2
![]() |
Make-An-Audio 2
![]() |
Type | Caption | LLM | Audio |
---|---|---|---|
Frequency
|
A pet cat meows for two times.
|
A pet cat meows at 1.3-1.9s and 3.5-4.1s.
|
![]() |
Frequency
|
A train horn for three times.
|
A train horn at 0.0s-0.2s, 2.9-3.6s and 5.3-7.8s
|
![]() |
Order
|
Cinking followed by a toilet flushing.
|
Cinking at 1.0-1.2s and a toilet flushing at 7.2-10.0s.
|
![]() |
Order
|
A man speaks then digital beeps.
|
A man speaks at 0.8-9.4s and digital beeps at 9.4-10.0s.
|
![]() |