hfw
commited on
Commit
·
4d54644
1
Parent(s):
a884350
update audio demo
Browse files- README.md +4 -4
- assets/audio_understanding.mp3 +0 -0
- assets/mimick.wav +3 -0
README.md
CHANGED
@@ -1127,7 +1127,7 @@ else:
|
|
1127 |
`Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
|
1128 |
```python
|
1129 |
mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
|
1130 |
-
audio_input, _ = librosa.load('
|
1131 |
msgs = [{'role': 'user', 'content': [mimick_prompt,audio_input]}]
|
1132 |
|
1133 |
res = model.chat(
|
@@ -1155,7 +1155,7 @@ ref_audio, _ = librosa.load('assets/demo.wav', sr=16000, mono=True) # load the r
|
|
1155 |
|
1156 |
Audio Assistant: # With this mode, model will speak with the voice in ref_audio as a AI assistant. (Stable and more suitable for general conversation)
|
1157 |
sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
|
1158 |
-
user_question = {'role': 'user', 'content': [librosa.load('
|
1159 |
```
|
1160 |
```python
|
1161 |
msgs = [sys_prompt, user_question]
|
@@ -1205,8 +1205,8 @@ General Audio:
|
|
1205 |
Audio Caption: Summarize the main content of the audio.
|
1206 |
Sound Scene Tagging: Utilize one keyword to convey the audio's content or the associated scene.
|
1207 |
'''
|
1208 |
-
task_prompt = "" # Choose the task prompt above
|
1209 |
-
audio_input, _ = librosa.load('
|
1210 |
|
1211 |
msgs = [{'role': 'user', 'content': [task_prompt,audio_input]}]
|
1212 |
|
|
|
1127 |
`Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
|
1128 |
```python
|
1129 |
mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
|
1130 |
+
audio_input, _ = librosa.load('assets/mimick.wav', sr=16000, mono=True)
|
1131 |
msgs = [{'role': 'user', 'content': [mimick_prompt,audio_input]}]
|
1132 |
|
1133 |
res = model.chat(
|
|
|
1155 |
|
1156 |
Audio Assistant: # With this mode, model will speak with the voice in ref_audio as a AI assistant. (Stable and more suitable for general conversation)
|
1157 |
sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='audio_assistant', language='en')
|
1158 |
+
user_question = {'role': 'user', 'content': [librosa.load('assets/qa.wav', sr=16000, mono=True)[0]]} # Try to ask something by recording it in 'xxx.wav'!!!
|
1159 |
```
|
1160 |
```python
|
1161 |
msgs = [sys_prompt, user_question]
|
|
|
1205 |
Audio Caption: Summarize the main content of the audio.
|
1206 |
Sound Scene Tagging: Utilize one keyword to convey the audio's content or the associated scene.
|
1207 |
'''
|
1208 |
+
task_prompt = "Summarize the main content of the audio.\n" # Choose the task prompt above
|
1209 |
+
audio_input, _ = librosa.load('assets/audio_understanding.mp3', sr=16000, mono=True)
|
1210 |
|
1211 |
msgs = [{'role': 'user', 'content': [task_prompt,audio_input]}]
|
1212 |
|
assets/audio_understanding.mp3
ADDED
Binary file (321 kB). View file
|
|
assets/mimick.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dbb0860cb4dd7c7003b6f0406299fc7c0febc5c6a990e1c670d29b763e84e7ed
|
3 |
+
size 384046
|