happyme531 commited on
Commit
6cec077
·
verified ·
1 Parent(s): 06b7a27

Upload 13 files

Browse files
.gitattributes CHANGED
@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  bert/chinese-roberta-wwm-ext-large/model.rknn filter=lfs diff=lfs merge=lfs -text
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  bert/chinese-roberta-wwm-ext-large/model.rknn filter=lfs diff=lfs merge=lfs -text
37
+ onnx/lx/lx_dec.rknn filter=lfs diff=lfs merge=lfs -text
38
+ onnx/lx/lx_flow.rknn filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,59 @@
1
- ---
2
- license: agpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bert-VITS2-RKNN2
2
+
3
+ RKNN2部署Bert-VITS2文字转语音模型!
4
+
5
+ - 推理速度:生成512000个样本大概用时2.6秒,速度大概3倍
6
+ - 内存占用:约2.3GB
7
+
8
+ ## 使用方法
9
+
10
+ 1. 克隆项目到本地
11
+
12
+ 2. 安装依赖
13
+
14
+ ```bash
15
+ # 懒得写requirements.txt了,看rknn_run.py里有什么依赖拿pip安装一下
16
+ ```
17
+
18
+ 3. 更改你想要生成音频的文字
19
+ 打开`rknn_run.py`,拉到最下方修改`text`变量
20
+ ```python
21
+ # text = "不必说碧绿的菜畦,光滑的石井栏,高大的皂荚树,紫红的桑葚;也不必说鸣蝉在树叶里长吟,肥胖的黄蜂伏在菜花上,轻捷的叫天子(云雀)忽然从草间直窜向云霄里去了。单是周围的短短的泥墙根一带,就有无限趣味。油蛉在这里低唱, 蟋蟀们在这里弹琴。翻开断砖来,有时会遇见蜈蚣;还有斑蝥,倘若用手指按住它的脊梁,便会“啪”的一声,从后窍喷出一阵烟雾。何首乌藤和木莲藤缠络着,木莲有莲房一般的果实,何首乌有臃肿的根。有人说,何首乌根是有像人形的,吃了便可以成仙,我于是常常拔它起来,牵连不断地拔起来,也曾因此弄坏了泥墙,却从来没有见过有一块根像人样。如果不怕刺,还可以摘到覆盆子,像小珊瑚珠攒成的小球,又酸又甜,色味都比桑葚要好得远。"
22
+ text = "我个人认为,这个意大利面就应该拌42号混凝土,因为这个螺丝钉的长度,它很容易会直接影响到挖掘机的扭矩你知道吧。你往里砸的时候,一瞬间它就会产生大量的高能蛋白,俗称ufo,会严重影响经济的发展,甚至对整个太平洋以及充电器都会造成一定的核污染。你知道啊?再者说,根据这个勾股定理,你可以很容易地推断出人工饲养的东条英机,它是可以捕获野生的三角函数的。所以说这个秦始皇的切面是否具有放射性啊,特朗普的N次方是否含有沉淀物,都不影响这个沃尔玛跟维尔康在南极会合。"
23
+ ```
24
+
25
+ 4. 运行
26
+
27
+ ```bash
28
+ python rknn_run.py
29
+ ```
30
+
31
+ 5. 音频会生成为`output.wav`
32
+
33
+ ## 模型转换
34
+
35
+ - 转换bert模型:
36
+ + pytorch转onnx: 执行`optimum-cli export onnx --task feature-extraction --model bert/chinese-roberta-wwm-ext-large/ --output bert/chinese-roberta-wwm-ext-large/model.onnx`
37
+ + onnx转rknn: 参考`bert/chinese-roberta-wwm-ext-large/export_rknn.py`
38
+ + 注意模型的`seq_len`是否与`rknn_run.py`中分词器的`max_length`一致
39
+ ```python
40
+ inputs = tokenizer(text, return_tensors="np",padding="max_length",truncation=True,max_length=256)
41
+ ```
42
+ - 转换vits模型:
43
+ + pytorch转onnx: 参考原项目的`export_onnx.py`
44
+ + onnx转rknn: 参考`onnx/lx/rknn_convert.py`
45
+ + 注意`input_len`是否与`rknn_run.py`中`flow_dec_input_len`的长度一致
46
+ + flow和dec两个模型的执行时间长, 其它模型非常快, 不需要转换
47
+ + flow模型转换后比原onnx模型还慢, 并且貌似模型文件还会明显变大, 不建议转换
48
+
49
+ ## 存在的问题
50
+ - 只支持中文
51
+ - flow模型没办法有效的使用NPU加速
52
+ - 由于NPU只能处理固定长度的输入, 所以需要分割文本, 但是现在貌似还不太清楚怎么做, 有时一句话还没读完就被截断
53
+ - 没有实现情感控制等功能
54
+ - 其实没必要为了分词器安装一个完整的huggingface Transformers库, 并且还要顺便装一个完全没用的pytorch, 占用2GB空间
55
+
56
+ ## 参考
57
+ - [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
58
+ - [chinese-roberta-wwm-ext-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large)
59
+ - [optimum](https://github.com/huggingface/optimum)
onnx/lx.json ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Folder": "lx",
3
+ "Name": "lx",
4
+ "Type": "BertVits",
5
+ "Symbol": [
6
+ "_",
7
+ "AA",
8
+ "E",
9
+ "EE",
10
+ "En",
11
+ "N",
12
+ "OO",
13
+ "V",
14
+ "a",
15
+ "a:",
16
+ "aa",
17
+ "ae",
18
+ "ah",
19
+ "ai",
20
+ "an",
21
+ "ang",
22
+ "ao",
23
+ "aw",
24
+ "ay",
25
+ "b",
26
+ "by",
27
+ "c",
28
+ "ch",
29
+ "d",
30
+ "dh",
31
+ "dy",
32
+ "e",
33
+ "e:",
34
+ "eh",
35
+ "ei",
36
+ "en",
37
+ "eng",
38
+ "er",
39
+ "ey",
40
+ "f",
41
+ "g",
42
+ "gy",
43
+ "h",
44
+ "hh",
45
+ "hy",
46
+ "i",
47
+ "i0",
48
+ "i:",
49
+ "ia",
50
+ "ian",
51
+ "iang",
52
+ "iao",
53
+ "ie",
54
+ "ih",
55
+ "in",
56
+ "ing",
57
+ "iong",
58
+ "ir",
59
+ "iu",
60
+ "iy",
61
+ "j",
62
+ "jh",
63
+ "k",
64
+ "ky",
65
+ "l",
66
+ "m",
67
+ "my",
68
+ "n",
69
+ "ng",
70
+ "ny",
71
+ "o",
72
+ "o:",
73
+ "ong",
74
+ "ou",
75
+ "ow",
76
+ "oy",
77
+ "p",
78
+ "py",
79
+ "q",
80
+ "r",
81
+ "ry",
82
+ "s",
83
+ "sh",
84
+ "t",
85
+ "th",
86
+ "ts",
87
+ "ty",
88
+ "u",
89
+ "u:",
90
+ "ua",
91
+ "uai",
92
+ "uan",
93
+ "uang",
94
+ "uh",
95
+ "ui",
96
+ "un",
97
+ "uo",
98
+ "uw",
99
+ "v",
100
+ "van",
101
+ "ve",
102
+ "vn",
103
+ "w",
104
+ "x",
105
+ "y",
106
+ "z",
107
+ "zh",
108
+ "zy",
109
+ "!",
110
+ "?",
111
+ "\u2026",
112
+ ",",
113
+ ".",
114
+ "'",
115
+ "-",
116
+ "SP",
117
+ "UNK"
118
+ ],
119
+ "Cleaner": "",
120
+ "Rate": 44100,
121
+ "CharaMix": true,
122
+ "Characters": [
123
+ "\u695a\u7559\u9999"
124
+ ],
125
+ "LanguageMap": {
126
+ "ZH": [
127
+ 0,
128
+ 0
129
+ ],
130
+ "JP": [
131
+ 1,
132
+ 6
133
+ ],
134
+ "EN": [
135
+ 2,
136
+ 8
137
+ ]
138
+ },
139
+ "Dict": "BasicDict",
140
+ "BertPath": [
141
+ "chinese-roberta-wwm-ext-large",
142
+ "deberta-v2-large-japanese",
143
+ "bert-base-japanese-v3"
144
+ ],
145
+ "Clap": false,
146
+ "BertSize": 1024
147
+ }
onnx/lx/lx_dec.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fb35e9eb1578cd1868abb75c7af19399e57e18ed531b9518a2776f733d4f98f
3
+ size 58678416
onnx/lx/lx_dec.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a72907bb2baa20aa928fd703a7d166f59513c20294a4070282f1448c249f335
3
+ size 73718997
onnx/lx/lx_dp.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51a264afd5abfb164a945a0a64918336bd47e2ceaa2978df729add93b79acb2a
3
+ size 1615226
onnx/lx/lx_emb.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ba71c3a9cfc7ba571cc29a2a116ebd38454119b85e314f3df11d54f3bd2d218
3
+ size 919367
onnx/lx/lx_enc_p.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51068e79be8e0dd278e1c726b6cb85a1330879859adf6176b58ff1c559a27a39
3
+ size 70750621
onnx/lx/lx_flow.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca6afa9ecc597f6b8cee85ccbd7e620f0492239cf52cf2e0975d6aba3a227395
3
+ size 120541444
onnx/lx/lx_flow.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:010f006570c64239177563bcadd80a925c5ddc1251a78023766fea09604a8f90
3
+ size 232704504
onnx/lx/lx_sdp.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de52263f189b6ee6c7c92b570cf8951702eab3b3356b183307fc01c7777c92e8
3
+ size 8125440
onnx/lx/rknn_convert.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding: utf-8
3
+
4
+ # In[1]:
5
+
6
+
7
+ import os
8
+ import urllib
9
+ import traceback
10
+ import time
11
+ import sys
12
+ import numpy as np
13
+ import cv2
14
+ from rknn.api import RKNN
15
+ from math import exp
16
+ from sys import exit
17
+
18
+ os.chdir(os.path.dirname(os.path.abspath(__file__)))
19
+
20
+ model_name_base = "lx"
21
+
22
+ # set input length
23
+ input_len = 1024
24
+
25
+
26
+ sample_rate = 44100
27
+ print(f"当前模型输出长度: {input_len * 512 / sample_rate * 1000} ms")
28
+
29
+
30
+ def convert_flow():
31
+ rknn = RKNN(verbose=True)
32
+
33
+ ONNX_MODEL=f"{model_name_base}_flow.onnx"
34
+ RKNN_MODEL=ONNX_MODEL.replace(".onnx",".rknn")
35
+ DATASET="dataset.txt"
36
+ QUANTIZE=False
37
+ detailed_performance_log = True
38
+
39
+ # pre-process config
40
+ print('--> Config model')
41
+ rknn.config(quantized_algorithm='normal', quantized_method='channel', target_platform='rk3588', optimization_level=3)
42
+ print('done')
43
+
44
+ # Load ONNX model
45
+ print('--> Loading model')
46
+ ret = rknn.load_onnx(model=ONNX_MODEL,
47
+ inputs=["z_p", "y_mask", "g"],
48
+ input_size_list=[[1, 192, input_len], [1, 1, input_len], [1, 256, 1]])
49
+ if ret != 0:
50
+ print('Load model failed!')
51
+ exit(ret)
52
+ print('done')
53
+
54
+ # Build model
55
+ print('--> Building model')
56
+ ret = rknn.build(do_quantization=QUANTIZE, dataset=DATASET, rknn_batch_size=None)
57
+ if ret != 0:
58
+ print('Build model failed!')
59
+ exit(ret)
60
+ print('done')
61
+
62
+ #export
63
+ print('--> Export RKNN model')
64
+ ret = rknn.export_rknn(RKNN_MODEL)
65
+ if ret != 0:
66
+ print('Export RKNN model failed!')
67
+ exit(ret)
68
+ print('done')
69
+
70
+ def convert_dec():
71
+ rknn = RKNN(verbose=True)
72
+
73
+ ONNX_MODEL=f"{model_name_base}_dec.onnx"
74
+ RKNN_MODEL=ONNX_MODEL.replace(".onnx",".rknn")
75
+ DATASET="dataset.txt"
76
+ QUANTIZE=False
77
+ detailed_performance_log = True
78
+
79
+ # pre-process config
80
+ print('--> Config model')
81
+ rknn.config(quantized_algorithm='normal', quantized_method='channel', target_platform='rk3588', optimization_level=3)
82
+ print('done')
83
+
84
+ # Load ONNX model
85
+ print('--> Loading model')
86
+ ret = rknn.load_onnx(model=ONNX_MODEL,
87
+ inputs=["z_in", "g"],
88
+ input_size_list=[[1, 192, input_len], [1, 256, 1]])
89
+ if ret != 0:
90
+ print('Load model failed!')
91
+ exit(ret)
92
+ print('done')
93
+
94
+ # Build model
95
+ print('--> Building model')
96
+ ret = rknn.build(do_quantization=QUANTIZE, dataset=DATASET, rknn_batch_size=None)
97
+ if ret != 0:
98
+ print('Build model failed!')
99
+ exit(ret)
100
+ print('done')
101
+
102
+ #export
103
+ print('--> Export RKNN model')
104
+ ret = rknn.export_rknn(RKNN_MODEL)
105
+ if ret != 0:
106
+ print('Export RKNN model failed!')
107
+ exit(ret)
108
+ print('done')
109
+
110
+ convert_flow()
111
+ convert_dec()
opencpop-strict.txt ADDED
@@ -0,0 +1,429 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ a AA a
2
+ ai AA ai
3
+ an AA an
4
+ ang AA ang
5
+ ao AA ao
6
+ ba b a
7
+ bai b ai
8
+ ban b an
9
+ bang b ang
10
+ bao b ao
11
+ bei b ei
12
+ ben b en
13
+ beng b eng
14
+ bi b i
15
+ bian b ian
16
+ biao b iao
17
+ bie b ie
18
+ bin b in
19
+ bing b ing
20
+ bo b o
21
+ bu b u
22
+ ca c a
23
+ cai c ai
24
+ can c an
25
+ cang c ang
26
+ cao c ao
27
+ ce c e
28
+ cei c ei
29
+ cen c en
30
+ ceng c eng
31
+ cha ch a
32
+ chai ch ai
33
+ chan ch an
34
+ chang ch ang
35
+ chao ch ao
36
+ che ch e
37
+ chen ch en
38
+ cheng ch eng
39
+ chi ch ir
40
+ chong ch ong
41
+ chou ch ou
42
+ chu ch u
43
+ chua ch ua
44
+ chuai ch uai
45
+ chuan ch uan
46
+ chuang ch uang
47
+ chui ch ui
48
+ chun ch un
49
+ chuo ch uo
50
+ ci c i0
51
+ cong c ong
52
+ cou c ou
53
+ cu c u
54
+ cuan c uan
55
+ cui c ui
56
+ cun c un
57
+ cuo c uo
58
+ da d a
59
+ dai d ai
60
+ dan d an
61
+ dang d ang
62
+ dao d ao
63
+ de d e
64
+ dei d ei
65
+ den d en
66
+ deng d eng
67
+ di d i
68
+ dia d ia
69
+ dian d ian
70
+ diao d iao
71
+ die d ie
72
+ ding d ing
73
+ diu d iu
74
+ dong d ong
75
+ dou d ou
76
+ du d u
77
+ duan d uan
78
+ dui d ui
79
+ dun d un
80
+ duo d uo
81
+ e EE e
82
+ ei EE ei
83
+ en EE en
84
+ eng EE eng
85
+ er EE er
86
+ fa f a
87
+ fan f an
88
+ fang f ang
89
+ fei f ei
90
+ fen f en
91
+ feng f eng
92
+ fo f o
93
+ fou f ou
94
+ fu f u
95
+ ga g a
96
+ gai g ai
97
+ gan g an
98
+ gang g ang
99
+ gao g ao
100
+ ge g e
101
+ gei g ei
102
+ gen g en
103
+ geng g eng
104
+ gong g ong
105
+ gou g ou
106
+ gu g u
107
+ gua g ua
108
+ guai g uai
109
+ guan g uan
110
+ guang g uang
111
+ gui g ui
112
+ gun g un
113
+ guo g uo
114
+ ha h a
115
+ hai h ai
116
+ han h an
117
+ hang h ang
118
+ hao h ao
119
+ he h e
120
+ hei h ei
121
+ hen h en
122
+ heng h eng
123
+ hong h ong
124
+ hou h ou
125
+ hu h u
126
+ hua h ua
127
+ huai h uai
128
+ huan h uan
129
+ huang h uang
130
+ hui h ui
131
+ hun h un
132
+ huo h uo
133
+ ji j i
134
+ jia j ia
135
+ jian j ian
136
+ jiang j iang
137
+ jiao j iao
138
+ jie j ie
139
+ jin j in
140
+ jing j ing
141
+ jiong j iong
142
+ jiu j iu
143
+ ju j v
144
+ jv j v
145
+ juan j van
146
+ jvan j van
147
+ jue j ve
148
+ jve j ve
149
+ jun j vn
150
+ jvn j vn
151
+ ka k a
152
+ kai k ai
153
+ kan k an
154
+ kang k ang
155
+ kao k ao
156
+ ke k e
157
+ kei k ei
158
+ ken k en
159
+ keng k eng
160
+ kong k ong
161
+ kou k ou
162
+ ku k u
163
+ kua k ua
164
+ kuai k uai
165
+ kuan k uan
166
+ kuang k uang
167
+ kui k ui
168
+ kun k un
169
+ kuo k uo
170
+ la l a
171
+ lai l ai
172
+ lan l an
173
+ lang l ang
174
+ lao l ao
175
+ le l e
176
+ lei l ei
177
+ leng l eng
178
+ li l i
179
+ lia l ia
180
+ lian l ian
181
+ liang l iang
182
+ liao l iao
183
+ lie l ie
184
+ lin l in
185
+ ling l ing
186
+ liu l iu
187
+ lo l o
188
+ long l ong
189
+ lou l ou
190
+ lu l u
191
+ luan l uan
192
+ lun l un
193
+ luo l uo
194
+ lv l v
195
+ lve l ve
196
+ ma m a
197
+ mai m ai
198
+ man m an
199
+ mang m ang
200
+ mao m ao
201
+ me m e
202
+ mei m ei
203
+ men m en
204
+ meng m eng
205
+ mi m i
206
+ mian m ian
207
+ miao m iao
208
+ mie m ie
209
+ min m in
210
+ ming m ing
211
+ miu m iu
212
+ mo m o
213
+ mou m ou
214
+ mu m u
215
+ na n a
216
+ nai n ai
217
+ nan n an
218
+ nang n ang
219
+ nao n ao
220
+ ne n e
221
+ nei n ei
222
+ nen n en
223
+ neng n eng
224
+ ni n i
225
+ nian n ian
226
+ niang n iang
227
+ niao n iao
228
+ nie n ie
229
+ nin n in
230
+ ning n ing
231
+ niu n iu
232
+ nong n ong
233
+ nou n ou
234
+ nu n u
235
+ nuan n uan
236
+ nun n un
237
+ nuo n uo
238
+ nv n v
239
+ nve n ve
240
+ o OO o
241
+ ou OO ou
242
+ pa p a
243
+ pai p ai
244
+ pan p an
245
+ pang p ang
246
+ pao p ao
247
+ pei p ei
248
+ pen p en
249
+ peng p eng
250
+ pi p i
251
+ pian p ian
252
+ piao p iao
253
+ pie p ie
254
+ pin p in
255
+ ping p ing
256
+ po p o
257
+ pou p ou
258
+ pu p u
259
+ qi q i
260
+ qia q ia
261
+ qian q ian
262
+ qiang q iang
263
+ qiao q iao
264
+ qie q ie
265
+ qin q in
266
+ qing q ing
267
+ qiong q iong
268
+ qiu q iu
269
+ qu q v
270
+ qv q v
271
+ quan q van
272
+ qvan q van
273
+ que q ve
274
+ qve q ve
275
+ qun q vn
276
+ qvn q vn
277
+ ran r an
278
+ rang r ang
279
+ rao r ao
280
+ re r e
281
+ ren r en
282
+ reng r eng
283
+ ri r ir
284
+ rong r ong
285
+ rou r ou
286
+ ru r u
287
+ rua r ua
288
+ ruan r uan
289
+ rui r ui
290
+ run r un
291
+ ruo r uo
292
+ sa s a
293
+ sai s ai
294
+ san s an
295
+ sang s ang
296
+ sao s ao
297
+ se s e
298
+ sen s en
299
+ seng s eng
300
+ sha sh a
301
+ shai sh ai
302
+ shan sh an
303
+ shang sh ang
304
+ shao sh ao
305
+ she sh e
306
+ shei sh ei
307
+ shen sh en
308
+ sheng sh eng
309
+ shi sh ir
310
+ shou sh ou
311
+ shu sh u
312
+ shua sh ua
313
+ shuai sh uai
314
+ shuan sh uan
315
+ shuang sh uang
316
+ shui sh ui
317
+ shun sh un
318
+ shuo sh uo
319
+ si s i0
320
+ song s ong
321
+ sou s ou
322
+ su s u
323
+ suan s uan
324
+ sui s ui
325
+ sun s un
326
+ suo s uo
327
+ ta t a
328
+ tai t ai
329
+ tan t an
330
+ tang t ang
331
+ tao t ao
332
+ te t e
333
+ tei t ei
334
+ teng t eng
335
+ ti t i
336
+ tian t ian
337
+ tiao t iao
338
+ tie t ie
339
+ ting t ing
340
+ tong t ong
341
+ tou t ou
342
+ tu t u
343
+ tuan t uan
344
+ tui t ui
345
+ tun t un
346
+ tuo t uo
347
+ wa w a
348
+ wai w ai
349
+ wan w an
350
+ wang w ang
351
+ wei w ei
352
+ wen w en
353
+ weng w eng
354
+ wo w o
355
+ wu w u
356
+ xi x i
357
+ xia x ia
358
+ xian x ian
359
+ xiang x iang
360
+ xiao x iao
361
+ xie x ie
362
+ xin x in
363
+ xing x ing
364
+ xiong x iong
365
+ xiu x iu
366
+ xu x v
367
+ xv x v
368
+ xuan x van
369
+ xvan x van
370
+ xue x ve
371
+ xve x ve
372
+ xun x vn
373
+ xvn x vn
374
+ ya y a
375
+ yan y En
376
+ yang y ang
377
+ yao y ao
378
+ ye y E
379
+ yi y i
380
+ yin y in
381
+ ying y ing
382
+ yo y o
383
+ yong y ong
384
+ you y ou
385
+ yu y v
386
+ yv y v
387
+ yuan y van
388
+ yvan y van
389
+ yue y ve
390
+ yve y ve
391
+ yun y vn
392
+ yvn y vn
393
+ za z a
394
+ zai z ai
395
+ zan z an
396
+ zang z ang
397
+ zao z ao
398
+ ze z e
399
+ zei z ei
400
+ zen z en
401
+ zeng z eng
402
+ zha zh a
403
+ zhai zh ai
404
+ zhan zh an
405
+ zhang zh ang
406
+ zhao zh ao
407
+ zhe zh e
408
+ zhei zh ei
409
+ zhen zh en
410
+ zheng zh eng
411
+ zhi zh ir
412
+ zhong zh ong
413
+ zhou zh ou
414
+ zhu zh u
415
+ zhua zh ua
416
+ zhuai zh uai
417
+ zhuan zh uan
418
+ zhuang zh uang
419
+ zhui zh ui
420
+ zhun zh un
421
+ zhuo zh uo
422
+ zi z i0
423
+ zong z ong
424
+ zou z ou
425
+ zu z u
426
+ zuan z uan
427
+ zui z ui
428
+ zun z un
429
+ zuo z uo
rknn_run.py ADDED
@@ -0,0 +1,1469 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import onnxruntime as ort
3
+ from rknnlite.api.rknn_lite import RKNNLite
4
+ import numpy as np
5
+ import soundfile as sf
6
+ from transformers import AutoTokenizer
7
+ import time
8
+ import os
9
+ import re
10
+ import cn2an
11
+ from pypinyin import lazy_pinyin, Style
12
+ from typing import List
13
+ from typing import Tuple
14
+ import jieba
15
+ import jieba.posseg as psg
16
+
17
+ def convert_pad_shape(pad_shape):
18
+ layer = pad_shape[::-1]
19
+ pad_shape = [item for sublist in layer for item in sublist]
20
+ return pad_shape
21
+
22
+
23
+ def sequence_mask(length, max_length=None):
24
+ if max_length is None:
25
+ max_length = length.max()
26
+ x = np.arange(max_length, dtype=length.dtype)
27
+ return np.expand_dims(x, 0) < np.expand_dims(length, 1)
28
+
29
+
30
+ def generate_path(duration, mask):
31
+ """
32
+ duration: [b, 1, t_x]
33
+ mask: [b, 1, t_y, t_x]
34
+ """
35
+
36
+ b, _, t_y, t_x = mask.shape
37
+ cum_duration = np.cumsum(duration, -1)
38
+
39
+ cum_duration_flat = cum_duration.reshape(b * t_x)
40
+ path = sequence_mask(cum_duration_flat, t_y)
41
+ path = path.reshape(b, t_x, t_y)
42
+ path = path ^ np.pad(path, ((0, 0), (1, 0), (0, 0)))[:, :-1]
43
+ path = np.expand_dims(path, 1).transpose(0, 1, 3, 2)
44
+ return path
45
+
46
+
47
+ class InferenceSession:
48
+ def __init__(self, path, Providers=["CPUExecutionProvider"]):
49
+ ort_config = ort.SessionOptions()
50
+ ort_config.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
51
+ ort_config.intra_op_num_threads = 4
52
+ ort_config.inter_op_num_threads = 4
53
+ self.enc = ort.InferenceSession(path["enc"], providers=Providers, sess_options=ort_config)
54
+ self.emb_g = ort.InferenceSession(path["emb_g"], providers=Providers, sess_options=ort_config)
55
+ self.dp = ort.InferenceSession(path["dp"], providers=Providers, sess_options=ort_config)
56
+ self.sdp = ort.InferenceSession(path["sdp"], providers=Providers, sess_options=ort_config)
57
+ # flow模型用onnx比rknn快
58
+ # self.flow = RKNNLite(verbose=False)
59
+ # self.flow.load_rknn(path["flow"])
60
+ # self.flow.init_runtime(core_mask=RKNNLite.NPU_CORE_1)
61
+ self.flow = ort.InferenceSession(path["flow"], providers=Providers, sess_options=ort_config)
62
+ self.dec = RKNNLite(verbose=False)
63
+ self.dec.load_rknn(path["dec"])
64
+ self.dec.init_runtime()
65
+ # self.dec = ort.InferenceSession(path["dec"], providers=Providers, sess_options=ort_config)
66
+
67
+ def __call__(
68
+ self,
69
+ seq,
70
+ tone,
71
+ language,
72
+ bert_zh,
73
+ bert_jp,
74
+ bert_en,
75
+ vqidx,
76
+ sid,
77
+ seed=114514,
78
+ seq_noise_scale=0.8,
79
+ sdp_noise_scale=0.6,
80
+ length_scale=1.0,
81
+ sdp_ratio=0.0,
82
+ rknn_pad_to = 1024
83
+ ):
84
+ if seq.ndim == 1:
85
+ seq = np.expand_dims(seq, 0)
86
+ if tone.ndim == 1:
87
+ tone = np.expand_dims(tone, 0)
88
+ if language.ndim == 1:
89
+ language = np.expand_dims(language, 0)
90
+ assert (seq.ndim == 2, tone.ndim == 2, language.ndim == 2)
91
+
92
+ start_time = time.time()
93
+ g = self.emb_g.run(
94
+ None,
95
+ {
96
+ "sid": sid.astype(np.int64),
97
+ },
98
+ )[0]
99
+ emb_g_time = time.time() - start_time
100
+ print(f"emb_g 运行时间: {emb_g_time:.4f} 秒")
101
+
102
+ g = np.expand_dims(g, -1)
103
+ start_time = time.time()
104
+ enc_rtn = self.enc.run(
105
+ None,
106
+ {
107
+ "x": seq.astype(np.int64),
108
+ "t": tone.astype(np.int64),
109
+ "language": language.astype(np.int64),
110
+ "bert_0": bert_zh.astype(np.float32),
111
+ "bert_1": bert_jp.astype(np.float32),
112
+ "bert_2": bert_en.astype(np.float32),
113
+ "g": g.astype(np.float32),
114
+ # 2.3版本的模型需要注释掉下面两行
115
+ "vqidx": vqidx.astype(np.int64),
116
+ "sid": sid.astype(np.int64),
117
+ },
118
+ )
119
+ enc_time = time.time() - start_time
120
+ print(f"enc 运行时间: {enc_time:.4f} 秒")
121
+
122
+ x, m_p, logs_p, x_mask = enc_rtn[0], enc_rtn[1], enc_rtn[2], enc_rtn[3]
123
+ np.random.seed(seed)
124
+ zinput = np.random.randn(x.shape[0], 2, x.shape[2]) * sdp_noise_scale
125
+
126
+ start_time = time.time()
127
+ sdp_output = self.sdp.run(
128
+ None, {"x": x, "x_mask": x_mask, "zin": zinput.astype(np.float32), "g": g}
129
+ )[0]
130
+ sdp_time = time.time() - start_time
131
+ print(f"sdp 运行时间: {sdp_time:.4f} 秒")
132
+
133
+ start_time = time.time()
134
+ dp_output = self.dp.run(None, {"x": x, "x_mask": x_mask, "g": g})[0]
135
+ dp_time = time.time() - start_time
136
+ print(f"dp 运行时间: {dp_time:.4f} 秒")
137
+
138
+ logw = sdp_output * (sdp_ratio) + dp_output * (1 - sdp_ratio)
139
+ w = np.exp(logw) * x_mask * length_scale
140
+ w_ceil = np.ceil(w)
141
+ y_lengths = np.clip(np.sum(w_ceil, (1, 2)), a_min=1.0, a_max=100000).astype(
142
+ np.int64
143
+ )
144
+ y_mask = np.expand_dims(sequence_mask(y_lengths, None), 1)
145
+ attn_mask = np.expand_dims(x_mask, 2) * np.expand_dims(y_mask, -1)
146
+ attn = generate_path(w_ceil, attn_mask)
147
+ m_p = np.matmul(attn.squeeze(1), m_p.transpose(0, 2, 1)).transpose(
148
+ 0, 2, 1
149
+ ) # [b, t', t], [b, t, d] -> [b, d, t']
150
+ logs_p = np.matmul(attn.squeeze(1), logs_p.transpose(0, 2, 1)).transpose(
151
+ 0, 2, 1
152
+ ) # [b, t', t], [b, t, d] -> [b, d, t']
153
+
154
+ z_p = (
155
+ m_p
156
+ + np.random.randn(m_p.shape[0], m_p.shape[1], m_p.shape[2])
157
+ * np.exp(logs_p)
158
+ * seq_noise_scale
159
+ )
160
+ #truncate to rknn_pad_to
161
+ actual_len = z_p.shape[2]
162
+ if actual_len > rknn_pad_to:
163
+ print("警告, 输入长度超过 rknn_pad_to, 将被截断")
164
+ z_p = z_p[:,:,:rknn_pad_to]
165
+ y_mask = y_mask[:,:,:rknn_pad_to]
166
+ else:
167
+ z_p = np.pad(z_p, ((0, 0), (0, 0), (0, rknn_pad_to - z_p.shape[2])))
168
+ y_mask = np.pad(y_mask, ((0, 0), (0, 0), (0, rknn_pad_to - y_mask.shape[2])))
169
+
170
+ start_time = time.time()
171
+ z = self.flow.run(
172
+ None,
173
+ {
174
+ "z_p": z_p.astype(np.float32),
175
+ "y_mask": y_mask.astype(np.float32),
176
+ "g": g,
177
+ },
178
+ )[0]
179
+ flow_time = time.time() - start_time
180
+ print(f"flow 运行时间: {flow_time:.4f} 秒")
181
+
182
+ start_time = time.time()
183
+ dec_output = self.dec.inference([z.astype(np.float32), g])[0]
184
+ dec_time = time.time() - start_time
185
+ print(f"dec 运行时间: {dec_time:.4f} 秒")
186
+
187
+ # truncate to actual_len*512
188
+ return dec_output[:,:,:actual_len*512]
189
+
190
+
191
+
192
+
193
+ class ToneSandhi:
194
+ def __init__(self):
195
+ self.must_neural_tone_words = {
196
+ "麻烦",
197
+ "麻利",
198
+ "鸳鸯",
199
+ "高粱",
200
+ "骨头",
201
+ "骆驼",
202
+ "马虎",
203
+ "首饰",
204
+ "馒头",
205
+ "馄饨",
206
+ "风筝",
207
+ "难为",
208
+ "队伍",
209
+ "阔气",
210
+ "闺女",
211
+ "门道",
212
+ "锄头",
213
+ "铺盖",
214
+ "铃铛",
215
+ "铁匠",
216
+ "钥匙",
217
+ "里脊",
218
+ "里头",
219
+ "部分",
220
+ "那么",
221
+ "道士",
222
+ "造化",
223
+ "迷糊",
224
+ "连累",
225
+ "这么",
226
+ "这个",
227
+ "运气",
228
+ "过去",
229
+ "软和",
230
+ "转悠",
231
+ "踏实",
232
+ "跳蚤",
233
+ "跟头",
234
+ "趔趄",
235
+ "财主",
236
+ "豆腐",
237
+ "讲究",
238
+ "记性",
239
+ "记号",
240
+ "认识",
241
+ "规矩",
242
+ "见识",
243
+ "裁缝",
244
+ "补丁",
245
+ "衣裳",
246
+ "衣服",
247
+ "衙门",
248
+ "街坊",
249
+ "行李",
250
+ "行当",
251
+ "蛤蟆",
252
+ "蘑菇",
253
+ "薄荷",
254
+ "葫芦",
255
+ "葡萄",
256
+ "萝卜",
257
+ "荸荠",
258
+ "苗条",
259
+ "苗头",
260
+ "苍蝇",
261
+ "芝麻",
262
+ "舒服",
263
+ "舒坦",
264
+ "舌头",
265
+ "自在",
266
+ "膏药",
267
+ "脾气",
268
+ "脑袋",
269
+ "脊梁",
270
+ "能耐",
271
+ "胳膊",
272
+ "胭脂",
273
+ "胡萝",
274
+ "胡琴",
275
+ "胡同",
276
+ "聪明",
277
+ "耽误",
278
+ "耽搁",
279
+ "耷拉",
280
+ "耳朵",
281
+ "老爷",
282
+ "老实",
283
+ "老婆",
284
+ "老头",
285
+ "老太",
286
+ "翻腾",
287
+ "罗嗦",
288
+ "罐头",
289
+ "编辑",
290
+ "结实",
291
+ "红火",
292
+ "累赘",
293
+ "糨糊",
294
+ "糊涂",
295
+ "精神",
296
+ "粮食",
297
+ "簸箕",
298
+ "篱笆",
299
+ "算计",
300
+ "算盘",
301
+ "答应",
302
+ "笤帚",
303
+ "笑语",
304
+ "笑话",
305
+ "窟窿",
306
+ "窝囊",
307
+ "窗户",
308
+ "稳当",
309
+ "稀罕",
310
+ "称呼",
311
+ "秧歌",
312
+ "秀气",
313
+ "秀才",
314
+ "福气",
315
+ "祖宗",
316
+ "砚台",
317
+ "码头",
318
+ "石榴",
319
+ "石头",
320
+ "石匠",
321
+ "知识",
322
+ "眼睛",
323
+ "眯缝",
324
+ "眨巴",
325
+ "眉毛",
326
+ "相声",
327
+ "盘算",
328
+ "白净",
329
+ "痢疾",
330
+ "痛快",
331
+ "疟疾",
332
+ "疙瘩",
333
+ "疏忽",
334
+ "畜生",
335
+ "生意",
336
+ "甘蔗",
337
+ "琵琶",
338
+ "琢磨",
339
+ "琉璃",
340
+ "玻璃",
341
+ "玫瑰",
342
+ "玄乎",
343
+ "狐狸",
344
+ "状元",
345
+ "特务",
346
+ "牲口",
347
+ "牙碜",
348
+ "牌楼",
349
+ "爽快",
350
+ "爱人",
351
+ "热闹",
352
+ "烧饼",
353
+ "烟筒",
354
+ "烂糊",
355
+ "点心",
356
+ "炊帚",
357
+ "灯笼",
358
+ "火候",
359
+ "漂亮",
360
+ "滑溜",
361
+ "溜达",
362
+ "温和",
363
+ "清楚",
364
+ "消息",
365
+ "浪头",
366
+ "活泼",
367
+ "比方",
368
+ "正经",
369
+ "欺负",
370
+ "模糊",
371
+ "槟榔",
372
+ "棺材",
373
+ "棒槌",
374
+ "棉花",
375
+ "核桃",
376
+ "栅栏",
377
+ "柴火",
378
+ "架势",
379
+ "枕头",
380
+ "枇杷",
381
+ "机灵",
382
+ "本事",
383
+ "木头",
384
+ "木匠",
385
+ "朋友",
386
+ "月饼",
387
+ "月亮",
388
+ "暖和",
389
+ "明白",
390
+ "时候",
391
+ "新鲜",
392
+ "故事",
393
+ "收拾",
394
+ "收成",
395
+ "提防",
396
+ "挖苦",
397
+ "挑剔",
398
+ "指甲",
399
+ "指头",
400
+ "拾掇",
401
+ "拳头",
402
+ "拨弄",
403
+ "招牌",
404
+ "招呼",
405
+ "抬举",
406
+ "护士",
407
+ "折腾",
408
+ "扫帚",
409
+ "打量",
410
+ "打算",
411
+ "打点",
412
+ "打扮",
413
+ "打听",
414
+ "打发",
415
+ "扎实",
416
+ "扁担",
417
+ "戒指",
418
+ "懒得",
419
+ "意识",
420
+ "意思",
421
+ "情形",
422
+ "悟性",
423
+ "怪物",
424
+ "思量",
425
+ "怎么",
426
+ "念头",
427
+ "念叨",
428
+ "快活",
429
+ "忙活",
430
+ "志气",
431
+ "心思",
432
+ "得罪",
433
+ "张罗",
434
+ "弟兄",
435
+ "开通",
436
+ "应酬",
437
+ "庄稼",
438
+ "干事",
439
+ "帮手",
440
+ "帐篷",
441
+ "希罕",
442
+ "师父",
443
+ "师傅",
444
+ "巴结",
445
+ "巴掌",
446
+ "差事",
447
+ "工夫",
448
+ "岁数",
449
+ "屁股",
450
+ "尾巴",
451
+ "少爷",
452
+ "小气",
453
+ "小伙",
454
+ "将就",
455
+ "对头",
456
+ "对付",
457
+ "寡妇",
458
+ "家伙",
459
+ "客气",
460
+ "实在",
461
+ "官司",
462
+ "学问",
463
+ "学生",
464
+ "字号",
465
+ "嫁妆",
466
+ "媳妇",
467
+ "媒人",
468
+ "婆家",
469
+ "娘家",
470
+ "委屈",
471
+ "姑娘",
472
+ "姐夫",
473
+ "妯娌",
474
+ "妥当",
475
+ "妖精",
476
+ "奴才",
477
+ "女婿",
478
+ "头发",
479
+ "太阳",
480
+ "大爷",
481
+ "大方",
482
+ "大意",
483
+ "大夫",
484
+ "多少",
485
+ "多么",
486
+ "外甥",
487
+ "壮实",
488
+ "地道",
489
+ "地方",
490
+ "在乎",
491
+ "困难",
492
+ "嘴巴",
493
+ "嘱咐",
494
+ "嘟囔",
495
+ "嘀咕",
496
+ "喜欢",
497
+ "喇嘛",
498
+ "喇叭",
499
+ "商量",
500
+ "唾沫",
501
+ "哑巴",
502
+ "哈欠",
503
+ "哆嗦",
504
+ "咳嗽",
505
+ "和尚",
506
+ "告诉",
507
+ "告示",
508
+ "含糊",
509
+ "吓唬",
510
+ "后头",
511
+ "名字",
512
+ "名堂",
513
+ "合同",
514
+ "吆喝",
515
+ "叫唤",
516
+ "口袋",
517
+ "厚道",
518
+ "厉害",
519
+ "千斤",
520
+ "包袱",
521
+ "包涵",
522
+ "匀称",
523
+ "勤快",
524
+ "动静",
525
+ "动弹",
526
+ "功夫",
527
+ "力气",
528
+ "前头",
529
+ "刺猬",
530
+ "刺激",
531
+ "别扭",
532
+ "利落",
533
+ "利索",
534
+ "利害",
535
+ "分析",
536
+ "出息",
537
+ "凑合",
538
+ "凉快",
539
+ "冷战",
540
+ "冤枉",
541
+ "冒失",
542
+ "养活",
543
+ "关系",
544
+ "先生",
545
+ "兄弟",
546
+ "便宜",
547
+ "使唤",
548
+ "佩服",
549
+ "作坊",
550
+ "体面",
551
+ "位置",
552
+ "似的",
553
+ "伙计",
554
+ "休息",
555
+ "什么",
556
+ "人家",
557
+ "亲戚",
558
+ "亲家",
559
+ "���情",
560
+ "云彩",
561
+ "事情",
562
+ "买卖",
563
+ "主意",
564
+ "丫头",
565
+ "丧气",
566
+ "两口",
567
+ "东西",
568
+ "东家",
569
+ "世故",
570
+ "不由",
571
+ "不在",
572
+ "下水",
573
+ "下巴",
574
+ "上头",
575
+ "上司",
576
+ "丈夫",
577
+ "丈人",
578
+ "一辈",
579
+ "那个",
580
+ "菩萨",
581
+ "父亲",
582
+ "母亲",
583
+ "咕噜",
584
+ "邋遢",
585
+ "费用",
586
+ "冤家",
587
+ "甜头",
588
+ "介绍",
589
+ "荒唐",
590
+ "大人",
591
+ "泥鳅",
592
+ "幸福",
593
+ "熟悉",
594
+ "计划",
595
+ "扑腾",
596
+ "蜡烛",
597
+ "姥爷",
598
+ "照顾",
599
+ "喉咙",
600
+ "吉他",
601
+ "弄堂",
602
+ "蚂蚱",
603
+ "凤凰",
604
+ "拖沓",
605
+ "寒碜",
606
+ "糟蹋",
607
+ "倒腾",
608
+ "报复",
609
+ "逻辑",
610
+ "盘缠",
611
+ "喽啰",
612
+ "牢骚",
613
+ "咖喱",
614
+ "扫把",
615
+ "惦记",
616
+ }
617
+ self.must_not_neural_tone_words = {
618
+ "男子",
619
+ "女子",
620
+ "分子",
621
+ "原子",
622
+ "量子",
623
+ "莲子",
624
+ "石子",
625
+ "瓜子",
626
+ "电子",
627
+ "人人",
628
+ "虎虎",
629
+ }
630
+ self.punc = ":,;。?!“”‘’':,;.?!"
631
+
632
+ # the meaning of jieba pos tag: https://blog.csdn.net/weixin_44174352/article/details/113731041
633
+ # e.g.
634
+ # word: "家里"
635
+ # pos: "s"
636
+ # finals: ['ia1', 'i3']
637
+ def _neural_sandhi(self, word: str, pos: str, finals: List[str]) -> List[str]:
638
+ # reduplication words for n. and v. e.g. 奶奶, 试试, 旺旺
639
+ for j, item in enumerate(word):
640
+ if (
641
+ j - 1 >= 0
642
+ and item == word[j - 1]
643
+ and pos[0] in {"n", "v", "a"}
644
+ and word not in self.must_not_neural_tone_words
645
+ ):
646
+ finals[j] = finals[j][:-1] + "5"
647
+ ge_idx = word.find("个")
648
+ if len(word) >= 1 and word[-1] in "吧呢啊呐噻嘛吖嗨呐哦哒额滴哩哟喽啰耶喔诶":
649
+ finals[-1] = finals[-1][:-1] + "5"
650
+ elif len(word) >= 1 and word[-1] in "的地得":
651
+ finals[-1] = finals[-1][:-1] + "5"
652
+ # e.g. 走了, 看着, 去过
653
+ # elif len(word) == 1 and word in "了着过" and pos in {"ul", "uz", "ug"}:
654
+ # finals[-1] = finals[-1][:-1] + "5"
655
+ elif (
656
+ len(word) > 1
657
+ and word[-1] in "们子"
658
+ and pos in {"r", "n"}
659
+ and word not in self.must_not_neural_tone_words
660
+ ):
661
+ finals[-1] = finals[-1][:-1] + "5"
662
+ # e.g. 桌上, 地下, 家里
663
+ elif len(word) > 1 and word[-1] in "上下里" and pos in {"s", "l", "f"}:
664
+ finals[-1] = finals[-1][:-1] + "5"
665
+ # e.g. 上来, 下去
666
+ elif len(word) > 1 and word[-1] in "来去" and word[-2] in "上下进出回过起开":
667
+ finals[-1] = finals[-1][:-1] + "5"
668
+ # 个做量词
669
+ elif (
670
+ ge_idx >= 1
671
+ and (
672
+ word[ge_idx - 1].isnumeric()
673
+ or word[ge_idx - 1] in "几有两半多各整每做是"
674
+ )
675
+ ) or word == "个":
676
+ finals[ge_idx] = finals[ge_idx][:-1] + "5"
677
+ else:
678
+ if (
679
+ word in self.must_neural_tone_words
680
+ or word[-2:] in self.must_neural_tone_words
681
+ ):
682
+ finals[-1] = finals[-1][:-1] + "5"
683
+
684
+ word_list = self._split_word(word)
685
+ finals_list = [finals[: len(word_list[0])], finals[len(word_list[0]) :]]
686
+ for i, word in enumerate(word_list):
687
+ # conventional neural in Chinese
688
+ if (
689
+ word in self.must_neural_tone_words
690
+ or word[-2:] in self.must_neural_tone_words
691
+ ):
692
+ finals_list[i][-1] = finals_list[i][-1][:-1] + "5"
693
+ finals = sum(finals_list, [])
694
+ return finals
695
+
696
+ def _bu_sandhi(self, word: str, finals: List[str]) -> List[str]:
697
+ # e.g. 看不懂
698
+ if len(word) == 3 and word[1] == "不":
699
+ finals[1] = finals[1][:-1] + "5"
700
+ else:
701
+ for i, char in enumerate(word):
702
+ # "不" before tone4 should be bu2, e.g. 不怕
703
+ if char == "不" and i + 1 < len(word) and finals[i + 1][-1] == "4":
704
+ finals[i] = finals[i][:-1] + "2"
705
+ return finals
706
+
707
+ def _yi_sandhi(self, word: str, finals: List[str]) -> List[str]:
708
+ # "一" in number sequences, e.g. 一零零, 二一零
709
+ if word.find("一") != -1 and all(
710
+ [item.isnumeric() for item in word if item != "一"]
711
+ ):
712
+ return finals
713
+ # "一" between reduplication words should be yi5, e.g. 看一看
714
+ elif len(word) == 3 and word[1] == "一" and word[0] == word[-1]:
715
+ finals[1] = finals[1][:-1] + "5"
716
+ # when "一" is ordinal word, it should be yi1
717
+ elif word.startswith("第一"):
718
+ finals[1] = finals[1][:-1] + "1"
719
+ else:
720
+ for i, char in enumerate(word):
721
+ if char == "一" and i + 1 < len(word):
722
+ # "一" before tone4 should be yi2, e.g. 一段
723
+ if finals[i + 1][-1] == "4":
724
+ finals[i] = finals[i][:-1] + "2"
725
+ # "一" before non-tone4 should be yi4, e.g. 一天
726
+ else:
727
+ # "一" 后面如果是标点,还读一声
728
+ if word[i + 1] not in self.punc:
729
+ finals[i] = finals[i][:-1] + "4"
730
+ return finals
731
+
732
+ def _split_word(self, word: str) -> List[str]:
733
+ word_list = jieba.cut_for_search(word)
734
+ word_list = sorted(word_list, key=lambda i: len(i), reverse=False)
735
+ first_subword = word_list[0]
736
+ first_begin_idx = word.find(first_subword)
737
+ if first_begin_idx == 0:
738
+ second_subword = word[len(first_subword) :]
739
+ new_word_list = [first_subword, second_subword]
740
+ else:
741
+ second_subword = word[: -len(first_subword)]
742
+ new_word_list = [second_subword, first_subword]
743
+ return new_word_list
744
+
745
+ def _three_sandhi(self, word: str, finals: List[str]) -> List[str]:
746
+ if len(word) == 2 and self._all_tone_three(finals):
747
+ finals[0] = finals[0][:-1] + "2"
748
+ elif len(word) == 3:
749
+ word_list = self._split_word(word)
750
+ if self._all_tone_three(finals):
751
+ # disyllabic + monosyllabic, e.g. 蒙古/包
752
+ if len(word_list[0]) == 2:
753
+ finals[0] = finals[0][:-1] + "2"
754
+ finals[1] = finals[1][:-1] + "2"
755
+ # monosyllabic + disyllabic, e.g. 纸/老虎
756
+ elif len(word_list[0]) == 1:
757
+ finals[1] = finals[1][:-1] + "2"
758
+ else:
759
+ finals_list = [finals[: len(word_list[0])], finals[len(word_list[0]) :]]
760
+ if len(finals_list) == 2:
761
+ for i, sub in enumerate(finals_list):
762
+ # e.g. 所有/人
763
+ if self._all_tone_three(sub) and len(sub) == 2:
764
+ finals_list[i][0] = finals_list[i][0][:-1] + "2"
765
+ # e.g. 好/喜欢
766
+ elif (
767
+ i == 1
768
+ and not self._all_tone_three(sub)
769
+ and finals_list[i][0][-1] == "3"
770
+ and finals_list[0][-1][-1] == "3"
771
+ ):
772
+ finals_list[0][-1] = finals_list[0][-1][:-1] + "2"
773
+ finals = sum(finals_list, [])
774
+ # split idiom into two words who's length is 2
775
+ elif len(word) == 4:
776
+ finals_list = [finals[:2], finals[2:]]
777
+ finals = []
778
+ for sub in finals_list:
779
+ if self._all_tone_three(sub):
780
+ sub[0] = sub[0][:-1] + "2"
781
+ finals += sub
782
+
783
+ return finals
784
+
785
+ def _all_tone_three(self, finals: List[str]) -> bool:
786
+ return all(x[-1] == "3" for x in finals)
787
+
788
+ # merge "不" and the word behind it
789
+ # if don't merge, "不" sometimes appears alone according to jieba, which may occur sandhi error
790
+ def _merge_bu(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
791
+ new_seg = []
792
+ last_word = ""
793
+ for word, pos in seg:
794
+ if last_word == "不":
795
+ word = last_word + word
796
+ if word != "不":
797
+ new_seg.append((word, pos))
798
+ last_word = word[:]
799
+ if last_word == "不":
800
+ new_seg.append((last_word, "d"))
801
+ last_word = ""
802
+ return new_seg
803
+
804
+ # function 1: merge "一" and reduplication words in it's left and right, e.g. "听","一","听" ->"听一听"
805
+ # function 2: merge single "一" and the word behind it
806
+ # if don't merge, "一" sometimes appears alone according to jieba, which may occur sandhi error
807
+ # e.g.
808
+ # input seg: [('听', 'v'), ('一', 'm'), ('听', 'v')]
809
+ # output seg: [['听一听', 'v']]
810
+ def _merge_yi(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
811
+ new_seg = []
812
+ # function 1
813
+ for i, (word, pos) in enumerate(seg):
814
+ if (
815
+ i - 1 >= 0
816
+ and word == "一"
817
+ and i + 1 < len(seg)
818
+ and seg[i - 1][0] == seg[i + 1][0]
819
+ and seg[i - 1][1] == "v"
820
+ ):
821
+ new_seg[i - 1][0] = new_seg[i - 1][0] + "一" + new_seg[i - 1][0]
822
+ else:
823
+ if (
824
+ i - 2 >= 0
825
+ and seg[i - 1][0] == "一"
826
+ and seg[i - 2][0] == word
827
+ and pos == "v"
828
+ ):
829
+ continue
830
+ else:
831
+ new_seg.append([word, pos])
832
+ seg = new_seg
833
+ new_seg = []
834
+ # function 2
835
+ for i, (word, pos) in enumerate(seg):
836
+ if new_seg and new_seg[-1][0] == "一":
837
+ new_seg[-1][0] = new_seg[-1][0] + word
838
+ else:
839
+ new_seg.append([word, pos])
840
+ return new_seg
841
+
842
+ # the first and the second words are all_tone_three
843
+ def _merge_continuous_three_tones(
844
+ self, seg: List[Tuple[str, str]]
845
+ ) -> List[Tuple[str, str]]:
846
+ new_seg = []
847
+ sub_finals_list = [
848
+ lazy_pinyin(word, neutral_tone_with_five=True, style=Style.FINALS_TONE3)
849
+ for (word, pos) in seg
850
+ ]
851
+ assert len(sub_finals_list) == len(seg)
852
+ merge_last = [False] * len(seg)
853
+ for i, (word, pos) in enumerate(seg):
854
+ if (
855
+ i - 1 >= 0
856
+ and self._all_tone_three(sub_finals_list[i - 1])
857
+ and self._all_tone_three(sub_finals_list[i])
858
+ and not merge_last[i - 1]
859
+ ):
860
+ # if the last word is reduplication, not merge, because reduplication need to be _neural_sandhi
861
+ if (
862
+ not self._is_reduplication(seg[i - 1][0])
863
+ and len(seg[i - 1][0]) + len(seg[i][0]) <= 3
864
+ ):
865
+ new_seg[-1][0] = new_seg[-1][0] + seg[i][0]
866
+ merge_last[i] = True
867
+ else:
868
+ new_seg.append([word, pos])
869
+ else:
870
+ new_seg.append([word, pos])
871
+
872
+ return new_seg
873
+
874
+ def _is_reduplication(self, word: str) -> bool:
875
+ return len(word) == 2 and word[0] == word[1]
876
+
877
+ # the last char of first word and the first char of second word is tone_three
878
+ def _merge_continuous_three_tones_2(
879
+ self, seg: List[Tuple[str, str]]
880
+ ) -> List[Tuple[str, str]]:
881
+ new_seg = []
882
+ sub_finals_list = [
883
+ lazy_pinyin(word, neutral_tone_with_five=True, style=Style.FINALS_TONE3)
884
+ for (word, pos) in seg
885
+ ]
886
+ assert len(sub_finals_list) == len(seg)
887
+ merge_last = [False] * len(seg)
888
+ for i, (word, pos) in enumerate(seg):
889
+ if (
890
+ i - 1 >= 0
891
+ and sub_finals_list[i - 1][-1][-1] == "3"
892
+ and sub_finals_list[i][0][-1] == "3"
893
+ and not merge_last[i - 1]
894
+ ):
895
+ # if the last word is reduplication, not merge, because reduplication need to be _neural_sandhi
896
+ if (
897
+ not self._is_reduplication(seg[i - 1][0])
898
+ and len(seg[i - 1][0]) + len(seg[i][0]) <= 3
899
+ ):
900
+ new_seg[-1][0] = new_seg[-1][0] + seg[i][0]
901
+ merge_last[i] = True
902
+ else:
903
+ new_seg.append([word, pos])
904
+ else:
905
+ new_seg.append([word, pos])
906
+ return new_seg
907
+
908
+ def _merge_er(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
909
+ new_seg = []
910
+ for i, (word, pos) in enumerate(seg):
911
+ if i - 1 >= 0 and word == "儿" and seg[i - 1][0] != "#":
912
+ new_seg[-1][0] = new_seg[-1][0] + seg[i][0]
913
+ else:
914
+ new_seg.append([word, pos])
915
+ return new_seg
916
+
917
+ def _merge_reduplication(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
918
+ new_seg = []
919
+ for i, (word, pos) in enumerate(seg):
920
+ if new_seg and word == new_seg[-1][0]:
921
+ new_seg[-1][0] = new_seg[-1][0] + seg[i][0]
922
+ else:
923
+ new_seg.append([word, pos])
924
+ return new_seg
925
+
926
+ def pre_merge_for_modify(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
927
+ seg = self._merge_bu(seg)
928
+ try:
929
+ seg = self._merge_yi(seg)
930
+ except:
931
+ print("_merge_yi failed")
932
+ seg = self._merge_reduplication(seg)
933
+ seg = self._merge_continuous_three_tones(seg)
934
+ seg = self._merge_continuous_three_tones_2(seg)
935
+ seg = self._merge_er(seg)
936
+ return seg
937
+
938
+ def modified_tone(self, word: str, pos: str, finals: List[str]) -> List[str]:
939
+ finals = self._bu_sandhi(word, finals)
940
+ finals = self._yi_sandhi(word, finals)
941
+ finals = self._neural_sandhi(word, pos, finals)
942
+ finals = self._three_sandhi(word, finals)
943
+ return finals
944
+
945
+
946
+ punctuation = ["!", "?", "…", ",", ".", "'", "-"]
947
+ pu_symbols = punctuation + ["SP", "UNK"]
948
+ pad = "_"
949
+
950
+ # chinese
951
+ zh_symbols = [
952
+ "E",
953
+ "En",
954
+ "a",
955
+ "ai",
956
+ "an",
957
+ "ang",
958
+ "ao",
959
+ "b",
960
+ "c",
961
+ "ch",
962
+ "d",
963
+ "e",
964
+ "ei",
965
+ "en",
966
+ "eng",
967
+ "er",
968
+ "f",
969
+ "g",
970
+ "h",
971
+ "i",
972
+ "i0",
973
+ "ia",
974
+ "ian",
975
+ "iang",
976
+ "iao",
977
+ "ie",
978
+ "in",
979
+ "ing",
980
+ "iong",
981
+ "ir",
982
+ "iu",
983
+ "j",
984
+ "k",
985
+ "l",
986
+ "m",
987
+ "n",
988
+ "o",
989
+ "ong",
990
+ "ou",
991
+ "p",
992
+ "q",
993
+ "r",
994
+ "s",
995
+ "sh",
996
+ "t",
997
+ "u",
998
+ "ua",
999
+ "uai",
1000
+ "uan",
1001
+ "uang",
1002
+ "ui",
1003
+ "un",
1004
+ "uo",
1005
+ "v",
1006
+ "van",
1007
+ "ve",
1008
+ "vn",
1009
+ "w",
1010
+ "x",
1011
+ "y",
1012
+ "z",
1013
+ "zh",
1014
+ "AA",
1015
+ "EE",
1016
+ "OO",
1017
+ ]
1018
+ num_zh_tones = 6
1019
+
1020
+ # japanese
1021
+ ja_symbols = [
1022
+ "N",
1023
+ "a",
1024
+ "a:",
1025
+ "b",
1026
+ "by",
1027
+ "ch",
1028
+ "d",
1029
+ "dy",
1030
+ "e",
1031
+ "e:",
1032
+ "f",
1033
+ "g",
1034
+ "gy",
1035
+ "h",
1036
+ "hy",
1037
+ "i",
1038
+ "i:",
1039
+ "j",
1040
+ "k",
1041
+ "ky",
1042
+ "m",
1043
+ "my",
1044
+ "n",
1045
+ "ny",
1046
+ "o",
1047
+ "o:",
1048
+ "p",
1049
+ "py",
1050
+ "q",
1051
+ "r",
1052
+ "ry",
1053
+ "s",
1054
+ "sh",
1055
+ "t",
1056
+ "ts",
1057
+ "ty",
1058
+ "u",
1059
+ "u:",
1060
+ "w",
1061
+ "y",
1062
+ "z",
1063
+ "zy",
1064
+ ]
1065
+ num_ja_tones = 2
1066
+
1067
+ # English
1068
+ en_symbols = [
1069
+ "aa",
1070
+ "ae",
1071
+ "ah",
1072
+ "ao",
1073
+ "aw",
1074
+ "ay",
1075
+ "b",
1076
+ "ch",
1077
+ "d",
1078
+ "dh",
1079
+ "eh",
1080
+ "er",
1081
+ "ey",
1082
+ "f",
1083
+ "g",
1084
+ "hh",
1085
+ "ih",
1086
+ "iy",
1087
+ "jh",
1088
+ "k",
1089
+ "l",
1090
+ "m",
1091
+ "n",
1092
+ "ng",
1093
+ "ow",
1094
+ "oy",
1095
+ "p",
1096
+ "r",
1097
+ "s",
1098
+ "sh",
1099
+ "t",
1100
+ "th",
1101
+ "uh",
1102
+ "uw",
1103
+ "V",
1104
+ "w",
1105
+ "y",
1106
+ "z",
1107
+ "zh",
1108
+ ]
1109
+ num_en_tones = 4
1110
+
1111
+ # combine all symbols
1112
+ normal_symbols = sorted(set(zh_symbols + ja_symbols + en_symbols))
1113
+ symbols = [pad] + normal_symbols + pu_symbols
1114
+ sil_phonemes_ids = [symbols.index(i) for i in pu_symbols]
1115
+
1116
+ # combine all tones
1117
+ num_tones = num_zh_tones + num_ja_tones + num_en_tones
1118
+
1119
+ # language maps
1120
+ language_id_map = {"ZH": 0, "JP": 1, "EN": 2}
1121
+ num_languages = len(language_id_map.keys())
1122
+
1123
+ language_tone_start_map = {
1124
+ "ZH": 0,
1125
+ "JP": num_zh_tones,
1126
+ "EN": num_zh_tones + num_ja_tones,
1127
+ }
1128
+
1129
+ current_file_path = os.path.dirname(__file__)
1130
+ pinyin_to_symbol_map = {
1131
+ line.split("\t")[0]: line.strip().split("\t")[1]
1132
+ for line in open(os.path.join(current_file_path, "opencpop-strict.txt")).readlines()
1133
+ }
1134
+
1135
+
1136
+
1137
+
1138
+ rep_map = {
1139
+ ":": ",",
1140
+ ";": ",",
1141
+ ",": ",",
1142
+ "。": ".",
1143
+ "!": "!",
1144
+ "?": "?",
1145
+ "\n": ".",
1146
+ "·": ",",
1147
+ "、": ",",
1148
+ "...": "…",
1149
+ "$": ".",
1150
+ "“": "'",
1151
+ "”": "'",
1152
+ '"': "'",
1153
+ "‘": "'",
1154
+ "’": "'",
1155
+ "(": "'",
1156
+ ")": "'",
1157
+ "(": "'",
1158
+ ")": "'",
1159
+ "《": "'",
1160
+ "》": "'",
1161
+ "【": "'",
1162
+ "】": "'",
1163
+ "[": "'",
1164
+ "]": "'",
1165
+ "—": "-",
1166
+ "~": "-",
1167
+ "~": "-",
1168
+ "「": "'",
1169
+ "」": "'",
1170
+ }
1171
+
1172
+ tone_modifier = ToneSandhi()
1173
+
1174
+
1175
+ def replace_punctuation(text):
1176
+ text = text.replace("嗯", "恩").replace("呣", "母")
1177
+ pattern = re.compile("|".join(re.escape(p) for p in rep_map.keys()))
1178
+
1179
+ replaced_text = pattern.sub(lambda x: rep_map[x.group()], text)
1180
+
1181
+ replaced_text = re.sub(
1182
+ r"[^\u4e00-\u9fa5" + "".join(punctuation) + r"]+", "", replaced_text
1183
+ )
1184
+
1185
+ return replaced_text
1186
+
1187
+
1188
+ def g2p(text):
1189
+ pattern = r"(?<=[{0}])\s*".format("".join(punctuation))
1190
+ sentences = [i for i in re.split(pattern, text) if i.strip() != ""]
1191
+ phones, tones, word2ph = _g2p(sentences)
1192
+ assert sum(word2ph) == len(phones)
1193
+ assert len(word2ph) == len(text) # Sometimes it will crash,you can add a try-catch.
1194
+ phones = ["_"] + phones + ["_"]
1195
+ tones = [0] + tones + [0]
1196
+ word2ph = [1] + word2ph + [1]
1197
+ return phones, tones, word2ph
1198
+
1199
+
1200
+ def _get_initials_finals(word):
1201
+ initials = []
1202
+ finals = []
1203
+ orig_initials = lazy_pinyin(word, neutral_tone_with_five=True, style=Style.INITIALS)
1204
+ orig_finals = lazy_pinyin(
1205
+ word, neutral_tone_with_five=True, style=Style.FINALS_TONE3
1206
+ )
1207
+ for c, v in zip(orig_initials, orig_finals):
1208
+ initials.append(c)
1209
+ finals.append(v)
1210
+ return initials, finals
1211
+
1212
+
1213
+ def _g2p(segments):
1214
+ phones_list = []
1215
+ tones_list = []
1216
+ word2ph = []
1217
+ for seg in segments:
1218
+ # Replace all English words in the sentence
1219
+ seg = re.sub("[a-zA-Z]+", "", seg)
1220
+ seg_cut = psg.lcut(seg)
1221
+ initials = []
1222
+ finals = []
1223
+ seg_cut = tone_modifier.pre_merge_for_modify(seg_cut)
1224
+ for word, pos in seg_cut:
1225
+ if pos == "eng":
1226
+ continue
1227
+ sub_initials, sub_finals = _get_initials_finals(word)
1228
+ sub_finals = tone_modifier.modified_tone(word, pos, sub_finals)
1229
+ initials.append(sub_initials)
1230
+ finals.append(sub_finals)
1231
+
1232
+ # assert len(sub_initials) == len(sub_finals) == len(word)
1233
+ initials = sum(initials, [])
1234
+ finals = sum(finals, [])
1235
+ #
1236
+ for c, v in zip(initials, finals):
1237
+ raw_pinyin = c + v
1238
+ # NOTE: post process for pypinyin outputs
1239
+ # we discriminate i, ii and iii
1240
+ if c == v:
1241
+ assert c in punctuation
1242
+ phone = [c]
1243
+ tone = "0"
1244
+ word2ph.append(1)
1245
+ else:
1246
+ v_without_tone = v[:-1]
1247
+ tone = v[-1]
1248
+
1249
+ pinyin = c + v_without_tone
1250
+ assert tone in "12345"
1251
+
1252
+ if c:
1253
+ # 多音节
1254
+ v_rep_map = {
1255
+ "uei": "ui",
1256
+ "iou": "iu",
1257
+ "uen": "un",
1258
+ }
1259
+ if v_without_tone in v_rep_map.keys():
1260
+ pinyin = c + v_rep_map[v_without_tone]
1261
+ else:
1262
+ # 单音节
1263
+ pinyin_rep_map = {
1264
+ "ing": "ying",
1265
+ "i": "yi",
1266
+ "in": "yin",
1267
+ "u": "wu",
1268
+ }
1269
+ if pinyin in pinyin_rep_map.keys():
1270
+ pinyin = pinyin_rep_map[pinyin]
1271
+ else:
1272
+ single_rep_map = {
1273
+ "v": "yu",
1274
+ "e": "e",
1275
+ "i": "y",
1276
+ "u": "w",
1277
+ }
1278
+ if pinyin[0] in single_rep_map.keys():
1279
+ pinyin = single_rep_map[pinyin[0]] + pinyin[1:]
1280
+
1281
+ assert pinyin in pinyin_to_symbol_map.keys(), (pinyin, seg, raw_pinyin)
1282
+ phone = pinyin_to_symbol_map[pinyin].split(" ")
1283
+ word2ph.append(len(phone))
1284
+
1285
+ phones_list += phone
1286
+ tones_list += [int(tone)] * len(phone)
1287
+ return phones_list, tones_list, word2ph
1288
+
1289
+
1290
+ def text_normalize(text):
1291
+ numbers = re.findall(r"\d+(?:\.?\d+)?", text)
1292
+ for number in numbers:
1293
+ text = text.replace(number, cn2an.an2cn(number), 1)
1294
+ text = replace_punctuation(text)
1295
+ return text
1296
+
1297
+ def get_bert_feature(
1298
+ text,
1299
+ word2ph,
1300
+ style_text=None,
1301
+ style_weight=0.7,
1302
+ ):
1303
+ global bert_model
1304
+
1305
+ # 使用tokenizer处理输入文本
1306
+ inputs = tokenizer(text, return_tensors="np",padding="max_length",truncation=True,max_length=256)
1307
+
1308
+ # 运行ONNX模型
1309
+ start_time = time.time()
1310
+ res = bert_model.inference([inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"]])
1311
+ flow_time = time.time() - start_time
1312
+ print(f"bert 运行时间: {flow_time:.4f} 秒")
1313
+ # 处理输出
1314
+ # res = np.concatenate(res[0], -1)[0]
1315
+ res = res[0][0]
1316
+
1317
+ if style_text:
1318
+ assert False # TODO
1319
+ # style_inputs = tokenizer(style_text, return_tensors="np")
1320
+ # style_onnx_inputs = {name: style_inputs[name] for name in bert_model.get_inputs()}
1321
+ # style_res = bert_model.run(None, style_onnx_inputs)
1322
+ # style_hidden_states = style_res[-1]
1323
+ # style_res = np.concatenate(style_hidden_states[-3:-2], -1)[0]
1324
+ # style_res_mean = style_res.mean(0)
1325
+
1326
+ assert len(word2ph) == len(text) + 2
1327
+ word2phone = word2ph
1328
+ phone_level_feature = []
1329
+ for i in range(len(word2phone)):
1330
+ if style_text:
1331
+ repeat_feature = (
1332
+ res[i].repeat(word2phone[i], 1) * (1 - style_weight)
1333
+ # + style_res_mean.repeat(word2phone[i], 1) * style_weight
1334
+ )
1335
+ else:
1336
+ repeat_feature = np.tile(res[i], (word2phone[i], 1))
1337
+ phone_level_feature.append(repeat_feature)
1338
+
1339
+ phone_level_feature = np.concatenate(phone_level_feature, axis=0)
1340
+
1341
+ return phone_level_feature.T
1342
+
1343
+ def clean_text(text, language):
1344
+ norm_text = text_normalize(text)
1345
+ phones, tones, word2ph = g2p(norm_text)
1346
+ return norm_text, phones, tones, word2ph
1347
+
1348
+
1349
+ def clean_text_bert(text, language):
1350
+ norm_text = text_normalize(text)
1351
+ phones, tones, word2ph = g2p(norm_text)
1352
+ bert = get_bert_feature(norm_text, word2ph)
1353
+ return phones, tones, bert
1354
+
1355
+ _symbol_to_id = {s: i for i, s in enumerate(symbols)}
1356
+
1357
+ def cleaned_text_to_sequence(cleaned_text, tones, language):
1358
+ """Converts a string of text to a sequence of IDs corresponding to the symbols in the text.
1359
+ Args:
1360
+ text: string to convert to a sequence
1361
+ Returns:
1362
+ List of integers corresponding to the symbols in the text
1363
+ """
1364
+ phones = [_symbol_to_id[symbol] for symbol in cleaned_text]
1365
+ tone_start = language_tone_start_map[language]
1366
+ tones = [i + tone_start for i in tones]
1367
+ lang_id = language_id_map[language]
1368
+ lang_ids = [lang_id for i in phones]
1369
+ return phones, tones, lang_ids
1370
+
1371
+ def text_to_sequence(text, language):
1372
+ norm_text, phones, tones, word2ph = clean_text(text, language)
1373
+ return cleaned_text_to_sequence(phones, tones, language)
1374
+
1375
+ def intersperse(lst, item):
1376
+ result = [item] * (len(lst) * 2 + 1)
1377
+ result[1::2] = lst
1378
+ return result
1379
+
1380
+ def get_text(text, language_str, style_text=None, style_weight=0.7, add_blank=False):
1381
+ # 在此处实现当前版本的get_text
1382
+ norm_text, phone, tone, word2ph = clean_text(text, language_str)
1383
+ phone, tone, language = cleaned_text_to_sequence(phone, tone, language_str)
1384
+
1385
+ if add_blank:
1386
+ phone = intersperse(phone, 0)
1387
+ tone = intersperse(tone, 0)
1388
+ language = intersperse(language, 0)
1389
+ for i in range(len(word2ph)):
1390
+ word2ph[i] = word2ph[i] * 2
1391
+ word2ph[0] += 1
1392
+ bert_ori = get_bert_feature(
1393
+ norm_text, word2ph, style_text, style_weight
1394
+ )
1395
+ del word2ph
1396
+ assert bert_ori.shape[-1] == len(phone), phone
1397
+
1398
+ if language_str == "ZH":
1399
+ bert = bert_ori
1400
+ ja_bert = np.zeros((1024, len(phone)))
1401
+ en_bert = np.zeros((1024, len(phone)))
1402
+ elif language_str == "JP":
1403
+ bert = np.zeros((1024, len(phone)))
1404
+ ja_bert = bert_ori
1405
+ en_bert = np.zeros((1024, len(phone)))
1406
+ elif language_str == "EN":
1407
+ bert = np.zeros((1024, len(phone)))
1408
+ ja_bert = np.zeros((1024, len(phone)))
1409
+ en_bert = bert_ori
1410
+ else:
1411
+ raise ValueError("language_str should be ZH, JP or EN")
1412
+
1413
+ assert bert.shape[-1] == len(
1414
+ phone
1415
+ ), f"Bert seq len {bert.shape[-1]} != {len(phone)}"
1416
+ phone = np.array(phone)
1417
+ tone = np.array(tone)
1418
+ language = np.array(language)
1419
+ return bert, ja_bert, en_bert, phone, tone, language
1420
+
1421
+ if __name__ == "__main__":
1422
+ name = "lx"
1423
+ model_prefix = f"onnx/{name}/{name}_"
1424
+ bert_path = "./bert/chinese-roberta-wwm-ext-large"
1425
+ flow_dec_input_len = 1024
1426
+ model_sample_rate = 44100
1427
+ # text = "不必说碧绿的菜畦,光滑的石井栏,高大的皂荚树,紫红的桑葚;也不必说鸣蝉在树叶里长吟,肥胖的黄蜂伏在菜花上,轻捷的叫天子(云雀)忽然从草间直窜向云霄里去了。单是周围的短短的泥墙根一带,就有无限趣味。油蛉在这里低唱, 蟋蟀们在这里弹琴。翻开断砖来,有时会遇见蜈蚣;还有斑蝥,倘若用手指按住它的脊梁,便会“啪”的一声,从后窍喷出一阵烟雾。何首乌藤和木莲藤缠络着,木莲有莲房一般的果实,何首乌有臃肿的根。有人说,何首乌根是有像人形的,吃了便可以成仙,我于是常常拔它起来,牵连不断地拔起来,也曾因此弄坏了泥墙,却从来没有见过有一块根像人样。如果不怕刺,还可以摘到覆盆子,像小珊瑚珠攒成的小球,又酸又甜,色味都比桑葚要好得远。"
1428
+ text = "我个人认为,这个意大利面就应该拌42号混凝土,因为这个螺丝钉的长度,它很容易会直接影响到挖掘机的扭矩你知道吧。你往里砸的时候,一瞬间它就会产生大量的高能蛋白,俗称ufo,会严重影响经济的发展,甚至对整个太平洋以及充电器都会造成一定的核污染。你知道啊?再者说,根据这个勾股定理,你可以很容易地推断出人工饲养的东条英机,它是可以捕获野生的三角函数的。所以说这个秦始皇的切面是否具有放射性啊,特朗普的N次方是否含有沉淀物,都不影响这个沃尔玛跟维尔康在南极会合。"
1429
+
1430
+ global bert_model,tokenizer
1431
+ tokenizer = AutoTokenizer.from_pretrained(bert_path)
1432
+ bert_model = RKNNLite(verbose=False)
1433
+ bert_model.load_rknn(bert_path + "/model.rknn")
1434
+ bert_model.init_runtime()
1435
+ model = InferenceSession({
1436
+ "enc": model_prefix + "enc_p.onnx",
1437
+ "emb_g": model_prefix + "emb.onnx",
1438
+ "dp": model_prefix + "dp.onnx",
1439
+ "sdp": model_prefix + "sdp.onnx",
1440
+ "flow": model_prefix + "flow.onnx",
1441
+ "dec": model_prefix + "dec.rknn",
1442
+ })
1443
+
1444
+ # 从句号分割
1445
+ text_seg = re.split(r'(?<=[。!?;])', text)
1446
+ output_acc = np.array([0.0])
1447
+
1448
+ for text in text_seg:
1449
+ bert, ja_bert, en_bert, phone, tone, language = get_text(text, "ZH", add_blank=True)
1450
+ bert = np.transpose(bert)
1451
+ ja_bert = np.transpose(ja_bert)
1452
+ en_bert = np.transpose(en_bert)
1453
+
1454
+ sid = np.array([0])
1455
+ vqidx = np.array([0])
1456
+
1457
+ output = model(phone, tone, language, bert, ja_bert, en_bert, vqidx, sid ,
1458
+ rknn_pad_to=flow_dec_input_len,
1459
+ seed=114514,
1460
+ seq_noise_scale=0.8,
1461
+ sdp_noise_scale=0.6,
1462
+ length_scale=1,
1463
+ sdp_ratio=0,
1464
+ )[0,0]
1465
+ output_acc = np.concatenate([output_acc, output])
1466
+ print(f"已生成长度: {len(output_acc) / model_sample_rate:.2f} 秒")
1467
+
1468
+ sf.write('output.wav', output_acc, model_sample_rate)
1469
+ print("已生成output.wav")