WMT有效方法
Back-Translation
zh2en
back-translation with beam search
de2en
back-translation with sampling
DLV
30层深层
Distillation by ensemble teacher
用DLV 8model的ensemble结果做数据然后在8个模型上finetuning
之后再挑选模型ensemble
Hypothesis combination
多个ensemble模型的nbest拿出来做reranking