Boxiao's blog

Boxiao Zhang

Thinking will not overcome fear but action will.

Train in Deep Learning

2019-04-18

自己看的记录文档 epoch、stepstotal = total * epochtrain_steps = total*5/(batchsize*4)/workergpu finetuning finetuning是使用一样的词表进行训练模型(output dir)里面放置预训练好的模型..

num translation

2019-04-17

施工中对齐关系对齐关系的获取是这里面的难点我们的翻译模型用的是谷歌的Transformer模型 tensor2tensor==1.0.14 tensorflow==1.4.0 CUDA 8.0 Python3 Transformer模型最大的特点是对注意力机制$Attention..

FastText

2019-04-15

https://blog.csdn.net/john_bh/article/details/79268850 FastText：快速的文本分类器一、简介fasttext是facebook开源的一个词向量与文本分类工具，在2016年开源，典型应用场景是“带监督的文本分类问题”。提供简单而高效..

Word2Vec

2019-04-15

CBOM、Skpi-gram、层次softmax

前人栽树，后人乘凉 word2vec、负采样、层序softmax 通俗理解word2vec 不懂word2vec，还敢说自己是做NLP？ word2vec 中的数学原理详解（五）基于 Negative Sampling 的模型最后一个讲的很细致，强烈推荐动机：Word2Vec以及wor..

CNN 多通道卷积核 channel and conv

2019-04-11

多通道卷积计算理解

自己看的记录笔记写的很差勿看参考文献好多博客都是类似的内容希望没有写错 https://blog.csdn.net/jacke121/article/details/80188821 https://blog.csdn.net/haoji007/article/details/81..

self-attention 相对位置 relative position.

2019-04-08

Self-Attention with Relative Position Representations

自己看的记录笔记写的很差勿看参考文献 https://blog.csdn.net/luoxiaolin_love/article/details/82258069 相对位置NAACL 2018的论文读论文读的太少了作者认为正弦余弦位置向量效果是比可学习的更好的，因为他可以看到..

深度学习概念：梯度弥散梯度爆炸过拟合 batchsize

2019-03-18

梯度弥散梯度爆炸过拟合 batchsize Adam Attention

自己看的记录笔记参考文献解决梯度消失和梯度弥散的方法注意力机制（Attention Mechanism）在自然语言处理中的应用真正的完全图解Seq2Seq Attention模型优化方法总结：SGD，Momentum，AdaGrad，RMSProp，Adam 梯度下降优化算法总结 ..

Transformer norm 先做后做

2019-03-14

Transformer layer normalization 先做后做

并不适合阅读的个人文档。 Normalization 先做和后做后做是传统做法后做是 input x , x residual 到后面, x 进行function(multi-head attention), f(x) 进行dropout，df(x) + x （residu..

Boxiao Zhang

Thinking will not overcome fear but action will.

终端 Git NMT NLP TensorFlow Transformer 深度学习

Train in Deep Learning

num translation

FastText

Word2Vec

CBOM、Skpi-gram、层次softmax

CNN 多通道卷积核 channel and conv

多通道卷积计算理解

self-attention 相对位置 relative position.

Self-Attention with Relative Position Representations

深度学习概念：梯度弥散 梯度爆炸 过拟合 batchsize

梯度弥散 梯度爆炸 过拟合 batchsize Adam Attention

Transformer norm 先做后做

Transformer layer normalization 先做后做

Boxiao Zhang

Friends

深度学习概念：梯度弥散梯度爆炸过拟合 batchsize

梯度弥散梯度爆炸过拟合 batchsize Adam Attention