自动问答之《社区问答技术调查》

作者：szx_spark我对《A Survey of Community Question Answering》论文中的部分内容进行了翻译，该文章对cQA下了定义，并与QA进行对比，并介绍了现有的一些技术方法。A Survey of Community Question Answering1 Introduction这次调查旨在讨论一些社区问答相关的挑战以及相关的方法。Section ...

weixin_30553065

150人浏览 · 2018-02-07 00:47:00

weixin_30553065 · 2018-02-07 00:47:00 发布

作者：szx_spark

我对《A Survey of Community Question Answering》论文中的部分内容进行了翻译，该文章对cQA下了定义，并与QA进行对比，并介绍了现有的一些技术方法。

A Survey of Community Question Answering

1 Introduction

这次调查旨在讨论一些社区问答相关的挑战以及相关的方法。
Section 2 对社区问答所具备的属性作出定义，并且将其与传统的QA任务进行对比。
Section 3 针对社区问答领域的任务作出定义，并且介绍几个用于解决问题的方法。
Section 4 介绍了论文中针对section 3提出的任务，所做实验的环境设置以及所使用的数据集。
Section 5 对不同方法的效果做出总结。
Section 6 对结果进行一般性的讨论。
Section 7 全文总结。

2 Community QA vs QA

一个社区论坛大体上包括以下几点：

提问者提出问题，通过审核，符合规范后发布出去。该问题可以被其他回答者看见。
其他使用通过两种方式互动：
- 通过回复相关或者不相关的答案进行互动。
- 通过对其他人的回答进行赞成与反对的投票进行互动
一些社区QA论坛也允许用户对问题进行赞成反对的投票，或者对问题进行评论，以询问更多的细节。
最后，如果提问人满意了。他将会对最好的回答进行标记。

访问cQA论坛的人常常寻找一些复杂问题的答案，而不是事实型问题。大多数QA任务对简单的单句查询进行处理，并且回答也都是简单的事实。这些问题比较直接、很少包含噪声。而cQA任务的问题往往不是单一句子，经常包含许多噪声( Eg, taken from yahoo answers, I have an exam tomorrow. But I don’t know anything. Please recommend a tutorial for calculus ?? )。而且，前者的答案源于知识库，后者的答案是用户的回复，因此cQA任务的问题十分开放。社区论坛中的答案质量一般没有处理，但是它提供了赞成与反对的票数，已经得分等等数据。

3 Tasks

3.1 Question Semantic Matching

论坛中的问题可能很相似，需要对问题进行匹配。比如下述例子：

What is the most populous state in India ?
Which state in India has the highest population ?

对问题进行语义匹配可以减少冗余性。

解决该问题有如下方法

3.1.1 Okapi BM25 (Robertson et al., 1996)

BM25算法比较常见，不翻译了。

3.1.2 TransLM (Xue et al., 2008)

给定两个问题\(q^1\)和\(q^2\)，应用翻译算法，根据语言模型，基于词袋假设，计算条件概率\(P(q^1|q^2)\)。得分是两个概率的平均值。

\[ P(q^1|q^2) =\prod_{w \in q^1}P(w|q^2) \]

\[P(w|q^2)=\frac{|q^2|}{|q^2|+\lambda} \cdot P_{mx}(w|q^2)+\frac{\lambda}{|q^2|+\lambda} \cdot P_{ml}(w|C)\]

\[P_{mx}(w|q^2)=(1- \beta)P_{ml}(w|q^2)+\beta\sum_{t\in q^2}P_{trans}(w|t)\cdot P_{ml}(t|q^2)\]
这里，\(P_{ml}(w|C)\)是最大似然估计，计算\(\frac{\#(w,C)}{|C|}\)，#表示频率。\(\lambda\)是平滑因子，\(\beta\)控制着\(P_{ml}\)和\(P_{trans}\)的贡献度。
\(P_{trans}(w^1|w^2)\)应用于翻译模型，计算给定一种语言\(w^2\)生成另一种语言\(w^1\)的概率。Eg. 给定一对句子\(S=\{(e^i, f^i)\}_{i=1}^{N}\)，\(e\)表示英语单词，\(f\)表示法语单词，概率计算如下：

\[P(f|e)=\frac{1}{Z(e)}\sum_{i=1}^Nc(f|e;e^i,f^i)\]

\[c(f|e;e^i,f^i)=\frac{P(f|e)}{\sum_{w \in e^i}P(f|w)}\cdot\#(f,f^i)\#(e,e^i)\]
\(Z(e)\)是归一化常量。\(P(f|e)\)按照上述公式，应用EM方法进行计算。
现在，将问题映射为计算\(P_{trans}\)的问题。

3.1.3 Word Embedding Based Methods

原文没细讲，主要思想是通过词向量得到两个问题的表示，之后通过网络有监督地训练，计算相似性。

3.1.4 Neural Network Attention with token alignment (Parikh et al., 2016)

给定问题\(q^1=[q_1^1,q_2^1,…,q_n^1]\)，以及\(q^2=[q_1^2,q_2^2,…,q_m^2]\)，对每个单词进行embedding，得到新的向量 \(\hat q^1=\hat q_1^1,\hat q_2^1,…,\hat q_n^1\)，以及\(\hat q^2=\hat q_1^2,\hat q_2^2,…,\hat q_m^2\)，通过对affine matrix应用softmax进行逐行逐列的归一化得到attention系数。

\(q^1\)中的第j个单词，现在被表示为\(G[\hat q_j^1;\hat v_j]\)，其中\(\hat v_j\)是\(q^2\)的attention权重表示。同样地，两个问题中每个单词都得到了新的表示。通过对每个单词的表示进行相加，得到问题的表示，之后将两个问题的表示进行级联，输送到稠密的网络层，生成预测值。

3.2 Question Answer Ranking and Retrieval

针对一个问题，从众多答案中找到最重要的一个。

3.2.1 Okapi BM25 (Robertson et al., 1996)

Answers with significant token overlap with the question would be scored
higher. 由于答案和问题的token很少匹配，这个方法效果并不好。

3.2.2 TransLM (Xue et al., 2008)

该方法也可以应用于此类问题，给定一个问题q，一个答案池A。这个找到最优候选答案的problem可以被建模为\(a^*=argmax_{a\in A} P_{TransLM}(q|a)\)。

3.2.3 An embedding based CNN based method (Feng et al., 2015)

该方法使用CNN生成question embedding 和 answer embedding。给定问题\(q=q_1,…,q_n\)，以及答案\(a=a_1,…,a_m\)，生成矩阵\(\hat q = [\hat q_1,…, \hat q_n] \in R^{(dxn)}\)和\(\hat a =[\hat a_1,…,\hat a_m] \in R^{(dxm)}\)，d是词向量的维度。应用一个核大小为m的卷积，
(generating a \(R^{d×n−m+1}\) for the question)，接下来应用1-max pooling。CNN模块被question与answer所共享。

具体的训练方法与损失函数，详见论文。

3.2.4 An Attention based CNN/LSTM Method (Tan et al., 2015)

在之前工作的基础上，该作者尝试从两个正交方向上提升模型。Instead of using just word embeddings, they pass the question and the answer through an BiLSTM layer，能够对上下文进行编码。之后使用卷积层与最大池化层，能够更好地捕获长距离依赖 (the final state of the LSTM is somewhat limited by the dimension size for capturing the entire context)。模型结构如图所示。

另一个基于attention的思想，following a max pooling of the question, the resultant vector is used to attend over the answer vectors。A max pool layer is then used over the attention weighted answer vectors。最终得到的向量作为答案的表示，这使得他们能够在max pooling之前，根据上下文对答案的不同单词进行加权。

最终的模型结合了二者的思想，使用CNN生成question embedding，使用question embedding 生成 attention weights for the answer，使用注意力加权的answer作为CNN的输入生成最终的answer embedding。该模型使用max margin loss进行训练。

3.2.5 A Deep CNN Method (Qiu and Huang, 2015)

作者使用深层卷积网络生成question和answer的embedding。通过卷积最终得到的矩阵维度为\(R^{d\times (l_q-m+1)}\)，d是word embedding的维度，\({l_q}\)是question的长度，m是卷积核的大小。

为了使得卷积网络更深，该模型使用k-max pooling。找到k个最大的，最终得到子序列。

The embedding dimensions hence are independent of the length of the question after the k-max pool (the dimension of the matrix is \(R^{d\times k}\) after the first k-max pool).

转载于:https://www.cnblogs.com/szxspark/p/8424884.html