Position Independent Word Error Rate (PER) Measure
(Redirected from Position independent word Error Rate (PER))
Jump to navigation
Jump to search
A Position Independent Word Error Rate (PER) Measure is a Performance Metric that compares the words in the two sentences without taking the word order into account.
- Context:
- It can be used to benchmarking NLP systems such as:
- Example(s):
- $\mathrm{PER}=\displaystyle \dfrac{1}{N_{\text {ref }}^{*}} \sum_{k=1}^{K} \min _{r} d_{\mathrm{PER}}\left(r e f_{k, r}, h y p_{k}\right)$ as described in Popovic & Ney (2007),
- …
- Counter-Example(s):
- See: Recall, Information Retrieval, Speech Recognition, Machine Translation, Levenshtein Distance, Phoneme.
References
2007
- (Popovic & Ney, 2007) ⇒ Maja Popovic, and Hermann Ney. (2007). “Word Error Rates: Decomposition over POS Classes and Applications for Error Analysis.” In: Proceedings of the Second Workshop on Statistical Machine Translation (WMT@ACL 2007).
- QUOTE: The word error rate (WER) is based on the Levenshtein distance (Levenshtein, 1966) - the minimum number of substitutions, deletions and insertions that have to be performed to convert the generated text $hyp$ into the reference text $ref$. A shortcoming of the WER is the fact that it does not allow reorderings of words, whereas the word order of the hypothesis can be different from word order of the reference even though it is correct translation. In order to overcome this problem, the position independent word error rate (PER) compares the words in the two sentences without taking the word order into account. The PER is always lower than or equal to the WER. On the other hand, shortcoming of the PER is the fact that the word order can be important in some cases. Therefore the best solution is to calculate both word error rates. (...)Calculation of PER: The PER can be calculated using the counts $n\left(e, hyp_k\right)$ and $n\left(e, ref_{k,r}\right)$ of a word $e$ in the hypothesis sentence $hyp_k$ and the reference sentence $ref_{k,r}$ respectively:
- QUOTE: The word error rate (WER) is based on the Levenshtein distance (Levenshtein, 1966) - the minimum number of substitutions, deletions and insertions that have to be performed to convert the generated text $hyp$ into the reference text $ref$. A shortcoming of the WER is the fact that it does not allow reorderings of words, whereas the word order of the hypothesis can be different from word order of the reference even though it is correct translation. In order to overcome this problem, the position independent word error rate (PER) compares the words in the two sentences without taking the word order into account. The PER is always lower than or equal to the WER. On the other hand, shortcoming of the PER is the fact that the word order can be important in some cases. Therefore the best solution is to calculate both word error rates.
$\mathrm{PER}=\displaystyle \dfrac{1}{N_{\text {ref }}^{*}} \sum_{k=1}^{K} \min _{r} d_{\mathrm{PER}}\left(r e f_{k, r}, h y p_{k}\right)$ |
(1) |
$d_{\mathrm{PER}}\left(ref_{k, r}, hyp_{k}\right) = \displaystyle \dfrac{1}{2}\left(\left|N_{r e f_{k, r}}-N_{h y p_{k}}\right|+\sum_{e}\left|n\left(e, r e f_{k, r}\right)-n\left(e, hyp_{k}\right)\right|\right)$