Similar Sentences Corpus
Jump to navigation
Jump to search
A Similar Sentences Corpus is a sentences corpus that consists of similar sentences.
- Context:
- It can be used to train Sentence Similarity Measures.
- It can provide a ground truth for measuring the effectiveness of sentence similarity algorithms, enabling the comparison of algorithm performance.
- It can include sentences from various domains, including legal texts, to support tasks like contract analysis or legal document review.
- It can (often) contain annotations made by human experts, ensuring the quality and reliability of the similarity measures.
- It can support the development of NLP applications in specific domains, such as legal, medical, or technical fields, by providing domain-specific examples of sentence similarity.
- ...
- Example(s):
- A Legal Similar Sentences Corpus, such as: a Similar Contract Sentences Corpus.
- A Medical Similar Sentences Corpus.
- ...
- Counter-Example(s):
- A General Text Corpus without any annotations for sentence similarity or semantic relatedness.
- A Document Classification Corpus, primarily used for categorizing whole documents rather than assessing sentence-level similarity.
- See: Sentence Embedding, Semantic Similarity Measure, Natural Language Processing, Contract Sentence Meaning Similarity Measure, Corpus Annotation.