python关键词_pke是一个基于python的开源关键词(Keyphrase)提取工具包

pke - python keyphrase extractionpke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified

weixin_39530839

694人浏览 · 2020-11-23 18:31:16

weixin_39530839 · 2020-11-23 18:31:16 发布

pke - python keyphrase extraction

pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction models, and ships with supervised models trained on the SemEval-2010 dataset.

68747470733a2f2f7472617669732d63692e6f72672f626f7564696e666c2f706b652e7376673f6272616e63683d6d6173746572

Table of Contents

Installation

To pip install pke from github:

pip install git+https://github.com/boudinfl/pke.git

Minimal example

pke provides a standardized API for extracting keyphrases from a document. Start by typing the 5 lines below. For using another model, simply replace pke.unsupervised.TopicRank with another model (list of implemented models).

import pke

# initialize keyphrase extraction model, here TopicRank

extractor = pke.unsupervised.TopicRank()

# load the content of the document, here document is expected to be in raw

# format (i.e. a simple text file) and preprocessing is carried out using spacy

extractor.load_document(input='/path/to/input.txt', language='en')

# keyphrase candidate selection, in the case of TopicRank: sequences of nouns

# and adjectives (i.e. `(Noun|Adj)*`)

extractor.candidate_selection()

# candidate weighting, in the case of TopicRank: using a random walk algorithm

extractor.candidate_weighting()

# N-best selection, keyphrases contains the 10 highest scored candidates as