pke - python keyphrase extraction

pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction models, and ships with supervised models trained on the SemEval-2010 dataset.

68747470733a2f2f7472617669732d63692e6f72672f626f7564696e666c2f706b652e7376673f6272616e63683d6d6173746572

Table of Contents

Installation

To pip install pke from github:

pip install git+https://github.com/boudinfl/pke.git

Minimal example

pke provides a standardized API for extracting keyphrases from a document. Start by typing the 5 lines below. For using another model, simply replace pke.unsupervised.TopicRank with another model (list of implemented models).

import pke

# initialize keyphrase extraction model, here TopicRank

extractor = pke.unsupervised.TopicRank()

# load the content of the document, here document is expected to be in raw

# format (i.e. a simple text file) and preprocessing is carried out using spacy

extractor.load_document(input='/path/to/input.txt', language='en')

# keyphrase candidate selection, in the case of TopicRank: sequences of nouns

# and adjectives (i.e. `(Noun|Adj)*`)

extractor.candidate_selection()

# candidate weighting, in the case of TopicRank: using a random walk algorithm

extractor.candidate_weighting()

# N-best selection, keyphrases contains the 10 highest scored candidates as

# (keyphrase, score) tuples

keyphrases = extractor.get_n_best(n=10)

A detailed example is provided in the examples/ directory.

Getting started

Tutorials and code documentation are available at https://boudinfl.github.io/pke/.

Implemented models

pke currently implements the following keyphrase extraction models:

Citing pke

If you use pke, please cite the following paper:

@InProceedings{boudin:2016:COLINGDEMO,

author = {Boudin, Florian},

title = {pke: an open source python-based keyphrase extraction toolkit},

booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},

month = {December},

year = {2016},

address = {Osaka, Japan},

pages = {69--73},

url = {http://aclweb.org/anthology/C16-2015}

}

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐