Python-如何下载NLTK数据?
更新答案:NLTK适用于2.7。我有3.2。我卸载了3.2,然后安装了2.7。现在可以了!!
我已经安装了NLTK并尝试下载NLTK数据。我所做的就是遵循此站点上的说明:http ://www.nltk.org/data.html
我下载了NLTK,进行了安装,然后尝试运行以下代码:
>>> import nltk>>> nltk.download()
它给了我如下错误信息:
Traceback (most recent call last): File "<pyshell#6>", line 1, in <module>
nltk.download()
AttributeError: 'module' object has no attribute 'download'
Directory of C:\Python32\Lib\site-packages
尝试了nltk.download()
和nltk.downloader()
,都给了我错误消息。
然后我习惯于help(nltk)
拉出包装,它显示以下信息:
NAME nltk
PACKAGE CONTENTS
align
app (package)
book
ccg (package)
chat (package)
chunk (package)
classify (package)
cluster (package)
collocations
corpus (package)
data
decorators
downloader
draw (package)
examples (package)
featstruct
grammar
help
inference (package)
internals
lazyimport
metrics (package)
misc (package)
model (package)
parse (package)
probability
sem (package)
sourcedstring
stem (package)
tag (package)
test (package)
text
tokenize (package)
toolbox
tree
treetransforms
util
yamltags
FILE
c:\python32\lib\site-packages\nltk
我确实在那儿看到了Downloader,不确定为什么它不起作用。Python 3.2.2,系统Windows Vista。
回答:
要下载特定的数据集/模型,请使用nltk.download()
函数,例如,如果你要下载punkt
句子标记器,请使用:
$ python3>>> import nltk
>>> nltk.download('punkt')
如果不确定所需的数据/模型,则可以使用以下数据和模型的基本列表开始:
>>> import nltk>>> nltk.download('popular')
它将下载“流行”资源的列表,其中包括:
<collection id="popular" name="Popular packages"> <item ref="cmudict" />
<item ref="gazetteers" />
<item ref="genesis" />
<item ref="gutenberg" />
<item ref="inaugural" />
<item ref="movie_reviews" />
<item ref="names" />
<item ref="shakespeare" />
<item ref="stopwords" />
<item ref="treebank" />
<item ref="twitter_samples" />
<item ref="omw" />
<item ref="wordnet" />
<item ref="wordnet_ic" />
<item ref="words" />
<item ref="maxent_ne_chunker" />
<item ref="punkt" />
<item ref="snowball_data" />
<item ref="averaged_perceptron_tagger" />
</collection>
已编辑
如果有人避免nltk从https://stackoverflow.com/a/38135306/610569上从下载较大的数据集而避免错误
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
更新
从v3.2.5起,当nltk_data
找不到资源时,NLTK会提供更多信息,例如:
>>> from nltk import word_tokenize>>> word_tokenize('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
opened_resource = _open(resource_url)
File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
return find(path_, path + ['']).open()
File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Searched in:
- '/Users/alvas/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
以上是 Python-如何下载NLTK数据? 的全部内容, 来源链接: utcz.com/qa/434683.html