| Home | Trees | Indices | Help |
|
|---|
|
|
The NLTK corpus and module downloader. This module defines several
interfaces which can be used to download corpora, models, and other
data packages that can be used with NLTK.
Downloading Packages
====================
If called with no arguments, L{download() <Downloader.download>}
function will display an interactive interface which can be used to
download and install new packages. If Tkinter is available, then a
graphical interface will be shown; otherwise, a simple text interface
will be provided.
Individual packages can be downloaded by calling the C{download()}
function with a single argument, giving the package identifier for the
package that should be downloaded:
>>> download('treebank') # doctest: +SKIP
[nltk_data] Downloading package 'treebank'...
[nltk_data] Unzipping corpora/treebank.zip.
NLTK also provides a number of "package collections", consisting of
a group of related packages. To download all packages in a
colleciton, simply call C{download()} with the collection's
identifier:
>>> download('all-corpora') # doctest: +SKIP
[nltk_data] Downloading package 'abc'...
[nltk_data] Unzipping corpora/abc.zip.
[nltk_data] Downloading package 'alpino'...
[nltk_data] Unzipping corpora/alpino.zip.
...
[nltk_data] Downloading package 'words'...
[nltk_data] Unzipping corpora/words.zip.
Download Directory
==================
By default, packages are installed in either a system-wide directory
(if Python has sufficient access to write to it); or in the current
user's home directory. However, the C{download_dir} argument may be
used to specify a different installation target, if desired.
See L{Downloader.default_download_dir()} for more a detailed
description of how the default download directory is chosen.
NLTK Download Server
====================
Before downloading any packages, the corpus and module downloader
contacts the NLTK download server, to retrieve an index file
describing the available packages. By default, this index file is
loaded from C{<http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml>}.
If necessary, it is possible to create a new L{Downloader} object,
specifying a different URL for the package index file.
Usage::
python nltk/downloader.py [-d DATADIR] [-q] [-f] [-k] PACKAGE_IDS
or with py2.5+:
python -m nltk.downloader [-d DATADIR] [-q] [-f] [-k] PACKAGE_IDS
|
|||
|
Package A directory entry for a downloadable package. |
|||
|
Collection A directory entry for a collection of downloadable packages. |
|||
|
DownloaderMessage A status message object, used by incr_download to communicate its progress.
|
|||
|
StartCollectionMessage Data server has started working on a collection of packages. |
|||
|
FinishCollectionMessage Data server has finished working on a collection of packages. |
|||
|
StartPackageMessage Data server has started working on a package. |
|||
|
FinishPackageMessage Data server has finished working on a package. |
|||
|
StartDownloadMessage Data server has started downloading a package. |
|||
|
FinishDownloadMessage Data server has finished downloading a package. |
|||
|
StartUnzipMessage Data server has started unzipping a package. |
|||
|
FinishUnzipMessage Data server has finished unzipping a package. |
|||
|
UpToDateMessage The package download file is already up-to-date |
|||
|
StaleMessage The package download file is out-of-date or corrupt |
|||
|
ErrorMessage Data server encountered an error |
|||
|
ProgressMessage Indicates how much progress the data server has made |
|||
|
SelectDownloadDirMessage Indicates what download directory the data server is using |
|||
|
Downloader A class used to access the NLTK data server, which can be used to download corpora and other data packages. |
|||
| DownloaderShell | |||
|
DownloaderGUI Graphical interface for downloading packages from the NLTK data server. |
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
TKINTER = True
|
|||
_downloader = Downloader()
|
|||
|
|||
Calculate and return the MD5 checksum for a given file.
|
Create a new data.xml index file, by combining the xml description
files for various packages and collections.
root/
packages/ .................. subdirectory for packages
corpora/ ................. zip & xml files for corpora
grammars/ ................ zip & xml files for grammars
taggers/ ................. zip & xml files for taggers
tokenizers/ .............. zip & xml files for tokenizers
etc.
collections/ ............... xml files for collections
For each package, there should be two files:
For each collection, there should be a single file
All identifiers (for both packages and collections) must be unique. |
Helper for build_index(): Yield a list of tuples
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Apr 11 14:39:41 2011 | http://epydoc.sourceforge.net |