Package nltk :: Module downloader :: Class Downloader
[hide private]
[frames] | no frames]

type Downloader

source code

object --+
         |
        Downloader

A class used to access the NLTK data server, which can be used to download corpora and other data packages.

Instance Methods [hide private]
 
__init__(self, server_index_url=None, download_dir=None) source code
 
list(self, download_dir=None, show_packages=True, show_collections=True, header=True, more_prompt=False, skip_installed=False) source code
 
packages(self) source code
 
corpora(self) source code
 
models(self) source code
 
collections(self) source code
 
_info_or_id(self, info_or_id) source code
 
incr_download(self, info_or_id, download_dir=None, force=False) source code
 
_num_packages(self, item) source code
 
_download_list(self, items, download_dir, force) source code
 
_download_package(self, info, download_dir, force) source code
 
download(self, info_or_id=None, download_dir=None, quiet=False, force=False, prefix='[nltk_data] ', halt_on_error=True, raise_on_error=False) source code
 
is_stale(self, info_or_id, download_dir=None) source code
 
is_installed(self, info_or_id, download_dir=None) source code
 
clear_status_cache(self, id=None) source code
 
status(self, info_or_id, download_dir=None)
Return a constant describing the status of the given package or collection.
source code
 
_pkg_status(self, info, filepath) source code
 
update(self, quiet=False, prefix='[nltk_data] ')
Re-download any packages whose status is STALE.
source code
 
_update_index(self, url=None)
A helper function that ensures that self._index is up-to-date.
source code
 
index(self)
Return the XML index describing the packages available from the data server.
source code
 
info(self, id)
Return the Package or Collection record for the given item.
source code
 
xmlinfo(self, id)
Return the XML info record for the given item
source code
 
_set_url(self, url) source code
 
default_download_dir(self)
Return the directory to which packages will be downloaded by default.
source code
 
_set_download_dir(self, download_dir) source code
 
_interactive_download(self) source code
Class Variables [hide private]
  INDEX_TIMEOUT = 3600
The amount of time after which the cached copy of the data server index will be considered 'stale,' and will be re-downloaded.
  DEFAULT_URL = 'http://nltk.googlecode.com/svn/trunk/nltk_data/...
The default URL for the NLTK data server's index.
  INSTALLED = 'installed'
A status string indicating that a package or collection is installed and up-to-date.
  NOT_INSTALLED = 'not installed'
A status string indicating that a package or collection is not installed.
  STALE = 'out of date'
A status string indicating that a package or collection is corrupt or out-of-date.
  PARTIAL = 'partial'
A status string indicating that a collection is partially installed (i.e., only some of its packages are installed.)
Instance Variables [hide private]
  _url
The URL for the data server's index file.
  _collections
Dictionary from collection identifier to Collection
  _packages
Dictionary from package identifier to Package
  _download_dir
The default directory to which packages will be downloaded.
  _index
The XML index file downloaded from the data server
  _index_timestamp
Time at which self._index was downloaded.
  _status_cache
Dictionary from package/collection identifier to status string (INSTALLED, NOT_INSTALLED, STALE, or PARTIAL).
  _errors
Flag for telling if all packages got successfully downloaded or not.
Properties [hide private]
  url
The URL for the data server's index file.
  download_dir
The default directory to which packages will be downloaded.
Method Details [hide private]

__init__(self, server_index_url=None, download_dir=None)
(Constructor)

source code 
Overrides: object.__init__
(inherited documentation)

status(self, info_or_id, download_dir=None)

source code 

Return a constant describing the status of the given package or collection. Status can be one of INSTALLED, NOT_INSTALLED, STALE, or PARTIAL.

_update_index(self, url=None)

source code 

A helper function that ensures that self._index is up-to-date. If the index is older than self.INDEX_TIMEOUT, then download it again.

index(self)

source code 

Return the XML index describing the packages available from the data server. If necessary, this index will be downloaded from the data server.

default_download_dir(self)

source code 

Return the directory to which packages will be downloaded by default. This value can be overridden using the constructor, or on a case-by-case basis using the download_dir argument when calling download().

On Windows, the default download directory is PYTHONHOME/lib/nltk, where PYTHONHOME is the directory containing Python (e.g. C:\Python25).

On all other platforms, the default directory is determined as follows:

  • If /usr/share exists and is writable, then return /usr/share/nltk
  • If /usr/local/share exists and is writable, then return /usr/local/share/nltk
  • If /usr/lib exists and is writable, then return /usr/lib/nltk
  • If /usr/local/lib exists and is writable, then return /usr/local/lib/nltk
  • Otherwise, return ~/nltk_data, where ~ is the current user's home directory.

Class Variable Details [hide private]

DEFAULT_URL

The default URL for the NLTK data server's index. An alternative URL can be specified when creating a new Downloader object.

Value:
'http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml'

Instance Variable Details [hide private]

_index_timestamp

Time at which self._index was downloaded. If it is more than INDEX_TIMEOUT seconds old, it will be re-downloaded.

_status_cache

Dictionary from package/collection identifier to status string (INSTALLED, NOT_INSTALLED, STALE, or PARTIAL). Cache is used for packages only, not collections.


Property Details [hide private]

url

The URL for the data server's index file.

Set Method:
_set_url(self, url)

download_dir

The default directory to which packages will be downloaded. This defaults to the value returned by default_download_dir(). To override this default on a case-by-case basis, use the download_dir argument when calling download().

Set Method:
_set_download_dir(self, download_dir)