filecache Module

filecache.file_cache.register_filecachesource(cls)[source]

Register one or more URL FileCacheSource subclasses as URL schemes.

Parameters:

cls (type[FileCacheSource])

Return type:

None

filecache.file_cache.set_global_logger(logger)[source]

Set the global logger for all FileCache instances that don’t specify one.

Parameters:

logger (Logger | None)

Return type:

None

filecache.file_cache.set_easy_logger()[source]

Set a default logger that outputs all messages to stdout.

Return type:

None

filecache.file_cache.get_global_logger()[source]

Return the current global logger.

Return type:

Logger | None

class filecache.file_cache.FileCache(cache_name='global', *, cache_root=None, delete_on_exit=None, time_sensitive=False, cache_metadata=False, mp_safe=None, anonymous=False, lock_timeout=60, nthreads=8, url_to_url=None, url_to_path=None, logger=None)[source]

Bases: object

Class which manages the lifecycle of files from various sources.

Parameters:
  • cache_name (Optional[str])

  • cache_root (Optional[Path | str])

  • delete_on_exit (Optional[bool])

  • time_sensitive (bool)

  • cache_metadata (bool)

  • mp_safe (Optional[bool])

  • anonymous (bool)

  • lock_timeout (int)

  • nthreads (int)

  • url_to_url (Optional[UrlToUrlFuncOrSeqType])

  • url_to_path (Optional[UrlToPathFuncOrSeqType])

  • logger (Optional[Logger | bool])

__init__(cache_name='global', *, cache_root=None, delete_on_exit=None, time_sensitive=False, cache_metadata=False, mp_safe=None, anonymous=False, lock_timeout=60, nthreads=8, url_to_url=None, url_to_path=None, logger=None)[source]

Initialization for the FileCache class.

Parameters:
  • cache_name (str | None) – By default, the file cache will be stored in the subdirectory _filecache_global under the cache_root directory. If a name is specified explicitly, the file cache will be stored in the subdirectory _filecache_<cache_name>. Explicitly naming a cache is useful if other programs will want to access the same cache, or if you want the directory name to be obvious to users browsing the file system. Using a cache name (including the default global) implies that this cache should be persistent on exit. If you pass in None, the cache will instead be stored in a uniquely-named subdirectory with the prefix _filecache_ and by default will be deleted on exit.

  • cache_root (Path | str | None) – The directory in which to place caches. By default, FileCache uses the contents of the environment variable FILECACHE_CACHE_ROOT; if not set, then the system temporary directory is used, which involves checking the environment variables TMPDIR, TEMP, and TMP, and if none of those are set then using C:\TEMP, C:\TMP, \TEMP, or \TMP on Windows and /tmp, /var/tmp, or /usr/tmp on other platforms. The cache will be stored in a sub-directory within this directory (see cache_name). If cache_root is specified but the directory does not exist, it is created.

  • delete_on_exit (bool | None) – If True, the cache directory and its contents are always deleted on program exit or exit from a FileCache context manager. If False, the cache is never deleted. By default, an unnamed cache (cache_name is None) will be deleted on exit and a named cache will not be deleted on program exit.

  • time_sensitive (bool) – If True, the modification time of files in the cache is considered to be important. When a file is retrieved, the modification time from the source location is set on the local copy. If a local copy already exists, the times on both copies are compared and the local copy is updated if the source is newer. When a file is uploaded, the modification time on the local copy is set to the time retrieved from the source after the upload is complete.

  • cache_metadata (bool) – If True, iterdir(), iterdir_metadata(), and other internal methods will cache the metadata (such as modification time, size, and is_dir) of remote files. If time_sensitive is True and retrieve() needs the modification time of a file to compare to the local file, it will be retrieved from the cache if possible to save a server query. This option should only be used if the remote source is guaranteed not to change during the lifetime of this FileCache instance.

  • mp_safe (bool | None) – If False, never use multiprocessor-safe locking. If True, always use multiprocessor-safe locking. By default, locking is used if cache_name is specified, as it is assumed that multiple processes will be using the named cache simultaneously. If multiple processes will not be using the cache simultaneously, a small performance boost can be realized by setting mp_safe explicitly to False.

  • anonymous (bool) – The default value for anonymous access to cloud resources. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

  • lock_timeout (int) – The default value for lock timeouts. This is how long to wait, in seconds, if another process is marked as retrieving a file before raising an exception. 0 means to not wait at all. A negative value means to never time out.

  • nthreads (int) – The default value for the maximum number of threads to use when doing multiple-file retrieval, upload, or other file operations.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The default function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The default function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

  • logger (Logger | bool | None) – If False, do not do any logging. If None, use the global logger set with set_global_logger(). Otherwise use the specified logger.

Notes

FileCache can be used as a context, such as:

with FileCache(cache_name=None) as fc:
    ...

In this case, the cache directory is created on entry to the context and deleted on exit. However, if the cache is named, the directory will not be deleted on exit unless the delete_on_exit=True option is used.

classmethod registered_scheme_prefixes()[source]
Return type:

tuple[str, …]

property cache_dir: Path

The top-level directory of the cache as a Path object.

property download_counter: int

The number of actual file downloads that have taken place.

property upload_counter: int

The number of actual file uploads that have taken place.

property is_delete_on_exit: bool

A bool indicating whether this FileCache will be deleted on exit.

property is_time_sensitive: bool

A bool indicating whether this FileCache cares about modification times.

property is_cache_metadata: bool

A bool indicating whether this FileCache caches metadata.

property is_mp_safe: bool

A bool indicating whether this FileCache is multi-processor safe.

property is_anonymous: bool

The default bool indicating whether to make all cloud accesses anonymous.

property lock_timeout: int

The default timeout in seconds while waiting for a file lock.

property nthreads: int

The default number of threads to use for multiple-file operations.

property url_to_url: Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...]

The default function(s) that is used to translate URLs into URLs.

property url_to_path: Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...]

The default function(s) that is used to translate URLs into paths.

property logger: Logger | None

The logger to use for this FileCache.

__repr__()[source]

Return repr(self).

Return type:

str

__str__()[source]

Return str(self).

Return type:

str

get_local_path(url, *, anonymous=None, create_parents=True, url_to_url=None, url_to_path=None)[source]

Return the local path for the given url.

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are processed.

  • anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this FileCache instance.

  • create_parents (bool) – If True, create all parent directories. This is useful when getting the local path of a file that will be uploaded.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Returns:

The Path (or list of Paths) of the filename in the temporary directory, or as specified by the url_to_path translators. The files do not have to exist because a Path could be used for writing a file to upload. To facilitate this, a side effect of this call (if create_parents is True) is that the complete parent directory structure will be created for each returned Path.

Return type:

Path | list[Path]

exists(url, *, bypass_cache=False, anonymous=None, nthreads=None, url_to_url=None, url_to_path=None)[source]

Check if a file exists without downloading it.

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are checked. This may be more efficient because files can be checked in parallel. It is OK to check files from multiple sources using one call.

  • bypass_cache (bool) – If False, check for the file first in the local cache, and if not found there then on the remote server. If True, only check on the remote server.

  • anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

  • nthreads (int | None) – The maximum number of threads to use. If None, use the default value for this FileCache instance.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Returns:

True if the file exists (note that it is possible that a file could exist and still not be downloadable due to permissions). False if the file does not exist. This includes bad bucket or webserver names, lack of permission to examine a bucket’s contents, etc. If url was a list or tuple, then instead return a list of bools giving the existence of each url in order.

Return type:

bool | list[bool]

modification_time(url, *, bypass_cache=False, anonymous=None, nthreads=None, exception_on_fail=True, url_to_url=None)[source]

Get the modification time of a remote file as a Unix timestamp.

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are checked. This may be more efficient because files can be checked in parallel. It is OK to check files from multiple sources using one call.

  • bypass_cache (bool) – If False, retrieve the modification time for the file first from the metadata cache, if enabled, and if not found there then from the remote server. If True, only retrieve the modification time directly from the remote server.

  • anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

  • nthreads (int | None) – The maximum number of threads to use. If None, use the default value for this FileCache instance.

  • exception_on_fail (bool) – If True, if any file does not exist a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure in place of the returned modification time.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Returns:

The modification time as a Unix timestamp if the file exists and the time can be retrieved, None otherwise. If url was a list or tuple, then instead return a list of modification times in order. This always returns the modification time of the file on the remote source, even if there is a local copy. If you want the modification time of the local copy, you can call the normal stat function. If cache_metadata is True, the modification time is retrieved from the cache if possible to save a server query. If exception_on_fail is False, any modification time may be an Exception if that file does not exist or the modification time cannot be retrieved.

Raises:

FileNotFoundError – If a file does not exist.

Return type:

float | None | Exception | list[float | None | Exception]

is_dir(url, *, anonymous=None, nthreads=None, exception_on_fail=True, url_to_url=None)[source]

Check if a URL represents a directory.

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the directory, including any source prefix. If url is a list or tuple, all URLs are checked. This may be more efficient because URLs can be checked in parallel. It is OK to check URLs from multiple sources using one call.

  • anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

  • nthreads (int | None) – The maximum number of threads to use. If None, use the default value for this FileCache instance.

  • exception_on_fail (bool) – If True, if any URL cannot be checked a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure in place of the returned boolean.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Returns:

True if the URL represents a directory, False otherwise. If url was a list or tuple, then instead return a list of booleans or exceptions in order. If exception_on_fail is False, any result may be an Exception if that URL cannot be checked.

Raises:

FileNotFoundError – If a URL cannot be checked.

Return type:

bool | Exception | list[bool | Exception]

Notes

Unlike os.path.isdir or pathlib.Path.is_dir`, this method raises an exception if the URL does not exist instead of returning False. This is so that remote connection errors are not masked by the return value. Contrast this with the return value of FileCache.exists(), which will return False if the file does not exist or cannot be accessed.

retrieve(url, *, anonymous=None, lock_timeout=None, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]

Retrieve file(s) from the given location(s) and store in the file cache.

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are retrieved. This may be more efficient because files can be downloaded in parallel. It is OK to retrieve files from multiple sources using one call.

  • anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this FileCache instance.

  • lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value for this FileCache instance.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this FileCache instance.

  • exception_on_fail (bool) – If True, if any file does not exist or download fails a FileNotFound exception is raised, and if any attempt to acquire a lock or wait for another process to download a file fails a TimeoutError is raised. If False, the function returns normally and any failed download is marked with the Exception that caused the failure in place of the returned Path.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Returns:

The Path of the filename in the temporary directory (or the original absolute path if local). If url was a list or tuple, then instead return a list of Paths of the filenames in the temporary directory (or the original absolute path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the download failed or a timeout occurred.

Raises:
  • FileNotFoundError – If a file does not exist or could not be downloaded, and exception_on_fail is True. Also if time_sensitive is True and the modification time of the remote file can not be determined because a locally cached file has been deleted on the remote source.

  • TimeoutError – If we could not acquire the lock to allow downloading of a file within the given timeout or, for a multi-file download, if we timed out waiting for other processes to download locked files, and exception_on_fail is True.

Return type:

Path | Exception | list[Path | Exception]

Notes

File download is normally an atomic operation; a program will never see a partially-downloaded file, and if a download is interrupted there will be no file present. However, when downloading multiple files at the same time, as many files as possible are downloaded before an exception is raised.

upload(url, *, anonymous=None, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]

Upload file(s) from the file cache to the storage location(s).

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, the complete list of files is uploaded. This may be more efficient because files can be uploaded in parallel.

  • anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this FileCache instance.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this FileCache instance.

  • exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Returns:

The Path of the filename in the cache directory (or the original absolute path if local). If url was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original full path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the upload failed.

Raises:

FileNotFoundError – If a file to upload does not exist or the upload failed, and exception_on_fail is True.

Return type:

Path | Exception | list[Path | Exception]

Notes

If time_sensitive is True for this FileCache instance, then the modification time of the local file is set to the modification time of the remote file after the upload is complete. If time_sensitive is False, then the modification time of the local file is not changed.

open(url, mode='r', *args, anonymous=None, lock_timeout=None, url_to_url=None, url_to_path=None, **kwargs)[source]

Retrieve+open or open+upload a file as a context manager.

If mode is a read mode (like 'r' or 'rb') then the file will be first retrieved by calling retrieve() and then opened. If the mode is a write mode (like 'w' or 'wb') then the file will be first opened for write, and when this context manager is exited the file will be uploaded.

Parameters:
  • url (str | Path) – The filename to open.

  • mode (str) – The mode string as you would specify to Python’s open() function.

  • **args (Any) – Any additional arguments are passed to the Python open() function.

  • anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this FileCache instance.

  • lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. If None, use the default value for this FileCache instance.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • **kwargs (Any) – Any additional arguments are passed to the Python open() function.

Returns:

The same object as would be returned by the normal open() function.

Return type:

Iterator[IO[Any]]

iterdir(url, *, anonymous=None, url_to_url=None)[source]

Enumerate the files and sub-directories in a directory.

This function always accesses a remote location (ignoring the local cache), if appropriate, because there is no way to know if the local cache contains all of the files and sub-directories present in the remote.

Parameters:
  • url (str | Path) – The URL of the directory, including any source prefix.

  • anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Yields:

All files and sub-directories in the directory given by the url, in no particular order. Special directories . and .. are ignored.

Return type:

Iterator[str]

iterdir_metadata(url, *, anonymous=None, url_to_url=None)[source]

Enumerate the files and sub-dirs in a directory indicating which is a dir.

This function always accesses a remote location (ignoring the local cache), if appropriate, because there is no way to know if the local cache contains all of the files and sub-directories present in the remote.

Parameters:
  • url (str | Path) – The URL of the directory, including any source prefix.

  • anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • date: The last modification date of the file as a UNIX timestamp.

  • size: The approximate size of the file in bytes.

If the metadata can not be retrieved, None is returned for the metadata.

Return type:

Iterator[tuple[str, dict[str, Any] | None]]

Remove a file, including any locally cached copy.

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are unlinked.

  • missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.

  • anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this FileCache instance.

  • exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If this parameter is specified, it replaces the default translators for this FileCache instance. If this parameter is omitted, the default translators are used.

Returns:

The Path of the filename in the cache directory (or the original absolute path if local). If url was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original full path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist and missing_ok is True.

Return type:

str | Exception | list[str | Exception]

Notes

If a URL points to a remote location, the locally cached version (if any) is only removed if the unlink of the remote location succeeded.

Raises:

FileNotFoundError – If a file to unlink does not exist or the unlink failed, and exception_on_fail is True.

Parameters:
  • url (str | Path | list[str | Path] | tuple[str | Path, ...])

  • missing_ok (bool)

  • anonymous (bool | None)

  • nthreads (int | None)

  • exception_on_fail (bool)

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None)

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None)

Return type:

str | Exception | list[str | Exception]

new_path(path, *, anonymous=None, lock_timeout=None, nthreads=None, url_to_url=None, url_to_path=None)[source]

Create a new FCPath with the given prefix.

Parameters:
  • path (str | Path | FCPath) – The path.

  • anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this FileCache instance.

  • lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value for this FileCache instance.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this FileCache instance.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for this FileCache instance.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default translators for this FileCache instance.

Return type:

FCPath

delete_cache()[source]

Delete all files stored in the cache including the cache directory.

Notes

It is permissible to call delete_cache() more than once. It is also permissible to call delete_cache(), then perform more operations that place files in the cache, then call delete_cache() again.

Return type:

None

class filecache.file_cache_path.FCPath(*paths, filecache=None, anonymous=None, lock_timeout=None, nthreads=None, url_to_url=None, url_to_path=None, copy_from=None)[source]

Bases: object

Rewrite of the Python pathlib.Path class that supports URLs and FileCache.

This class provides a simpler way to abstract away remote access in a FileCache by emulating the Python pathlib.Path class. At the same time, it can collect common parameters (anonymous, lock_timeout, nthreads) into a single location so that they do not have to be specified on every method call.

Parameters:
  • paths (str | Path | FCPath | None)

  • filecache (Optional['FileCache'])

  • anonymous (Optional[bool])

  • lock_timeout (Optional[int])

  • nthreads (Optional[int])

  • url_to_url (Optional[UrlToUrlFuncOrSeqType])

  • url_to_path (Optional[UrlToPathFuncOrSeqType])

  • copy_from (Optional[FCPath])

__init__(*paths, filecache=None, anonymous=None, lock_timeout=None, nthreads=None, url_to_url=None, url_to_path=None, copy_from=None)[source]

Initialization for the FCPath class.

Parameters:
  • paths (str | Path | FCPath | None) – The path(s). These may be absolute or relative paths. They are joined together to form a final path. File operations can only be performed on absolute paths.

  • file_cache – The FileCache in which to store files retrieved from this path. If not specified, the default global FileCache will be used.

  • anonymous (Optional[bool]) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for the associated FileCache instance.

  • lock_timeout (Optional[int]) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value for the associated FileCache instance.

  • nthreads (Optional[int]) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for the associated FileCache instance.

  • url_to_url (Optional[UrlToUrlFuncOrSeqType]) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

  • url_to_path (Optional[UrlToPathFuncOrSeqType]) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default translators for the associated FileCache instance.

  • copy_from (Optional[FCPath]) – An FCPath instance to copy internal parameters (file_cache, anonymous, lock_timeout, nthreads, url_to_url, and url_to_path) from. If specified, any values for these parameters in this constructor are ignored. Used internally and should not be used by external programmers.

  • filecache (Optional['FileCache'])

__str__()[source]

Return str(self).

Return type:

str

property path: str

Return this path as a string.

as_pathlib()[source]

Return this path as a pathlib Path object.

Return type:

Path

as_posix()[source]

Return this FCPath as a POSIX path. This is a str using only forward slashes.

Notes

Because URLs are not really supported in POSIX format, we just return the URL as-is, including any scheme and remote.

Returns:

This path as a POSIX path.

Return type:

str

property drive: str

The drive associated with this FCPath.

Notes

Examples:

For a Windows path: ‘’ or ‘C:’

For a UNC share: ‘//host/share’

For a cloud resource: ‘gs://bucket’

property root: str

The root of this FCPath; ‘/’ if absolute, ‘’ otherwise.

property anchor: str

The anchor of this FCPath, which is drive + root.

property suffix: str

The final component’s last suffix, if any, including the leading period.

property suffixes: list[str]

A list of the final component’s suffixes, including the leading periods.

property stem: str

The final path component, minus its last suffix.

with_name(name)[source]

Return a new FCPath with the filename changed.

Parameters:

name (str) – The new filename to replace the final path component with.

Returns:

A new FCPath with the final component replaced. The new FCPath will have the same parameters (filecache, etc.) as the source FCPath.

Return type:

FCPath

with_stem(stem)[source]

Return a new FCPath with the stem (the filename minus the suffix) changed.

Parameters:

stem (str) – The new stem.

Returns:

A new FCPath with the final component’s stem replaced. The new FCPath will have the same parameters (filecache, etc.) as the source FCPath.

Return type:

FCPath

with_suffix(suffix)[source]

Return a new FCPath with the file suffix changed.

If the path has no suffix, add the given suffix. If the given suffix is an empty string, remove the suffix from the path.

Parameters:

suffix (str) – The new suffix to use.

Returns:

A new FCPath with the final component’s suffix replaced. The new FCPath will have the same parameters (filecache, etc.) as the source FCPath.

Return type:

FCPath

property parts: tuple[str, ...]

An object providing sequence-like access to the components in the path.

joinpath(*pathsegments)[source]

Combine this path with additional paths.

Parameters:

pathsegments (str | Path | FCPath | None) – One or more additional paths to join with this path.

Returns:

A new FCPath that is a combination of this path and the additional paths. The new FCPath will have the same parameters (filecache, etc.) as the source FCPath.

Return type:

FCPath

__truediv__(other)[source]

Combine this path with an additional path.

Parameters:

other (str | Path | FCPath | None) – The path to join with this path.

Returns:

A new FCPath that is a combination of this path and the other path. The new FCPath will have the same parameters (filecache, etc.) as the current FCPath.

Return type:

FCPath

__rtruediv__(other)[source]

Combine an additional path with this path.

Parameters:

other (str | Path | FCPath) – The path to join with this path.

Returns:

A new FCPath that is a combination of the other path and this path. The new FCPath will have the same parameters (filecache, etc.) as the other path if the other path is an FCPath; otherwise it will have the same parameters as the current FCPath.

Return type:

FCPath

splitpath(search_dir)[source]

Split the path into a list of FCPaths at each occurrence of search_dir.

Parameters:

search_dir (str) – The directory to search for.

Returns:

A tuple of FCPaths, each of which is a segment of the path between instances of search_dir, not including the search_dir itself.

Return type:

tuple[FCPath, …]

__repr__()[source]

Return repr(self).

Return type:

str

__eq__(other)[source]

Return self==value.

Parameters:

other (object)

Return type:

bool

__lt__(other)[source]

Return self<value.

Parameters:

other (object)

Return type:

bool

__le__(other)[source]

Return self<=value.

Parameters:

other (object)

Return type:

bool

__gt__(other)[source]

Return self>value.

Parameters:

other (object)

Return type:

bool

__ge__(other)[source]

Return self>=value.

Parameters:

other (object)

Return type:

bool

property name: str

The final component of the path.

property parent: FCPath

The logical parent of the path.

The new FCPath will have the same parameters (filecache, etc.) as the original path.

property parents: tuple[FCPath, ...]

A sequence of this path’s logical parents.

is_absolute()[source]

True if the path is absolute.

Return type:

bool

as_absolute()[source]

Return the absolute version of this possibly-relative path.

Return type:

FCPath

match(path_pattern)[source]

Return True if this path matches the given pattern.

If the pattern is relative, matching is done from the right; otherwise, the entire path is matched. The recursive wildcard ** is not supported by this method (it just acts like *).

See pathlib.Path.match for full documentation.

Parameters:

path_pattern (str | Path | FCPath)

Return type:

bool

full_match(pattern)[source]

Return True if this path matches the given glob-style pattern.

The pattern is matched against the entire path.

See pathlib.Path.full_match for full documentation.

Parameters:

pattern (str | Path | FCPath)

Return type:

bool

property filecache: FileCache

The FileCache associated with this path.

get_local_path(sub_path=None, *, create_parents=True, url_to_url=None, url_to_path=None)[source]

Return the local path for the given sub_path relative to this path.

Parameters:
  • sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, all paths are processed. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.

  • create_parents (bool) – If True, create all parent directories. This is useful when getting the local path of a file that will be uploaded.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default value given when this FCPath was created.

Returns:

The Path (or list of Paths) of the URL (possibly as mapped by the url_to_url translators) in the cache directory, or as specified by the url_to_path translators. The files do not have to exist because a Path could be used for writing a file to upload. To facilitate this, a side effect of this call (if create_parents is True) is that the complete parent directory structure will be created for each returned Path.

Return type:

Path | list[Path]

exists(sub_path=None, *, bypass_cache=False, nthreads=None, url_to_url=None, url_to_path=None)[source]

Check if a file exists without downloading it.

Parameters:
  • sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.

  • bypass_cache (bool) – If False, check for the file first in the local cache, and if not found there then on the remote server. If True, only check on the remote server.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this FCPath was created.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default value given when this FCPath was created.

Returns:

True if the file exists. Note that it is possible that a file could exist and still not be downloadable due to permissions. False if the file does not exist. This includes bad bucket or webserver names, lack of permission to examine a bucket’s contents, etc.

Return type:

bool | list[bool]

modification_time(sub_path=None, *, bypass_cache=False, nthreads=None, exception_on_fail=True, url_to_url=None)[source]

Get the modification time of a remote file as a Unix timestamp.

Parameters:
  • sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, all URLs are checked. This may be more efficient because files can be checked in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.

  • bypass_cache (bool) – If False, retrieve the modification time for the file first from the metadata cache, if enabled, and if not found there then from the remote server. If True, only retrieve the modification time directly from the remote server.

  • nthreads (int | None) – The maximum number of threads to use. If None, use the default value given when this FCPath was created.

  • exception_on_fail (bool) – If True, if any file does not exist a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure in place of the returned modification time.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

Returns:

The modification time as a Unix timestamp if the file exists and the time can be retrieved, None otherwise. If sub_path was a list or tuple, then instead return a list of modification times in order. This always returns the modification time of the file on the remote source, even if there is a local copy. If you want the modification time of the local copy, you can call the normal stat function. If exception_on_fail is False, any modification time may be an Exception if that file does not exist or the modification time cannot be retrieved.

Raises:

FileNotFoundError – If a file does not exist.

Return type:

float | None | Exception | list[float | None | Exception]

is_dir(sub_path=None, *, nthreads=None, exception_on_fail=True, url_to_url=None)[source]

Check if a path represents a directory.

Parameters:
  • sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the directory relative to this path. If not specified, this path is used. If sub_path is a list or tuple, all paths are checked. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.

  • nthreads (int | None) – The maximum number of threads to use for multiple paths.

  • exception_on_fail (bool) – If True, if any path cannot be checked a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

Returns:

True if the path represents a directory, False otherwise. If sub_path was a list or tuple, then instead return a list of booleans or exceptions in order. If exception_on_fail is False, any result may be an Exception if that path cannot be checked.

Raises:

FileNotFoundError – If a path cannot be checked.

Return type:

bool | Exception | list[bool | Exception]

Notes

Unlike os.path.isdir or pathlib.Path.is_dir`, this method raises an exception if the URL does not exist instead of returning False. This is so that remote connection errors are not masked by the return value.

retrieve(sub_path=None, *, lock_timeout=None, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]

Retrieve a file(s) from the given sub_path and store it in the file cache.

Parameters:
  • sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, the complete list of files is retrieved. Depending on the storage location, this may be more efficient because files can be downloaded in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this FCPath was created.

  • lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value given when this FCPath was created.

  • exception_on_fail (bool) – If True, if any file does not exist or download fails a FileNotFound exception is raised, and if any attempt to acquire a lock or wait for another process to download a file fails a TimeoutError is raised. If False, the function returns normally and any failed download is marked with the Exception that caused the failure in place of the returned Path.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default value given when this FCPath was created.

Returns:

The Path of the filename in the temporary directory (or the original absolute path if local). If sub_path was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original absolute path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the download failed or a timeout occurred.

Raises:
  • FileNotFoundError – If a file does not exist or could not be downloaded, and exception_on_fail is True.

  • TimeoutError – If we could not acquire the lock to allow downloading of a file within the given timeout or, for a multi-file download, if we timed out waiting for other processes to download locked files, and exception_on_fail is True.

Return type:

Path | Exception | list[Path | Exception]

Notes

File download is normally an atomic operation; a program will never see a partially-downloaded file, and if a download is interrupted there will be no file present. However, when downloading multiple files at the same time, as many files as possible are downloaded before an exception is raised.

upload(sub_path=None, *, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]

Upload file(s) from the file cache to the storage location(s).

Parameters:
  • sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, the complete list of files is uploaded. This may be more efficient because files can be uploaded in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this FileCache was created.

  • exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default value given when this FCPath was created.

Returns:

The Path of the filename in the temporary directory (or the original absolute path if local). If sub_path was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original absolute path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the upload failed.

Raises:

FileNotFoundError – If a file to upload does not exist or the upload failed, and exception_on_fail is True.

Return type:

Path | Exception | list[Path | Exception]

open(mode='r', *args, url_to_url=None, url_to_path=None, **kwargs)[source]

Retrieve+open or open+upload a file as a context manager.

If mode is a read mode (like 'r' or 'rb') then the file will be first retrieved by calling retrieve() and then opened. If the mode is a write mode (like 'w' or 'wb') then the file will be first opened for write, and when this context manager is exited the file will be uploaded.

Parameters:
  • mode (str) – The mode string as you would specify to Python’s open() function.

  • **args (Any) – Any additional arguments are passed to the Python open() function.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default value given when this FCPath was created.

  • **kwargs (Any) – Any additional arguments are passed to the Python open() function.

Returns:

The same object as would be returned by the normal open() function.

Return type:

IO object

Remove this file or link.

Parameters:
  • sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, the complete list of files is retrieved. Depending on the storage location, this may be more efficient because files can be downloaded in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.

  • missing_ok (bool) – True to ignore attempting to unlink a file that doesn’t exist.

  • nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this FileCache was created.

  • exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.

  • url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:

    func(scheme: str, remote: str, path: str) -> str
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.

    If None, use the default translators for the associated FileCache instance.

  • url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –

    The function (or list of functions) that is used to translate URLs into local paths. By default, FileCache uses a directory hierarchy consisting of <cache_dir>/<cache_name>/<source>/<path>, where source is the URL prefix converted to a filesystem-friendly format (e.g. gs://bucket is converted to gs_bucket). A user-specified translator function takes five arguments:

    func(scheme: str, remote: str, path: str, cache_dir: Path,
         cache_subdir: str) -> str | Path
    

    where scheme is the URL scheme (like "gs" or "file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.

    If None, use the default value given when this FCPath was created.

Return type:

str | Exception | list[str | Exception]

See pathlib.Path.unlink for full documentation.

property download_counter: int

The number of actual file downloads that have taken place.

property upload_counter: int

The number of actual file uploads that have taken place.

is_local()[source]

True if the path refers to the local filesystem.

Return type:

bool

is_file()[source]

True if this path exists and is a regular file.

Return type:

bool

read_bytes(**kwargs)[source]

Download and open the file in bytes mode, read it, and close the file.

Any additional arguments are passed to the Python open() function.

Parameters:

kwargs (Any)

Return type:

bytearray

read_text(**kwargs)[source]

Download and open the file in text mode, read it, and close the file.

Any additional arguments are passed to the Python open() function.

Parameters:

kwargs (Any)

Return type:

str

write_bytes(data, **kwargs)[source]

Open the file in bytes mode, write to it, and close and upload the file.

Any additional arguments are passed to the Python open() function.

Parameters:
  • data (Any)

  • kwargs (Any)

Return type:

int

write_text(data, **kwargs)[source]

Open the file in text mode, write to it, and close and upload the file.

Any additional arguments are passed to the Python open() function.

Parameters:
  • data (Any)

  • kwargs (Any)

Return type:

int

iterdir()[source]

Yield FCPath objects of the current path’s directory contents.

The children are yielded in arbitrary order, and the special entries ‘.’ and ‘..’ are not included.

Return type:

Iterator[FCPath]

iterdir_metadata()[source]

Yield FCPath objects of the current directory’s contents, with metadata.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • mtime: The last modification time of the file as a float.

  • size: The approximate size of the file in bytes.

If the metadata can not be retrieved, None is returned for the metadata.

Return type:

Iterator[tuple[FCPath, dict[str, Any] | None]]

glob(pattern)[source]

Yield all existing files and directories matching the given relative pattern.

Notes

If the FCPath is local, then the normal pathlib.Path.glob() method is called. If the pattern is only **, this function had different behavior before Python 3.13 (only directories returned) and in Python 3.13 and later (both files and directories are returned). In contrast, when the FCPath is remote, we always return all files and directories. To be safe, do not use ** but instead always use **/*.

Parameters:

pattern (str | Path | FCPath)

Return type:

Generator[FCPath]

rglob(pattern)[source]

Yield all existing files and directories matching the given relative pattern.

This is like calling FCPath.glob() with **/ added in front of the pattern.

Notes

If the FCPath is local, then the normal pathlib.Path.glob() method is called. If the pattern is only **, this function had different behavior before Python 3.13 (only directories returned) and in Python 3.13 and later (both files and directories are returned). In contrast, when the FCPath is remote, we always return all files and directories. To be safe, do not use ** but instead always use **/*.

Parameters:

pattern (str | Path | FCPath)

Return type:

Generator[FCPath]

walk(top_down=True)[source]

Walk the directory tree from this directory.

See pathlib.Path.walk for full documentation.

Parameters:

top_down (bool)

Return type:

Iterator[tuple[FCPath, list[str], list[str]]]

rename(target)[source]

Rename this path to the target path.

Both the source and target paths must be absolute, and must be in the same location (e.g. both local files or both in the same GS bucket). Because cloud platforms do not support renaming of files, this is accomplished by downloading the source file, uploading it with the new name, and deleting the original version. If the target already exists, it will be overwritten. If the downloading or uploading fails, the copy in the local cache is removed to eliminate ambiguity. If there is only a copy in the local cache and the source path does not exist on the remote, the rename will still succeed by uploading a copy to the target path.

Parameters:

target (str | Path | FCPath) – The path to rename to.

Returns:

The new FCPath instance pointing to the target path.

Return type:

FCPath

replace(target)[source]

Rename this path to the target path, overwriting if that path exists.

Both the source and target paths must be absolute, and must be in the same location (e.g. both local files or both in the same GS bucket). Because cloud platforms do not support renaming of files, this is accomplished by downloading the source file, uploading it with the new name, and deleting the original version. If the target already exists, it will be overwritten.

Parameters:

target (str | FCPath) – The path to rename to.

Returns:

The new FCPath instance pointing to the target path.

Return type:

FCPath

relative_to(other, *, walk_up=False)[source]

Return the relative path to another path.

See pathlib.Path.relative_to for full documentation.

Parameters:
  • other (str | Path | FCPath)

  • walk_up (bool)

Return type:

FCPath

is_relative_to(other)[source]

Return True if the path is relative to another path.

See pathlib.Path.is_relative_to for full documentation.

Parameters:

other (str | Path | FCPath)

Return type:

bool

is_reserved()[source]

True if the path contains a special reserved name.

See pathlib.Path.is_reserved for full documentation.

Return type:

bool

stat(*, follow_symlinks=True)[source]

Return the result of the stat() system call on this path.

Only valid for local files. See pathlib.Path.stat for full documentation.

Parameters:

follow_symlinks (bool)

Return type:

Any

lstat()[source]

Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.

Only valid for local files. See pathlib.Path.lstat for full documentation.

Return type:

Any

is_mount()[source]

Check if this path is a mount point.

Only valid for local directories. See pathlib.Path.is_mount for full documentation.

Return type:

bool

Whether this path is a symbolic link.

Only valid for local files. See pathlib.Path.is_symlink for full documentation.

Return type:

bool

is_junction()[source]

Whether this path is a junction.

Only valid for local files. See pathlib.Path.is_junction for full documentation.

Return type:

bool

is_block_device()[source]

Whether this path is a block device.

Only valid for local files. See pathlib.Path.is_block_device for full documentation.

Return type:

bool

is_char_device()[source]

Whether this path is a character device.

Only valid for local files. See pathlib.Path.is_char_device for full documentation.

Return type:

bool

is_fifo()[source]

Whether this path is a FIFO.

Only valid for local files. See pathlib.Path.is_fifo for full documentation.

Return type:

bool

is_socket()[source]

Whether this path is a socket.

Only valid for local files. See pathlib.Path.is_socket for full documentation.

Return type:

bool

samefile(other_path)[source]

True if this path and the given path refer to the same file.

Unlink the pathlib.Path.samefile version, this function only looks to see if the URLs are identical. Thus symlinks, hardlinks, etc. are ignored.

Parameters:

other_path (str | Path | FCPath)

Return type:

bool

absolute()[source]

Return an absolute version of this path.

For non-local paths, this just returns the URL. For local paths, it does the same operations as pathlib.Path.absolute. See pathlib.Path.absolute for full documentation.

Return type:

FCPath

classmethod cwd()[source]

Return a new FCPath pointing to the current working directory.

See pathlib.Path.cwd for full documentation.

Return type:

FCPath

expanduser()[source]

Return a new FCPath with expanded ~ and ~user constructs.

See pathlib.Path.expanduser for full documentation.

Return type:

FCPath

expandvars()[source]

Return a new FCPath with expanded environment variables.

See os.path.expandvars for full documentation.

Return type:

FCPath

classmethod home()[source]

Return a new FCPath pointing to expanduser(‘~’).

See pathlib.Path.home for full documentation.

Return type:

FCPath

Return the FCPath to which the symbolic link points.

Only valid for local files. See pathlib.Path.readlink for full documentation.

Return type:

FCPath

resolve(strict=False)[source]

Return the absolute path with resolved symlinks.

See pathlib.Path.resolve for full documentation.

Parameters:

strict (bool)

Return type:

FCPath

Make this path a symlink pointing to the target path.

Only valid for local files. See pathlib.Path.symlink_to for full documentation.

Parameters:
  • target (str)

  • target_is_directory (bool)

Return type:

None

Make this path a hard link pointing to the same file as target.

Only valid for local files. See pathlib.Path.hardlink_to for full documentation.

Parameters:

target (str)

Return type:

None

touch(mode=438, exist_ok=True)[source]

Create this file, if it doesn’t exist.

See pathlib.Path.touch for full documentation.

Parameters:
  • mode (int)

  • exist_ok (bool)

Return type:

None

mkdir(mode=511, parents=False, exist_ok=False)[source]

Create a new directory at this given path.

Only valid for local directories. See pathlib.Path.mkdir for full documentation.

Parameters:
  • mode (int)

  • parents (bool)

  • exist_ok (bool)

Return type:

None

chmod(mode, *, follow_symlinks=True)[source]

Change the permissions of the path, like os.chmod().

Only valid for local files. See pathlib.Path.chmod for full documentation.

Parameters:
  • mode (int)

  • follow_symlinks (bool)

Return type:

None

lchmod(mode)[source]

Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.

Only valid for local files. See pathlib.Path.lchmod for full documentation.

Parameters:

mode (int)

Return type:

None

rmdir()[source]

Remove this directory. The directory must be empty.

Only valid for local directories. See pathlib.Path.rmdir for full documentation.

Return type:

None

owner()[source]

Return the login name of the file owner.

Only valid for local files. See pathlib.Path.owner for full documentation.

Return type:

str

group()[source]

Return the group name of the file gid.

Only valid for local files. See pathlib.Path.group for full documentation.

Return type:

str

classmethod from_uri(uri)[source]

Return a new FCPath from the given URI.

Parameters:

uri (str)

Return type:

FCPath

as_uri()[source]

Return the path as a URI.

Return type:

str

class filecache.file_cache_source.FileCacheSource(scheme, remote, *, anonymous=False)[source]

Bases: ABC

Superclass for all remote file source classes. Do not use directly.

The FileCacheSource subclasses (FileCacheSourceFile, FileCacheSourceHTTP, FileCacheSourceGS, and FileCacheSourceS3) provide direct access to local and remote sources, bypassing the caching mechanism of FileCache.

Parameters:
  • scheme (str)

  • remote (str)

  • anonymous (bool)

__init__(scheme, remote, *, anonymous=False)[source]

Initialization for the FileCacheSource superclass.

Note

Do not instantiate FileCacheSource directly. Instead use one of the subclasses (FileCacheSourceFile, FileCacheSourceHTTP, FileCacheSourceGS, and FileCacheSourceS3).

Parameters:
  • scheme (str) – The scheme of the source, such as "gs" or "file".

  • remote (str) – The bucket or remote server name. Must be an empty string for file.

  • anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.

__repr__()[source]

Return repr(self).

Return type:

str

__str__()[source]

Return str(self).

Return type:

str

abstractmethod classmethod schemes()[source]

The URL schemes supported by this class.

Return type:

tuple[str, …]

classmethod primary_scheme()[source]

The primary URL scheme supported by this class.

Return type:

str

abstractmethod classmethod uses_anonymous()[source]

Whether this class has the concept of anonymous accesses.

Return type:

bool

abstractmethod exists(sub_path)[source]
Parameters:

sub_path (str)

Return type:

bool

exists_multi(sub_paths, *, nthreads=8)[source]

Check if multiple files exist using threads without downloading them.

Parameters:
  • sub_paths (Sequence[str]) – The path of the files relative to the source prefix to check for existence.

  • nthreads (int) – The maximum number of threads to use.

Returns:

For each entry, True if the file exists. Note that it is possible that a file could exist and still not be accessible due to permissions. False if the file does not exist. This includes bad bucket or webserver names, lack of permission to examine a bucket’s contents, etc.

Return type:

list[bool]

abstractmethod modification_time(sub_path)[source]
Parameters:

sub_path (str)

Return type:

float | None

modification_time_multi(sub_paths, *, nthreads=8)[source]

Get the modification time of multiple files as a Unix timestamp.

Parameters:
  • sub_paths (Sequence[str]) – The path of the files relative to the source prefix to get the modification time of.

  • nthreads (int) – The maximum number of threads to use.

Returns:

For each entry, the modification time as a Unix timestamp if the file exists and the time can be retrieved, None otherwise. Any modification time may be an Exception if that file does not exist or the modification time cannot be retrieved.

Return type:

list[float | None | Exception]

abstractmethod is_dir(sub_path)[source]
Parameters:

sub_path (str)

Return type:

bool

is_dir_multi(sub_paths, *, nthreads=8)[source]

Check if multiple paths represent directories using threads.

Parameters:
  • sub_paths (Sequence[str]) – The paths relative to the source prefix to check for directory status.

  • nthreads (int) – The maximum number of threads to use.

Returns:

For each entry, True if the path represents a directory, False otherwise, or an Exception if the check failed.

Return type:

list[bool | Exception]

abstractmethod retrieve(sub_path, local_path, *, preserve_mtime=False)[source]
Parameters:
  • sub_path (str)

  • local_path (str | Path)

  • preserve_mtime (bool)

Return type:

Path

retrieve_multi(sub_paths, local_paths, *, preserve_mtime=False, nthreads=8)[source]

Retrieve multiple files from the storage location using threads.

Parameters:
  • sub_paths (Sequence[str]) – The path of the files to retrieve relative to the source prefix.

  • local_paths (Sequence[str | Path]) – The paths to the destinations where the downloaded files will be stored.

  • preserve_mtime (bool) – If True, the modification time of the remote files will be copied to the local files.

  • nthreads (int) – The maximum number of threads to use.

Returns:

A list containing the local paths of the retrieved files. If a file failed to download, the entry is the Exception that caused the failure. This list is in the same order and has the same length as local_paths.

Return type:

list[Path | BaseException]

Notes

All parent directories in all local_paths are created even if a file download fails.

The download of each file is an atomic operation. However, even if some files have download failures, all other files will be downloaded.

abstractmethod upload(sub_path, local_path, *, preserve_mtime=False)[source]
Parameters:
  • sub_path (str)

  • local_path (str | Path)

  • preserve_mtime (bool)

Return type:

Path

upload_multi(sub_paths, local_paths, *, preserve_mtime=False, nthreads=8)[source]

Upload multiple files to a storage location.

Parameters:
  • sub_paths (Sequence[str]) – The path of the destination files relative to the source prefix.

  • local_paths (Sequence[str | Path]) – The paths of the files to upload.

  • preserve_mtime (bool) – If True, the modification time of the local files will be copied to the remote files.

  • nthreads (int) – The maximum number of threads to use.

Returns:

A list containing the local paths of the uploaded files. If a file failed to upload, the entry is the Exception that caused the failure. This list is in the same order and has the same length as local_paths.

Return type:

list[Path | BaseException]

abstractmethod iterdir_metadata(sub_path)[source]

Iterate over the contents of a directory.

Parameters:

sub_path (str) – The path of the directory relative to the source prefix.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • mtime: The last modification time of the file as a float.

  • size: The approximate size of the file in bytes.

If the metadata can not be retrieved, None is returned for the metadata.

Return type:

Iterator[tuple[str, dict[str, Any] | None]]

Remove the given object.

Parameters:
  • sub_path (str) – The path of the file relative to the source prefix to delete.

  • missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.

Returns:

The sub_path.

Raises:

FileNotFoundError – If the file doesn’t exist and missing_ok is False.

Return type:

str

Unlink multiple files in a storage location.

Parameters:
  • sub_paths (Sequence[str]) – The path of the destination files relative to the source prefix.

  • missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.

  • nthreads (int) – The maximum number of threads to use.

Returns:

A list containing the paths of the unlink files. If a file failed to unlink, the entry is the Exception that caused the failure. This list is in the same order and has the same length as sub_paths.

Return type:

list[str | BaseException]

class filecache.file_cache_source.FileCacheSourceFile(scheme, remote, *, anonymous=False)[source]

Bases: FileCacheSource

Class that provides direct access to local files.

This class is unlikely to be directly useful to an external program, as it provides essentially no functionality on top of the standard Python filesystem functions.

Parameters:
  • scheme (str)

  • remote (str)

  • anonymous (bool)

__init__(scheme, remote, *, anonymous=False)[source]

Initialization for the FileCacheLocal class.

Parameters:
  • scheme (str) – The scheme of the source. Must be "file" or "".

  • remote (str) – The remote server name. Must be "" since UNC shares are not supported.

  • anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.

classmethod schemes()[source]

The URL schemes supported by this class.

Return type:

tuple[str, …]

classmethod uses_anonymous()[source]

Whether this class has the concept of anonymous accesses.

Return type:

bool

exists(sub_path)[source]

Check if a file exists without downloading it.

Parameters:

sub_path (str | Path) – The absolute path of the local file.

Returns:

True if the file exists. Note that it is possible that a file could exist and still not be accessible due to permissions.

Return type:

bool

modification_time(sub_path)[source]

Get the modification time of a file as a Unix timestamp.

Parameters:

sub_path (str)

Return type:

float | None

is_dir(sub_path)[source]

Check if a file is a directory.

Parameters:

sub_path (str)

Return type:

bool

retrieve(sub_path, local_path, *, preserve_mtime=False)[source]

Retrieve a file from the storage location.

Parameters:
  • sub_path (str | Path) – The absolute path of the local file to retrieve.

  • local_path (str | Path) – The path to the desination where the file will be stored. Must be the same as sub_path.

  • preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file. Not used for local files.

Returns:

The Path of the filename, which is the same as the sub_path parameter.

Raises:
  • ValueError – If sub_path and local_path are not identical.

  • FileNotFoundError – If the file does not exist.

Return type:

Path

Notes

This method essentially does nothing except check for the existence of the file.

upload(sub_path, local_path, *, preserve_mtime=False)[source]

Upload a file from the local filesystem to the storage location.

Parameters:
  • sub_path (str | Path) – The absolute path of the destination.

  • local_path (str | Path) – The absolute path of the local file to upload. Must be the same as sub_path.

  • preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file. Not used for local files.

Returns:

The Path of the filename, which is the same as the local_path parameter.

Raises:
  • ValueError – If sub_path and local_path are not identical.

  • FileNotFoundError – If the file does not exist.

Return type:

Path

iterdir_metadata(sub_path)[source]

Iterate over the contents of a directory.

Parameters:

sub_path (str) – The absolute path of the directory.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • mtime: The last modification time of the file as a float.

  • size: The approximate size of the file in bytes.

If the metadata can not be retrieved, None is returned for the metadata.

Return type:

Iterator[tuple[str, dict[str, Any] | None]]

Remove the given object.

Parameters:
  • sub_path (str) – The path of the file.

  • missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.

Returns:

The sub_path.

Raises:

FileNotFoundError – If the file doesn’t exist and missing_ok is False.

Return type:

str

class filecache.file_cache_source.FileCacheSourceHTTP(scheme, remote, *, anonymous=False)[source]

Bases: FileCacheSource

Class that provides access to files stored on a webserver.

Parameters:
  • scheme (str)

  • remote (str)

  • anonymous (bool)

__init__(scheme, remote, *, anonymous=False)[source]

Initialization for the FileCacheHTTP class.

Parameters:
  • scheme (str) – The scheme of the source. Must be "http" or "https".

  • remote (str) – The remote server name.

  • anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.

classmethod schemes()[source]

The URL schemes supported by this class.

Return type:

tuple[str, …]

classmethod uses_anonymous()[source]

Whether this class has the concept of anonymous accesses.

Return type:

bool

exists(sub_path)[source]

Check if a file exists without downloading it.

Parameters:

sub_path (str) – The path of the file on the webserver relative to the source prefix.

Returns:

True if the file (including the webserver) exists. Note that it is possible that a file could exist and still not be downloadable due to permissions.

Return type:

bool

modification_time(sub_path)[source]

Get the modification time of a file as a Unix timestamp.

Parameters:

sub_path (str) – The path of the file on the webserver relative to the source prefix.

Returns:

The modification time as a float (Unix timestamp) if the file exists and the time can be retrieved from HTTP headers. If the file exists but no modification time is available, None is returned. If the file does not exist or other errors occur, an Exception is raised.

Return type:

float | None

Notes

This method uses HTTP HEAD request to get file metadata without downloading the content. It checks for Last-Modified header and converts it to a Unix timestamp.

is_dir(sub_path)[source]

Check if a file is a directory.

Parameters:

sub_path (str) – The path of the directory on the webserver relative to the source prefix.

Returns:

True if the path represents a directory, False otherwise.

Return type:

bool

Notes

This method assumes the provided URL is either a file or a directory that will show up as a “fancy index” page. If you provide a URL that is a directory but does not support fancy indexing, it will incorrectly return False. It is also possible to fool this method by giving it a file URL that appears to be a fancy index.

retrieve(sub_path, local_path, *, preserve_mtime=False)[source]

Retrieve a file from a webserver.

Parameters:
  • sub_path (str) – The path of the file to retrieve relative to the source prefix.

  • local_path (str | Path) – The path to the destination where the downloaded file will be stored.

  • preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file.

Returns:

The Path where the file was stored (same as local_path).

Raises:

FileNotFoundError – If the remote file does not exist or the download fails for another reason.

Return type:

Path

Notes

All parent directories in local_path are created even if the file download fails.

The download is an atomic operation.

upload(sub_path, local_path, *, preserve_mtime=False)[source]

Upload a local file to a webserver. Not implemented.

Parameters:
  • sub_path (str)

  • local_path (str | Path)

  • preserve_mtime (bool)

Return type:

Path

iterdir_metadata(sub_path)[source]

Iterate over the contents of a directory.

Parameters:

sub_path (str) – The path of the directory on the webserver relative to the source prefix.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • mtime: The last modification time of the file as a float.

  • size: The approximate size of the file in bytes.

Raises:
  • FileNotFoundError – If the URL does not point to a valid file or directory page.

  • ConnectionError – If the URL could not be accessed.

Return type:

Iterator[tuple[str, dict[str, Any]]]

Notes

This method relies on the webserver returning a “fancy index” page in some recognizable format. If server-side indexing is disabled, this method will not work.

We have no way of knowing what the timezone of the webserver is, and the fancy index page does not include the timezone. As a result we assume all times are in UTC.

Remove the given object.

Parameters:
  • sub_path (str) – The path of the file on the webserver relative to the source prefix to delete.

  • missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.

Returns:

The sub_path.

Raises:

FileNotFoundError – If the file doesn’t exist and missing_ok is False.

Return type:

str

class filecache.file_cache_source.FileCacheSourceGS(scheme, remote, *, anonymous=False)[source]

Bases: FileCacheSource

Class that provides access to files stored in Google Storage.

Parameters:
  • scheme (str)

  • remote (str)

  • anonymous (bool)

__init__(scheme, remote, *, anonymous=False)[source]

Initialization for the FileCacheGS class.

Parameters:
  • scheme (str) – The scheme of the source. Must be "gs".

  • remote (str) – The bucket name.

  • anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.

classmethod schemes()[source]

The URL schemes supported by this class.

Return type:

tuple[str, …]

classmethod uses_anonymous()[source]

Whether this class has the concept of anonymous accesses.

Return type:

bool

exists(sub_path)[source]

Check if a file exists without downloading it.

Parameters:

sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix.

Returns:

True if the file (including the bucket) exists. Note that it is possible that a file could exist and still not be downloadable due to permissions. False will also be returned if the bucket itself does not exist or is not accessible.

Return type:

bool

modification_time(sub_path)[source]

Get the modification time of a file as a Unix timestamp.

Parameters:

sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix.

Returns:

The modification time as a float (Unix timestamp) if the file exists. If the file does not exist or other errors occur, an Exception is raised.

Return type:

float | None

is_dir(sub_path)[source]

Check if a file is a directory.

Parameters:

sub_path (str) – The path of the directory in the Google Storage bucket given by the source prefix.

Returns:

True if the path represents a directory (i.e., there are objects with this prefix), False otherwise.

Return type:

bool

Notes

In Google Cloud Storage, directories are conceptual - they exist as prefixes in object names. This method checks if there are any objects with the given prefix to determine if it represents a directory.

retrieve(sub_path, local_path, *, preserve_mtime=False)[source]

Retrieve a file from a Google Storage bucket.

Parameters:
  • sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix.

  • local_path (str | Path) – The path to the destination where the downloaded file will be stored.

  • preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file.

Returns:

The Path where the file was stored (same as local_path).

Raises:

FileNotFoundError – If the remote file does not exist or the download fails for another reason.

Return type:

Path

Notes

All parent directories in local_path are created even if the file download fails.

The download is an atomic operation.

upload(sub_path, local_path, *, preserve_mtime=False)[source]

Upload a local file to a Google Storage bucket.

Parameters:
  • sub_path (str) – The path of the destination file in the Google Storage bucket given by the source prefix.

  • local_path (str | Path) – The absolute path of the local file to upload.

  • preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file.

Returns:

The Path of the filename, which is the same as the local_path parameter.

Raises:

FileNotFoundError – If the local file does not exist.

Return type:

Path

iterdir_metadata(sub_path)[source]

Iterate over the contents of a directory.

Parameters:

sub_path (str) – The path of the directory in the Google Storage bucket given by the source prefix.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • mtime: The last modification time of the file as a float.

  • size: The approximate size of the file in bytes.

If the metadata can not be retrieved, None is returned for the metadata.

Return type:

Iterator[tuple[str, dict[str, Any] | None]]

Remove the given object.

Parameters:
  • sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix to delete.

  • missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.

Returns:

The sub_path.

Raises:

FileNotFoundError – If the file doesn’t exist and missing_ok is False.

Return type:

str

class filecache.file_cache_source.FileCacheSourceS3(scheme, remote, *, anonymous=False)[source]

Bases: FileCacheSource

Class that provides access to files stored in AWS S3.

Parameters:
  • scheme (str)

  • remote (str)

  • anonymous (bool)

__init__(scheme, remote, *, anonymous=False)[source]

Initialization for the FileCacheS3 class.

Parameters:
  • scheme (str) – The scheme of the source. Must be "s3".

  • remote (str) – The bucket name.

  • anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.

classmethod schemes()[source]

The URL schemes supported by this class.

Return type:

tuple[str, …]

classmethod uses_anonymous()[source]

Whether this class has the concept of anonymous accesses.

Return type:

bool

exists(sub_path)[source]

Check if a file exists without downloading it.

Parameters:

sub_path (str) – The path of the file in the AWS S3 bucket given by the source prefix.

Returns:

True if the file (including the bucket) exists. Note that it is possible that a file could exist and still not be downloadable due to permissions. False will also be returned if the bucket itself does not exist or is not accessible.

Return type:

bool

modification_time(sub_path)[source]

Get the modification time of a file as a Unix timestamp.

Parameters:

sub_path (str) – The path of the file in the AWS S3 bucket given by the source prefix.

Returns:

The modification time as a float (Unix timestamp) if the file exists and the time can be retrieved. If the file does not exist or other errors occur, an Exception is raised.

Return type:

float | None

is_dir(sub_path)[source]

Check if a file is a directory.

Parameters:

sub_path (str) – The path of the directory in the AWS S3 bucket given by the source prefix.

Returns:

True if the path represents a directory (i.e., there are objects with this prefix), False otherwise.

Return type:

bool

Notes

In AWS S3, directories are conceptual - they exist as prefixes in object names. This method checks if there are any objects with the given prefix to determine if it represents a directory.

retrieve(sub_path, local_path, *, preserve_mtime=False)[source]

Retrieve a file from an AWS S3 bucket.

Parameters:
  • sub_path (str) – The path of the file in the AWS S3 bucket given by the source prefix.

  • local_path (str | Path) – The path to the destination where the downloaded file will be stored.

  • preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file.

Returns:

The Path where the file was stored (same as local_path).

Raises:

FileNotFoundError – If the remote file does not exist or the download fails for another reason.

Return type:

Path

Notes

All parent directories in local_path are created even if the file download fails.

The download is an atomic operation.

upload(sub_path, local_path, *, preserve_mtime=False)[source]

Upload a local file to an AWS S3 bucket.

Parameters:
  • sub_path (str) – The path of the destination file in the AWS S3 bucket given by the source prefix.

  • local_path (str | Path) – The full path of the local file to upload.

  • preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file. Not used for local files.

Returns:

The Path of the filename, which is the same as the local_path parameter.

Raises:

FileNotFoundError – If the local file does not exist.

Return type:

Path

iterdir_metadata(sub_path)[source]

Iterate over the contents of a directory.

Parameters:

sub_path (str) – The path of the directory in the AWS S3 bucket given by the source prefix.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • mtime: The last modification time of the file as a float.

  • size: The approximate size of the file in bytes.

If the metadata can not be retrieved, None is returned for the metadata.

Return type:

Iterator[tuple[str, dict[str, Any] | None]]

Remove the given object.

Parameters:
  • sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix to delete.

  • missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.

Returns:

The sub_path.

Raises:

FileNotFoundError – If the file doesn’t exist and missing_ok is False.

Return type:

str

class filecache.file_cache_source.FileCacheSourceFake(scheme, remote, *, anonymous=False, storage_dir=None)[source]

Bases: FileCacheSource

Class that simulates a remote file source using a local directory structure.

This class is useful for testing file operations without requiring actual remote connections. Files are stored in a local directory that simulates the remote storage, including the need for uploads and downloads. By default, the storage directory is <TEMPDIR>/.filecache_fake_remote and persists across program runs.

Parameters:
  • scheme (str)

  • remote (str)

  • anonymous (bool)

  • storage_dir (str | Path | None)

classmethod get_default_storage_dir()[source]

Get the current default storage directory for fake remote files.

Returns:

The current default storage directory Path.

Return type:

Path

classmethod set_default_storage_dir(directory)[source]

Set the default storage directory for fake remote files.

Parameters:

directory (str | Path) – The directory to use as the default storage location. The directory is expanded and resolved to an absolute path.

Return type:

None

classmethod delete_default_storage_dir()[source]

Delete the current default storage directory and all its contents.

This is useful for cleanup after testing.

Return type:

None

__init__(scheme, remote, *, anonymous=False, storage_dir=None)[source]

Initialize the FileCacheSourceFake class.

Parameters:
  • scheme (str) – The scheme of the source. Must be “fake”.

  • remote (str) – The simulated remote/bucket name.

  • anonymous (bool) – Not used for this class.

  • storage_dir (str | Path | None) – Base directory in which to store the fake remote files. If None, uses the class default storage directory.

classmethod schemes()[source]

The URL schemes supported by this class.

Return type:

tuple[str, …]

classmethod uses_anonymous()[source]

Whether this class has the concept of anonymous accesses.

Return type:

bool

exists(sub_path)[source]

Check if a file exists in the fake remote storage.

Parameters:

sub_path (str) – The path of the file relative to the storage directory.

Returns:

True if the file exists, False otherwise.

Return type:

bool

modification_time(sub_path)[source]

Get the modification time of a file as a Unix timestamp.

Parameters:

sub_path (str)

Return type:

float | None

is_dir(sub_path)[source]

Check if a file is a directory.

Parameters:

sub_path (str)

Return type:

bool

retrieve(sub_path, local_path, *, preserve_mtime=False)[source]

Retrieve a file from the fake remote storage.

Parameters:
  • sub_path (str) – The path of the file relative to the storage directory.

  • local_path (str | Path) – The path where the file should be copied to.

  • preserve_mtime (bool) – If True, the modification time of the fake remote file will be copied to the local file.

Returns:

The Path where the file was stored (same as local_path).

Raises:

FileNotFoundError – If the remote file does not exist.

Return type:

Path

upload(sub_path, local_path, *, preserve_mtime=False)[source]

Upload a file to the fake remote storage.

Parameters:
  • sub_path (str) – The destination path relative to the storage directory.

  • local_path (str | Path) – The path of the local file to upload.

  • preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file.

Returns:

The Path of the local file that was uploaded.

Raises:

FileNotFoundError – If the local file does not exist.

Return type:

Path

iterdir_metadata(sub_path)[source]

Iterate over the contents of a directory in the fake remote storage.

Parameters:

sub_path (str) – The path of the directory relative to the storage directory.

Yields:

All files and sub-directories in the given directory (except . and ..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:

  • is_dir: True if the returned name is a directory, False if it is a file.

  • mtime: The last modification time of the file as a float.

  • size: The approximate size of the file in bytes.

If the metadata can not be retrieved, None is returned for the metadata.

Return type:

Iterator[tuple[str, dict[str, Any] | None]]

Remove a file from the fake remote storage.

Parameters:
  • sub_path (str) – The path of the file relative to the storage directory.

  • missing_ok (bool) – If True, don’t raise an error if the file doesn’t exist.

Returns:

The sub_path that was removed.

Raises:

FileNotFoundError – If the file doesn’t exist and missing_ok is False.

Return type:

str