filecache Module
- filecache.file_cache.register_filecachesource(cls)[source]
Register one or more URL FileCacheSource subclasses as URL schemes.
- Parameters:
cls (type[FileCacheSource])
- Return type:
None
- filecache.file_cache.set_global_logger(logger)[source]
Set the global logger for all FileCache instances that don’t specify one.
- Parameters:
logger (Logger | None)
- Return type:
None
- filecache.file_cache.set_easy_logger()[source]
Set a default logger that outputs all messages to stdout.
- Return type:
None
- filecache.file_cache.get_global_logger()[source]
Return the current global logger.
- Return type:
Logger | None
- class filecache.file_cache.FileCache(cache_name='global', *, cache_root=None, delete_on_exit=None, time_sensitive=False, cache_metadata=False, mp_safe=None, anonymous=False, lock_timeout=60, nthreads=8, url_to_url=None, url_to_path=None, logger=None)[source]
Bases:
objectClass which manages the lifecycle of files from various sources.
- Parameters:
cache_name (Optional[str])
cache_root (Optional[Path | str])
delete_on_exit (Optional[bool])
time_sensitive (bool)
cache_metadata (bool)
mp_safe (Optional[bool])
anonymous (bool)
lock_timeout (int)
nthreads (int)
url_to_url (Optional[UrlToUrlFuncOrSeqType])
url_to_path (Optional[UrlToPathFuncOrSeqType])
logger (Optional[Logger | bool])
- __init__(cache_name='global', *, cache_root=None, delete_on_exit=None, time_sensitive=False, cache_metadata=False, mp_safe=None, anonymous=False, lock_timeout=60, nthreads=8, url_to_url=None, url_to_path=None, logger=None)[source]
Initialization for the FileCache class.
- Parameters:
cache_name (str | None) – By default, the file cache will be stored in the subdirectory
_filecache_globalunder the cache_root directory. If a name is specified explicitly, the file cache will be stored in the subdirectory_filecache_<cache_name>. Explicitly naming a cache is useful if other programs will want to access the same cache, or if you want the directory name to be obvious to users browsing the file system. Using a cache name (including the defaultglobal) implies that this cache should be persistent on exit. If you pass inNone, the cache will instead be stored in a uniquely-named subdirectory with the prefix_filecache_and by default will be deleted on exit.cache_root (Path | str | None) – The directory in which to place caches. By default,
FileCacheuses the contents of the environment variableFILECACHE_CACHE_ROOT; if not set, then the system temporary directory is used, which involves checking the environment variablesTMPDIR,TEMP, andTMP, and if none of those are set then usingC:\TEMP,C:\TMP,\TEMP, or\TMPon Windows and/tmp,/var/tmp, or/usr/tmpon other platforms. The cache will be stored in a sub-directory within this directory (see cache_name). If cache_root is specified but the directory does not exist, it is created.delete_on_exit (bool | None) – If True, the cache directory and its contents are always deleted on program exit or exit from a
FileCachecontext manager. If False, the cache is never deleted. By default, an unnamed cache (cache_name isNone) will be deleted on exit and a named cache will not be deleted on program exit.time_sensitive (bool) – If True, the modification time of files in the cache is considered to be important. When a file is retrieved, the modification time from the source location is set on the local copy. If a local copy already exists, the times on both copies are compared and the local copy is updated if the source is newer. When a file is uploaded, the modification time on the local copy is set to the time retrieved from the source after the upload is complete.
cache_metadata (bool) – If True,
iterdir(),iterdir_metadata(), and other internal methods will cache the metadata (such as modification time, size, and is_dir) of remote files. If time_sensitive is True andretrieve()needs the modification time of a file to compare to the local file, it will be retrieved from the cache if possible to save a server query. This option should only be used if the remote source is guaranteed not to change during the lifetime of thisFileCacheinstance.mp_safe (bool | None) – If False, never use multiprocessor-safe locking. If True, always use multiprocessor-safe locking. By default, locking is used if cache_name is specified, as it is assumed that multiple processes will be using the named cache simultaneously. If multiple processes will not be using the cache simultaneously, a small performance boost can be realized by setting mp_safe explicitly to False.
anonymous (bool) – The default value for anonymous access to cloud resources. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
lock_timeout (int) – The default value for lock timeouts. This is how long to wait, in seconds, if another process is marked as retrieving a file before raising an exception. 0 means to not wait at all. A negative value means to never time out.
nthreads (int) – The default value for the maximum number of threads to use when doing multiple-file retrieval, upload, or other file operations.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The default function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The default function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.logger (Logger | bool | None) – If False, do not do any logging. If None, use the global logger set with
set_global_logger(). Otherwise use the specified logger.
Notes
FileCachecan be used as a context, such as:with FileCache(cache_name=None) as fc: ...
In this case, the cache directory is created on entry to the context and deleted on exit. However, if the cache is named, the directory will not be deleted on exit unless the
delete_on_exit=Trueoption is used.
- property cache_dir: Path
The top-level directory of the cache as a Path object.
- property download_counter: int
The number of actual file downloads that have taken place.
- property upload_counter: int
The number of actual file uploads that have taken place.
- property is_delete_on_exit: bool
A bool indicating whether this FileCache will be deleted on exit.
- property is_time_sensitive: bool
A bool indicating whether this FileCache cares about modification times.
- property is_cache_metadata: bool
A bool indicating whether this FileCache caches metadata.
- property is_mp_safe: bool
A bool indicating whether this FileCache is multi-processor safe.
- property is_anonymous: bool
The default bool indicating whether to make all cloud accesses anonymous.
- property lock_timeout: int
The default timeout in seconds while waiting for a file lock.
- property nthreads: int
The default number of threads to use for multiple-file operations.
- property url_to_url: Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...]
The default function(s) that is used to translate URLs into URLs.
- property url_to_path: Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...]
The default function(s) that is used to translate URLs into paths.
- property logger: Logger | None
The logger to use for this FileCache.
- get_local_path(url, *, anonymous=None, create_parents=True, url_to_url=None, url_to_path=None)[source]
Return the local path for the given url.
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are processed.
anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this
FileCacheinstance.create_parents (bool) – If True, create all parent directories. This is useful when getting the local path of a file that will be uploaded.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Returns:
The Path (or list of Paths) of the filename in the temporary directory, or as specified by the url_to_path translators. The files do not have to exist because a Path could be used for writing a file to upload. To facilitate this, a side effect of this call (if create_parents is True) is that the complete parent directory structure will be created for each returned Path.
- Return type:
Path | list[Path]
- exists(url, *, bypass_cache=False, anonymous=None, nthreads=None, url_to_url=None, url_to_path=None)[source]
Check if a file exists without downloading it.
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are checked. This may be more efficient because files can be checked in parallel. It is OK to check files from multiple sources using one call.
bypass_cache (bool) – If False, check for the file first in the local cache, and if not found there then on the remote server. If True, only check on the remote server.
anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
nthreads (int | None) – The maximum number of threads to use. If None, use the default value for this
FileCacheinstance.url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Returns:
True if the file exists (note that it is possible that a file could exist and still not be downloadable due to permissions). False if the file does not exist. This includes bad bucket or webserver names, lack of permission to examine a bucket’s contents, etc. If url was a list or tuple, then instead return a list of bools giving the existence of each url in order.
- Return type:
bool | list[bool]
- modification_time(url, *, bypass_cache=False, anonymous=None, nthreads=None, exception_on_fail=True, url_to_url=None)[source]
Get the modification time of a remote file as a Unix timestamp.
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are checked. This may be more efficient because files can be checked in parallel. It is OK to check files from multiple sources using one call.
bypass_cache (bool) – If False, retrieve the modification time for the file first from the metadata cache, if enabled, and if not found there then from the remote server. If True, only retrieve the modification time directly from the remote server.
anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
nthreads (int | None) – The maximum number of threads to use. If None, use the default value for this
FileCacheinstance.exception_on_fail (bool) – If True, if any file does not exist a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure in place of the returned modification time.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Returns:
The modification time as a Unix timestamp if the file exists and the time can be retrieved, None otherwise. If url was a list or tuple, then instead return a list of modification times in order. This always returns the modification time of the file on the remote source, even if there is a local copy. If you want the modification time of the local copy, you can call the normal
statfunction. If cache_metadata is True, the modification time is retrieved from the cache if possible to save a server query. If exception_on_fail is False, any modification time may be an Exception if that file does not exist or the modification time cannot be retrieved.- Raises:
FileNotFoundError – If a file does not exist.
- Return type:
float | None | Exception | list[float | None | Exception]
- is_dir(url, *, anonymous=None, nthreads=None, exception_on_fail=True, url_to_url=None)[source]
Check if a URL represents a directory.
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the directory, including any source prefix. If url is a list or tuple, all URLs are checked. This may be more efficient because URLs can be checked in parallel. It is OK to check URLs from multiple sources using one call.
anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
nthreads (int | None) – The maximum number of threads to use. If None, use the default value for this
FileCacheinstance.exception_on_fail (bool) – If True, if any URL cannot be checked a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure in place of the returned boolean.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Returns:
True if the URL represents a directory, False otherwise. If url was a list or tuple, then instead return a list of booleans or exceptions in order. If exception_on_fail is False, any result may be an Exception if that URL cannot be checked.
- Raises:
FileNotFoundError – If a URL cannot be checked.
- Return type:
bool | Exception | list[bool | Exception]
Notes
Unlike
os.path.isdiror pathlib.Path.is_dir`, this method raises an exception if the URL does not exist instead of returningFalse. This is so that remote connection errors are not masked by the return value. Contrast this with the return value ofFileCache.exists(), which will returnFalseif the file does not exist or cannot be accessed.
- retrieve(url, *, anonymous=None, lock_timeout=None, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]
Retrieve file(s) from the given location(s) and store in the file cache.
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are retrieved. This may be more efficient because files can be downloaded in parallel. It is OK to retrieve files from multiple sources using one call.
anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this
FileCacheinstance.lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value for this
FileCacheinstance.nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this
FileCacheinstance.exception_on_fail (bool) – If True, if any file does not exist or download fails a FileNotFound exception is raised, and if any attempt to acquire a lock or wait for another process to download a file fails a TimeoutError is raised. If False, the function returns normally and any failed download is marked with the Exception that caused the failure in place of the returned Path.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Returns:
The Path of the filename in the temporary directory (or the original absolute path if local). If url was a list or tuple, then instead return a list of Paths of the filenames in the temporary directory (or the original absolute path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the download failed or a timeout occurred.
- Raises:
FileNotFoundError – If a file does not exist or could not be downloaded, and exception_on_fail is True. Also if time_sensitive is True and the modification time of the remote file can not be determined because a locally cached file has been deleted on the remote source.
TimeoutError – If we could not acquire the lock to allow downloading of a file within the given timeout or, for a multi-file download, if we timed out waiting for other processes to download locked files, and exception_on_fail is True.
- Return type:
Path | Exception | list[Path | Exception]
Notes
File download is normally an atomic operation; a program will never see a partially-downloaded file, and if a download is interrupted there will be no file present. However, when downloading multiple files at the same time, as many files as possible are downloaded before an exception is raised.
- upload(url, *, anonymous=None, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]
Upload file(s) from the file cache to the storage location(s).
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, the complete list of files is uploaded. This may be more efficient because files can be uploaded in parallel.
anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this
FileCacheinstance.nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this
FileCacheinstance.exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Returns:
The Path of the filename in the cache directory (or the original absolute path if local). If url was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original full path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the upload failed.
- Raises:
FileNotFoundError – If a file to upload does not exist or the upload failed, and exception_on_fail is True.
- Return type:
Path | Exception | list[Path | Exception]
Notes
If time_sensitive is True for this
FileCacheinstance, then the modification time of the local file is set to the modification time of the remote file after the upload is complete. If time_sensitive is False, then the modification time of the local file is not changed.
- open(url, mode='r', *args, anonymous=None, lock_timeout=None, url_to_url=None, url_to_path=None, **kwargs)[source]
Retrieve+open or open+upload a file as a context manager.
If mode is a read mode (like
'r'or'rb') then the file will be first retrieved by callingretrieve()and then opened. If the mode is a write mode (like'w'or'wb') then the file will be first opened for write, and when this context manager is exited the file will be uploaded.- Parameters:
url (str | Path) – The filename to open.
mode (str) – The mode string as you would specify to Python’s open() function.
**args (Any) – Any additional arguments are passed to the Python
open()function.anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this
FileCacheinstance.lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. If None, use the default value for this
FileCacheinstance.url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.**kwargs (Any) – Any additional arguments are passed to the Python
open()function.
- Returns:
The same object as would be returned by the normal open() function.
- Return type:
Iterator[IO[Any]]
- iterdir(url, *, anonymous=None, url_to_url=None)[source]
Enumerate the files and sub-directories in a directory.
This function always accesses a remote location (ignoring the local cache), if appropriate, because there is no way to know if the local cache contains all of the files and sub-directories present in the remote.
- Parameters:
url (str | Path) – The URL of the directory, including any source prefix.
anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Yields:
All files and sub-directories in the directory given by the url, in no particular order. Special directories
.and..are ignored.- Return type:
Iterator[str]
- iterdir_metadata(url, *, anonymous=None, url_to_url=None)[source]
Enumerate the files and sub-dirs in a directory indicating which is a dir.
This function always accesses a remote location (ignoring the local cache), if appropriate, because there is no way to know if the local cache contains all of the files and sub-directories present in the remote.
- Parameters:
url (str | Path) – The URL of the directory, including any source prefix.
anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.date: The last modification date of the file as a UNIX timestamp.size: The approximate size of the file in bytes.
If the metadata can not be retrieved, None is returned for the metadata.
- Return type:
Iterator[tuple[str, dict[str, Any] | None]]
- unlink(url, *, missing_ok=False, anonymous=None, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]
Remove a file, including any locally cached copy.
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...]) – The URL of the file, including any source prefix. If url is a list or tuple, all URLs are unlinked.
missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.
anonymous (bool | None) – If specified, override the default setting for anonymous access. If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this
FileCacheinstance.exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If this parameter is specified, it replaces the default translators for this
FileCacheinstance. If this parameter is omitted, the default translators are used.
- Returns:
The Path of the filename in the cache directory (or the original absolute path if local). If url was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original full path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist and missing_ok is True.
- Return type:
str | Exception | list[str | Exception]
Notes
If a URL points to a remote location, the locally cached version (if any) is only removed if the unlink of the remote location succeeded.
- Raises:
FileNotFoundError – If a file to unlink does not exist or the unlink failed, and exception_on_fail is True.
- Parameters:
url (str | Path | list[str | Path] | tuple[str | Path, ...])
missing_ok (bool)
anonymous (bool | None)
nthreads (int | None)
exception_on_fail (bool)
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None)
url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None)
- Return type:
str | Exception | list[str | Exception]
- new_path(path, *, anonymous=None, lock_timeout=None, nthreads=None, url_to_url=None, url_to_path=None)[source]
Create a new FCPath with the given prefix.
- Parameters:
path (str | Path | FCPath) – The path.
anonymous (bool | None) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for this
FileCacheinstance.lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value for this
FileCacheinstance.nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for this
FileCacheinstance.url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for this
FileCacheinstance.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default translators for this
FileCacheinstance.
- Return type:
- delete_cache()[source]
Delete all files stored in the cache including the cache directory.
Notes
It is permissible to call
delete_cache()more than once. It is also permissible to calldelete_cache(), then perform more operations that place files in the cache, then calldelete_cache()again.- Return type:
None
- class filecache.file_cache_path.FCPath(*paths, filecache=None, anonymous=None, lock_timeout=None, nthreads=None, url_to_url=None, url_to_path=None, copy_from=None)[source]
Bases:
objectRewrite of the Python pathlib.Path class that supports URLs and FileCache.
This class provides a simpler way to abstract away remote access in a FileCache by emulating the Python pathlib.Path class. At the same time, it can collect common parameters (anonymous, lock_timeout, nthreads) into a single location so that they do not have to be specified on every method call.
- Parameters:
- __init__(*paths, filecache=None, anonymous=None, lock_timeout=None, nthreads=None, url_to_url=None, url_to_path=None, copy_from=None)[source]
Initialization for the FCPath class.
- Parameters:
paths (str | Path | FCPath | None) – The path(s). These may be absolute or relative paths. They are joined together to form a final path. File operations can only be performed on absolute paths.
file_cache – The
FileCachein which to store files retrieved from this path. If not specified, the default globalFileCachewill be used.anonymous (Optional[bool]) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. If None, use the default setting for the associated
FileCacheinstance.lock_timeout (Optional[int]) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value for the associated
FileCacheinstance.nthreads (Optional[int]) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value for the associated
FileCacheinstance.url_to_url (Optional[UrlToUrlFuncOrSeqType]) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.url_to_path (Optional[UrlToPathFuncOrSeqType]) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default translators for the associated
FileCacheinstance.copy_from (Optional[FCPath]) – An FCPath instance to copy internal parameters (file_cache, anonymous, lock_timeout, nthreads, url_to_url, and url_to_path) from. If specified, any values for these parameters in this constructor are ignored. Used internally and should not be used by external programmers.
filecache (Optional['FileCache'])
- property path: str
Return this path as a string.
- as_posix()[source]
Return this FCPath as a POSIX path. This is a str using only forward slashes.
Notes
Because URLs are not really supported in POSIX format, we just return the URL as-is, including any scheme and remote.
- Returns:
This path as a POSIX path.
- Return type:
str
- property drive: str
The drive associated with this FCPath.
Notes
Examples:
For a Windows path: ‘’ or ‘C:’
For a UNC share: ‘//host/share’
For a cloud resource: ‘gs://bucket’
- property root: str
The root of this FCPath; ‘/’ if absolute, ‘’ otherwise.
- property anchor: str
The anchor of this FCPath, which is drive + root.
- property suffix: str
The final component’s last suffix, if any, including the leading period.
- property suffixes: list[str]
A list of the final component’s suffixes, including the leading periods.
- property stem: str
The final path component, minus its last suffix.
- with_name(name)[source]
Return a new FCPath with the filename changed.
- Parameters:
name (str) – The new filename to replace the final path component with.
- Returns:
A new FCPath with the final component replaced. The new FCPath will have the same parameters (filecache, etc.) as the source FCPath.
- Return type:
- with_stem(stem)[source]
Return a new FCPath with the stem (the filename minus the suffix) changed.
- Parameters:
stem (str) – The new stem.
- Returns:
A new FCPath with the final component’s stem replaced. The new FCPath will have the same parameters (filecache, etc.) as the source FCPath.
- Return type:
- with_suffix(suffix)[source]
Return a new FCPath with the file suffix changed.
If the path has no suffix, add the given suffix. If the given suffix is an empty string, remove the suffix from the path.
- Parameters:
suffix (str) – The new suffix to use.
- Returns:
A new FCPath with the final component’s suffix replaced. The new FCPath will have the same parameters (filecache, etc.) as the source FCPath.
- Return type:
- property parts: tuple[str, ...]
An object providing sequence-like access to the components in the path.
- __rtruediv__(other)[source]
Combine an additional path with this path.
- Parameters:
other (str | Path | FCPath) – The path to join with this path.
- Returns:
A new FCPath that is a combination of the other path and this path. The new FCPath will have the same parameters (filecache, etc.) as the other path if the other path is an FCPath; otherwise it will have the same parameters as the current FCPath.
- Return type:
- splitpath(search_dir)[source]
Split the path into a list of FCPaths at each occurrence of search_dir.
- Parameters:
search_dir (str) – The directory to search for.
- Returns:
A tuple of FCPaths, each of which is a segment of the path between instances of search_dir, not including the search_dir itself.
- Return type:
tuple[FCPath, …]
- property name: str
The final component of the path.
- property parent: FCPath
The logical parent of the path.
The new FCPath will have the same parameters (filecache, etc.) as the original path.
- match(path_pattern)[source]
Return True if this path matches the given pattern.
If the pattern is relative, matching is done from the right; otherwise, the entire path is matched. The recursive wildcard
**is not supported by this method (it just acts like*).See pathlib.Path.match for full documentation.
- Parameters:
path_pattern (str | Path | FCPath)
- Return type:
bool
- full_match(pattern)[source]
Return True if this path matches the given glob-style pattern.
The pattern is matched against the entire path.
See pathlib.Path.full_match for full documentation.
- Parameters:
pattern (str | Path | FCPath)
- Return type:
bool
- get_local_path(sub_path=None, *, create_parents=True, url_to_url=None, url_to_path=None)[source]
Return the local path for the given sub_path relative to this path.
- Parameters:
sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, all paths are processed. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.
create_parents (bool) – If True, create all parent directories. This is useful when getting the local path of a file that will be uploaded.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default value given when this
FCPathwas created.
- Returns:
The Path (or list of Paths) of the URL (possibly as mapped by the url_to_url translators) in the cache directory, or as specified by the url_to_path translators. The files do not have to exist because a Path could be used for writing a file to upload. To facilitate this, a side effect of this call (if create_parents is True) is that the complete parent directory structure will be created for each returned Path.
- Return type:
Path | list[Path]
- exists(sub_path=None, *, bypass_cache=False, nthreads=None, url_to_url=None, url_to_path=None)[source]
Check if a file exists without downloading it.
- Parameters:
sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.
bypass_cache (bool) – If False, check for the file first in the local cache, and if not found there then on the remote server. If True, only check on the remote server.
nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this
FCPathwas created.url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default value given when this
FCPathwas created.
- Returns:
True if the file exists. Note that it is possible that a file could exist and still not be downloadable due to permissions. False if the file does not exist. This includes bad bucket or webserver names, lack of permission to examine a bucket’s contents, etc.
- Return type:
bool | list[bool]
- modification_time(sub_path=None, *, bypass_cache=False, nthreads=None, exception_on_fail=True, url_to_url=None)[source]
Get the modification time of a remote file as a Unix timestamp.
- Parameters:
sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, all URLs are checked. This may be more efficient because files can be checked in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.
bypass_cache (bool) – If False, retrieve the modification time for the file first from the metadata cache, if enabled, and if not found there then from the remote server. If True, only retrieve the modification time directly from the remote server.
nthreads (int | None) – The maximum number of threads to use. If None, use the default value given when this
FCPathwas created.exception_on_fail (bool) – If True, if any file does not exist a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure in place of the returned modification time.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.
- Returns:
The modification time as a Unix timestamp if the file exists and the time can be retrieved, None otherwise. If sub_path was a list or tuple, then instead return a list of modification times in order. This always returns the modification time of the file on the remote source, even if there is a local copy. If you want the modification time of the local copy, you can call the normal
statfunction. If exception_on_fail is False, any modification time may be an Exception if that file does not exist or the modification time cannot be retrieved.- Raises:
FileNotFoundError – If a file does not exist.
- Return type:
float | None | Exception | list[float | None | Exception]
- is_dir(sub_path=None, *, nthreads=None, exception_on_fail=True, url_to_url=None)[source]
Check if a path represents a directory.
- Parameters:
sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the directory relative to this path. If not specified, this path is used. If sub_path is a list or tuple, all paths are checked. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.
nthreads (int | None) – The maximum number of threads to use for multiple paths.
exception_on_fail (bool) – If True, if any path cannot be checked a FileNotFound exception is raised. If False, the function returns normally and any failed check is marked with the Exception that caused the failure.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.
- Returns:
True if the path represents a directory, False otherwise. If sub_path was a list or tuple, then instead return a list of booleans or exceptions in order. If exception_on_fail is False, any result may be an Exception if that path cannot be checked.
- Raises:
FileNotFoundError – If a path cannot be checked.
- Return type:
bool | Exception | list[bool | Exception]
Notes
Unlike
os.path.isdiror pathlib.Path.is_dir`, this method raises an exception if the URL does not exist instead of returningFalse. This is so that remote connection errors are not masked by the return value.
- retrieve(sub_path=None, *, lock_timeout=None, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]
Retrieve a file(s) from the given sub_path and store it in the file cache.
- Parameters:
sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, the complete list of files is retrieved. Depending on the storage location, this may be more efficient because files can be downloaded in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.
nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this
FCPathwas created.lock_timeout (int | None) – How long to wait, in seconds, if another process is marked as retrieving the file before raising an exception. 0 means to not wait at all. A negative value means to never time out. None means to use the default value given when this
FCPathwas created.exception_on_fail (bool) – If True, if any file does not exist or download fails a FileNotFound exception is raised, and if any attempt to acquire a lock or wait for another process to download a file fails a TimeoutError is raised. If False, the function returns normally and any failed download is marked with the Exception that caused the failure in place of the returned Path.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default value given when this
FCPathwas created.
- Returns:
The Path of the filename in the temporary directory (or the original absolute path if local). If sub_path was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original absolute path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the download failed or a timeout occurred.
- Raises:
FileNotFoundError – If a file does not exist or could not be downloaded, and exception_on_fail is True.
TimeoutError – If we could not acquire the lock to allow downloading of a file within the given timeout or, for a multi-file download, if we timed out waiting for other processes to download locked files, and exception_on_fail is True.
- Return type:
Path | Exception | list[Path | Exception]
Notes
File download is normally an atomic operation; a program will never see a partially-downloaded file, and if a download is interrupted there will be no file present. However, when downloading multiple files at the same time, as many files as possible are downloaded before an exception is raised.
- upload(sub_path=None, *, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]
Upload file(s) from the file cache to the storage location(s).
- Parameters:
sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, the complete list of files is uploaded. This may be more efficient because files can be uploaded in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.
nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this
FileCachewas created.exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default value given when this
FCPathwas created.
- Returns:
The Path of the filename in the temporary directory (or the original absolute path if local). If sub_path was a list or tuple of paths, then instead return a list of Paths of the filenames in the temporary directory (or the original absolute path if local). If exception_on_fail is False, any Path may be an Exception if that file does not exist or the upload failed.
- Raises:
FileNotFoundError – If a file to upload does not exist or the upload failed, and exception_on_fail is True.
- Return type:
Path | Exception | list[Path | Exception]
- open(mode='r', *args, url_to_url=None, url_to_path=None, **kwargs)[source]
Retrieve+open or open+upload a file as a context manager.
If mode is a read mode (like
'r'or'rb') then the file will be first retrieved by callingretrieve()and then opened. If the mode is a write mode (like'w'or'wb') then the file will be first opened for write, and when this context manager is exited the file will be uploaded.- Parameters:
mode (str) – The mode string as you would specify to Python’s open() function.
**args (Any) – Any additional arguments are passed to the Python
open()function.url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default value given when this
FCPathwas created.**kwargs (Any) – Any additional arguments are passed to the Python
open()function.
- Returns:
The same object as would be returned by the normal open() function.
- Return type:
IO object
- unlink(sub_path=None, *, missing_ok=False, nthreads=None, exception_on_fail=True, url_to_url=None, url_to_path=None)[source]
Remove this file or link.
- Parameters:
sub_path (str | Path | list[str | Path] | tuple[str | Path, ...] | None) – The path of the file relative to this path. If not specified, this path is used. If sub_path is a list or tuple, the complete list of files is retrieved. Depending on the storage location, this may be more efficient because files can be downloaded in parallel. If the resulting derived path is not absolute, it is assumed to be a relative local path and is converted to an absolute path by expanding usernames and resolving links.
missing_ok (bool) – True to ignore attempting to unlink a file that doesn’t exist.
nthreads (int | None) – The maximum number of threads to use when doing multiple-file retrieval or upload. If None, use the default value given when this
FileCachewas created.exception_on_fail (bool) – If True, if any file does not exist or upload fails an exception is raised. If False, the function returns normally and any failed upload is marked with the Exception that caused the failure in place of the returned path.
url_to_url (Callable[[str, str, str], str | None] | list[Callable[[str, str, str], str | None]] | tuple[Callable[[str, str, str], str | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into URLs. A user-specified translator function takes three arguments:
func(scheme: str, remote: str, path: str) -> str
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, and path is the rest of the URL. If the translator wants to override the default translation, it can return a new complete URL as a string. Otherwise, it returns None. If more than one translator is specified, they are called in order until one returns a URL, or it falls through to the default.If None, use the default translators for the associated
FileCacheinstance.url_to_path (Callable[[str, str, str, Path, str], str | Path | None] | list[Callable[[str, str, str, Path, str], str | Path | None]] | tuple[Callable[[str, str, str, Path, str], str | Path | None], ...] | None) –
The function (or list of functions) that is used to translate URLs into local paths. By default,
FileCacheuses a directory hierarchy consisting of<cache_dir>/<cache_name>/<source>/<path>, wheresourceis the URL prefix converted to a filesystem-friendly format (e.g.gs://bucketis converted togs_bucket). A user-specified translator function takes five arguments:func(scheme: str, remote: str, path: str, cache_dir: Path, cache_subdir: str) -> str | Path
where scheme is the URL scheme (like
"gs"or"file"), remote is the name of the bucket or webserver or the empty string for a local file, path is the rest of the URL, cache_dir is the top-level directory of the cache (<cache_dir>/<cache_name>), and cache_subdir is the subdirectory specific to this scheme and remote. If the translator wants to override the default translation, it can return a Path. Otherwise, it returns None. If the returned Path is relative, if will be appended to cache_dir; if it is absolute, it will be used directly (be very careful with this, as it has the ability to access files outside of the cache directory). If more than one translator is specified, they are called in order until one returns a Path, or it falls through to the default. Note that url_to_path operates on the original URL, not the URL generated by a url_to_url translator.If None, use the default value given when this
FCPathwas created.
- Return type:
str | Exception | list[str | Exception]
See pathlib.Path.unlink for full documentation.
- property download_counter: int
The number of actual file downloads that have taken place.
- property upload_counter: int
The number of actual file uploads that have taken place.
- read_bytes(**kwargs)[source]
Download and open the file in bytes mode, read it, and close the file.
Any additional arguments are passed to the Python
open()function.- Parameters:
kwargs (Any)
- Return type:
bytearray
- read_text(**kwargs)[source]
Download and open the file in text mode, read it, and close the file.
Any additional arguments are passed to the Python
open()function.- Parameters:
kwargs (Any)
- Return type:
str
- write_bytes(data, **kwargs)[source]
Open the file in bytes mode, write to it, and close and upload the file.
Any additional arguments are passed to the Python
open()function.- Parameters:
data (Any)
kwargs (Any)
- Return type:
int
- write_text(data, **kwargs)[source]
Open the file in text mode, write to it, and close and upload the file.
Any additional arguments are passed to the Python
open()function.- Parameters:
data (Any)
kwargs (Any)
- Return type:
int
- iterdir()[source]
Yield FCPath objects of the current path’s directory contents.
The children are yielded in arbitrary order, and the special entries ‘.’ and ‘..’ are not included.
- Return type:
Iterator[FCPath]
- iterdir_metadata()[source]
Yield FCPath objects of the current directory’s contents, with metadata.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.mtime: The last modification time of the file as a float.size: The approximate size of the file in bytes.
If the metadata can not be retrieved, None is returned for the metadata.
- Return type:
Iterator[tuple[FCPath, dict[str, Any] | None]]
- glob(pattern)[source]
Yield all existing files and directories matching the given relative pattern.
Notes
If the FCPath is local, then the normal pathlib.Path.glob() method is called. If the pattern is only **, this function had different behavior before Python 3.13 (only directories returned) and in Python 3.13 and later (both files and directories are returned). In contrast, when the FCPath is remote, we always return all files and directories. To be safe, do not use ** but instead always use **/*.
- rglob(pattern)[source]
Yield all existing files and directories matching the given relative pattern.
This is like calling
FCPath.glob()with**/added in front of the pattern.Notes
If the FCPath is local, then the normal pathlib.Path.glob() method is called. If the pattern is only **, this function had different behavior before Python 3.13 (only directories returned) and in Python 3.13 and later (both files and directories are returned). In contrast, when the FCPath is remote, we always return all files and directories. To be safe, do not use ** but instead always use **/*.
- walk(top_down=True)[source]
Walk the directory tree from this directory.
See pathlib.Path.walk for full documentation.
- Parameters:
top_down (bool)
- Return type:
Iterator[tuple[FCPath, list[str], list[str]]]
- rename(target)[source]
Rename this path to the target path.
Both the source and target paths must be absolute, and must be in the same location (e.g. both local files or both in the same GS bucket). Because cloud platforms do not support renaming of files, this is accomplished by downloading the source file, uploading it with the new name, and deleting the original version. If the target already exists, it will be overwritten. If the downloading or uploading fails, the copy in the local cache is removed to eliminate ambiguity. If there is only a copy in the local cache and the source path does not exist on the remote, the rename will still succeed by uploading a copy to the target path.
- replace(target)[source]
Rename this path to the target path, overwriting if that path exists.
Both the source and target paths must be absolute, and must be in the same location (e.g. both local files or both in the same GS bucket). Because cloud platforms do not support renaming of files, this is accomplished by downloading the source file, uploading it with the new name, and deleting the original version. If the target already exists, it will be overwritten.
- relative_to(other, *, walk_up=False)[source]
Return the relative path to another path.
See pathlib.Path.relative_to for full documentation.
- is_relative_to(other)[source]
Return True if the path is relative to another path.
See pathlib.Path.is_relative_to for full documentation.
- Parameters:
other (str | Path | FCPath)
- Return type:
bool
- is_reserved()[source]
True if the path contains a special reserved name.
See pathlib.Path.is_reserved for full documentation.
- Return type:
bool
- stat(*, follow_symlinks=True)[source]
Return the result of the stat() system call on this path.
Only valid for local files. See pathlib.Path.stat for full documentation.
- Parameters:
follow_symlinks (bool)
- Return type:
Any
- lstat()[source]
Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.
Only valid for local files. See pathlib.Path.lstat for full documentation.
- Return type:
Any
- is_mount()[source]
Check if this path is a mount point.
Only valid for local directories. See pathlib.Path.is_mount for full documentation.
- Return type:
bool
- is_symlink()[source]
Whether this path is a symbolic link.
Only valid for local files. See pathlib.Path.is_symlink for full documentation.
- Return type:
bool
- is_junction()[source]
Whether this path is a junction.
Only valid for local files. See pathlib.Path.is_junction for full documentation.
- Return type:
bool
- is_block_device()[source]
Whether this path is a block device.
Only valid for local files. See pathlib.Path.is_block_device for full documentation.
- Return type:
bool
- is_char_device()[source]
Whether this path is a character device.
Only valid for local files. See pathlib.Path.is_char_device for full documentation.
- Return type:
bool
- is_fifo()[source]
Whether this path is a FIFO.
Only valid for local files. See pathlib.Path.is_fifo for full documentation.
- Return type:
bool
- is_socket()[source]
Whether this path is a socket.
Only valid for local files. See pathlib.Path.is_socket for full documentation.
- Return type:
bool
- samefile(other_path)[source]
True if this path and the given path refer to the same file.
Unlink the pathlib.Path.samefile version, this function only looks to see if the URLs are identical. Thus symlinks, hardlinks, etc. are ignored.
- Parameters:
other_path (str | Path | FCPath)
- Return type:
bool
- absolute()[source]
Return an absolute version of this path.
For non-local paths, this just returns the URL. For local paths, it does the same operations as pathlib.Path.absolute. See pathlib.Path.absolute for full documentation.
- Return type:
- classmethod cwd()[source]
Return a new FCPath pointing to the current working directory.
See pathlib.Path.cwd for full documentation.
- Return type:
- expanduser()[source]
Return a new FCPath with expanded ~ and ~user constructs.
See pathlib.Path.expanduser for full documentation.
- Return type:
- expandvars()[source]
Return a new FCPath with expanded environment variables.
See os.path.expandvars for full documentation.
- Return type:
- classmethod home()[source]
Return a new FCPath pointing to expanduser(‘~’).
See pathlib.Path.home for full documentation.
- Return type:
- readlink()[source]
Return the FCPath to which the symbolic link points.
Only valid for local files. See pathlib.Path.readlink for full documentation.
- Return type:
- resolve(strict=False)[source]
Return the absolute path with resolved symlinks.
See pathlib.Path.resolve for full documentation.
- Parameters:
strict (bool)
- Return type:
- symlink_to(target, target_is_directory=False)[source]
Make this path a symlink pointing to the target path.
Only valid for local files. See pathlib.Path.symlink_to for full documentation.
- Parameters:
target (str)
target_is_directory (bool)
- Return type:
None
- hardlink_to(target)[source]
Make this path a hard link pointing to the same file as target.
Only valid for local files. See pathlib.Path.hardlink_to for full documentation.
- Parameters:
target (str)
- Return type:
None
- touch(mode=438, exist_ok=True)[source]
Create this file, if it doesn’t exist.
See pathlib.Path.touch for full documentation.
- Parameters:
mode (int)
exist_ok (bool)
- Return type:
None
- mkdir(mode=511, parents=False, exist_ok=False)[source]
Create a new directory at this given path.
Only valid for local directories. See pathlib.Path.mkdir for full documentation.
- Parameters:
mode (int)
parents (bool)
exist_ok (bool)
- Return type:
None
- chmod(mode, *, follow_symlinks=True)[source]
Change the permissions of the path, like os.chmod().
Only valid for local files. See pathlib.Path.chmod for full documentation.
- Parameters:
mode (int)
follow_symlinks (bool)
- Return type:
None
- lchmod(mode)[source]
Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.
Only valid for local files. See pathlib.Path.lchmod for full documentation.
- Parameters:
mode (int)
- Return type:
None
- rmdir()[source]
Remove this directory. The directory must be empty.
Only valid for local directories. See pathlib.Path.rmdir for full documentation.
- Return type:
None
- owner()[source]
Return the login name of the file owner.
Only valid for local files. See pathlib.Path.owner for full documentation.
- Return type:
str
- group()[source]
Return the group name of the file gid.
Only valid for local files. See pathlib.Path.group for full documentation.
- Return type:
str
- class filecache.file_cache_source.FileCacheSource(scheme, remote, *, anonymous=False)[source]
Bases:
ABCSuperclass for all remote file source classes. Do not use directly.
The
FileCacheSourcesubclasses (FileCacheSourceFile,FileCacheSourceHTTP,FileCacheSourceGS, andFileCacheSourceS3) provide direct access to local and remote sources, bypassing the caching mechanism ofFileCache.- Parameters:
scheme (str)
remote (str)
anonymous (bool)
- __init__(scheme, remote, *, anonymous=False)[source]
Initialization for the FileCacheSource superclass.
Note
Do not instantiate
FileCacheSourcedirectly. Instead use one of the subclasses (FileCacheSourceFile,FileCacheSourceHTTP,FileCacheSourceGS, andFileCacheSourceS3).- Parameters:
scheme (str) – The scheme of the source, such as
"gs"or"file".remote (str) – The bucket or remote server name. Must be an empty string for
file.anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment.
- abstractmethod classmethod schemes()[source]
The URL schemes supported by this class.
- Return type:
tuple[str, …]
- classmethod primary_scheme()[source]
The primary URL scheme supported by this class.
- Return type:
str
- abstractmethod classmethod uses_anonymous()[source]
Whether this class has the concept of anonymous accesses.
- Return type:
bool
- exists_multi(sub_paths, *, nthreads=8)[source]
Check if multiple files exist using threads without downloading them.
- Parameters:
sub_paths (Sequence[str]) – The path of the files relative to the source prefix to check for existence.
nthreads (int) – The maximum number of threads to use.
- Returns:
For each entry, True if the file exists. Note that it is possible that a file could exist and still not be accessible due to permissions. False if the file does not exist. This includes bad bucket or webserver names, lack of permission to examine a bucket’s contents, etc.
- Return type:
list[bool]
- abstractmethod modification_time(sub_path)[source]
- Parameters:
sub_path (str)
- Return type:
float | None
- modification_time_multi(sub_paths, *, nthreads=8)[source]
Get the modification time of multiple files as a Unix timestamp.
- Parameters:
sub_paths (Sequence[str]) – The path of the files relative to the source prefix to get the modification time of.
nthreads (int) – The maximum number of threads to use.
- Returns:
For each entry, the modification time as a Unix timestamp if the file exists and the time can be retrieved, None otherwise. Any modification time may be an Exception if that file does not exist or the modification time cannot be retrieved.
- Return type:
list[float | None | Exception]
- is_dir_multi(sub_paths, *, nthreads=8)[source]
Check if multiple paths represent directories using threads.
- Parameters:
sub_paths (Sequence[str]) – The paths relative to the source prefix to check for directory status.
nthreads (int) – The maximum number of threads to use.
- Returns:
For each entry, True if the path represents a directory, False otherwise, or an Exception if the check failed.
- Return type:
list[bool | Exception]
- abstractmethod retrieve(sub_path, local_path, *, preserve_mtime=False)[source]
- Parameters:
sub_path (str)
local_path (str | Path)
preserve_mtime (bool)
- Return type:
Path
- retrieve_multi(sub_paths, local_paths, *, preserve_mtime=False, nthreads=8)[source]
Retrieve multiple files from the storage location using threads.
- Parameters:
sub_paths (Sequence[str]) – The path of the files to retrieve relative to the source prefix.
local_paths (Sequence[str | Path]) – The paths to the destinations where the downloaded files will be stored.
preserve_mtime (bool) – If True, the modification time of the remote files will be copied to the local files.
nthreads (int) – The maximum number of threads to use.
- Returns:
A list containing the local paths of the retrieved files. If a file failed to download, the entry is the Exception that caused the failure. This list is in the same order and has the same length as local_paths.
- Return type:
list[Path | BaseException]
Notes
All parent directories in all local_paths are created even if a file download fails.
The download of each file is an atomic operation. However, even if some files have download failures, all other files will be downloaded.
- abstractmethod upload(sub_path, local_path, *, preserve_mtime=False)[source]
- Parameters:
sub_path (str)
local_path (str | Path)
preserve_mtime (bool)
- Return type:
Path
- upload_multi(sub_paths, local_paths, *, preserve_mtime=False, nthreads=8)[source]
Upload multiple files to a storage location.
- Parameters:
sub_paths (Sequence[str]) – The path of the destination files relative to the source prefix.
local_paths (Sequence[str | Path]) – The paths of the files to upload.
preserve_mtime (bool) – If True, the modification time of the local files will be copied to the remote files.
nthreads (int) – The maximum number of threads to use.
- Returns:
A list containing the local paths of the uploaded files. If a file failed to upload, the entry is the Exception that caused the failure. This list is in the same order and has the same length as local_paths.
- Return type:
list[Path | BaseException]
- abstractmethod iterdir_metadata(sub_path)[source]
Iterate over the contents of a directory.
- Parameters:
sub_path (str) – The path of the directory relative to the source prefix.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.mtime: The last modification time of the file as a float.size: The approximate size of the file in bytes.
If the metadata can not be retrieved, None is returned for the metadata.
- Return type:
Iterator[tuple[str, dict[str, Any] | None]]
- abstractmethod unlink(sub_path, *, missing_ok=False)[source]
Remove the given object.
- Parameters:
sub_path (str) – The path of the file relative to the source prefix to delete.
missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.
- Returns:
The sub_path.
- Raises:
FileNotFoundError – If the file doesn’t exist and missing_ok is False.
- Return type:
str
- unlink_multi(sub_paths, *, missing_ok=False, nthreads=8)[source]
Unlink multiple files in a storage location.
- Parameters:
sub_paths (Sequence[str]) – The path of the destination files relative to the source prefix.
missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.
nthreads (int) – The maximum number of threads to use.
- Returns:
A list containing the paths of the unlink files. If a file failed to unlink, the entry is the Exception that caused the failure. This list is in the same order and has the same length as sub_paths.
- Return type:
list[str | BaseException]
- class filecache.file_cache_source.FileCacheSourceFile(scheme, remote, *, anonymous=False)[source]
Bases:
FileCacheSourceClass that provides direct access to local files.
This class is unlikely to be directly useful to an external program, as it provides essentially no functionality on top of the standard Python filesystem functions.
- Parameters:
scheme (str)
remote (str)
anonymous (bool)
- __init__(scheme, remote, *, anonymous=False)[source]
Initialization for the FileCacheLocal class.
- Parameters:
scheme (str) – The scheme of the source. Must be
"file"or"".remote (str) – The remote server name. Must be
""since UNC shares are not supported.anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.
- classmethod uses_anonymous()[source]
Whether this class has the concept of anonymous accesses.
- Return type:
bool
- exists(sub_path)[source]
Check if a file exists without downloading it.
- Parameters:
sub_path (str | Path) – The absolute path of the local file.
- Returns:
True if the file exists. Note that it is possible that a file could exist and still not be accessible due to permissions.
- Return type:
bool
- modification_time(sub_path)[source]
Get the modification time of a file as a Unix timestamp.
- Parameters:
sub_path (str)
- Return type:
float | None
- is_dir(sub_path)[source]
Check if a file is a directory.
- Parameters:
sub_path (str)
- Return type:
bool
- retrieve(sub_path, local_path, *, preserve_mtime=False)[source]
Retrieve a file from the storage location.
- Parameters:
sub_path (str | Path) – The absolute path of the local file to retrieve.
local_path (str | Path) – The path to the desination where the file will be stored. Must be the same as sub_path.
preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file. Not used for local files.
- Returns:
The Path of the filename, which is the same as the sub_path parameter.
- Raises:
ValueError – If sub_path and local_path are not identical.
FileNotFoundError – If the file does not exist.
- Return type:
Path
Notes
This method essentially does nothing except check for the existence of the file.
- upload(sub_path, local_path, *, preserve_mtime=False)[source]
Upload a file from the local filesystem to the storage location.
- Parameters:
sub_path (str | Path) – The absolute path of the destination.
local_path (str | Path) – The absolute path of the local file to upload. Must be the same as sub_path.
preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file. Not used for local files.
- Returns:
The Path of the filename, which is the same as the local_path parameter.
- Raises:
ValueError – If sub_path and local_path are not identical.
FileNotFoundError – If the file does not exist.
- Return type:
Path
- iterdir_metadata(sub_path)[source]
Iterate over the contents of a directory.
- Parameters:
sub_path (str) – The absolute path of the directory.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.mtime: The last modification time of the file as a float.size: The approximate size of the file in bytes.
If the metadata can not be retrieved, None is returned for the metadata.
- Return type:
Iterator[tuple[str, dict[str, Any] | None]]
- unlink(sub_path, *, missing_ok=False)[source]
Remove the given object.
- Parameters:
sub_path (str) – The path of the file.
missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.
- Returns:
The sub_path.
- Raises:
FileNotFoundError – If the file doesn’t exist and missing_ok is False.
- Return type:
str
- class filecache.file_cache_source.FileCacheSourceHTTP(scheme, remote, *, anonymous=False)[source]
Bases:
FileCacheSourceClass that provides access to files stored on a webserver.
- Parameters:
scheme (str)
remote (str)
anonymous (bool)
- __init__(scheme, remote, *, anonymous=False)[source]
Initialization for the FileCacheHTTP class.
- Parameters:
scheme (str) – The scheme of the source. Must be
"http"or"https".remote (str) – The remote server name.
anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.
- classmethod uses_anonymous()[source]
Whether this class has the concept of anonymous accesses.
- Return type:
bool
- exists(sub_path)[source]
Check if a file exists without downloading it.
- Parameters:
sub_path (str) – The path of the file on the webserver relative to the source prefix.
- Returns:
True if the file (including the webserver) exists. Note that it is possible that a file could exist and still not be downloadable due to permissions.
- Return type:
bool
- modification_time(sub_path)[source]
Get the modification time of a file as a Unix timestamp.
- Parameters:
sub_path (str) – The path of the file on the webserver relative to the source prefix.
- Returns:
The modification time as a float (Unix timestamp) if the file exists and the time can be retrieved from HTTP headers. If the file exists but no modification time is available, None is returned. If the file does not exist or other errors occur, an Exception is raised.
- Return type:
float | None
Notes
This method uses HTTP HEAD request to get file metadata without downloading the content. It checks for Last-Modified header and converts it to a Unix timestamp.
- is_dir(sub_path)[source]
Check if a file is a directory.
- Parameters:
sub_path (str) – The path of the directory on the webserver relative to the source prefix.
- Returns:
True if the path represents a directory, False otherwise.
- Return type:
bool
Notes
This method assumes the provided URL is either a file or a directory that will show up as a “fancy index” page. If you provide a URL that is a directory but does not support fancy indexing, it will incorrectly return
False. It is also possible to fool this method by giving it a file URL that appears to be a fancy index.
- retrieve(sub_path, local_path, *, preserve_mtime=False)[source]
Retrieve a file from a webserver.
- Parameters:
sub_path (str) – The path of the file to retrieve relative to the source prefix.
local_path (str | Path) – The path to the destination where the downloaded file will be stored.
preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file.
- Returns:
The Path where the file was stored (same as local_path).
- Raises:
FileNotFoundError – If the remote file does not exist or the download fails for another reason.
- Return type:
Path
Notes
All parent directories in local_path are created even if the file download fails.
The download is an atomic operation.
- upload(sub_path, local_path, *, preserve_mtime=False)[source]
Upload a local file to a webserver. Not implemented.
- Parameters:
sub_path (str)
local_path (str | Path)
preserve_mtime (bool)
- Return type:
Path
- iterdir_metadata(sub_path)[source]
Iterate over the contents of a directory.
- Parameters:
sub_path (str) – The path of the directory on the webserver relative to the source prefix.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.mtime: The last modification time of the file as a float.size: The approximate size of the file in bytes.
- Raises:
FileNotFoundError – If the URL does not point to a valid file or directory page.
ConnectionError – If the URL could not be accessed.
- Return type:
Iterator[tuple[str, dict[str, Any]]]
Notes
This method relies on the webserver returning a “fancy index” page in some recognizable format. If server-side indexing is disabled, this method will not work.
We have no way of knowing what the timezone of the webserver is, and the fancy index page does not include the timezone. As a result we assume all times are in UTC.
- unlink(sub_path, *, missing_ok=False)[source]
Remove the given object.
- Parameters:
sub_path (str) – The path of the file on the webserver relative to the source prefix to delete.
missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.
- Returns:
The sub_path.
- Raises:
FileNotFoundError – If the file doesn’t exist and missing_ok is False.
- Return type:
str
- class filecache.file_cache_source.FileCacheSourceGS(scheme, remote, *, anonymous=False)[source]
Bases:
FileCacheSourceClass that provides access to files stored in Google Storage.
- Parameters:
scheme (str)
remote (str)
anonymous (bool)
- __init__(scheme, remote, *, anonymous=False)[source]
Initialization for the FileCacheGS class.
- Parameters:
scheme (str) – The scheme of the source. Must be
"gs".remote (str) – The bucket name.
anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.
- classmethod uses_anonymous()[source]
Whether this class has the concept of anonymous accesses.
- Return type:
bool
- exists(sub_path)[source]
Check if a file exists without downloading it.
- Parameters:
sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix.
- Returns:
True if the file (including the bucket) exists. Note that it is possible that a file could exist and still not be downloadable due to permissions. False will also be returned if the bucket itself does not exist or is not accessible.
- Return type:
bool
- modification_time(sub_path)[source]
Get the modification time of a file as a Unix timestamp.
- Parameters:
sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix.
- Returns:
The modification time as a float (Unix timestamp) if the file exists. If the file does not exist or other errors occur, an Exception is raised.
- Return type:
float | None
- is_dir(sub_path)[source]
Check if a file is a directory.
- Parameters:
sub_path (str) – The path of the directory in the Google Storage bucket given by the source prefix.
- Returns:
True if the path represents a directory (i.e., there are objects with this prefix), False otherwise.
- Return type:
bool
Notes
In Google Cloud Storage, directories are conceptual - they exist as prefixes in object names. This method checks if there are any objects with the given prefix to determine if it represents a directory.
- retrieve(sub_path, local_path, *, preserve_mtime=False)[source]
Retrieve a file from a Google Storage bucket.
- Parameters:
sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix.
local_path (str | Path) – The path to the destination where the downloaded file will be stored.
preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file.
- Returns:
The Path where the file was stored (same as local_path).
- Raises:
FileNotFoundError – If the remote file does not exist or the download fails for another reason.
- Return type:
Path
Notes
All parent directories in local_path are created even if the file download fails.
The download is an atomic operation.
- upload(sub_path, local_path, *, preserve_mtime=False)[source]
Upload a local file to a Google Storage bucket.
- Parameters:
sub_path (str) – The path of the destination file in the Google Storage bucket given by the source prefix.
local_path (str | Path) – The absolute path of the local file to upload.
preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file.
- Returns:
The Path of the filename, which is the same as the local_path parameter.
- Raises:
FileNotFoundError – If the local file does not exist.
- Return type:
Path
- iterdir_metadata(sub_path)[source]
Iterate over the contents of a directory.
- Parameters:
sub_path (str) – The path of the directory in the Google Storage bucket given by the source prefix.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.mtime: The last modification time of the file as a float.size: The approximate size of the file in bytes.
If the metadata can not be retrieved, None is returned for the metadata.
- Return type:
Iterator[tuple[str, dict[str, Any] | None]]
- unlink(sub_path, *, missing_ok=False)[source]
Remove the given object.
- Parameters:
sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix to delete.
missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.
- Returns:
The sub_path.
- Raises:
FileNotFoundError – If the file doesn’t exist and missing_ok is False.
- Return type:
str
- class filecache.file_cache_source.FileCacheSourceS3(scheme, remote, *, anonymous=False)[source]
Bases:
FileCacheSourceClass that provides access to files stored in AWS S3.
- Parameters:
scheme (str)
remote (str)
anonymous (bool)
- __init__(scheme, remote, *, anonymous=False)[source]
Initialization for the FileCacheS3 class.
- Parameters:
scheme (str) – The scheme of the source. Must be
"s3".remote (str) – The bucket name.
anonymous (bool) – If True, access cloud resources without specifying credentials. If False, credentials must be initialized in the program’s environment. Not used for this class.
- classmethod uses_anonymous()[source]
Whether this class has the concept of anonymous accesses.
- Return type:
bool
- exists(sub_path)[source]
Check if a file exists without downloading it.
- Parameters:
sub_path (str) – The path of the file in the AWS S3 bucket given by the source prefix.
- Returns:
True if the file (including the bucket) exists. Note that it is possible that a file could exist and still not be downloadable due to permissions. False will also be returned if the bucket itself does not exist or is not accessible.
- Return type:
bool
- modification_time(sub_path)[source]
Get the modification time of a file as a Unix timestamp.
- Parameters:
sub_path (str) – The path of the file in the AWS S3 bucket given by the source prefix.
- Returns:
The modification time as a float (Unix timestamp) if the file exists and the time can be retrieved. If the file does not exist or other errors occur, an Exception is raised.
- Return type:
float | None
- is_dir(sub_path)[source]
Check if a file is a directory.
- Parameters:
sub_path (str) – The path of the directory in the AWS S3 bucket given by the source prefix.
- Returns:
True if the path represents a directory (i.e., there are objects with this prefix), False otherwise.
- Return type:
bool
Notes
In AWS S3, directories are conceptual - they exist as prefixes in object names. This method checks if there are any objects with the given prefix to determine if it represents a directory.
- retrieve(sub_path, local_path, *, preserve_mtime=False)[source]
Retrieve a file from an AWS S3 bucket.
- Parameters:
sub_path (str) – The path of the file in the AWS S3 bucket given by the source prefix.
local_path (str | Path) – The path to the destination where the downloaded file will be stored.
preserve_mtime (bool) – If True, the modification time of the remote file will be copied to the local file.
- Returns:
The Path where the file was stored (same as local_path).
- Raises:
FileNotFoundError – If the remote file does not exist or the download fails for another reason.
- Return type:
Path
Notes
All parent directories in local_path are created even if the file download fails.
The download is an atomic operation.
- upload(sub_path, local_path, *, preserve_mtime=False)[source]
Upload a local file to an AWS S3 bucket.
- Parameters:
sub_path (str) – The path of the destination file in the AWS S3 bucket given by the source prefix.
local_path (str | Path) – The full path of the local file to upload.
preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file. Not used for local files.
- Returns:
The Path of the filename, which is the same as the local_path parameter.
- Raises:
FileNotFoundError – If the local file does not exist.
- Return type:
Path
- iterdir_metadata(sub_path)[source]
Iterate over the contents of a directory.
- Parameters:
sub_path (str) – The path of the directory in the AWS S3 bucket given by the source prefix.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.mtime: The last modification time of the file as a float.size: The approximate size of the file in bytes.
If the metadata can not be retrieved, None is returned for the metadata.
- Return type:
Iterator[tuple[str, dict[str, Any] | None]]
- unlink(sub_path, *, missing_ok=False)[source]
Remove the given object.
- Parameters:
sub_path (str) – The path of the file in the Google Storage bucket given by the source prefix to delete.
missing_ok (bool) – True if it is OK to unlink a file that doesn’t exist; False to raise a FileNotFoundError in this case.
- Returns:
The sub_path.
- Raises:
FileNotFoundError – If the file doesn’t exist and missing_ok is False.
- Return type:
str
- class filecache.file_cache_source.FileCacheSourceFake(scheme, remote, *, anonymous=False, storage_dir=None)[source]
Bases:
FileCacheSourceClass that simulates a remote file source using a local directory structure.
This class is useful for testing file operations without requiring actual remote connections. Files are stored in a local directory that simulates the remote storage, including the need for uploads and downloads. By default, the storage directory is
<TEMPDIR>/.filecache_fake_remoteand persists across program runs.- Parameters:
scheme (str)
remote (str)
anonymous (bool)
storage_dir (str | Path | None)
- classmethod get_default_storage_dir()[source]
Get the current default storage directory for fake remote files.
- Returns:
The current default storage directory Path.
- Return type:
Path
- classmethod set_default_storage_dir(directory)[source]
Set the default storage directory for fake remote files.
- Parameters:
directory (str | Path) – The directory to use as the default storage location. The directory is expanded and resolved to an absolute path.
- Return type:
None
- classmethod delete_default_storage_dir()[source]
Delete the current default storage directory and all its contents.
This is useful for cleanup after testing.
- Return type:
None
- __init__(scheme, remote, *, anonymous=False, storage_dir=None)[source]
Initialize the FileCacheSourceFake class.
- Parameters:
scheme (str) – The scheme of the source. Must be “fake”.
remote (str) – The simulated remote/bucket name.
anonymous (bool) – Not used for this class.
storage_dir (str | Path | None) – Base directory in which to store the fake remote files. If None, uses the class default storage directory.
- classmethod uses_anonymous()[source]
Whether this class has the concept of anonymous accesses.
- Return type:
bool
- exists(sub_path)[source]
Check if a file exists in the fake remote storage.
- Parameters:
sub_path (str) – The path of the file relative to the storage directory.
- Returns:
True if the file exists, False otherwise.
- Return type:
bool
- modification_time(sub_path)[source]
Get the modification time of a file as a Unix timestamp.
- Parameters:
sub_path (str)
- Return type:
float | None
- is_dir(sub_path)[source]
Check if a file is a directory.
- Parameters:
sub_path (str)
- Return type:
bool
- retrieve(sub_path, local_path, *, preserve_mtime=False)[source]
Retrieve a file from the fake remote storage.
- Parameters:
sub_path (str) – The path of the file relative to the storage directory.
local_path (str | Path) – The path where the file should be copied to.
preserve_mtime (bool) – If True, the modification time of the fake remote file will be copied to the local file.
- Returns:
The Path where the file was stored (same as local_path).
- Raises:
FileNotFoundError – If the remote file does not exist.
- Return type:
Path
- upload(sub_path, local_path, *, preserve_mtime=False)[source]
Upload a file to the fake remote storage.
- Parameters:
sub_path (str) – The destination path relative to the storage directory.
local_path (str | Path) – The path of the local file to upload.
preserve_mtime (bool) – If True, the modification time of the local file will be copied to the remote file.
- Returns:
The Path of the local file that was uploaded.
- Raises:
FileNotFoundError – If the local file does not exist.
- Return type:
Path
- iterdir_metadata(sub_path)[source]
Iterate over the contents of a directory in the fake remote storage.
- Parameters:
sub_path (str) – The path of the directory relative to the storage directory.
- Yields:
All files and sub-directories in the given directory (except
.and..), in no particular order. Each file or directory is represented by a tuple of the form (path, metadata), where path is the path of the file or directory relative to the source prefix, and metadata is a dictionary with the following keys:is_dir: True if the returned name is a directory, False if it is a file.mtime: The last modification time of the file as a float.size: The approximate size of the file in bytes.
If the metadata can not be retrieved, None is returned for the metadata.
- Return type:
Iterator[tuple[str, dict[str, Any] | None]]
- unlink(sub_path, *, missing_ok=False)[source]
Remove a file from the fake remote storage.
- Parameters:
sub_path (str) – The path of the file relative to the storage directory.
missing_ok (bool) – If True, don’t raise an error if the file doesn’t exist.
- Returns:
The sub_path that was removed.
- Raises:
FileNotFoundError – If the file doesn’t exist and missing_ok is False.
- Return type:
str