Examples
FileCache Examples
Here are some examples of how to use FileCache in practice.
Example 1: Ephemeral Cache - Automatically Deleted on Exit
This creates a unique, temporary cache that is automatically deleted when the context manager exits. Useful for one-time operations where you don’t want to leave files behind.
from filecache import FileCache
with FileCache(cache_name=None) as fc:
# Retrieve a file from Google Cloud Storage
local_path = fc.retrieve('gs://my-bucket/data/file.txt')
with open(local_path, 'r') as f:
content = f.read()
print(f'Read {len(content)} bytes')
# Cache is automatically deleted here
Example 4: Time-Sensitive Cache with Metadata Caching
Preserves file modification times and caches metadata for efficiency. Useful when you need to track when files were last modified.
from filecache import FileCache
fc = FileCache(time_sensitive=True, cache_metadata=True)
# First call retrieves modification time from server
mtime1 = fc.modification_time('gs://my-bucket/data.txt')
# Second call uses cached value
mtime2 = fc.modification_time('gs://my-bucket/data.txt')
assert mtime1 == mtime2 # From cache, no network call
# Retrieval downloads and sets modification time
path1 = fc.retrieve('gs://my-bucket/data.txt')
# Local file's modification time should match the server's
assert path1.stat().st_mtime == mtime1
Example 5: Parallel File Operations
Downloads multiple files simultaneously using multiple threads. This is significantly faster when retrieving many files from the same source. Download errors are returned as Exceptions in the return list.
from filecache import FileCache
fc = FileCache(nthreads=4) # Use 4 threads for parallel operations
urls = [
'gs://my-bucket/file1.txt',
'gs://my-bucket/file2.txt',
'gs://my-bucket/file3.txt',
'gs://my-bucket/file4.txt'
]
# All files downloaded in parallel
paths = fc.retrieve(urls, exception_on_fail=False)
for path in paths:
if isinstance(path, Exception):
print(f'Download failed: {path}')
continue
print(f'Downloaded: {path}')
Example 6: Upload Operations
Writes files to remote storage. Files are written locally first, then uploaded.
from filecache import FileCache
with FileCache(cache_name=None) as fc:
# Write a file to cloud storage
with fc.open('gs://my-bucket/output.txt', 'w') as f:
f.write('Hello, World!')
# File is automatically uploaded when the context manager for the file handle exits
# Ephemeral cache is automatically deleted here
# Verify it was uploaded by reading it back using a difference cache
with FileCache(cache_name=None) as fc:
with fc.open('gs://my-bucket/output.txt', 'r') as f:
content = f.read()
print(f'File contents: {content}')
# Ephemeral cache is automatically deleted here
Example 7: URL to URL Translation
Translates URLs from one source to another, allowing code to work with different data layouts without modification.
For example, assume the data is laid out on a webserver as:
/data/file11.txt
/data/file12.txt
/data/file21.txt
/data/file22.txt
and the data is available in a cloud storage bucket as:
gs://my-bucket/data/dir1/file11.txt
gs://my-bucket/data/dir1/file12.txt
gs://my-bucket/data/dir2/file21.txt
gs://my-bucket/data/dir2/file22.txt
The code was written using the layout of the webserver. Later, the Google Cloud Storage bucket was made available. Rather than rewriting the code (and thus making it incompatible with the original webserver layout), a mapping function can be used to translate the URLs:
def url_to_url(scheme, remote, path):
if scheme == "https" and remote == "data.com" and path.startswith("data/"):
dir_num_match = re.match(r"data/file(\d+)\.txt", path)
if dir_num_match:
dir_num = dir_num_match.group(1)
return f"gs://my-bucket/data/dir{dir_num[0]}/file{dir_num}.txt"
return None
This code will now work both with the original webserver layout and the new Google Cloud Storage layout:
fc = FileCache()
# This will download from the webserver
fc.retrieve("https://data.com/data/file11.txt")
fc = FileCache(url_to_url=url_to_url)
# This will download from the Google Cloud Storage bucket
fc.retrieve("https://data.com/data/file11.txt")
Example 8: URL to Path Translation
Translates URLs into local paths, allowing code to store files in the local cache in a different hierarchy than the remote source. Assume the data is available in a cloud storage bucket as:
gs://my-bucket/data/dir1/file11.txt
gs://my-bucket/data/dir1/file12.txt
gs://my-bucket/data/dir2/file21.txt
gs://my-bucket/data/dir2/file22.txt
To store the above Google Cloud Storage data in a flat hierarchy like this:
/data/file11.txt
/data/file12.txt
/data/file21.txt
/data/file22.txt
def url_to_path(scheme, remote, path, cache_dir, cache_subdir):
if scheme == "gs" and remote == "my-bucket" and path.startswith("data/dir"):
path_split = path.split("/", 2)
if len(path_split) > 2:
new_path = path_split[2]
ret = f"{cache_dir}/{cache_subdir}/data/{new_path}"
print(ret)
return ret
return None
fc = FileCache()
# This file will be stored in <cache_root>/_filecache_global/gs_my-bucket/data/dir1/file11.txt
fc.retrieve("gs://my-bucket/data/dir1/file11.txt")
fc = FileCache(url_to_path=url_to_path)
# This file will be stored in <cache_root>/_filecache_global/gs_my-bucket/data/file11.txt
fc.retrieve("gs://my-bucket/data/dir1/file11.txt")
Example 9: Local File Access
FileCache can also work with local files, providing a unified interface
regardless of file location.
from filecache import FileCache
from pathlib import Path
fc = FileCache()
# Access a local file (no download needed, accessed in-place)
local_file = Path.home() / 'myfile.txt'
local_file.write_text('Local content')
# FileCache handles local files transparently
path = fc.retrieve(str(local_file))
with open(path, 'r') as f:
content = f.read()
print(f'Read local file: {content}')
Example 10: Manual Cache Management (Non-Context Manager)
When using a shared cache, you must manually manage cache deletion. Useful when you need more control over cache lifetime.
from filecache import FileCache
# Create a permanent cache (would not auto-delete on program exit)
fc = FileCache(cache_name='mycache')
# Use the cache
path1 = fc.retrieve('gs://my-bucket/file1.txt')
path2 = fc.retrieve('gs://my-bucket/file2.txt')
# Manually delete cache when done
# Be careful that another process is not using the cache at the same time!
fc.delete_cache()
print('Cache manually deleted')
FCPath Examples
Here are examples showcasing the unique features of FCPath, which provides a
Path-like interface for working with remote files.
Example 11: Using FCPath for Simpler Syntax
Here are examples of creating FCPath instances and using them to access files.
from filecache import FileCache, FCPath
with FileCache(cache_name=None) as fc:
# Create an FCPath that encapsulates the cache settings
base_path = fc.new_path('gs://my-bucket/data')
# Use Path-like operations
file1 = base_path / 'subdir' / 'file1.txt'
file2 = base_path / 'subdir' / 'file2.txt'
# Read files using simple Path methods
content1 = file1.read_text()
content2 = file2.read_text()
print(f'Read {len(content1)} and {len(content2)} bytes')
The same code but using the FCPath constructor directly:
from filecache import FileCache, FCPath
fc = FileCache(cache_name=None)
base_path = FCPath('gs://my-bucket/data', filecache=fc)
file1 = base_path / 'subdir' / 'file1.txt'
file2 = base_path / 'subdir' / 'file2.txt'
content1 = file1.read_text()
content2 = file2.read_text()
print(f'Read {len(content1)} and {len(content2)} bytes')
Example 12: Path Joining with the / Operator
FCPath supports the / operator for joining paths, just like pathlib.Path.
The new FCPath inherits all settings (FileCache, time_sensitive, etc.) from the
left-hand side.
from filecache import FileCache, FCPath
with FileCache(cache_name=None, time_sensitive=True) as fc:
# Create a base FCPath with specific settings
base = fc.new_path('gs://my-bucket/data')
# Join paths using the / operator
file1 = base / 'subdir' / 'file1.txt'
file2 = base / 'subdir' / 'file2.txt'
# All resulting FCPath objects inherit the FileCache and time_sensitive setting
content1 = file1.read_text() # Uses same cache and time_sensitive=True
content2 = file2.read_text() # Uses same cache and time_sensitive=True
Example 13: Creating FCPath from Different Types
FCPath can be created from strings, Path objects, or other FCPath objects.
When created from an existing FCPath, it inherits all settings.
from filecache import FileCache, FCPath
from pathlib import Path
with FileCache(cache_name='myproject', time_sensitive=True) as fc:
# Create from string
path1 = FCPath('gs://my-bucket/data/file.txt', filecache=fc)
# Create from Path object
local_path = Path('/local/path/file.txt')
path2 = FCPath(local_path, filecache=fc)
# Create from another FCPath (inherits all settings)
path3 = FCPath(path1) # Inherits filecache, time_sensitive, etc. from path1
path4 = path1 / 'subdir' / 'file2.txt' # Also inherits via / operator
# All can be used the same way
content1 = path1.read_text()
content2 = path2.read_text()
content3 = path3.read_text()
Example 14: Inheriting FileCache Settings
When creating new FCPath objects from existing ones, all FileCache-related
settings are automatically inherited, making it easy to work with paths that share
the same configuration.
from filecache import FileCache, FCPath
# Create a FileCache with specific settings
fc = FileCache(cache_name='project', time_sensitive=True, nthreads=4)
# Create base FCPath with a different lock timeout
base = fc.new_path('gs://my-bucket/data', lock_timeout=30)
# All child paths inherit: filecache, lock_timeout=30
file1 = base / 'dir1' / 'file1.txt'
file2 = base / 'dir2' / 'file2.txt'
file3 = FCPath(base) / 'dir3' / 'file3.txt'
# All operations use the same settings
paths = [file1, file2, file3]
contents = [f.read_text() for f in paths] # All use time_sensitive=True, lock_timeout=30
Example 15: Using glob() for Pattern Matching
FCPath supports glob() for pattern matching on remote directories, a feature
not available directly in FileCache. This works on both local and remote paths.
from filecache import FileCache, FCPath
with FileCache(cache_name=None) as fc:
# Create base path to a directory
base_dir = fc.new_path('gs://my-bucket/data')
# Find all .txt files in the directory
txt_files = list(base_dir.glob('*.txt'))
for txt_file in txt_files:
print(f'Found: {txt_file.path}')
content = txt_file.read_text()
# Find files in subdirectories
all_txt = list(base_dir.glob('**/*.txt')) # Recursive
for file in all_txt:
print(f'Recursive match: {file.path}')
Example 16: Using rglob() for Recursive Pattern Matching
FCPath.rglob() is a convenience method that automatically adds **/ to the pattern,
making recursive searches easier.
from filecache import FileCache, FCPath
with FileCache() as fc:
base = fc.new_path('gs://my-bucket/project')
# Find all Python files recursively
# rglob('*.py') is equivalent to glob('**/*.py')
python_files = list(base.rglob('*.py'))
for py_file in python_files:
print(f'Python file: {py_file.path}')
# Process each file
content = py_file.read_text()
Example 17: Directory Traversal with iterdir()
FCPath provides FCPath.iterdir() for iterating over directory contents, returning
FCPath objects that inherit settings from the parent.
from filecache import FileCache, FCPath
with FileCache(cache_name=None) as fc:
base_dir = fc.new_path('gs://my-bucket/data')
# Iterate over directory contents
for item in base_dir.iterdir():
if item.is_dir():
print(f'Directory: {item.path}')
# Recursively process subdirectories
for subitem in item.iterdir():
print(f' Subitem: {subitem.path}')
else:
print(f'File: {item.path}')
# Read file content
content = item.read_text()
Example 18: Directory Walking with walk()
The walk() method provides a convenient way to traverse directory trees, similar to
os.walk() but returning FCPath objects.
from filecache import FileCache, FCPath
with FileCache() as fc:
root = fc.new_path('gs://my-bucket/project')
# Walk the directory tree
for dirpath, dirnames, filenames in root.walk():
print(f'Directory: {dirpath.path}')
print(f' Subdirectories: {dirnames}')
print(f' Files: {filenames}')
# Process files in this directory
for filename in filenames:
file_path = dirpath / filename
content = file_path.read_text()
Example 19: Combining Path Operations
FCPath supports chaining of path operations, making complex path manipulations
easy and readable.
from filecache import FileCache, FCPath
with FileCache(cache_name='analysis') as fc:
# Start with a base URL
base = fc.new_path('gs://data-bucket')
# Build complex paths using chaining
data_file = base / 'year' / '2024' / 'month' / '01' / 'data.csv'
# Read both files
data = data_file.read_text()
config = (base / 'config' / 'settings.json').read_text()
# Create output path in same structure
output_dir = base / 'output' / '2024' / '01'
# Write to output
(output_dir / 'results.txt').write_text('Analysis results')
Example 20: Working with Metadata via iterdir_metadata()
The iterdir_metadata() method provides directory contents along with metadata (is_dir, mtime, size), useful for filtering or sorting files.
from filecache import FileCache, FCPath
with FileCache() as fc:
base_dir = fc.new_path('gs://my-bucket/data')
# Get directory contents with metadata
files_by_size = []
for item, metadata in base_dir.iterdir_metadata():
if metadata and not metadata['is_dir']:
files_by_size.append((item, metadata['size']))
# Sort by size (largest first)
files_by_size.sort(key=lambda x: x[1] or 0, reverse=True)
# Process largest files first
for item, size in files_by_size:
print(f'Processing {item.path} ({size} bytes)')
content = item.read_text()
Example 21: Creating FCPath Without Explicit FileCache
When an FCPath is created without specifying a filecache, it uses the default
global FileCache when an operation is performed. This allows for simpler
syntax when the default cache is sufficient.
from filecache import FCPath
# Create FCPath without explicit FileCache
# Will use default global cache when operations are performed
base = FCPath('gs://my-bucket/data')
# Operations automatically use the default FileCache
file1 = base / 'file1.txt'
file2 = base / 'file2.txt'
# Read files (uses default global cache)
content1 = file1.read_text()
content2 = file2.read_text()
# All paths share the same default cache
assert file1.filecache == file2.filecache