Examples ======== FileCache Examples ****************** Here are some examples of how to use :class:`FileCache` in practice. Example 1: Ephemeral Cache - Automatically Deleted on Exit ---------------------------------------------------------- This creates a unique, temporary cache that is automatically deleted when the context manager exits. Useful for one-time operations where you don't want to leave files behind. .. code-block:: python from filecache import FileCache with FileCache(cache_name=None) as fc: # Retrieve a file from Google Cloud Storage local_path = fc.retrieve('gs://my-bucket/data/file.txt') with open(local_path, 'r') as f: content = f.read() print(f'Read {len(content)} bytes') # Cache is automatically deleted here Example 2: Shared Persistent Cache (Default 'global') ----------------------------------------------------- This uses the default shared cache named 'global' that persists after program exit. Multiple processes can share the same downloaded files. .. code-block:: python from filecache import FileCache fc = FileCache() # Uses default cache_name='global' # First process downloads the file path1 = fc.retrieve('https://example.com/data.txt') # Second process (or same process later) reuses the cached file path2 = fc.retrieve('https://example.com/data.txt') assert path1 == path2 # Same cached file location # Cache persists after program exits Example 3: Named Shared Cache with Custom Location -------------------------------------------------- Creates a named cache in a specific directory. Useful for organizing different types of cached data separately. .. code-block:: python from filecache import FileCache from pathlib import Path # Create a cache named 'myproject' in a custom location fc = FileCache(cache_name='myproject', cache_root=Path.home() / 'my_caches') local_path = fc.retrieve('s3://my-bucket/project_data/file.dat') print(f'Cached at: {local_path}') # Cache persists and can be shared by other processes using the same name Example 4: Time-Sensitive Cache with Metadata Caching ----------------------------------------------------- Preserves file modification times and caches metadata for efficiency. Useful when you need to track when files were last modified. .. code-block:: python from filecache import FileCache fc = FileCache(time_sensitive=True, cache_metadata=True) # First call retrieves modification time from server mtime1 = fc.modification_time('gs://my-bucket/data.txt') # Second call uses cached value mtime2 = fc.modification_time('gs://my-bucket/data.txt') assert mtime1 == mtime2 # From cache, no network call # Retrieval downloads and sets modification time path1 = fc.retrieve('gs://my-bucket/data.txt') # Local file's modification time should match the server's assert path1.stat().st_mtime == mtime1 Example 5: Parallel File Operations ----------------------------------- Downloads multiple files simultaneously using multiple threads. This is significantly faster when retrieving many files from the same source. Download errors are returned as Exceptions in the return list. .. code-block:: python from filecache import FileCache fc = FileCache(nthreads=4) # Use 4 threads for parallel operations urls = [ 'gs://my-bucket/file1.txt', 'gs://my-bucket/file2.txt', 'gs://my-bucket/file3.txt', 'gs://my-bucket/file4.txt' ] # All files downloaded in parallel paths = fc.retrieve(urls, exception_on_fail=False) for path in paths: if isinstance(path, Exception): print(f'Download failed: {path}') continue print(f'Downloaded: {path}') Example 6: Upload Operations ----------------------------- Writes files to remote storage. Files are written locally first, then uploaded. .. code-block:: python from filecache import FileCache with FileCache(cache_name=None) as fc: # Write a file to cloud storage with fc.open('gs://my-bucket/output.txt', 'w') as f: f.write('Hello, World!') # File is automatically uploaded when the context manager for the file handle exits # Ephemeral cache is automatically deleted here # Verify it was uploaded by reading it back using a difference cache with FileCache(cache_name=None) as fc: with fc.open('gs://my-bucket/output.txt', 'r') as f: content = f.read() print(f'File contents: {content}') # Ephemeral cache is automatically deleted here .. _example_url_to_url: Example 7: URL to URL Translation --------------------------------- Translates URLs from one source to another, allowing code to work with different data layouts without modification. For example, assume the data is laid out on a webserver as: .. code-block:: text /data/file11.txt /data/file12.txt /data/file21.txt /data/file22.txt and the data is available in a cloud storage bucket as: .. code-block:: text gs://my-bucket/data/dir1/file11.txt gs://my-bucket/data/dir1/file12.txt gs://my-bucket/data/dir2/file21.txt gs://my-bucket/data/dir2/file22.txt The code was written using the layout of the webserver. Later, the Google Cloud Storage bucket was made available. Rather than rewriting the code (and thus making it incompatible with the original webserver layout), a mapping function can be used to translate the URLs: .. code-block:: python def url_to_url(scheme, remote, path): if scheme == "https" and remote == "data.com" and path.startswith("data/"): dir_num_match = re.match(r"data/file(\d+)\.txt", path) if dir_num_match: dir_num = dir_num_match.group(1) return f"gs://my-bucket/data/dir{dir_num[0]}/file{dir_num}.txt" return None This code will now work both with the original webserver layout and the new Google Cloud Storage layout: .. code-block:: python fc = FileCache() # This will download from the webserver fc.retrieve("https://data.com/data/file11.txt") fc = FileCache(url_to_url=url_to_url) # This will download from the Google Cloud Storage bucket fc.retrieve("https://data.com/data/file11.txt") .. _example_url_to_path: Example 8: URL to Path Translation ---------------------------------- Translates URLs into local paths, allowing code to store files in the local cache in a different hierarchy than the remote source. Assume the data is available in a cloud storage bucket as: .. code-block:: text gs://my-bucket/data/dir1/file11.txt gs://my-bucket/data/dir1/file12.txt gs://my-bucket/data/dir2/file21.txt gs://my-bucket/data/dir2/file22.txt To store the above Google Cloud Storage data in a flat hierarchy like this: .. code-block:: text /data/file11.txt /data/file12.txt /data/file21.txt /data/file22.txt .. code-block:: python def url_to_path(scheme, remote, path, cache_dir, cache_subdir): if scheme == "gs" and remote == "my-bucket" and path.startswith("data/dir"): path_split = path.split("/", 2) if len(path_split) > 2: new_path = path_split[2] ret = f"{cache_dir}/{cache_subdir}/data/{new_path}" print(ret) return ret return None fc = FileCache() # This file will be stored in /_filecache_global/gs_my-bucket/data/dir1/file11.txt fc.retrieve("gs://my-bucket/data/dir1/file11.txt") fc = FileCache(url_to_path=url_to_path) # This file will be stored in /_filecache_global/gs_my-bucket/data/file11.txt fc.retrieve("gs://my-bucket/data/dir1/file11.txt") Example 9: Local File Access ------------------------------ :class:`FileCache` can also work with local files, providing a unified interface regardless of file location. .. code-block:: python from filecache import FileCache from pathlib import Path fc = FileCache() # Access a local file (no download needed, accessed in-place) local_file = Path.home() / 'myfile.txt' local_file.write_text('Local content') # FileCache handles local files transparently path = fc.retrieve(str(local_file)) with open(path, 'r') as f: content = f.read() print(f'Read local file: {content}') Example 10: Manual Cache Management (Non-Context Manager) --------------------------------------------------------- When using a shared cache, you must manually manage cache deletion. Useful when you need more control over cache lifetime. .. code-block:: python from filecache import FileCache # Create a permanent cache (would not auto-delete on program exit) fc = FileCache(cache_name='mycache') # Use the cache path1 = fc.retrieve('gs://my-bucket/file1.txt') path2 = fc.retrieve('gs://my-bucket/file2.txt') # Manually delete cache when done # Be careful that another process is not using the cache at the same time! fc.delete_cache() print('Cache manually deleted') FCPath Examples *************** Here are examples showcasing the unique features of :class:`FCPath`, which provides a Path-like interface for working with remote files. Example 11: Using FCPath for Simpler Syntax ------------------------------------------- Here are examples of creating :class:`FCPath` instances and using them to access files. .. code-block:: python from filecache import FileCache, FCPath with FileCache(cache_name=None) as fc: # Create an FCPath that encapsulates the cache settings base_path = fc.new_path('gs://my-bucket/data') # Use Path-like operations file1 = base_path / 'subdir' / 'file1.txt' file2 = base_path / 'subdir' / 'file2.txt' # Read files using simple Path methods content1 = file1.read_text() content2 = file2.read_text() print(f'Read {len(content1)} and {len(content2)} bytes') The same code but using the :class:`FCPath` constructor directly: .. code-block:: python from filecache import FileCache, FCPath fc = FileCache(cache_name=None) base_path = FCPath('gs://my-bucket/data', filecache=fc) file1 = base_path / 'subdir' / 'file1.txt' file2 = base_path / 'subdir' / 'file2.txt' content1 = file1.read_text() content2 = file2.read_text() print(f'Read {len(content1)} and {len(content2)} bytes') Example 12: Path Joining with the / Operator -------------------------------------------- :class:`FCPath` supports the ``/`` operator for joining paths, just like `pathlib.Path`. The new :class:`FCPath` inherits all settings (FileCache, time_sensitive, etc.) from the left-hand side. .. code-block:: python from filecache import FileCache, FCPath with FileCache(cache_name=None, time_sensitive=True) as fc: # Create a base FCPath with specific settings base = fc.new_path('gs://my-bucket/data') # Join paths using the / operator file1 = base / 'subdir' / 'file1.txt' file2 = base / 'subdir' / 'file2.txt' # All resulting FCPath objects inherit the FileCache and time_sensitive setting content1 = file1.read_text() # Uses same cache and time_sensitive=True content2 = file2.read_text() # Uses same cache and time_sensitive=True Example 13: Creating FCPath from Different Types ------------------------------------------------- :class:`FCPath` can be created from strings, `Path` objects, or other `FCPath` objects. When created from an existing :class:`FCPath`, it inherits all settings. .. code-block:: python from filecache import FileCache, FCPath from pathlib import Path with FileCache(cache_name='myproject', time_sensitive=True) as fc: # Create from string path1 = FCPath('gs://my-bucket/data/file.txt', filecache=fc) # Create from Path object local_path = Path('/local/path/file.txt') path2 = FCPath(local_path, filecache=fc) # Create from another FCPath (inherits all settings) path3 = FCPath(path1) # Inherits filecache, time_sensitive, etc. from path1 path4 = path1 / 'subdir' / 'file2.txt' # Also inherits via / operator # All can be used the same way content1 = path1.read_text() content2 = path2.read_text() content3 = path3.read_text() Example 14: Inheriting FileCache Settings ------------------------------------------ When creating new :class:`FCPath` objects from existing ones, all FileCache-related settings are automatically inherited, making it easy to work with paths that share the same configuration. .. code-block:: python from filecache import FileCache, FCPath # Create a FileCache with specific settings fc = FileCache(cache_name='project', time_sensitive=True, nthreads=4) # Create base FCPath with a different lock timeout base = fc.new_path('gs://my-bucket/data', lock_timeout=30) # All child paths inherit: filecache, lock_timeout=30 file1 = base / 'dir1' / 'file1.txt' file2 = base / 'dir2' / 'file2.txt' file3 = FCPath(base) / 'dir3' / 'file3.txt' # All operations use the same settings paths = [file1, file2, file3] contents = [f.read_text() for f in paths] # All use time_sensitive=True, lock_timeout=30 Example 15: Using glob() for Pattern Matching --------------------------------------------- :class:`FCPath` supports `glob()` for pattern matching on remote directories, a feature not available directly in :class:`FileCache`. This works on both local and remote paths. .. code-block:: python from filecache import FileCache, FCPath with FileCache(cache_name=None) as fc: # Create base path to a directory base_dir = fc.new_path('gs://my-bucket/data') # Find all .txt files in the directory txt_files = list(base_dir.glob('*.txt')) for txt_file in txt_files: print(f'Found: {txt_file.path}') content = txt_file.read_text() # Find files in subdirectories all_txt = list(base_dir.glob('**/*.txt')) # Recursive for file in all_txt: print(f'Recursive match: {file.path}') Example 16: Using rglob() for Recursive Pattern Matching -------------------------------------------------------- :meth:`FCPath.rglob()` is a convenience method that automatically adds ``**/`` to the pattern, making recursive searches easier. .. code-block:: python from filecache import FileCache, FCPath with FileCache() as fc: base = fc.new_path('gs://my-bucket/project') # Find all Python files recursively # rglob('*.py') is equivalent to glob('**/*.py') python_files = list(base.rglob('*.py')) for py_file in python_files: print(f'Python file: {py_file.path}') # Process each file content = py_file.read_text() Example 17: Directory Traversal with iterdir() ---------------------------------------------- :class:`FCPath` provides :meth:`FCPath.iterdir()` for iterating over directory contents, returning :class:`FCPath` objects that inherit settings from the parent. .. code-block:: python from filecache import FileCache, FCPath with FileCache(cache_name=None) as fc: base_dir = fc.new_path('gs://my-bucket/data') # Iterate over directory contents for item in base_dir.iterdir(): if item.is_dir(): print(f'Directory: {item.path}') # Recursively process subdirectories for subitem in item.iterdir(): print(f' Subitem: {subitem.path}') else: print(f'File: {item.path}') # Read file content content = item.read_text() Example 18: Directory Walking with walk() ----------------------------------------- The `walk()` method provides a convenient way to traverse directory trees, similar to `os.walk()` but returning :class:`FCPath` objects. .. code-block:: python from filecache import FileCache, FCPath with FileCache() as fc: root = fc.new_path('gs://my-bucket/project') # Walk the directory tree for dirpath, dirnames, filenames in root.walk(): print(f'Directory: {dirpath.path}') print(f' Subdirectories: {dirnames}') print(f' Files: {filenames}') # Process files in this directory for filename in filenames: file_path = dirpath / filename content = file_path.read_text() Example 19: Combining Path Operations ------------------------------------- :class:`FCPath` supports chaining of path operations, making complex path manipulations easy and readable. .. code-block:: python from filecache import FileCache, FCPath with FileCache(cache_name='analysis') as fc: # Start with a base URL base = fc.new_path('gs://data-bucket') # Build complex paths using chaining data_file = base / 'year' / '2024' / 'month' / '01' / 'data.csv' # Read both files data = data_file.read_text() config = (base / 'config' / 'settings.json').read_text() # Create output path in same structure output_dir = base / 'output' / '2024' / '01' # Write to output (output_dir / 'results.txt').write_text('Analysis results') Example 20: Working with Metadata via iterdir_metadata() -------------------------------------------------------- The `iterdir_metadata()` method provides directory contents along with metadata (is_dir, mtime, size), useful for filtering or sorting files. .. code-block:: python from filecache import FileCache, FCPath with FileCache() as fc: base_dir = fc.new_path('gs://my-bucket/data') # Get directory contents with metadata files_by_size = [] for item, metadata in base_dir.iterdir_metadata(): if metadata and not metadata['is_dir']: files_by_size.append((item, metadata['size'])) # Sort by size (largest first) files_by_size.sort(key=lambda x: x[1] or 0, reverse=True) # Process largest files first for item, size in files_by_size: print(f'Processing {item.path} ({size} bytes)') content = item.read_text() Example 21: Creating FCPath Without Explicit FileCache ------------------------------------------------------- When an :class:`FCPath` is created without specifying a `filecache`, it uses the default global :class:`FileCache` when an operation is performed. This allows for simpler syntax when the default cache is sufficient. .. code-block:: python from filecache import FCPath # Create FCPath without explicit FileCache # Will use default global cache when operations are performed base = FCPath('gs://my-bucket/data') # Operations automatically use the default FileCache file1 = base / 'file1.txt' file2 = base / 'file2.txt' # Read files (uses default global cache) content1 = file1.read_text() content2 = file2.read_text() # All paths share the same default cache assert file1.filecache == file2.filecache