Filesystem API
The HfFileSystem
class provides a pythonic file interface to the Hugging Face Hub based on fsspec
.
HfFileSystem
HfFileSystem
is based on fsspec, so it is compatible with most of the APIs that it offers. For more details, check out our guide and fsspec’s API Reference.
class huggingface_hub.HfFileSystem
< source >( *args **kwargs )
Parameters
- token (
str
orbool
, optional) — A valid user access token (string). Defaults to the locally saved token, which is the recommended method for authentication (see https://huggingface.co/docs/huggingface_hub/quick-start#authentication). To disable authentication, passFalse
.
Access a remote Hugging Face Hub repository as if were a local file system.
Usage:
>>> from huggingface_hub import HfFileSystem
>>> fs = HfFileSystem()
>>> # List files
>>> fs.glob("my-username/my-model/*.bin")
['my-username/my-model/pytorch_model.bin']
>>> fs.ls("datasets/my-username/my-dataset", detail=False)
['datasets/my-username/my-dataset/.gitattributes', 'datasets/my-username/my-dataset/README.md', 'datasets/my-username/my-dataset/data.json']
>>> # Read/write files
>>> with fs.open("my-username/my-model/pytorch_model.bin") as f:
... data = f.read()
>>> with fs.open("my-username/my-model/pytorch_model.bin", "wb") as f:
... f.write(data)
__init__
< source >( *args endpoint: Optional = None token: Union = None **storage_options )
Parameters
- use_listings_cache, listings_expiry_time, max_paths —
passed to
DirCache
, if the implementation supports directory listing caching. Pass use_listings_cache=False to disable such caching. skip_instance_cache — bool If this is a cachable implementation, pass True here to force creating a new instance even if a matching instance exists, and prevent storing this instance. asynchronous — bool loop — asyncio-compatible IOLoop or None
Docstring taken from fsspec documentation.
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seen a new instance is not required. The token attribute exists to allow implementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
Docstring taken from fsspec documentation.
Is there a file at the given path
find
< source >( path: str maxdepth: Optional = None withdirs: bool = False detail: bool = False refresh: bool = False revision: Optional = None **kwargs )
Docstring taken from fsspec documentation.
List all files below path.
Like posix find
command without conditions
get_file
< source >( rpath lpath callback = <fsspec.callbacks.NoOpCallback object at 0x7f918a4f3d60> outfile = None **kwargs )
Docstring taken from fsspec documentation.
Copy single remote file to local
Docstring taken from fsspec documentation.
Find files by glob-matching.
If the path ends with ’/’, only folders are returned.
We support "**"
,
"?"
and "[..]"
. We do not support ^ for pattern negation.
The maxdepth option is applied on the first **** found in the path.
kwargs are passed to ls
.
info
< source >( path: str refresh: bool = False revision: Optional = None **kwargs ) → dict with keys
Returns
dict with keys
name (full path in the FS), size (in bytes), type (file, directory, or something else) and other FS-specific keys.
Docstring taken from fsspec documentation.
Give details of entry at path
Returns a single dictionary, with exactly the same information as ls
would with detail=True
.
The default implementation should calls ls and could be overridden by a
shortcut. kwargs are passed on to “ls()
.
Some file systems might not be able to measure the file’s size, in which case, the returned dict will include `‘size’: None“.
Docstring taken from fsspec documentation.
Discard any cached directory information
ls
< source >( path: str detail: bool = True refresh: bool = False revision: Optional = None **kwargs )
Docstring taken from fsspec documentation.
List objects at path.
This should include subdirectories and files at that location. The difference between a file and a directory must be clear when details are requested.
The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include:
- full path to the entry (without protocol)
- size of the entry, in bytes. If the value cannot be determined, will
be
None
. - type of entry, “file”, “directory” or other
Additional information may be present, appropriate to the file-system, e.g., generation, checksum, etc.
May use refresh=True|False to allow use of self._ls_from_cache to check for a saved listing and avoid calling the backend. This would be common where listing may be expensive.
Docstring taken from fsspec documentation.
Return the modified timestamp of a file as a datetime.datetime
rm
< source >( path: str recursive: bool = False maxdepth: Optional = None revision: Optional = None **kwargs )
Docstring taken from fsspec documentation.
Delete files.
Docstring taken from fsspec documentation.
Begin write transaction for deferring files, non-context version
Get the HTTP URL of the given path
Docstring taken from fsspec documentation.
Return all files belows path
List all files, recursing into subdirectories; output is iterator-style,
like os.walk()
. For a simple list of files, find()
is available.
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect. (see os.walk)
Note that the “files” outputted will include anything that is not a directory, such as links.