API Reference

This is the API for the signac (core) application.

The Project

Attributes

Project.check()

Check the project's workspace for corruption.

Project.clone(job[, copytree])

Clone job into this project.

Project.config

Get project's configuration.

Project.create_linked_view([prefix, ...])

Create or update a persistent linked view of the selected data space.

Project.detect_schema([exclude_const, subset])

Detect the project's state point schema.

Project.data

Get data associated with this project.

Project.doc

Get document associated with this project.

Project.document

Get document associated with this project.

Project.export_to(target[, path, copytree])

Export all jobs to a target location, such as a directory or a (compressed) archive file.

Project.find_jobs([filter])

Find all jobs in the project's workspace.

Project.fn(filename)

Prepend a filename with the project path.

Project.groupby([key, default])

Group jobs according to one or more state point or document parameters.

Project.import_from([origin, schema, sync, ...])

Import the data space located at origin into this project.

Project.isfile(filename)

Check if a filename exists in the project path.

Project.min_len_unique_id()

Determine the minimum length required for a job id to be unique.

Project.open_job([statepoint, id])

Get a job handle associated with a state point.

Project.path

The path to the project directory.

Project.repair([job_ids])

Attempt to repair the workspace after it got corrupted.

Project.stores

Get HDF5 stores associated with this project.

Project.sync(other[, strategy, exclude, ...])

Synchronize this project with the other project.

Project.update_cache()

Update the persistent state point cache.

Project.workspace

The project's workspace directory.

class signac.Project(path=None)

Bases: object

The handle on a signac project.

A Project may only be constructed in a directory that is already a signac project, i.e. a directory in which init_project() has already been run. To search upwards in the folder hierarchy until a project is found, instead invoke get_project() or Project.get_project().

Parameters:

path (str, optional) – The project directory. By default, the current working directory (Default value = None).

FN_CACHE = '.signac/statepoint_cache.json.gz'

The default filename for the state point cache file.

FN_DOCUMENT = 'signac_project_document.json'

The project’s document filename.

KEY_DATA = 'signac_data'

The project’s datastore key.

check()

Check the project’s workspace for corruption.

Raises:

signac.errors.JobsCorruptedError – When one or more jobs are identified as corrupted.

clone(job, copytree=None)

Clone job into this project.

Create an identical copy of job within this project.

See signac clone for the command line equivalent.

Parameters:
  • job (Job) – The job to copy into this project.

  • copytree (callable, optional) – The function used for copying directory tree structures. Uses shutil.copytree() if None (Default value = None). The function requires that the target is a directory.

Returns:

The job instance corresponding to the copied job.

Return type:

Job

Raises:

DestinationExistsError – In case that a job with the same id is already initialized within this project.

property config

Get project’s configuration.

The configuration is immutable once the Project is constructed. To modify a project configuration, use the command line or edit the configuration file directly.

See signac config for related command line tools.

Returns:

Dictionary containing project’s configuration.

Return type:

_ProjectConfig

create_linked_view(prefix=None, job_ids=None, path=None)

Create or update a persistent linked view of the selected data space.

Similar to export_to(), this function expands the data space for the selected jobs, but instead of copying data will create symbolic links to the individual job directories. This is primarily useful for browsing through the data space using a file-browser with human-interpretable directory paths.

By default, the paths of the view will be based on variable state point keys as part of the implicit schema of the selected jobs that we create the view for. For example, creating a linked view for a data space with schema

>>> print(project.detect_schema())
{
 'foo': 'int([0, 1, 2, ..., 8, 9], 10)',
}

by calling project.create_linked_view('my_view') will look similar to:

my_view/foo/0/job -> workspace/b8fcc6b8f99c56509eb65568922e88b8
my_view/foo/1/job -> workspace/b6cd26b873ae3624653c9268deff4485
...

It is possible to control the paths using the path argument, which behaves in the exact same manner as the equivalent argument for export_to().

Note

The behavior of this function is almost equivalent to project.export_to('my_view', copytree=os.symlink) with the major difference that view hierarchies are actually updated, meaning that invalid links are automatically removed.

See signac view for the command line equivalent.

Parameters:
  • prefix (str, optional) – The path where the linked view will be created or updated (Default value = None).

  • job_ids (iterable, optional) – If None (the default), create the view for the complete data space, otherwise only for this iterable of job ids.

  • path (str or callable, optional) – The path (function) used to structure the linked data space (Default value = None).

Returns:

A dictionary that maps the source directory paths to the linked directory paths.

Return type:

dict

property data

Get data associated with this project.

This property should be used for large array-like data, which can’t be stored efficiently in the project document. For examples and usage, see Centralized Project Data.

Equivalent to:

return project.stores['signac_data']

See also

H5Store

Usage examples.

Returns:

An HDF5-backed datastore.

Return type:

H5Store

detect_schema(exclude_const=False, subset=None)

Detect the project’s state point schema.

See signac schema for the command line equivalent.

Parameters:
  • exclude_const (bool, optional) – Exclude all state point keys that are shared by all jobs within this project (Default value = False).

  • subset (sequence[Job or str], optional) – A sequence of jobs or job ids specifying a subset over which the state point schema should be detected (Default value = None).

Returns:

The detected project schema.

Return type:

ProjectSchema

property doc

Get document associated with this project.

Alias for document().

Returns:

The project document. Supports attribute-based access to dict keys.

Return type:

MutableMapping

property document

Get document associated with this project.

Returns:

The project document. Supports attribute-based access to dict keys.

Return type:

MutableMapping

export_to(target, path=None, copytree=None)

Export all jobs to a target location, such as a directory or a (compressed) archive file.

Use this function in combination with find_jobs() to export only a select number of jobs, for example:

project.find_jobs({'foo': 0}).export_to('foo_0.tar')

The path argument enables users to control how exactly the exported data space is to be expanded. By default, the path-function will be based on the implicit schema of the exported jobs. For example, exporting jobs that all differ by a state point key foo with project.export_to('data/'), the exported directory structure could look like this:

data/foo/0
data/foo/1
...

That would be equivalent to specifying path=lambda job: os.path.join('foo', job.sp.foo).

Instead of a function, we can also provide a string, where fields for state point keys are automatically formatted. For example, the following two path arguments are equivalent: “foo/{foo}” and “foo/{job.sp.foo}”.

Any attribute of job can be used as a field here, so job.doc.bar, job.id, and job.ws can also be used as path fields.

A special {{auto}} field allows us to expand the path automatically with state point keys that have not been specified explicitly. So, for example, one can provide path="foo/{foo}/{{auto}}" to specify that the path shall begin with foo/{foo}/, but is then automatically expanded with all other state point key-value pairs. How key-value pairs are concatenated can be controlled via the format-specifier, so for example, path="{{auto:_}}" will generate a structure such as

data/foo_0
data/foo_1
...

Finally, providing path=False is equivalent to path="{job.id}".

See also

import_from() :

Previously exported or non-signac data spaces can be imported.

signac export :

See signac export for the command line equivalent.

Parameters:
  • target (str) – A path to a directory to export to. The target can not already exist. Besides directories, possible targets are tar files (.tar), gzipped tar files (.tar.gz), zip files (.zip), bzip2-compressed files (.bz2), and xz-compressed files (.xz).

  • path (str or callable, optional) – The path (function) used to structure the exported data space. This argument must either be a callable which returns a path (str) as a function of job, a string where fields are replaced using the job-state point dictionary, or False, which means that we just use the job-id as path. Defaults to the equivalent of {{auto}}.

  • copytree (callable, optional) – The function used for copying directory tree structures. Uses shutil.copytree() if None (Default value = None). The function requires that the target is a directory.

Returns:

A dict that maps the source directory paths, to the target directory paths.

Return type:

dict

find_jobs(filter=None)

Find all jobs in the project’s workspace.

The filter argument must be a JSON-serializable Mapping of key-value pairs. The filter argument can search against both job state points and job documents. See https://docs.signac.io/en/latest/query.html#query-namespaces for a description of supported queries.

See signac find for the command line equivalent.

Tip

To find a single job given a state point, use open_job with O(1) cost.

Tip

To find many groups of jobs, use your own code to loop through the project once and build multiple matching lists.

Warning

find_jobs costs O(N) each time it is called. It applies the filter to every job in the workspace.

Parameters:

filter (Mapping, optional) – A mapping of key-value pairs used for the query (Default value = None).

Returns:

JobsCursor of jobs matching the provided filter.

Return type:

JobsCursor

Raises:
  • TypeError – If the filters are not JSON serializable.

  • ValueError – If the filters are invalid.

fn(filename)

Prepend a filename with the project path.

Parameters:

filename (str) – The name of the file.

Returns:

The absolute path of the file.

Return type:

str

classmethod get_job(path=None)

Find a Job in or above the current working directory (or provided path).

Parameters:

path (str, optional) – The starting point to search for a job. If None, the current working directory is used (Default value = None).

Returns:

The first job found in or above the provided path.

Return type:

Job

Raises:

LookupError – If a job cannot be found.

classmethod get_project(path=None, search=True, **kwargs)

Find a project configuration and return the associated project.

Parameters:
  • path (str, optional) – The starting point to search for a project. If None, the current working directory is used (Default value = None).

  • search (bool, optional) – If True, search for project configurations inside and above the specified path, otherwise only return a project in the specified path (Default value = True).

  • **kwargs – Optional keyword arguments that are forwarded to the Project class constructor.

Returns:

An instance of Project.

Return type:

Project

Raises:

LookupError – If no project configuration can be found.

groupby(key=None, default=None)

Group jobs according to one or more state point or document parameters.

Prepend the key with ‘sp.’ or ‘doc.’ to specify the query namespace. If no prefix is specified, group by state point key.

This method can be called on any JobsCursor such as the one returned by find_jobs() or by iterating over a project.

Examples

# Group jobs by state point parameter 'a'.
for key, group in project.groupby('a'):
    print(key, list(group))

# Group jobs by document value 'a'.
for key, group in project.groupby('doc.a'):
    print(key, list(group))

# Group jobs by jobs.sp['a'] and job.document['b']
for key, group in project.groupby(('a', 'doc.b')):
    print(key, list(group))

# Find jobs where job.sp['a'] is 1 and group them
# by job.sp['b'] and job.sp['c'].
for key, group in project.find_jobs({'a': 1}).groupby(('b', 'c')):
    print(key, list(group))

# Group by job.sp['d'] and job.document['count'] using a lambda.
for key, group in project.groupby(
    lambda job: (job.sp['d'], job.document['count'])
):
    print(key, list(group))

If key is None, jobs are grouped by id, placing one job into each group.

If default is None, only jobs with the key defined will be grouped. Jobs without the key will be filtered out and not included in any group.

Parameters:
  • key (str, iterable, or callable, optional) – The grouping key(s) passed as a string, iterable of strings, or a callable that will be passed one argument, the job (Default value = None).

  • default (object, optional) – A default value to be used when a given key is not present. The value must be sortable and is only used if not None (Default value = None).

Yields:
  • key – Key identifying this group.

  • group (iterable of Jobs) – Iterable of Job instances matching this group.

import_from(origin=None, schema=None, sync=None, copytree=None)

Import the data space located at origin into this project.

This function will walk through the data space located at origin and will try to identify data space paths that can be imported as a job workspace into this project.

The schema argument expects a function that takes a path argument and returns a state point dictionary. A default function is used when no argument is provided. The default schema function will simply look for state point files – usually named signac_statepoint.json – and then import all data located within that path into the job workspace corresponding to the specified state point.

Alternatively the schema argument may be a string, that is converted into a schema function, for example: Providing foo/{foo:int} as schema argument means that all directories under foo/ will be imported and their names will be interpreted as the value for foo within the state point.

Tip

Use copytree=os.replace or copytree=shutil.move to move dataspaces on import instead of copying them.

Warning: Imports can fail due to conflicts. Moving data instead of copying may therefore lead to inconsistent states and users are advised to apply caution.

See also

export_to() : Export the project data space.

signac import :

See signac import for the command line equivalent.

Parameters:
  • origin (str, optional) – The path to the data space origin, which is to be imported. This may be a path to a directory, a zip file, or a tarball archive (Default value = None).

  • schema (callable, optional) – An optional schema function, which is either a string or a function that accepts a path as its first and only argument and returns the corresponding state point as dict. (Default value = None).

  • sync (bool or dict, optional) – If True, the project will be synchronized with the imported data space. If a dict of keyword arguments is provided, the arguments will be used for sync() (Default value = None).

  • copytree (callable, optional) – The function used for copying directory tree structures. Uses shutil.copytree() if None (Default value = None). The function requires that the target is a directory.

Returns:

A dict that maps the source directory paths to the target directory paths.

Return type:

dict

classmethod init_project(path=None)

Initialize a project in the provided directory.

It is safe to call this function multiple times with the same arguments. However, a RuntimeError is raised if an existing project configuration would conflict with the provided initialization parameters.

See signac init for the command line equivalent.

Parameters:

path (str, optional) – The directory for the project. Defaults to the current working directory.

Returns:

Initialized project, an instance of Project.

Return type:

Project

isfile(filename)

Check if a filename exists in the project path.

Parameters:

filename (str) – The name of the file.

Returns:

True if filename exists in the project path.

Return type:

bool

min_len_unique_id()

Determine the minimum length required for a job id to be unique.

This method’s runtime scales with the number of jobs in the workspace.

Returns:

Minimum string length of a unique job identifier.

Return type:

int

open_job(statepoint=None, id=None)

Get a job handle associated with a state point.

This method returns the job instance associated with the given state point or job id. Opening a job by a valid state point never fails. Opening a job by id requires a lookup of the state point from the job id, which may fail if the job was not previously initialized.

Parameters:
  • statepoint (dict, optional) – The job’s unique set of state point parameters (Default value = None).

  • id (str, optional) – The job id (Default value = None).

Returns:

The job instance.

Return type:

Job

Raises:
  • KeyError – If the attempt to open the job by id fails.

  • LookupError – If the attempt to open the job by an abbreviated id returns more than one match.

property path

The path to the project directory.

Type:

str

repair(job_ids=None)

Attempt to repair the workspace after it got corrupted.

This method will attempt to repair lost or corrupted job state point files using a state point cache.

Parameters:

job_ids (iterable[str], optional) – An iterable of job ids that should get repaired. Defaults to all jobs.

Raises:

signac.errors.JobsCorruptedError – When one or more corrupted job could not be repaired.

property stores

Get HDF5 stores associated with this project.

Use this property to access an HDF5 file within the project directory using the H5Store dict-like interface.

This is an example for accessing an HDF5 file called 'my_data.h5' within the project directory:

project.stores['my_data']['array'] = np.random((32, 4))

This is equivalent to:

H5Store(project.fn('my_data.h5'))['array'] = np.random((32, 4))

Both the project.stores and the H5Store itself support attribute access. The above example could therefore also be expressed as:

project.stores.my_data.array = np.random((32, 4))
Returns:

The HDF5 store manager for this project.

Return type:

H5StoreManager

sync(other, strategy=None, exclude=None, doc_sync=None, selection=None, **kwargs)

Synchronize this project with the other project.

Try to clone all jobs from the other project to this project. If a job is already part of this project, try to synchronize the job using the optionally specified strategies.

See signac sync for the command line equivalent.

Parameters:
  • other (Project) – The other project to synchronize this project with.

  • strategy (callable, optional) – A synchronization strategy for file conflicts. If no strategy is provided, a SyncConflict exception will be raised upon conflict (Default value = None).

  • exclude (str, optional) – A filename exclude pattern. All files matching this pattern will be excluded from synchronization (Default value = None).

  • doc_sync (attribute or callable from DocSync, optional) – A synchronization strategy for document keys. If this argument is None, by default no keys will be synchronized upon conflict (Default value = None).

  • selection (sequence of Job or job ids (str), optional) – Only synchronize the given selection of jobs (Default value = None).

  • **kwargs – This method also accepts the same keyword arguments as the sync_projects() function.

Raises:
  • DocumentSyncConflict – If there are conflicting keys within the project or job documents that cannot be resolved with the given strategy or if there is no strategy provided.

  • FileSyncConflict – If there are differing files that cannot be resolved with the given strategy or if no strategy is provided.

  • SchemaSyncConflict – In case that the check_schema argument is True and the detected state point schema of this and the other project differ.

temporary_project(dir=None)

Context manager for the initialization of a temporary project.

The temporary project is by default created within the parent project’s workspace to ensure that they share the same file system. This is an example for how this method can be used for the import and synchronization of external data spaces.

with project.temporary_project() as tmp_project:
    tmp_project.import_from('/data')
    project.sync(tmp_project)
Parameters:

dir (str, optional) – Optionally specify where the temporary project directory is to be created. Defaults to the project’s workspace directory.

Returns:

An instance of Project.

Return type:

Project

to_dataframe(*args, **kwargs)

Export the project metadata to a pandas DataFrame.

The arguments to this function are forwarded to to_dataframe().

Parameters:
Return type:

DataFrame

update_cache()

Update the persistent state point cache.

This function updates a persistent state point cache, which is stored in the project directory. Most data space operations, including iteration and filtering or selection are expected to be significantly faster after calling this function, especially for large data spaces.

property workspace

The project’s workspace directory.

Type:

str

The JobsCursor class

Attributes

JobsCursor.export_to(target[, path, copytree])

Export all jobs to a target location, such as a directory or a (zipped) archive file.

JobsCursor.groupby([key, default])

Group jobs according to one or more state point or document parameters.

JobsCursor.to_dataframe([sp_prefix, ...])

Convert the selection of jobs to a pandas DataFrame.

class signac.project.JobsCursor(project, filter=None)

Bases: object

An iterator over a search query result.

Application developers should not directly instantiate this class, but use find_jobs() instead.

Enables simple iteration and grouping operations.

Warning

JobsCursor caches the jobs that match the filter. Call Project.find_jobs again to update the search after making changes to jobs or the workspace that would change the result of the search.

Parameters:
  • project (Project) – Project handle.

  • filter (Mapping) – A mapping of key-value pairs used for the query (Default value = None).

export_to(target, path=None, copytree=None)

Export all jobs to a target location, such as a directory or a (zipped) archive file.

See also

export_to()

For full details on how to use this function.

Parameters:
  • target (str) – A path to a directory or archive file to export to.

  • path (str or callable) – The path (function) used to structure the exported data space (Default value = None).

  • copytree (callable, optional) – The function used for copying directory tree structures. Uses shutil.copytree() if None (Default value = None). The function requires that the target is a directory.

Returns:

A dictionary that maps the source directory paths to the target directory paths.

Return type:

dict

groupby(key=None, default=None)

Group jobs according to one or more state point or document parameters.

Prepend the key with ‘sp.’ or ‘doc.’ to specify the query namespace. If no prefix is specified, group by state point key.

This method can be called on any JobsCursor such as the one returned by find_jobs() or by iterating over a project.

Examples

# Group jobs by state point parameter 'a'.
for key, group in project.groupby('a'):
    print(key, list(group))

# Group jobs by document value 'a'.
for key, group in project.groupby('doc.a'):
    print(key, list(group))

# Group jobs by jobs.sp['a'] and job.document['b']
for key, group in project.groupby(('a', 'doc.b')):
    print(key, list(group))

# Find jobs where job.sp['a'] is 1 and group them
# by job.sp['b'] and job.sp['c'].
for key, group in project.find_jobs({'a': 1}).groupby(('b', 'c')):
    print(key, list(group))

# Group by job.sp['d'] and job.document['count'] using a lambda.
for key, group in project.groupby(
    lambda job: (job.sp['d'], job.document['count'])
):
    print(key, list(group))

If key is None, jobs are grouped by id, placing one job into each group.

If default is None, only jobs with the key defined will be grouped. Jobs without the key will be filtered out and not included in any group.

Parameters:
  • key (str, iterable, or callable, optional) – The grouping key(s) passed as a string, iterable of strings, or a callable that will be passed one argument, the job (Default value = None).

  • default (object, optional) – A default value to be used when a given key is not present. The value must be sortable and is only used if not None (Default value = None).

Yields:
  • key – Key identifying this group.

  • group (iterable of Jobs) – Iterable of Job instances matching this group.

to_dataframe(sp_prefix='sp.', doc_prefix='doc.', usecols=None, flatten=False)

Convert the selection of jobs to a pandas DataFrame.

This function exports the job metadata to a pandas.DataFrame. All state point and document keys are prefixed by default to be able to distinguish them.

Parameters:
  • sp_prefix (str, optional) – Prefix state point keys with the given string. Defaults to “sp.”.

  • doc_prefix (str, optional) – Prefix document keys with the given string. Defaults to “doc.”.

  • usecols (list-like or callable, optional) – Used to select a subset of columns. If list-like, must contain strings corresponding to the column names that should be included. For example, ['sp.a', 'doc.notes']. If callable, the column will be included if the function called on the column name returns True. For example, lambda x: 'sp.' in x. Defaults to None, which uses all columns from the state point and document. Note that this filter is applied after the doc and sp prefixes are added to the column names.

  • flatten (bool, optional) – Whether nested state points or document keys should be flattened. If True, {'a': {'b': 'c'}} becomes a column named a.b with value c. If False, it becomes a column named a with value {'b': 'c'}. Defaults to False.

Returns:

A pandas DataFrame with all job metadata.

Return type:

DataFrame

The Job class

Attributes

Job.cached_statepoint

Get a copy of the job's state point as a read-only mapping.

Job.clear()

Remove all job data, but not the job itself.

Job.close()

Close the job and switch to the previous working directory.

Job.data

Get data associated with this job.

Job.doc

Alias for document.

Job.document

Get document associated with this job.

Job.fn(filename)

Prepend a filename with the job path.

Job.id

Get the unique identifier for the job's state point.

Job.init([force, validate_statepoint])

Initialize the job's workspace directory.

Job.isfile(filename)

Check if a filename exists in the job directory.

Job.move(project)

Move this job to project.

Job.open()

Enter the job's workspace directory.

Job.path

The path to the job directory.

Job.project

Get the project that contains this job.

Job.remove()

Remove the job's workspace including the job document.

Job.reset()

Remove all job data, but not the job itself.

Job.sp

Alias for statepoint.

Job.statepoint

Get or set the job's state point.

Job.stores

Get HDF5 stores associated with this job.

Job.sync(other[, strategy, exclude, doc_sync])

Perform a one-way synchronization of this job with the other job.

Job.update_statepoint(update[, overwrite])

Change the state point of this job while preserving job data.

class signac.job.Job(project, statepoint=None, id_=None, directory_known=False)

Bases: object

The job instance is a handle to the data of a unique state point.

Application developers should not directly instantiate this class, but use open_job() instead.

Jobs can be opened by statepoint or id_. If both values are provided, it is the user’s responsibility to ensure that the values correspond. Set directory_known to True when the job directory is known to exist - this skips some expensive isdir checks.

Parameters:
  • project (Project) – Project handle.

  • statepoint (dict, optional) – State point for the job. (Default value = None)

  • id (str, optional) – The job identifier. (Default value = None)

  • directory_known (bool, optional) – Set to true when the job directory is known to exist. (Default value = False)

FN_DOCUMENT = 'signac_job_document.json'

The job’s document filename.

FN_STATE_POINT = 'signac_statepoint.json'

The job’s state point filename.

The job state point is a human-readable file containing the job’s state point that is stored in each job’s workspace directory.

KEY_DATA = 'signac_data'

The job’s datastore key.

property cached_statepoint

Get a copy of the job’s state point as a read-only mapping.

cached_statepoint uses the state point cache to provide fast access to the job’s state point for reading.

Note

Create and update the state point cache by calling project.update_cache or running signac update-cache on the command line.

See also

Use statepoint to modify the job’s state point.

Returns:

Returns the job’s state point.

Return type:

Mapping

clear()

Remove all job data, but not the job itself.

This function will do nothing if the job was not previously initialized.

See signac rm -c for the command line equivalent.

close()

Close the job and switch to the previous working directory.

property data

Get data associated with this job.

This property should be used for large array-like data, which can’t be stored efficiently in the job document. For examples and usage, see Job Data Storage.

Equivalent to:

return job.stores['signac_data']
Returns:

An HDF5-backed datastore.

Return type:

H5Store

property doc

Alias for document.

Warning

Even deep copies of doc will modify the same file, so changes will still effectively be persisted between deep copies. If you need a deep copy that will not modify the underlying persistent JSON file, use the call operator to get an equivalent plain dictionary: job.doc().

See signac document for the command line equivalent.

Returns:

The job document handle. Supports attribute-based access to dict keys.

Return type:

MutableMapping

property document

Get document associated with this job.

Warning

Even deep copies of document will modify the same file, so changes will still effectively be persisted between deep copies. If you need a deep copy that will not modify the underlying persistent JSON file, use the call operator to get an equivalent plain dictionary: job.document(). For more information, see JSONDict.

See signac document for the command line equivalent.

Returns:

The job document handle. Supports attribute-based access to dict keys.

Return type:

MutableMapping

fn(filename)

Prepend a filename with the job path.

Parameters:

filename (str) – The name of the file.

Returns:

The absolute path to the file.

Return type:

str

property id

Get the unique identifier for the job’s state point.

Returns:

The job id.

Return type:

str

init(force=False, validate_statepoint=True)

Initialize the job’s workspace directory.

This function will do nothing if the directory and the job state point already exist and the state point is valid.

Returns the calling job.

See signac job -c for the command line equivalent.

Parameters:
  • force (bool, optional) – Overwrite any existing state point files, e.g., to repair them if they got corrupted (Default value = False).

  • validate_statepoint (bool, optional) – When True (the default), load the job state point and ensure that it matches the id. When False, exit early when the job directory exists.

Returns:

The job handle.

Return type:

Job

Raises:
  • OSError – If the workspace directory cannot be created or any other I/O error occurs when attempting to save the state point file.

  • JobsCorruptedError – If the job state point on disk is corrupted.

isfile(filename)

Check if a filename exists in the job directory.

Parameters:

filename (str) – The name of the file.

Returns:

True if filename exists in the job directory.

Return type:

bool

move(project)

Move this job to project.

This function will attempt to move this instance of job from its original project to a different project.

See signac move for the command line equivalent.

Parameters:

project (Project) – The project to move this job to.

open()

Enter the job’s workspace directory.

You can use the Job class as context manager:

with project.open_job(my_statepoint) as job:
    # Manipulate your job data
    pass

Opening the context will switch into the job’s workspace, leaving it will switch back to the previous working directory.

property path

The path to the job directory.

See signac job -w for the command line equivalent.

Type:

str

property project

Get the project that contains this job.

Returns:

Returns the project containing this job.

Return type:

signac.Project

remove()

Remove the job’s workspace including the job document.

This function will do nothing if the workspace directory does not exist.

See signac rm for the command line equivalent.

reset()

Remove all job data, but not the job itself.

This function will initialize the job if it was not previously initialized.

property sp

Alias for statepoint.

property statepoint

Get or set the job’s state point.

Setting the state point to a different value will change the job id.

For more information, see Modifying the State Point.

Tip

Use cached_statepoint for fast read-only access to the state point.

Warning

The state point object behaves like a dictionary in most cases, but because it persists changes to the filesystem, making a copy requires explicitly converting it to a dict. If you need a modifiable copy that will not modify the underlying JSON file, you can access a dict copy of the state point by calling it, e.g. sp_dict = job.statepoint() instead of sp = job.statepoint. For more information, see JSONAttrDict.

See signac statepoint for the command line equivalent.

Danger

Use this function with caution! Resetting a job’s state point may sometimes be necessary, but can possibly lead to incoherent data spaces.

Returns:

Returns the job’s state point. Supports attribute-based access to dict keys.

Return type:

MutableMapping

property stores

Get HDF5 stores associated with this job.

Use this property to access an HDF5 file within the job’s workspace directory using the H5Store dict-like interface.

This is an example for accessing an HDF5 file called ‘my_data.h5’ within the job’s workspace:

job.stores['my_data']['array'] = np.random((32, 4))

This is equivalent to:

H5Store(job.fn('my_data.h5'))['array'] = np.random((32, 4))

Both the stores and the H5Store itself support attribute access. The above example could therefore also be expressed as:

job.stores.my_data.array = np.random((32, 4))
Returns:

The HDF5-Store manager for this job.

Return type:

H5StoreManager

sync(other, strategy=None, exclude=None, doc_sync=None, **kwargs)

Perform a one-way synchronization of this job with the other job.

By default, this method will synchronize all files and document data with the other job to this job until a synchronization conflict occurs. There are two different kinds of synchronization conflicts:

  1. The two jobs have files with the same, but different content.

  2. The two jobs have documents that share keys, but those keys are associated with different values.

A file conflict can be resolved by providing a ‘FileSync’ strategy or by excluding files from the synchronization. An unresolvable conflict is indicated with the raise of a FileSyncConflict exception.

A document synchronization conflict can be resolved by providing a doc_sync function that takes the source and the destination document as first and second argument.

Parameters:
  • other (Job) – The other job to synchronize from.

  • strategy (callable, optional) – A synchronization strategy for file conflicts. If no strategy is provided, a SyncConflict exception will be raised upon conflict (Default value = None).

  • exclude (str, optional) – A filename exclude pattern. All files matching this pattern will be excluded from synchronization (Default value = None).

  • doc_sync (attribute or callable from DocSync, optional) – A synchronization strategy for document keys. If this argument is None, by default no keys will be synchronized upon conflict (Default value = None).

  • dry_run (bool, optional) – If True, do not actually perform the synchronization.

  • **kwargs – Extra keyword arguments will be forwarded to the sync_jobs() function which actually executes the synchronization operation.

Raises:

FileSyncConflict – In case that a file synchronization results in a conflict.

update_statepoint(update, overwrite=False)

Change the state point of this job while preserving job data.

By default, this method will not change existing parameters of the state point of the job.

This method will change the job id if the state point has been altered.

For more information, see Modifying the State Point.

Warning

While appending to a job’s state point is generally safe, modifying existing parameters may lead to data inconsistency. Use the overwrite argument with caution!

Parameters:
  • update (dict) – A mapping used for the state point update.

  • overwrite (bool, optional) – If False, an error will be raised if the update modifies the values of existing keys in the state point. If True, any existing keys will be overwritten in the same way as dict.update(). Use with caution! (Default value = False).

Raises:
  • KeyError – If the update contains keys which are already part of the job’s state point and overwrite is False.

  • DestinationExistsError – If a job associated with the new state point is already initialized.

  • OSError – If the move failed due to an unknown system related error.

The JSONDict

This class implements the interface for the job’s statepoint and document attributes, but can also be used on its own.

signac.JSONDict

alias of BufferedJSONAttrDict

The H5Store

This class implements the interface to the job’s data attribute, but can also be used on its own.

class signac.H5Store(filename, **kwargs)

An HDF5-backed container for storing array-like and dictionary-like data.

The H5Store is a MutableMapping and therefore behaves similar to a dict, but all data is stored persistently in the associated HDF5 file on disk.

Supported types include:

  • built-in types (int, float, str, bool, NoneType, array)

  • numpy arrays

  • pandas data frames (requires pandas and pytables)

  • mappings with values that are supported types

Values can be accessed as attributes (h5s.foo) or via key index (h5s['foo']).

Examples

>>> from signac import H5Store
>>> with H5Store('file.h5') as h5s:
...     h5s['foo'] = 'bar'
...     assert 'foo' in h5s
...     assert h5s.foo == 'bar'
...     assert h5s['foo'] == 'bar'

The H5Store can be used as a context manager to ensure that the underlying file is opened, however most built-in types (excluding arrays) can be read and stored without the need to explicitly open the file. To access arrays (reading or writing), the file must always be opened!

To open a file in read-only mode, use the open() method with mode='r':

>>> with H5Store('file.h5').open(mode='r') as h5s:
...     pass
Parameters:
  • filename (str) – The filename of the underlying HDF5 file.

  • **kwargs – Additional keyword arguments to be forwarded to the h5py.File constructor. See the h5py.File documentation for more information.

clear()

Remove all data from this store.

Danger

All data will be removed, this action cannot be reversed!

close()

Close the underlying HDF5 file.

property file

Access the underlying instance of h5py.File.

This property exposes the underlying h5py.File object, enabling use of functions such as h5py.Group.create_dataset() or h5py.Group.require_dataset().

Note

The store must be open to access this property!

Returns:

The h5py.File object that this store is operating on.

Return type:

h5py.File

Raises:

H5StoreClosedError – If the store is closed.

property filename

Return the H5Store filename.

flush()

Flush the underlying HDF5 file.

get(k[, d]) D[k] if k in D, else d.  d defaults to None.
items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
property mode

Return the default opening mode of this H5Store.

open(mode=None)

Open the underlying HDF5 file.

Parameters:

mode (str) – The file open mode to use. Defaults to ‘a’ (append).

Returns:

This H5Store instance.

Return type:

H5Store

pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(key, value)

Set a value for a key if that key is not already set.

update([E, ]**F) None.  Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() an object providing a view on D's values

The H5StoreManager

This class implements the interface to the job’s stores attribute, but can also be used on its own.

class signac.H5StoreManager(prefix)

Bases: _DictManager

Helper class to manage multiple instances of H5Store within a directory.

Parameters:

prefix (str) – The directory prefix shared by all files managed by this class.

Examples

Assuming that the stores/ directory exists:

>>> stores = H5StoreManager('stores/')
>>> stores.data
<H5Store(filename=stores/data.h5)>
>>> stores.data.foo = True
>>> dict(stores.data)
{'foo': True}
cls

alias of H5Store

keys()

Return an iterable of keys.

property prefix

Return the prefix.

Top-level functions

The signac framework aids in the management of large and heterogeneous data spaces.

It provides a simple and robust data model to create a well-defined, indexable storage layout for data and metadata. This makes it easier to operate on large data spaces, streamlines post-processing and analysis, and makes data collectively accessible.

signac.TemporaryProject(cls=None, **kwargs)

Context manager for the generation of a temporary project.

This is a factory function that creates a Project within a temporary directory and must be used as context manager, for example like this:

with TemporaryProject() as tmp_project:
    tmp_project.import_from('/data')
Parameters:
  • cls (object, optional) – The class of the temporary project. Defaults to Project.

  • **kwargs – Optional keyword arguments that are forwarded to the TemporaryDirectory class constructor, which is used to create a temporary project directory.

Yields:

Project – An instance of Project.

signac.buffered(buffer_capacity=None)

Enter context to buffer all operations for this backend.

Parameters:

buffer_capacity (int) – The capacity of the buffer to use within this context (resets after the context is exited).

signac.diff_jobs(*jobs)

Find differences among a list of jobs’ state points.

The resulting diff is a dictionary where the keys are job ids and the values are each job’s state point minus the intersection of all provided jobs’ state points. The comparison is performed over the combined set of keys and values.

See signac diff for the command line equivalent.

Parameters:

*jobs (sequence[Job]) – Sequence of jobs to diff.

Returns:

A dictionary where the keys are job ids and values are the unique parts of that job’s state point.

Return type:

dict

Examples

>>> import signac
>>> project = signac.init_project()
>>> job1 = project.open_job({'constant': 42, 'diff1': 0, 'diff2': 1}).init()
>>> job2 = project.open_job({'constant': 42, 'diff1': 1, 'diff2': 1}).init()
>>> job3 = project.open_job({'constant': 42, 'diff1': 2, 'diff2': 2}).init()
>>> print(job1)
c4af2b26f1fd256d70799ad3ce3bdad0
>>> print(job2)
b96b21fada698f8934d58359c72755c0
>>> print(job3)
e4289419d2b0e57e4852d44a09f167c0
>>> signac.diff_jobs(job1, job2, job3)
{'c4af2b26f1fd256d70799ad3ce3bdad0': {'diff2': 1, 'diff1': 0},
'b96b21fada698f8934d58359c72755c0': {'diff2': 1, 'diff1': 1},
'e4289419d2b0e57e4852d44a09f167c0': {'diff2': 2, 'diff1': 2}}
>>> signac.diff_jobs(*project)
{'c4af2b26f1fd256d70799ad3ce3bdad0': {'diff2': 1, 'diff1': 0},
'b96b21fada698f8934d58359c72755c0': {'diff2': 1, 'diff1': 1},
'e4289419d2b0e57e4852d44a09f167c0': {'diff2': 2, 'diff1': 2}}
signac.get_buffer_capacity()

Get the current buffer capacity.

Returns:

The amount of data that can be stored before a flush is triggered in the appropriate units for a particular buffering implementation.

Return type:

int

signac.get_current_buffer_size()

Get the total amount of data currently stored in the buffer.

Returns:

The size of all data contained in the buffer in the appropriate units for a particular buffering implementation.

Return type:

int

signac.get_job(path=None)

Find a Job in or above the provided path (or the current working directory).

Parameters:

path (str, optional) – The starting point to search for a job. If None, the current working directory is used (Default value = None).

Returns:

The first job found in or above the provided path.

Return type:

Job

Raises:

LookupError – If a job cannot be found.

Examples

When the current directory is a job directory:

>>> signac.get_job()
signac.job.Job(project=..., statepoint={...})
signac.get_project(path=None, search=True, **kwargs)

Find a project configuration and return the associated project.

Parameters:
  • path (str, optional) – The starting point to search for a project. If None, the current working directory is used (Default value = None).

  • search (bool, optional) – If True, search for project configurations inside and above the specified path, otherwise only return a project in the specified path (Default value = True).

  • **kwargs – Optional keyword arguments that are forwarded to get_project().

Returns:

An instance of Project.

Return type:

Project

Raises:

LookupError – If no project configuration can be found.

signac.init_project(path=None)

Initialize a project.

It is safe to call this function multiple times with the same arguments. However, a RuntimeError is raised if an existing project configuration would conflict with the provided initialization parameters.

Parameters:

path (str, optional) – The directory for the project. Defaults to the current working directory.

Returns:

The initialized project instance.

Return type:

Project

Raises:

RuntimeError – If the project path already contains a conflicting project configuration.

signac.is_buffered()

Check if this backend is currently buffered.

signac.set_buffer_capacity(new_capacity)

Update the buffer capacity.

Parameters:

new_capacity (int) – The new capacity of the buffer in the appropriate units for a particular buffering implementation.

Submodules

signac.sync module

Synchronization of jobs and projects.

Jobs may be synchronized by copying all data from the source job to the destination job. This means all files are copied and the documents are synchronized. Conflicts, that means both jobs contain conflicting data, may be resolved with a user defined strategy.

The synchronization of projects is in essence the synchronization of all jobs which are in the destination project with the ones in the source project and the sync synchronization of the project document. If a specific job does not exist yet at the destination it is simply cloned, otherwise it is synchronized.

A sync strategy is a function (or functor) that takes the source job, the destination job, and the name of the file generating the conflict as arguments and returns the decision whether to overwrite the file as Boolean. There are some default strategies defined within this module as part of the FileSync class. These are the default strategies:

  1. always – Always overwrite on conflict.

  2. never – Never overwrite on conflict.

  3. update – Overwrite when the modification time of the source file is newer.

  4. Ask – Ask the user interactively about each conflicting filename.

For example, to synchronize two projects resolving conflicts by modification time, use:

dest_project.sync(source_project, strategy=sync.FileSync.update)

Unlike files, which are always either overwritten as a whole or not, documents can be synchronized more fine-grained with a sync function. Such a function (or functor) takes the source and the destination document as arguments and performs the synchronization. The user is encouraged to implement their own sync functions, but there are a few default functions implemented as part of the DocSync class:

  1. NO_SYNC – Do not perform any synchronization.

  2. COPY – Apply the same strategy used to resolve file conflicts.

  3. update – Equivalent to dst.update(src).

  4. ByKey – Synchronize the source document key by key, more information below.

This is how we could synchronize two jobs, where the documents are synchronized with a simple update function:

dst_job.sync(src_job, doc_sync=sync.DocSync.update)

The DocSync.ByKey functor attempts to synchronize the destination document with the source document without overwriting any data. That means this function behaves similar to update() for a non-intersecting set of keys, but in addition will preserve nested mappings without overwriting values. In addition, any key conflict, that means keys that are present in both documents, but have differing data, will lead to the raise of a DocumentSyncConflict exception. The user may expclitly decide to overwrite certain keys by providing a “key-strategy”, which is a function that takes the conflicting key as argument, and returns the decision whether to overwrite that specific key as Boolean. For example, to sync two jobs, where conflicting keys should only be overwritten if they contain the term ‘foo’, we could execute:

dst_job.sync(src_job, doc_sync=sync.DocSync.ByKey(lambda key: 'foo' in key))

This means that all documents are synchronized ‘key-by-key’ and only conflicting keys that contain the word “foo” will be overwritten, any other conflicts would lead to the raise of a DocumentSyncConflict exception. A key-strategy may also be a regular expression, so the synchronization above could also be achieved with:

dst_job.sync(src_job, doc_sync=sync.DocSync.ByKey('foo'))
class signac.sync.DocSync

Bases: object

Collection of document synchronization functions.

class ByKey(key_strategy=None)

Bases: object

Synchronize documents key by key.

COPY = 'copy'

Copy (and potentially overwrite) documents like any other file.

NO_SYNC = False

Do not synchronize documents.

static update(src, dst)

Perform a simple update.

class signac.sync.FileSync

Bases: object

Collection of file synchronization strategies.

class Ask

Bases: object

Resolve sync conflicts by asking whether a file should be overwritten interactively.

static always(src, dst, fn)

Resolve sync conflicts by always overwriting.

classmethod keys()

Return keys.

static never(src, dst, fn)

Resolve sync conflicts by never overwriting.

static update(src, dst, fn)

Resolve sync conflicts based on newest modified timestamp.

signac.sync.sync_jobs(src, dst, strategy=None, exclude=None, doc_sync=None, recursive=False, follow_symlinks=True, preserve_permissions=False, preserve_times=False, preserve_owner=False, preserve_group=False, deep=False, dry_run=False)

Synchronize the dst job with the src job.

By default, this method will synchronize all files and document data of dst job with the src job until a synchronization conflict occurs. There are two different kinds of synchronization conflicts:

  1. The two jobs have files with the same name, but different content.

  2. The two jobs have documents that share keys, but those keys are mapped to different values.

A file conflict can be resolved by providing a ‘FileSync’ strategy or by excluding files from the synchronization. An unresolvable conflict is indicated with the raise of a FileSyncConflict exception.

A document synchronization conflict can be resolved by providing a doc_sync function that takes the source and the destination document as first and second argument.

Parameters:
  • src (Job) – The src job, data will be copied from this job’s workspace.

  • dst (Job) – The dst job, data will be copied to this job’s workspace.

  • strategy (callable, optional) – A synchronization strategy for file conflicts. The strategy should be a callable with signature strategy(src, dst, filepath) where src and dst are the source and destination instances of Project and filepath is the filepath relative to the project path. If no strategy is provided, a errors.SyncConflict exception will be raised upon conflict. (Default value = None)

  • exclude (str, optional) – A filename exclusion pattern. All files matching this pattern will be excluded from the synchronization process. (Default value = None)

  • doc_sync (attribute or callable from DocSync, optional) – A synchronization strategy for document keys. The default is to use a safe key-by-key strategy that will not overwrite any values on conflict, but instead raises a DocumentSyncConflict exception.

  • recursive (bool, optional) – Recursively synchronize sub-directories encountered within the job workspace directories. (Default value = False)

  • follow_symlinks (bool, optional) – Follow and copy the target of symbolic links. (Default value = True)

  • preserve_permissions (bool, optional) – Preserve file permissions (Default value = False)

  • preserve_times (bool, optional) – Preserve file modification times (Default value = False)

  • preserve_owner (bool, optional) – Preserve file owner (Default value = False)

  • preserve_group (bool, optional) – Preserve file group ownership (Default value = False)

  • dry_run (bool, optional) – If True, do not actually perform any synchronization operations. (Default value = False)

  • deep (bool, optional) – (Default value = False)

signac.sync.sync_projects(source, destination, strategy=None, exclude=None, doc_sync=None, selection=None, check_schema=True, recursive=False, follow_symlinks=True, preserve_permissions=False, preserve_times=False, preserve_owner=False, preserve_group=False, deep=False, dry_run=False, parallel=False, collect_stats=False)

Synchronize the destination project with the source project.

Try to clone all jobs from the source to the destination. If the destination job already exist, try to synchronize the job using the optionally specified strategy.

Parameters:
  • source (class:~.Project) – The project presenting the source for synchronization.

  • destination (class:~.Project) – The project that is modified for synchronization.

  • strategy (callable, optional) – A synchronization strategy for file conflicts. The strategy should be a callable with signature strategy(src, dst, filepath) where src and dst are the source and destination instances of Project and filepath is the filepath relative to the project path. If no strategy is provided, a errors.SyncConflict exception will be raised upon conflict. (Default value = None)

  • exclude (str, optional) – A filename exclusion pattern. All files matching this pattern will be excluded from the synchronization process. (Default value = None)

  • doc_sync (attribute or callable from DocSync) – A synchronization strategy for document keys. The default is to use a safe key-by-key strategy that will not overwrite any values on conflict, but instead raises a DocumentSyncConflict exception.

  • selection (sequence of Job or job ids (str), optional) – Only synchronize the given selection of jobs. (Default value = None)

  • check_schema (bool, optional) – If True, only synchronize if this and the other project have a matching state point schema. See also: detect_schema(). (Default value = True)

  • recursive (bool, optional) – Recursively synchronize sub-directories encountered within the job workspace directories. (Default value = False)

  • follow_symlinks (bool, optional) – Follow and copy the target of symbolic links. (Default value = True)

  • preserve_permissions (bool, optional) – Preserve file permissions (Default value = False)

  • preserve_times (bool, optional) – Preserve file modification times (Default value = False)

  • preserve_owner (bool, optional) – Preserve file owner (Default value = False)

  • preserve_group (bool, optional) – Preserve file group ownership (Default value = False)

  • dry_run (bool, optional) – If True, do not actually perform the synchronization operation, just log what would happen theoretically. Useful to test synchronization strategies without the risk of data loss. (Default value = False)

  • deep (bool, optional) – (Default value = False)

  • parallel (bool, optional) – (Default value = False)

  • collect_stats (bool, optional) – (Default value = False)

Returns:

Returns stats if collect_stats is True, else None.

Return type:

NoneType or FileTransferStats

Raises:
  • DocumentSyncConflict – If there are conflicting keys within the project or job documents that cannot be resolved with the given strategy or if there is no strategy provided.

  • FileSyncConflict – If there are differing files that cannot be resolved with the given strategy or if no strategy is provided.

  • SchemaSyncConflict – In case that the check_schema argument is True and the detected state point schema of this and the other project differ.

signac.errors module

Errors raised by signac.

exception signac.errors.ConfigError

Bases: Error, RuntimeError

Error with parsing or reading a configuration file.

exception signac.errors.DestinationExistsError(destination)

Bases: Error, RuntimeError

The destination for a move or copy operation already exists.

Parameters:

destination (str) – The destination causing the error.

exception signac.errors.DocumentSyncConflict(keys)

Bases: SyncConflict

Raised when a synchronization operation fails due to a document conflict.

keys

The keys that caused the conflict.

exception signac.errors.Error

Bases: Exception

Base class used for signac Errors.

exception signac.errors.FileSyncConflict(filename)

Bases: SyncConflict

Raised when a synchronization operation fails due to a file conflict.

filename

The filename of the file that caused the conflict.

exception signac.errors.H5StoreAlreadyOpenError

Bases: Error, OSError

Indicates that the underlying HDF5 file is already open.

exception signac.errors.H5StoreClosedError

Bases: Error, RuntimeError

Raised when trying to access a closed HDF5 file.

exception signac.errors.IncompatibleSchemaVersion

Bases: Error

The project’s schema version is incompatible with this version of signac.

exception signac.errors.InvalidKeyError

Bases: ValueError

Raised when a user uses a non-conforming key.

exception signac.errors.JobsCorruptedError(job_ids)

Bases: Error, RuntimeError

The state point file of one or more jobs cannot be opened or is corrupted.

Parameters:

job_ids – The job id(s) of the corrupted job(s).

exception signac.errors.KeyTypeError

Bases: TypeError

Raised when a user uses a key of invalid type.

exception signac.errors.SchemaSyncConflict(schema_src, schema_dst)

Bases: SyncConflict

Raised when a synchronization operation fails due to schema differences.

exception signac.errors.StatepointParsingError

Bases: Error, RuntimeError

Indicates an error that occurred while trying to identify a state point.

exception signac.errors.SyncConflict

Bases: Error, RuntimeError

Raised when a synchronization operation fails.

exception signac.errors.WorkspaceError(error)

Bases: Error, OSError

Raised when there is an issue creating or accessing the workspace.

Parameters:

error – The underlying error causing this issue.