\n",
"\n",
"Please note: The following notebook requires you first run [signac_101_Getting_Started](signac_101_Getting_Started.ipynb).\n",
"\n",
"
\n",
"\n",
"## Finding jobs\n",
"\n",
"In [section one](signac_101_Getting_Started.ipynb) of this tutorial, we evaluated the ideal gas equation and stored the results in the *job document* and in a file called `V.txt`.\n",
"Let's now have a look at how we can explore our data space for basic and advanced analysis.\n",
"\n",
"We already saw how to iterate over the *complete* data space using the \"`for job in project`\" expression.\n",
"This is a short-hand notation for \"`for job in project.find_jobs()`\", meaning: \"find **all** jobs\".\n",
"\n",
"Instead of finding all jobs, we can also find a subset using *filters*.\n",
"\n",
"Let's get started by getting a handle on our project using the `get_project()` function.\n",
"We don't need to initialize the project again, since we already did that in section 1."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import signac\n",
"\n",
"project = signac.get_project(\"projects/tutorial\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we assume that we would like to find all jobs, where *p=10.0*. For this, we can use the `find_jobs()` method, which takes a dictionary of parameters as filter argument."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'p': 10.0, 'kT': 1.0, 'N': 1000}\n"
]
}
],
"source": [
"for job in project.find_jobs({\"p\": 10.0}):\n",
" print(job.statepoint())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, that is of course only a single job.\n",
"\n",
"You can execute the same kind of find operation on the [command line](https://signac.readthedocs.io/projects/core/en/latest/cli.html) with `$ signac find`, as will be shown later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While the filtering method is optimized for a simple dissection of the data space, it is possible to construct more complex query routines for example using [list comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions).\n",
"\n",
"This is an example for how to select all jobs where the pressure *p* is greater than 0.1:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'p': 10.0, 'kT': 1.0, 'N': 1000} {'V': 100.0}\n",
"{'p': 1.0, 'kT': 1.0, 'N': 1000} {'V': 1000.0}\n"
]
}
],
"source": [
"jobs_p_gt_0_1 = [job for job in project if job.sp.p > 0.1]\n",
"for job in jobs_p_gt_0_1:\n",
" print(job.statepoint(), job.document)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finding jobs by certain criteria requires an index of the data space, which signac automatically generates and uses internally."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Views\n",
"\n",
"Sometimes we want to examine our data on the file system directly. However the file paths within the workspace are obfuscated by the *job id*. The solution is to use *views*, which are human-readable, maximally compact hierarchical links to our data space.\n",
"\n",
"To create a linked view we simply execute the `create_linked_view()` method within python or the `$ signac view` command on the [command line](https://signac.readthedocs.io/projects/core/en/latest/cli.html)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[0m\u001b[01;34mp\u001b[0m/\r\n"
]
}
],
"source": [
"project.create_linked_view(prefix=\"projects/tutorial/view\")\n",
"%ls projects/tutorial/view"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The view paths only contain parameters which actually vary across the different jobs.\n",
"In this example, that is only the pressure *p*.\n",
"\n",
"This allows us to examine the data with highly-compact human-readable path names:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"V.txt signac_job_document.json signac_statepoint.json\r\n",
"1000.0\r\n"
]
}
],
"source": [
"%ls 'projects/tutorial/view/p/1.0/job/'\n",
"%cat 'projects/tutorial/view/p/1.0/job/V.txt'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE: Update your view after adding or removing jobs by executing the view command for the same prefix again!**\n",
"\n",
"Tip: Consider creating a linked view for large data sets on an [**in-memory** file system](https://en.wikipedia.org/wiki/Tmpfs) for best performance!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [next section](signac_103_A_Basic_Workflow.ipynb) will demonstrate how to implement a basic, but complete workflow for more expensive computations."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}