{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.2 Exploring Data\n", "\n", "
\n", "\n", "Please note: The following notebook requires you first run [signac_101_Getting_Started](signac_101_Getting_Started.ipynb).\n", "\n", "
\n", "\n", "## Finding jobs\n", "\n", "In [section one](signac_101_Getting_Started.ipynb) of this tutorial, we evaluated the ideal gas equation and stored the results in the *job document* and in a file called `V.txt`.\n", "Let's now have a look at how we can explore our data space for basic and advanced analysis.\n", "\n", "We already saw how to iterate over the *complete* data space using the \"`for job in project`\" expression.\n", "This is a short-hand notation for \"`for job in project.find_jobs()`\", meaning: \"find **all** jobs\".\n", "\n", "Instead of finding all jobs, we can also find a subset using *filters*.\n", "\n", "Let's get started by getting a handle on our project using the `get_project()` function.\n", "We don't need to initialize the project again, since we already did that in section 1." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import signac\n", "\n", "project = signac.get_project(\"projects/tutorial\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we assume that we would like to find all jobs, where *p=10.0*. For this, we can use the `find_jobs()` method, which takes a dictionary of parameters as filter argument." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'p': 10.0, 'kT': 1.0, 'N': 1000}\n" ] } ], "source": [ "for job in project.find_jobs({\"p\": 10.0}):\n", " print(job.statepoint())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, that is of course only a single job.\n", "\n", "You can execute the same kind of find operation on the [command line](https://signac.readthedocs.io/projects/core/en/latest/cli.html) with `$ signac find`, as will be shown later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the filtering method is optimized for a simple dissection of the data space, it is possible to construct more complex query routines for example using [list comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions).\n", "\n", "This is an example for how to select all jobs where the pressure *p* is greater than 0.1:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'p': 10.0, 'kT': 1.0, 'N': 1000} {'V': 100.0}\n", "{'p': 1.0, 'kT': 1.0, 'N': 1000} {'V': 1000.0}\n" ] } ], "source": [ "jobs_p_gt_0_1 = [job for job in project if job.sp.p > 0.1]\n", "for job in jobs_p_gt_0_1:\n", " print(job.statepoint(), job.document)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finding jobs by certain criteria requires an index of the data space, which signac automatically generates and uses internally." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Views\n", "\n", "Sometimes we want to examine our data on the file system directly. However the file paths within the workspace are obfuscated by the *job id*. The solution is to use *views*, which are human-readable, maximally compact hierarchical links to our data space.\n", "\n", "To create a linked view we simply execute the `create_linked_view()` method within python or the `$ signac view` command on the [command line](https://signac.readthedocs.io/projects/core/en/latest/cli.html)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0m\u001b[01;34mp\u001b[0m/\r\n" ] } ], "source": [ "project.create_linked_view(prefix=\"projects/tutorial/view\")\n", "%ls projects/tutorial/view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The view paths only contain parameters which actually vary across the different jobs.\n", "In this example, that is only the pressure *p*.\n", "\n", "This allows us to examine the data with highly-compact human-readable path names:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "V.txt signac_job_document.json signac_statepoint.json\r\n", "1000.0\r\n" ] } ], "source": [ "%ls 'projects/tutorial/view/p/1.0/job/'\n", "%cat 'projects/tutorial/view/p/1.0/job/V.txt'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE: Update your view after adding or removing jobs by executing the view command for the same prefix again!**\n", "\n", "Tip: Consider creating a linked view for large data sets on an [**in-memory** file system](https://en.wikipedia.org/wiki/Tmpfs) for best performance!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [next section](signac_103_A_Basic_Workflow.ipynb) will demonstrate how to implement a basic, but complete workflow for more expensive computations." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 4 }