{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1.2 Exploring Data\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "Please note: The following notebook requires you first run [signac_101_Getting_Started](signac_101_Getting_Started.ipynb).\n",
    "\n",
    "</div>\n",
    "\n",
    "## Finding jobs\n",
    "\n",
    "In [section one](signac_101_Getting_Started.ipynb) of this tutorial, we evaluated the ideal gas equation and stored the results in the *job document* and in a file called `V.txt`.\n",
    "Let's now have a look at how we can explore our data space for basic and advanced analysis.\n",
    "\n",
    "We already saw how to iterate over the *complete* data space using the \"`for job in project`\" expression.\n",
    "This is a short-hand notation for \"`for job in project.find_jobs()`\", meaning: \"find **all** jobs\".\n",
    "\n",
    "Instead of finding all jobs, we can also find a subset using *filters*.\n",
    "\n",
    "Let's get started by getting a handle on our project using the `get_project()` function.\n",
    "We don't need to initialize the project again, since we already did that in section 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import signac\n",
    "\n",
    "project = signac.get_project(\"projects/tutorial\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we assume that we would like to find all jobs, where *p=10.0*. For this, we can use the `find_jobs()` method, which takes a dictionary of parameters as filter argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'p': 10.0, 'kT': 1.0, 'N': 1000}\n"
     ]
    }
   ],
   "source": [
    "for job in project.find_jobs({\"p\": 10.0}):\n",
    "    print(job.statepoint())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, that is of course only a single job.\n",
    "\n",
    "You can execute the same kind of find operation on the [command line](https://signac.readthedocs.io/projects/core/en/latest/cli.html) with `$ signac find`, as will be shown later."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While the filtering method is optimized for a simple dissection of the data space, it is possible to construct more complex query routines for example using [list comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions).\n",
    "\n",
    "This is an example for how to select all jobs where the pressure *p* is greater than 0.1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'p': 10.0, 'kT': 1.0, 'N': 1000} {'V': 100.0}\n",
      "{'p': 1.0, 'kT': 1.0, 'N': 1000} {'V': 1000.0}\n"
     ]
    }
   ],
   "source": [
    "jobs_p_gt_0_1 = [job for job in project if job.sp.p > 0.1]\n",
    "for job in jobs_p_gt_0_1:\n",
    "    print(job.statepoint(), job.document)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finding jobs by certain criteria requires an index of the data space, which signac automatically generates and uses internally."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Views\n",
    "\n",
    "Sometimes we want to examine our data on the file system directly. However the file paths within the workspace are obfuscated by the *job id*. The solution is to use *views*, which are human-readable, maximally compact hierarchical links to our data space.\n",
    "\n",
    "To create a linked view we simply execute the `create_linked_view()` method within python or the `$ signac view` command on the [command line](https://signac.readthedocs.io/projects/core/en/latest/cli.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0m\u001b[01;34mp\u001b[0m/\r\n"
     ]
    }
   ],
   "source": [
    "project.create_linked_view(prefix=\"projects/tutorial/view\")\n",
    "%ls projects/tutorial/view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The view paths only contain parameters which actually vary across the different jobs.\n",
    "In this example, that is only the pressure *p*.\n",
    "\n",
    "This allows us to examine the data with highly-compact human-readable path names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "V.txt  signac_job_document.json  signac_statepoint.json\r\n",
      "1000.0\r\n"
     ]
    }
   ],
   "source": [
    "%ls 'projects/tutorial/view/p/1.0/job/'\n",
    "%cat 'projects/tutorial/view/p/1.0/job/V.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE: Update your view after adding or removing jobs by executing the view command for the same prefix again!**\n",
    "\n",
    "Tip: Consider creating a linked view for large data sets on an [**in-memory** file system](https://en.wikipedia.org/wiki/Tmpfs) for best performance!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [next section](signac_103_A_Basic_Workflow.ipynb) will demonstrate how to implement a basic, but complete workflow for more expensive computations."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}