{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "9b0fb3d2-f23f-416d-bc01-7d817ad98cc9"
    }
   },
   "source": [
    "# 1.3 A Basic Workflow\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "Please note: The following notebook requires you first run [signac_101_Getting_Started](signac_101_Getting_Started.ipynb).\n",
    "\n",
    "</div>\n",
    "\n",
    "This part of the tutorial requires [NumPy](https://numpy.org)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "9b0fb3d2-f23f-416d-bc01-7d817ad98cc9"
    }
   },
   "source": [
    "## Operations\n",
    "\n",
    "For this part of the tutorial we will imagine that we are still not convinced of the pressure-volume relations that we just \"discovered\" and that calculating the volume is actually a very expensive procedure, such as a many particle simulation with [HOOMD-blue](https://glotzerlab.engin.umich.edu/hoomd-blue/).\n",
    "\n",
    "We emulate this by adding an optional *cost* argument to our volume calculation function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "nbpresent": {
     "id": "b89db729-97f6-4734-a219-45f7279c60f1"
    }
   },
   "outputs": [],
   "source": [
    "from time import sleep\n",
    "\n",
    "\n",
    "def V_idg(N, p, kT, cost=0):\n",
    "    sleep(cost)\n",
    "    return N * kT / p"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "254fed93-2d9e-485a-868e-868b8ab119cc"
    }
   },
   "source": [
    "It is useful to think of each modification of the workspace, that includes addition, modification, and removal of data, in terms of an *operation*.\n",
    "\n",
    "**An operation should take only one(!) argument: the job handle.**\n",
    "\n",
    "Any additional arguments may represent hidden state point parameters which would lead to a loss of provenance and possibly render our data space inconsistent.\n",
    "\n",
    "The following function is an example for an operation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "nbpresent": {
     "id": "a8042a93-6932-48d3-ab2e-1b51ab91b759"
    }
   },
   "outputs": [],
   "source": [
    "def compute_volume(job):\n",
    "    print(\"Computing volume of\", job)\n",
    "    V = V_idg(cost=1, **job.statepoint())\n",
    "    job.document[\"V\"] = V\n",
    "    with open(job.fn(\"V.txt\"), \"w\") as file:\n",
    "        file.write(str(V) + \"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This operation computes the volume solely based on the state point parameters and stores the results such that they are clearly associated with the job, i.e., in the [job document](https://signac.readthedocs.io/en/latest/projects.html#the-job-document) and in a [file](http://signac.readthedocs.io/en/latest/signac.contrib.html#signac.contrib.job.Job.fn) within the job's workspace.\n",
    "\n",
    "*Please note, that the only reason for storing the the same result in two different ways is for demonstration purposes.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Execution\n",
    "\n",
    "To execute our first data space operation, we simply loop through our project's data space:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing volume of 5a6c687f7655319db24de59a2336eff8\n",
      "Computing volume of 5a456c131b0c5897804a4af8e77df5aa\n",
      "Computing volume of ee617ad585a90809947709a7a45dda9a\n"
     ]
    }
   ],
   "source": [
    "import signac\n",
    "\n",
    "project = signac.get_project(\"projects/tutorial\")\n",
    "\n",
    "for job in project:\n",
    "    compute_volume(job)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "66bce051-6aee-4a78-987b-120d55402824"
    }
   },
   "source": [
    "## Data Space Initialization\n",
    "\n",
    "Since our operation is now more expensive, it is a good idea to split initialization and execution.\n",
    "Let's initialize a few more state points in one go:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "nbpresent": {
     "id": "08342212-d34f-4763-9d58-d5b78a70bb73"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "initialize 5a6c687f7655319db24de59a2336eff8\n",
      "initialize d03270cdbbae73c8bb1d9fa0ab370264\n",
      "initialize 973e29d6a4ed6cf7329c03c77df7f645\n",
      "initialize 4cf2795722061df825ec9a4d5e31e494\n",
      "initialize 5a456c131b0c5897804a4af8e77df5aa\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import signac\n",
    "\n",
    "project = signac.get_project(\"projects/tutorial\")\n",
    "\n",
    "\n",
    "def init_statepoints(n):\n",
    "    for p in np.linspace(0.1, 10.0, n):\n",
    "        sp = {\"p\": float(p), \"kT\": 1.0, \"N\": 1000}\n",
    "        job = project.open_job(sp)\n",
    "        job.init()\n",
    "        print(\"initialize\", job)\n",
    "\n",
    "\n",
    "init_statepoints(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "635ead70-2f3a-4676-ba62-d7d1328d52ad"
    }
   },
   "source": [
    "We see that initializing more jobs and even reinitializing old jobs is no problem.\n",
    "However, since our calculation will be \"expensive\", we would want to skip the computation whenever the result is already available.\n",
    "\n",
    "One possibility is to add a simple check before executing the computation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "nbpresent": {
     "id": "c5b60f8c-5919-48fe-abb9-9140b41b12b8"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing volume of 4cf2795722061df825ec9a4d5e31e494\n",
      "Computing volume of 973e29d6a4ed6cf7329c03c77df7f645\n",
      "Computing volume of d03270cdbbae73c8bb1d9fa0ab370264\n"
     ]
    }
   ],
   "source": [
    "for job in project:\n",
    "    if \"V\" not in job.document:\n",
    "        compute_volume(job)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "425cdcfe-046c-4908-8bbe-218f07240fa2"
    }
   },
   "source": [
    "## Classification\n",
    "\n",
    "It would be even better, if we could get an overview of which state points have been calculated and which not.\n",
    "We call this a project's *status*.\n",
    "\n",
    "Before we continue, let's initialize a few more state points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "nbpresent": {
     "id": "6c86712c-93c3-40de-b4e2-8c99fc0ea114"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "initialize 5a6c687f7655319db24de59a2336eff8\n",
      "initialize 22582e83c6b12336526ed304d4378ff8\n",
      "initialize c0ab2e09a6f878019a6057175bf718e6\n",
      "initialize 9110d0837ad93ff6b4013bae30091edd\n",
      "initialize b45a2485a44a46364cc60134360ea5af\n",
      "initialize 05061d2acea19d2d9a25ac3360f70e04\n",
      "initialize 665547b1344fe40de5b2c7ace4204783\n",
      "initialize 8629822576debc2bfbeffa56787ca348\n",
      "initialize e8186b9b68e18a82f331d51a7b8c8c15\n",
      "initialize 5a456c131b0c5897804a4af8e77df5aa\n"
     ]
    }
   ],
   "source": [
    "init_statepoints(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "95ef85d2-6d8c-4f9d-a4f6-ddb692c66053"
    }
   },
   "source": [
    "Next, we implement a `classify()` [generator function](http://stackoverflow.com/questions/1756096/understanding-generators-in-python), which labels a *job* based on certain conditions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "nbpresent": {
     "id": "c4103f15-9991-4aae-8844-cacfee4a62b0"
    }
   },
   "outputs": [],
   "source": [
    "def classify(job):\n",
    "    yield \"init\"\n",
    "    if \"V\" in job.document and job.isfile(\"V.txt\"):\n",
    "        yield \"volume-computed\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "d76a4d9c-96f8-4515-a798-4fa2c98e5e96"
    }
   },
   "source": [
    "Our classifier will always yield the `init` label, but the `volume-computed` label is only yielded if the result has been computed and stored both in the *job document* and as a text file.\n",
    "We can then use this function to get an overview of our project's status."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "nbpresent": {
     "id": "316f0e4c-0f68-433a-a7d8-3b21dfd53b4c"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Status: notebooks/projects/tutorial\n",
      "22582e83c6b12336526ed304d4378ff8 1.2 init\n",
      "4cf2795722061df825ec9a4d5e31e494 7.5 init, volume-computed\n",
      "b45a2485a44a46364cc60134360ea5af 4.5 init\n",
      "e8186b9b68e18a82f331d51a7b8c8c15 8.9 init\n",
      "5a6c687f7655319db24de59a2336eff8 0.1 init, volume-computed\n",
      "973e29d6a4ed6cf7329c03c77df7f645 5.0 init, volume-computed\n",
      "d03270cdbbae73c8bb1d9fa0ab370264 2.6 init, volume-computed\n",
      "05061d2acea19d2d9a25ac3360f70e04 5.6 init\n",
      "665547b1344fe40de5b2c7ace4204783 6.7 init\n",
      "8629822576debc2bfbeffa56787ca348 7.8 init\n",
      "5a456c131b0c5897804a4af8e77df5aa 10.0 init, volume-computed\n",
      "c0ab2e09a6f878019a6057175bf718e6 2.3 init\n",
      "9110d0837ad93ff6b4013bae30091edd 3.4 init\n",
      "ee617ad585a90809947709a7a45dda9a 1.0 init, volume-computed\n"
     ]
    }
   ],
   "source": [
    "print(f\"Status: {project}\")\n",
    "for job in project:\n",
    "    labels = \", \".join(classify(job))\n",
    "    p = round(job.sp.p, 1)\n",
    "    print(job, p, labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "fd2269a9-b11e-45a5-837b-ee8aa81c47c8"
    }
   },
   "source": [
    "Using only simple classification functions, we already get a very good grasp on our project's overall status.\n",
    "\n",
    "Furthermore, we can use the classification labels for controling the execution of operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "nbpresent": {
     "id": "e393ff4c-e0b9-47f4-81f5-edd5732bb63a"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing volume of 22582e83c6b12336526ed304d4378ff8\n",
      "Computing volume of b45a2485a44a46364cc60134360ea5af\n",
      "Computing volume of e8186b9b68e18a82f331d51a7b8c8c15\n",
      "Computing volume of 05061d2acea19d2d9a25ac3360f70e04\n",
      "Computing volume of 665547b1344fe40de5b2c7ace4204783\n",
      "Computing volume of 8629822576debc2bfbeffa56787ca348\n",
      "Computing volume of c0ab2e09a6f878019a6057175bf718e6\n",
      "Computing volume of 9110d0837ad93ff6b4013bae30091edd\n"
     ]
    }
   ],
   "source": [
    "for job in project:\n",
    "    labels = classify(job)\n",
    "    if \"volume-computed\" not in labels:\n",
    "        compute_volume(job)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parallelization\n",
    "\n",
    "So far, we have executed all *operations* in serial using a simple for-loop.\n",
    "We will now learn how to easily **parallelize** the execution!\n",
    "\n",
    "Instead of using a `for-loop`, we can also take advantage of Python's built-in [map-operator](https://docs.python.org/3/library/functions.html#map):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing volume of 22582e83c6b12336526ed304d4378ff8\n",
      "Computing volume of 4cf2795722061df825ec9a4d5e31e494\n",
      "Computing volume of b45a2485a44a46364cc60134360ea5af\n",
      "Computing volume of e8186b9b68e18a82f331d51a7b8c8c15\n",
      "Computing volume of 5a6c687f7655319db24de59a2336eff8\n",
      "Computing volume of 973e29d6a4ed6cf7329c03c77df7f645\n",
      "Computing volume of d03270cdbbae73c8bb1d9fa0ab370264\n",
      "Computing volume of 05061d2acea19d2d9a25ac3360f70e04\n",
      "Computing volume of 665547b1344fe40de5b2c7ace4204783\n",
      "Computing volume of 8629822576debc2bfbeffa56787ca348\n",
      "Computing volume of 5a456c131b0c5897804a4af8e77df5aa\n",
      "Computing volume of c0ab2e09a6f878019a6057175bf718e6\n",
      "Computing volume of 9110d0837ad93ff6b4013bae30091edd\n",
      "Computing volume of ee617ad585a90809947709a7a45dda9a\n",
      "Done.\n"
     ]
    }
   ],
   "source": [
    "list(map(compute_volume, project))\n",
    "print(\"Done.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the `map()` expression makes it trivial to implement parallelization patterns, for example, using a process [Pool](https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.pool.Pool). We provide these code snippets as examples but they are not executable. Instead, we recommend using the **signac-flow** package, which is designed for executing parallel workflows.\n",
    "\n",
    "```python\n",
    "from multiprocessing import Pool\n",
    "\n",
    "with Pool() as pool:\n",
    "    pool.map(compute_volume, list(project))\n",
    "```\n",
    "\n",
    "Or a `ThreadPool`:\n",
    "```python\n",
    "from multiprocessing.pool import ThreadPool\n",
    "\n",
    "with ThreadPool() as pool:\n",
    "    pool.map(compute_volume, list(project))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbpresent": {
     "id": "cedcb2a4-7e91-4457-bb2b-fc581f039218"
    }
   },
   "source": [
    "Uncomment and execute the following line if you want to remove all data and start over."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "nbpresent": {
     "id": "471ffb2a-a5f8-4e1c-9ebe-1f2007896a0b"
    }
   },
   "outputs": [],
   "source": [
    "# %rm -r projects/tutorial/workspace"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section we learned how to create a simple, yet complete workflow for our computational investigation.\n",
    "\n",
    "In the [next section](signac_104_Modifying_the_Data_Space.ipynb) we will learn how to adjust the data space, e.g., modify existing state point parameters."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  },
  "nbpresent": {
   "slides": {},
   "themes": {}
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}