Python environment with Pipenv, Jupyter, and EIN
Lately I’ve been using more Python, and I think I’ve arrived at a decent workflow. Clojure opened my eyes to the joy and power that interactivity and quick iteration bring to programming, and while Python’s interactive dev experience doesn’t feel quite as seamless as Clojure’s, Jupyter/IPython Notebook and the Python REPL are nice.
Here I’m going to talk about setting up a Python development environment using Pipenv, and then interactively developing within that environment using Jupyter. This workflow is focused on robust dependency management/isolation and fast iteration.
My development needs might not necessarily align with the needs of a Django developer or a sysadmin using Python. I’ve mostly been using Python to write API data extraction scripts for work, and machine learning applications for grad school. This setup also works nicely with tools I’m already using (Ubuntu/macOS and Emacs). I haven’t used PyCharm, but I’ve heard good things about it (and I like JetBrains). Another Python thing worth checking out is the popular Anaconda data science platform.
Pipenv is a Python dependency manager. Functionally, it’s a combination of pip and virtualenv. It’s officially recommended by Python.org. It’s used it to install and keep track of required project dependencies and keep them isolated from the rest of the system.
It’s easy to install using pip or Homebrew:
brew install pipenv # using Homebrew on macOS
And creating an empty Python3 environment is straightforward:
$ mkdir helloworld $ cd helloworld/ $ pipenv --three
A basically empty
Pipfile is created:
[[source]] url = "https://pypi.python.org/simple" verify_ssl = true name = "pypi" [packages] [dev-packages] [requires] python_version = "3.6"
Let’s install some libraries:
$ pipenv install pandas numpy matplotlib
Pipenv file now has the required libraries listed:
[[source]] url = "https://pypi.python.org/simple" verify_ssl = true name = "pypi" [packages] pandas = "*" numpy = "*" matplotlib = "*" [dev-packages] [requires] python_version = "3.6"
You’ll also notice a file called
Pipfile.lock has been created – this is a record of the whole dependency graph of the project. It should be checked into source control, as Pipenv can use it to ensure deterministic builds.
pipenv graph command lists these inter-library dependencies in a more readable way:
$ pipenv graph matplotlib==2.1.2 - cycler [required: >=0.10, installed: 0.10.0] - six [required: Any, installed: 1.11.0] - numpy [required: >=1.7.1, installed: 1.14.0] - pyparsing [required: >=2.0.1,!=2.1.6,!=2.0.4,!=2.1.2, installed: 2.2.0] - python-dateutil [required: >=2.1, installed: 2.6.1] - six [required: >=1.5, installed: 1.11.0] - pytz [required: Any, installed: 2017.3] - six [required: >=1.10, installed: 1.11.0] pandas==0.22.0 - numpy [required: >=1.9.0, installed: 1.14.0] - python-dateutil [required: >=2, installed: 2.6.1] - six [required: >=1.5, installed: 1.11.0] - pytz [required: >=2011k, installed: 2017.3]
Once our environment is set up, we can begin using it. To spawn a new shell using the Pipenv environment:
$ pipenv shell Spawning environment shell (/bin/bash). Use 'exit' to leave. bash-3.2$ source /Users/m/.local/share/virtualenvs/helloworld-6Ag-sbDH/bin/activate (helloworld-6Ag-sbDH) bash-3.2$ python Python 3.6.4 (default, Jan 6 2018, 11:51:59) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> pd.__version__ '0.22.0'
Cool. But how about if we want to execute a script?
$ printf "import pandas as pd\nprint(pd.__version__)" > myscript.py $ pipenv run python myscript.py 0.22.0
Note that this won’t work if we attempt to invoke the script outside of the virtual environment, since the
pandas dependency is isolated to the environment we just created:
$ python myscript.py Traceback (most recent call last): File "myscript.py", line 1, in <module> import pandas ImportError: No module named pandas
This is good and desirable – it means that if we’re developing another Python program on this system that depends on a different version of the
pandas library, we won’t be subject to nuanced dependency bugs that can be difficult to find and correct. And if a colleague is working on this same project on another system, we can both rely on our environments being the same.
Project Jupyter and the IPython Notebook are tools used for interactive programming (that’s what the “I” in “IPython” stands for). Jupyter supports other language kernels like R and Ruby as well.
We can install Jupyter easily within our Pipenv environment:
$ pipenv install jupyter
It’s also possible to create an IPython kernel from this environment and give it a name:
$ pipenv run python -m ipykernel install --user --name mygreatenv --display-name "My Great Env"
The notebook can be started by using
$ pipenv run jupyter notebook
Which will serve the notebook software locally and open it in a browser.
I won’t go into actually using Jupyter Notebook for interactive Python development, but it’s fairly intuitive and is well-suited for experimentation.
Emacs IPython Notebook
Today I played with an Emacs plugin called Emacs IPython Notebook to be able to connect directly to an IPython notebook kernel and evaluate code within Emacs. At first glance there are commands for most of the functions offered in the browser-based UI.
It took a bit of trial and error and internet-searching to figure out how to connect to the notebook server. When Jupyter Notebook starts, it generates a token used to authenticate a client connecting to the server. This token can be entered at the password prompt when running
ein:notebooklist-login. Once authenticated, the command
ein:notebooklist-open shows the current Notebook server’s file list, and lets you create or connect to a notebook.
I had been using the web-based UI with the jupyter-vim-binding extension for a short period, but I may switch over to Emacs + EIN. It’s nice to be able to introduce new tooling into an ecosystem you’re already comfortable in.