• Python environment with Pipenv, Jupyter, and EIN

    Update 4/2019: This post gets a lot of traffic, so I wanted to note that the Python tooling described herein isn’t exactly what I’d recommend anymore. Specifically, I’d probably recommend Poetry over Pipenv if you need pinned dependencies, and maybe just pip and virtualenv if you’re developing a library or something small / local. I also haven’t used EIN much. Here’s a good post about Python tooling.

    Lately I’ve been using more Python, and I think I’ve arrived at a decent workflow. Clojure opened my eyes to the joy and power that interactivity and quick iteration bring to programming, and while Python’s interactive dev experience doesn’t feel quite as seamless as Clojure’s, Jupyter/IPython Notebook and the Python REPL are nice.

    Here I’m going to talk about setting up a Python development environment using Pipenv, and then interactively developing within that environment using Jupyter. This workflow is focused on robust dependency management/isolation and fast iteration.

    My development needs might not necessarily align with the needs of a Django developer or a sysadmin using Python. I’ve mostly been using Python to write API data extraction scripts for work, and machine learning applications for grad school. This setup also works nicely with tools I’m already using (Ubuntu/macOS and Emacs). I haven’t used PyCharm, but I’ve heard good things about it (and I like JetBrains). Another Python thing worth checking out is the popular Anaconda data science platform.

    Pipenv

    Pipenv is a Python dependency manager. Functionally, it’s a combination of pip and virtualenv. It’s officially recommended by Python.org. It’s used it to install and keep track of required project dependencies and keep them isolated from the rest of the system.

    It’s easy to install using pip or Homebrew:

    brew install pipenv # using Homebrew on macOS
    

    And creating an empty Python3 environment is straightforward:

    $ mkdir helloworld
    $ cd helloworld/
    $ pipenv --three
    

    A basically empty Pipfile is created:

    [[source]]
    
    url = "https://pypi.python.org/simple"
    verify_ssl = true
    name = "pypi"
    
    
    [packages]
    
    
    
    [dev-packages]
    
    
    
    [requires]
    
    python_version = "3.6"
    

    Let’s install some libraries:

    $ pipenv install pandas numpy matplotlib
    

    Our Pipenv file now has the required libraries listed:

    [[source]]
    
    url = "https://pypi.python.org/simple"
    verify_ssl = true
    name = "pypi"
    
    
    [packages]
    
    pandas = "*"
    numpy = "*"
    matplotlib = "*"
    
    
    [dev-packages]
    
    
    
    [requires]
    
    python_version = "3.6"
    

    You’ll also notice a file called Pipfile.lock has been created – this is a record of the whole dependency graph of the project. It should be checked into source control, as Pipenv can use it to ensure deterministic builds.

    The pipenv graph command lists these inter-library dependencies in a more readable way:

    $ pipenv graph
    matplotlib==2.1.2
      - cycler [required: >=0.10, installed: 0.10.0]
        - six [required: Any, installed: 1.11.0]
      - numpy [required: >=1.7.1, installed: 1.14.0]
      - pyparsing [required: >=2.0.1,!=2.1.6,!=2.0.4,!=2.1.2, installed: 2.2.0]
      - python-dateutil [required: >=2.1, installed: 2.6.1]
        - six [required: >=1.5, installed: 1.11.0]
      - pytz [required: Any, installed: 2017.3]
      - six [required: >=1.10, installed: 1.11.0]
    pandas==0.22.0
      - numpy [required: >=1.9.0, installed: 1.14.0]
      - python-dateutil [required: >=2, installed: 2.6.1]
        - six [required: >=1.5, installed: 1.11.0]
      - pytz [required: >=2011k, installed: 2017.3]
    

    Once our environment is set up, we can begin using it. To spawn a new shell using the Pipenv environment: pipenv shell.

    $ pipenv shell
    Spawning environment shell (/bin/bash). Use 'exit' to leave.
    bash-3.2$ source /Users/m/.local/share/virtualenvs/helloworld-6Ag-sbDH/bin/activate
    (helloworld-6Ag-sbDH) bash-3.2$ python
    Python 3.6.4 (default, Jan  6 2018, 11:51:59)
    [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pandas as pd
    >>> pd.__version__
    '0.22.0'
    

    Cool. But how about if we want to execute a script?

    $ printf "import pandas as pd\nprint(pd.__version__)" > myscript.py
    $ pipenv run python myscript.py
    0.22.0
    

    Note that this won’t work if we attempt to invoke the script outside of the virtual environment, since the pandas dependency is isolated to the environment we just created:

    $ python myscript.py
    Traceback (most recent call last):
      File "myscript.py", line 1, in <module>
        import pandas
    ImportError: No module named pandas
    

    This is good and desirable – it means that if we’re developing another Python program on this system that depends on a different version of the pandas library, we won’t be subject to nuanced dependency bugs that can be difficult to find and correct. And if a colleague is working on this same project on another system, we can both rely on our environments being the same.

    Jupyter

    Project Jupyter and the IPython Notebook are tools used for interactive programming (that’s what the “I” in “IPython” stands for). Jupyter supports other language kernels like R and Ruby as well.

    We can install Jupyter easily within our Pipenv environment:

    $ pipenv install jupyter
    

    It’s also possible to create an IPython kernel from this environment and give it a name:

    $ pipenv run python -m ipykernel install --user --name mygreatenv --display-name "My Great Env"
    

    The notebook can be started by using pipenv run:

    $ pipenv run jupyter notebook
    

    Which will serve the notebook software locally and open it in a browser.

    I won’t go into actually using Jupyter Notebook for interactive Python development, but it’s fairly intuitive and is well-suited for experimentation.

    Emacs IPython Notebook

    Today I played with an Emacs plugin called Emacs IPython Notebook to be able to connect directly to an IPython notebook kernel and evaluate code within Emacs. At first glance there are commands for most of the functions offered in the browser-based UI.

    It took a bit of trial and error and internet-searching to figure out how to connect to the notebook server. When Jupyter Notebook starts, it generates a token used to authenticate a client connecting to the server. This token can be entered at the password prompt when running ein:notebooklist-login. Once authenticated, the command ein:notebooklist-open shows the current Notebook server’s file list, and lets you create or connect to a notebook.

    I had been using the web-based UI with the jupyter-vim-binding extension for a short period, but I may switch over to Emacs + EIN. It’s nice to be able to introduce new tooling into an ecosystem you’re already comfortable in.


  • Constructing a list from an iterable in Python

    I discovered this unexpected behavior in Python when a generator function yields an (object) element, mutates that object, and yields it again. Iterating over the result of the generator has a different effect than constructing a list from that iterable:

    >>> def gen_stuff():
    ...   output = {}
    ...   yield output
    ...   output['abc'] = 123
    ...   yield output
    ...
    >>> yielded = gen_stuff()
    >>> for y in yielded: print(y)
    ...
    {}
    {'abc': 123}
    >>> yielded = gen_stuff()
    >>> list(yielded)
    [{'abc': 123}, {'abc': 123}]
    

    Not sure what’s going on here…


  • Using the Google Places API in Google Sheets

    My girlfriend and I were making a list of places to visit while on vacation in a new city. We decided to put this data in a spreadsheet so that we could easily see and keep track of the different types of places we were considering and other data like their cost, rating, etc.

    It seemed annoying to have to copy data straight from Google Maps/Places into a spreadsheet, so I used this as an excuse to play with the Google Places API.

    I wanted to create a custom function in sheets that would accept as input the URL to a Google Maps place, and would populate some cells with data about that place. This way we could discover places in Google Maps, and then quickly get info about those places into our tracking sheet.

    Google Maps URLs look like this:

    https://www.google.com/maps/place/Dirty+Franks/@39.9453658,-75.1628075,15z/data=!4m5!3m4!1s0x0:0x26f65f8548e1f772!8m2!3d39.9453658!4d-75.1628075

    It’s straightforward to parse from this URL the place name and the latitude/longitude. Those pieces of info can be fed to the Places Text Search Service to get structured info about the place in question. E.g.

    $ curl "https://maps.googleapis.com/maps/api/place/textsearch/json?query=Dirty+Franks&location=39.9453658,-75.1628075&radius=500&key=$API_KEY"
    {
       "html_attributions" : [],
       "results" : [
          {
             "formatted_address" : "347 S 13th St, Philadelphia, PA 19107, United States",
             "geometry" : {
                "location" : {
                   "lat" : 39.9453658,
                   "lng" : -75.1628075
                },
                "viewport" : {
                   "northeast" : {
                      "lat" : 39.9467659302915,
                      "lng" : -75.1615061697085
                   },
                   "southwest" : {
                      "lat" : 39.9440679697085,
                      "lng" : -75.16420413029151
                   }
                }
             },
             "icon" : "https://maps.gstatic.com/mapfiles/place_api/icons/bar-71.png",
             "id" : "30371f87239f7f5259d9b24a62d8ec7c32861097",
             "name" : "Dirty Franks",
             "opening_hours" : {
                "open_now" : true,
                "weekday_text" : []
             },
             "photos" : [
                {
                   "height" : 608,
                   "html_attributions" : [
                      "\u003ca href=\"https://maps.google.com/maps/contrib/114919395575905373294/photos\"\u003eDirty Franks\u003c/a\u003e"
                   ],
                   "photo_reference" : "CmRaAAAAY-2fs6cFG21uVFP33Aguxwy4q_cCx8Z46lOGazGyNNlRhn6ar90Drb8Z4gZnuVdyQZsvwPXfmOl8efqfiJrfMf01QgLN9KKZh5-eRfTcZFkIQ5kO08xTOH5nUjiy0G-NEhCgLdOf6afTjgF7sC9V_JOyGhQBxnxXYmtQe-kXF8dIk-mSEhFgJQ",
                   "width" : 1080
                }
             ],
             "place_id" : "ChIJAxzOXSTGxokRcvfhSIVf9iY",
             "price_level" : 2,
             "rating" : 4.3,
             "reference" : "CmRRAAAAIhlMRQZtM9JbwJYXeGPWWkP70ujjPj6NlK_1ZXQSefVk5oNa22vqseV1ySiti3zXMyZuzSn5DIQEBQoqTOmmFLH7iHp6Lr1XGZ5x0zVaUZFjvD2EYDHxbICvMNRaBWOIEhCAHJaIxUcjP5kw6FJqhhzTGhSJsWZQ09kuYNFpk9-xAM4EyQWRNQ",
             "types" : [ "bar", "point_of_interest", "establishment" ]
          }
       ],
       "status" : "OK"
    }
    

    That JSON data can be ingested by the sheet. Custom functions in Google Sheets, I found out, can return nested arrays of data to fill surrounding cells, like this:

    return [ [ "this cell", "one cell to the right", "two cells to the right" ],
             [ "one cell down", "one down and to the right", "one down and two to the right" ] ];
    

    Here’s the resultant code to populate my sheet with Places data:

    function locUrlToQueryUrl(locationUrl) {
      var API_KEY = 'AIz********************';
      var matches = locationUrl.match(/maps\/place\/(.*)\/@(.*),/);
      var name = matches[1];
      var latLon = matches[2];
      var baseUrl = 'https://maps.googleapis.com/maps/api/place/textsearch/json';
      var queryUrl = baseUrl + '?query=' + name + '&location=' +  latLon + '&radius=500&key=' + API_KEY;
      return queryUrl;
    }
    
    function GET_LOC(locationUrl) {
      if (locationUrl == '') {
        return 'Give me a Google Places URL...';
      }
      var queryUrl = locUrlToQueryUrl(locationUrl);
      var response = UrlFetchApp.fetch(queryUrl);
      var json = response.getContentText();
      var place = JSON.parse(json).results[0];
      var place_types = place.types.join(", ");
      var price_level = [];
      for (var i = 0; i < place.price_level; i++) { price_level.push('$'); }
      price_level = price_level.join('')
      
      return [[ place.name,
                place.formatted_address,
                place_types,
                place.rating,
                price_level ]];
    }
    

    The function can be used like any of the built-in Sheets functions by entering a formula into a cell like =GET_LOC(A1). And voila:

     

     


  • painting clouds with clojure

    Over the 4th of July weekend I made this little program to generate images of clouds

     

    (ns clouds.core
      (:gen-class)
      (:import [java.awt.image BufferedImage]
               [java.io File]
               [javax.imageio ImageIO]
               [javax.swing JPanel JFrame SwingUtilities]
               [java.awt Graphics Color Dimension RenderingHints]))
    
    (def width 500)
    (def height 500)
    (def num-particles 1000000)
    (def color-cache (atom {}))
    (def output-image? true)
    
    (defn- rand-between [min max]
      (+ (rand-int (- max min)) min))
    
    (defn- make-gray-color [color-val alpha]
      (let [color-key (keyword (str color-val "-" alpha))
            cached-color (color-key @color-cache)]
        (if cached-color
          cached-color
          (let [^Color new-color (Color. color-val
                                         color-val
                                         color-val
                                         alpha)]
            (swap! color-cache assoc color-key new-color)
            new-color))))
    
    (defn- paint-clouds [^Graphics graphics]
      (loop [n 0
             last-x (rand-int width)
             last-y (rand-int height)]
        (let [rand-op (if (< (rand) 0.5) inc dec)
              rand-axis (if (< (rand) 0.5) :vert :horiz)
              new-x (if (= rand-axis :horiz)
                      (rand-op last-x)
                      last-x)
              new-y (if (= rand-axis :vert)
                      (rand-op last-y)
                      last-y)
              rand-gray (rand-between 250 255)
              rand-alpha (rand-int 75)
              neighbor-alpha-modifier 0.11
              particle-color (make-gray-color rand-gray rand-alpha)
              neighbor-color (make-gray-color rand-gray
                                              (int (* rand-alpha
                                                      neighbor-alpha-modifier)))]
          (doall
           (for [x-offset (range -1 2)
                 y-offset (range -1 2)
                 :let [x (+ new-x x-offset)
                       y (+ new-y y-offset)]
                 :when (and (<= 0 x width)
                            (<= 0 y height)
                            (or (= x-offset y-offset 0)
                                (not= x-offset y-offset)))]
             (let [^Color color (if (= x-offset y-offset 0)
                                  particle-color
                                  neighbor-color)]
               (doto graphics
                 (.setColor color)
                 (.drawLine x y x y)))))
          (when (< n num-particles)
            (recur (inc n) new-x new-y)))))
    
    (defn- painter []
      (proxy [JPanel] []
        (paint [^Graphics graphics]
          (let [^int width (proxy-super getWidth)
                ^int height (proxy-super getHeight)]
            (doto graphics
              (.setRenderingHint RenderingHints/KEY_ANTIALIASING
                                 RenderingHints/VALUE_ANTIALIAS_ON)
              (.setRenderingHint RenderingHints/KEY_INTERPOLATION
                                 RenderingHints/VALUE_INTERPOLATION_BICUBIC)
              (.setColor (Color. 135 206 250))
              (.fillRect 0 0 width height))
            (paint-clouds graphics)))))
    
    (defn- gen []
      (let [^JPanel painting-panel (painter)
            ^Dimension dim (Dimension. width height)]
        (doto painting-panel
          (.setSize dim)
          (.setPreferredSize dim))
        (if output-image?
          (let [^BufferedImage bi (BufferedImage. width
                                                  height
                                                  BufferedImage/TYPE_INT_ARGB)
                ^Graphics graphics (.createGraphics bi)]
            (.paint painting-panel graphics)
            (ImageIO/write bi "png" (File. (str "output/"
                                                (System/currentTimeMillis)
                                                ".png"))))
          (let [^JFrame frame (JFrame. "clouds")]
            (.add (.getContentPane frame) painting-panel)
            (doto frame
              (.pack)
              (.setVisible true))))))
    
    (defn -main
      [& args]
      (gen))
    

  • Getting average motorcycle price across all Craigslist cities

    Today I’m going to look at a motorcycle that’s for sale on Craigslist. The asking price for the bike seems fair, but I wanted to get a sense for what other people were asking for the same model and year.

    First I did a local search for the motorcycle I was interested in using the year, make and model search filters. The resultant URL was

    https://philadelphia.craigslist.org/search/mcy?srchType=T&auto_make_model=suzuki+TU250X&min_auto_year=2012&max_auto_year=2012
    

    This returned all the listings in Philadelphia for a 2012 Suzuki TU250X. The srchType=T parameter filters to only include results that have a match in the listing title.

    Using pup, a command-line tool for parsing HTML, I extracted the asking price of the motorcycle in the search result listing.

    curl -s "https://philadelphia.craigslist.org/search/mcy?srchType=T&auto_make_model=suzuki+TU250X&min_auto_year=2012&max_auto_year=2012" | \
    pup 'ul.rows li.result-row p.result-info span.result-meta span.result-price text{}'
    

    There is a CL page that lists every Craigslist site in the US. I parsed that for each location’s specific URL.

    curl -s "https://geo.craigslist.org/iso/us" | \
    pup 'div.geo-site-list-container a attr{href}'
    

    I combined these

    curl -s "https://geo.craigslist.org/iso/us" | \
    pup 'div.geo-site-list-container a attr{href}' | \
    while read location;
     do curl -s "$location/search/mcy?srchType=T&auto_make_model=suzuki+TU250X&min_auto_year=2012&max_auto_year=2012" | \
     pup 'ul.rows li.result-row p.result-info span.result-meta span.result-price text{}';
    done
    

    which outputs the asking prices…

    $3800
    $2750
    $2800
    $2950
    $3800
    $3750
    $2800
    $2400
    $2750
    $2950
    $2750
    $3800
    $3750
    $2700
    $2400
    ...
    

    I was then able to see how the price of the motorcycle in which I was interested compared to similar bikes throughout the US.