Moving to Berlin with the help of IPython and friends

Berlin Skyline

TL;DR

I help my girlfriend look for a flat using (I)Python and friends. I plot in a map apartments that match her criteria along with the time it takes to reach her workplace using public transportaton. I showcase some Python libraries like Pandas and Scrappy along with some features of IPython notebook to work with Google Maps API. Code and notebooks available on Github.

Intro

Moving to a new city is not an easy task. Among all the things, one of the most time consuming is finding a place to live. It is not easy, there are many variables to take into account and if you don't use an agency looking for one might be boring and repetitive.

Assuming you use the web to get some possible apartments, once you find a good candidate, you generally have to check the address, check the surroundings (e.g. Stores, Cafes) and which public transportation is available. You generally also have to check how long it will take you to get to your working place or the city center, either by car or public transportation. This is important, as Stutzer and Frey found "that a person with a one-hour commute has to earn 515 Euro more (or 40% of an average monthly wage in Germany) to compensate for the dissatisfaction caused by their long commute" [source].

If you use Google or specialized websites like ImmobilienScout24 (in Germany), you probably have to go through process of searching for it, checking wheter that apartment matches your criteria (i.e. number of rooms, size, rent price, etc). In addition to that, you have to check how far or how much time will you need to get to work.

There is actually a nice tool written by a Berliner that can help you with the last part called Mapficient. Mapficient can show you graphically areas you can reach with public transport in a given time and it is available for many cities. However, in order to use the tool you have to add the latitud and longitude coordinates manually for each of the candidate apartments.

That is the problem that my girlfriend is facing. She is moving to Berlin next month and she wants an apartment that matches her criteria and from where she could reach her working place in the shortest possible amount of time. So I decided to help her (us?) a bit with some assistance of Python/IPython and some services.

Since I read Karim's blog post and attended his presentation at the Munich Datageeks Meetup, I got interested in how to harness open data to automate or improve otherwise boring and time-consuming tasks.

I also googled a little bit before coding and stumpled upon a nice article by Robin Clarke, a guy living in Munich, and how he looked for an area in the city where he could reach the center of Munich in a specific time. He even built a super duper visualization that you can see below:

In [49]:
from IPython.display import HTML
HTML('<iframe src="https://www.google.com/fusiontables/embedviz?viz=MAP&q=select+col1+from+2304677+&h=false \
     &lat=48.19187395469069&lng=11.499547000000007&z=10&t=1&l=col1" width=800 height=400></iframe>')
Out[49]:

The lighter the area the less time you need from that location to reach the Munich's city center. In theory you could calculate something similar from any point to another arbitrary point in a city (and that's what Mapficient does), but I did not want to do anything that complex as I like more Street Fighting Data Science.

However this gave me an Idea: Why don't I plot in a Google Map only the apartments that have the characteristics I want (she wants) along with the time it takes to get to my girlfriend's workplace? I don't know the Google Map API nor Javascript but it can't be that hard.

Getting the Data

Here comes Web Scraping handy. Althought I had never used it, I knew there was a popular framework for Python called Scrapy. This was a nice opportunity to learn a bit about it. I wrote a small python project that scraps ImmoScout24 listings and stores the results in a JSON file. Before doing that, it uses Google Map services to geocode the address and calculate the distance to my girlfriend's workplace using public transportation. To do that, I use what Scrapy calls an ItemPipeline and Google Maps services client. I only limited it to Kreuzberg, Schoenegerberg and Charlottenburg as they still close to the city center but are still in the direction of her workplace.

The class that actually does the magic looks like this (You can find the full code along with this notebook on Github. )

In [16]:
class AddDistanceToMPIPipeline(object):

    latlong_mpi = str((52.444311, 13.273748))

    def __init__(self, ):
        self.gm_client = googlemaps.Client("_PUT_API_KEY_HERE")

    def process_item(self, item, spider):
        orig = item["addr"]
        geoloc = self.gm_client.geocode(orig)

        if len(geoloc) > 0:
            for k in ('lat', 'lng'):
                item[k] = geoloc[0]['geometry']['location'][k]

        directions_result = self.gm_client.directions(str((item['lat'], item['lng'])),
                                                      self.latlong_mpi,
                                                      mode="transit",
                                                      departure_time=1421307820)

        #  Pick the fastest way
        chosen_leg = None
        if len(directions_result) > 0:
            for dr in directions_result:
                for l in dr["legs"]:
                    if chosen_leg is None:
                        chosen_leg = l
                    if chosen_leg is not None and \
                       chosen_leg["duration"]["value"] > l["duration"]["value"]:
                        chosen_leg = l

        if chosen_leg is None:
            return
        item["time_to"] = chosen_leg["duration"]["value"]/60.0
        return item

Taking a look at the data

So after scraping the website for a while we have a file with all apartments available. We can use Pandas to load the data and take a look at it:

In [33]:
import pandas

with open('../ichbineinberliner/items.json') as f:
    data =  pandas.io.json.read_json(f)

Pandas also can do some SQL like filtering of the data. So let's assume my girlfriend wants an 3-room apartment (in Germany the living room is counted as one Zimmer (room)). She also wants to be able to get to her job in less than 40 minutes and the monthly rent should be less than 800 euros.

In [52]:
apartments =  data[data.zimmer == 3][data.miete <= 800][data.time_to <= 40].sort('time_to')
apartments[['addr', 'link', 'sqm', 'time_to', 'zimmer']].head(10)
Out[52]:
addr link sqm time_to zimmer
541 Schöneberg (Schöneberg), 12157 Berlin http://www.immobilienscout24.de/expose/78832352 98.74 26.716667 3
524 Dominicusstraße 40, Schöneberg (Schöneberg), 1... http://www.immobilienscout24.de/expose/78752552 86.37 28.916667 3
581 Sybelstraße 17, Charlottenburg (Charlottenburg... http://www.immobilienscout24.de/expose/78081827 72.25 29.533333 3
582 Ebersstrasse 15, Schöneberg (Schöneberg), 1082... http://www.immobilienscout24.de/expose/78826435 76.00 29.900000 3
301 Charlottenburg (Charlottenburg), 10629 Berlin http://www.immobilienscout24.de/expose/76914555 79.00 31.233333 3
511 Charlottenburg (Charlottenburg), 10625 Berlin http://www.immobilienscout24.de/expose/77718892 62.00 33.750000 3
489 Dernburgstr. 43, Charlottenburg (Charlottenbur... http://www.immobilienscout24.de/expose/77665277 70.00 34.900000 3
636 Sachsendamm 78, Schöneberg (Schöneberg), 10829... http://www.immobilienscout24.de/expose/78941951 73.34 35.516667 3
20 Otto-Suhr-Allee , Charlottenburg (Charlottenbu... http://www.immobilienscout24.de/expose/56455442 69.00 35.750000 3
15 Olbersstr. 2, Charlottenburg (Charlottenburg),... http://www.immobilienscout24.de/expose/59454790 60.00 36.233333 3

Visualizing the Data

So now we have some apartments that match our criteria. At this point I could send me the table above via email, and make the crawling run each day so I get notified of new available apartments. But I really wanted to visualize it in a better way.

I discovered that IPython notebook can embed and execute Javascript and HTML, thus embedding a Google Map in a cell is possible. The notebooks from the class Working with Open Data of the UC Berkely helped me to get started. Doing this is not that simple (a better support should be possible) but it is not hard either.

The first thing is to initialize the Google Maps API:

In [35]:
from IPython.core.display import HTML, Javascript
def gmap_init():
    js = """
window.gmap_initialize = function() {};
$.getScript('https://maps.googleapis.com/maps/api/js?v=3&sensor=false&callback=gmap_initialize');
"""
    return Javascript(data=js)
gmap_init()
Out[35]:

Then we declare the properties of the div where we are going to displaye the map:

In [36]:
%%html
<style type="text/css">
  .map-canvas { height: 400px; }
</style

Rendering the Map

Now comes the part were we generate the map. What we are going to do is to generate the Javascript code that renders the map. Then we can either display it in a cell using the IPython notebook HTML object or just store in a html file and upload it somewhere.

I created the small function below that generates the image (check the code comments for more info):

In [37]:
from IPython.core.display import HTML, Javascript

def map_pos_apartments(apartments, display=True, lat=52.4798023, lng=13.3563576, zoom=12):

    div_id = "miete" # name of the div where are we are going to display the map.
    html = """<div id="%s" class="map-canvas"/>""" % (div_id)


    # This is a template for the infobox that we are going to present to the user when he clicks a 
    # Marker
    content_template = """'<ul style="list-style: none;padding:0; margin:0;">' + 
    '<li> <a href="{link}" target="_blank"> {addr} </a></li>' +
    '<li><b>Time to MPI</b>: {time_to:.2f} min</b> </li><b>Size:</b> {sqm} m<sup>2</sup><li></li>' +
    '<li><b>Rent:</b> &#8364; {miete}</li></ul> '
    
    """
    # This is the template for a Marker on the map.  It also contains the code for generating the "Infowindow"
    # That appears when clicked. 
    marker_template = """
        var myLatlng = new google.maps.LatLng({lat},{lng});
        var marker_{i} = new google.maps.Marker({% raw  %}{{ {% endraw %}

        position: myLatlng,
        map: map,
        title:"{title}"
        }});
    
         var contentString = {content};

          var infowindow_{i} = new google.maps.InfoWindow({% raw  %}{{ {% endraw %}

          content: contentString
          }});
    
          google.maps.event.addListener(marker_{i}, 'click', function() {% raw  %}{{ {% endraw %}

            infowindow_{i}.open(map,marker_{i});
            if (lastWindow) {% raw  %}{{ {% endraw %}

                lastWindow.close();
            }}
            lastWindow = infowindow_{i}
      }});
    
    """
    ## JS intitalization code.
    js_init = """
    <script type="text/Javascript">
      (function(){
        var mapOptions = {
            zoom: %s,
            center: new google.maps.LatLng(%s, %s)
          };

        var map = new google.maps.Map(document.getElementById('%s'),
              mapOptions);
              
        var lastWindow = false;
        
        var transitLayer = new google.maps.TransitLayer();
        transitLayer.setMap(map);
              
              """ % (zoom, lat, lng, div_id)

    # closing script
    js_end = """
      })();  
    </script>
    
    """

    # Not the actual part that generates the Markers based on the code from 
    # the data crawled.

    js_markers = ""
    for i,r in enumerate(apartments.iterrows()):
        d = r[1]
        addr = d.addr.encode('utf-8')
        content = content_template.format(link=d.link, addr=addr,
                                           time_to=d.time_to, miete=d.miete,
                                           sqm=d.sqm)
        js_markers +=  marker_template.format(i=i, lat=d.lat, lng=d.lng,
                                              title=addr, content=content)

    html = html+js_init+js_markers+js_end
    if display:
        return HTML(html)
    else:
        return html

Now we can call this function and see the map:

In []:
map_pos_apartments(apartments)

The only issue is that this code is executed on the fly, so in order to visualize this the code would have to store it first or load it automatically somehow. As a replacement I am attaching an IFrame showing the results of the code above.

In [47]:
HTML('<iframe src="http://mfcabrera.com/files/ichbineinberliner/" width=800 height=400></iframe>')
Out[47]:

Now we have a nice responsive and interactive map with the apartments matching our criteria. If we click a marker we get more information about each available apartment.

AS the HTML constructor only takes HTML/JS text source code, we could also store it in a file, so we can embedd it somewhere else.

In [54]:
html_src = map_pos_apartments(apartments, display=False)

init_script = """ <script type="text/javascript"
      src="https://maps.googleapis.com/maps/api/js?key=AIzaSyD1tR9ag8ImBLr4BJdr-ZMTP0bFOXPJFUk">
    </script>"""

with open("index.html", "w") as f:
    f.write("<html><head> " )

    f.write(init_script)

    f.write('<style type="text/css"> \
            .map-canvas { height: 800px; } \
            </style>')


    f.write('<script type="text/javascript">')
    f.write("google.maps.event.addDomListener(window, 'load', initialize);")
    f.write("</script>")
    f.write("\n\n {} </head><body><div id='miete' class='map-canvas'/>".format(html_src ))
    f.write("</body></html>")
In [55]:
!open index.html

Conclusion

We managed to build a nice visualization of candidate apparments that is definetly helpful for moving to a new city. This definetly does not get her an apartment automatically. However, linking the apartment listing with transit information narrows down the search a lot and automates some of the most boring tasks.

This small project even made me take a look into the Open Data and the Open Knowledge movementand their standing in Germany.

I am also glady surprised by the capabilites of IPython Notebooks and this made me realize that I finally need to learn to code properly in Javascript.

Comments

Comments powered by Disqus