Attending the Lisbon Machine Learning Summer School

mfcabrera

2014-07-25

TL;DR

This year I had the chance to attend the Lisbon Machine Learning (Summer) School LxMLS. In this post I want share my experiences as well as of give my opinion on how some things could be improved.

Introduction

Lisbon Machine Learning Summer School (LxMLS) is an intensive school on Machine Learning held in the beautiful city of Lisbon in Portugal. What is special about this school, is that it has an specific emphasis on applications of Machine Learning in the field of Natural Language Processing (NLP). I believe this is because most of the organizers are somehow connected to NLP research groups and companies in Lisbon as well to the CMU Language Technology Institute and the IST. This was appealing to me as I wrote my M.Sc. thesis on the topic of applications of word vector representations. However, I only used them for document classification when they their more interesting applications are in the field of NLP.

The summer school was designed in such way that in the morning we received a tutorial / lecture on a specific topic of ML & NLP; and in the afternoon, a practical session was held directly related to the lecture received in the morning. After the labs, a short presentation featuring more research results was scheduled.

In this post I describe briefly the summer school school and I give an account of my experience. This is my personal opinion and thus, it might not be shared by many of the attendees as the school experience will vary based on their backgrounds and expectations.

Day 0 - Tuesday

The first day started with a quick introduction and presentation of the summer school. Shortly afterwards Prof. Mario Figuerido kicked off the school with a review of basic probability concepts. I have to admit that it was helpful to refresh some things. The morning talk was an introduction to Python by Luis Pedro Coelho from the EMBL, which was necessary for those with no previous knowledge of the programming language, as the programming exercises of the lab sessions required basic Python knowledge. The Lab session of the afternoon was focused on getting Python installed. There were a couple of exercises about gradient descent. The fist day we did not have an evening short talk but instead a welcome reception where we could do some networking while eating snacks and drinking wine.

Day 1 - Wednesday

The second day we had an introduction to machine learning and linear classifiers by Ryan McDonald from Google. I liked this presentation, however at the beginning I had issues understanding the part on how the feature extraction worked (as it was a bit NLP-focused). At times the slides got filled with mathematical derivation that in my opinion did not help to make the concepts clearer.

In the evening we had to implement the Multinomial Naïve Bayes for document classification. We basically had to fill the train method from a existing class (it reminded me of the ML-class from Coursera). As it happens, I have never implemented this (it is -/a well motivated- counting/- basically), so it was nice to finally do it.

The evening talk was a tutorial on Scikit-Learn by Andreas Müller, one of the main contributors to the project. The funny thing is that I had used parts of his tutorial for my talk on SVMs at the Datageeks Meetup.

Day 2 - Thursday

On day three we had a talk on sequence models by Noah Smith, leader of the CMU's group Noah's Ark. I was particularly interested in this talk because I have no experience with sequence models or with their application to NLP tasks. I really liked the presentation style and the slides, however by the end of the talk the it became a bit esoteric for me and it was hard to follow.

In the practical session we were required to implement the Viterbi algorithm. I was at the beginning pretty lost but with the help of some of the instructors I was able to complete the task.

After the lab we went to the LxMLS Demo Day where local companies and research groups showed their products and research. From all the boots I liked two local companies: Unbabel and Priberam. Unbabel is a Y-Combinator backed that offer crowdsourced human corrected machine translation. I found the service pretty cool. Priberam is a company offering a NLP related services. They have a strong research group connected to the Instituto Superior Técnico.

Day 3 - Friday

Friday morning we had a talk on learning structured predictors from Xavier Carreras of Xerox Research. This was another one I liked and I was looking forward to it. For most of the talk I could somehow follow, but once again by the end it was a little hard to get everything. The afternoon lab was about implementing the structured perceptron algorithm.

I have to say I could not follow the evening talk at all. It was something related to Spectral Learning by Ariadna Quanttoni. I think it was too specialized and the scope was not appropriate for the summer school.

Noah Smith's Lecture - Photo by @DH_FBK

Day 4 - Saturday

For Saturday we had a talk on Syntax and Parsing form Slav Petrov. This was really hard to follow. I guess if you don't have a good background in NLP that would be the case. However, the main concepts were understandable. In the lab session we were required to play with existent code related to parsing.

The evening talk was given by Dipanjas Das from Google on Cross-Lingual Learning in Natural Language Syntax. Pretty advance topic, but I really liked how the presenter moved from basic concepts to more complex techniques. This was one of my favorites presentation.

Day 5 - Monday

After the free day (which I used to visit Sintra along with some cool people I met in the school - High guys :D!) We returned to the school for the last two days. Monday was the day for the Big Data topics with CMU Prof. Chris Dyer. I also liked this one. It does not only showed the basic of Map Reduce but also strategies on how to implement the ML and NLP algorithm using this paradigm. The Afternoon labs were introduced the basic concepts of MR with the almost pathological word counting problem.

The afternoon lecture was about Cross-Lingual Semantics by Prof. Ivan Titov of University of Amsterdam. I couldn't follow this talk a lot either.

Day 6 - Tuesday (Final Day)

I was really looking forward to the talk on Deep Learning by Richard Socher. I had already watched his Tutorial on Deep Learning so I was familiar with many of the topics. His tutorial was really helpful to understand basic concepts of Deep Learning. For the afternoon labs we were required to write/execute a Map Reduce version of the expectation maximization algorithm. I did not do much here because I was writing this post :D

I could not attend the afternoon talk because there were some problems in the red line of the metro, which communicates the city with the airport. Thus I preferred to go to the airport a bit early and I only could catch the first 15 minutes of the lecture.

The Good

There were many things that I like about the summer school, I will list here the ones I believe are the most important:

The people: From the organizers to the attendees. All the people I had the chance to interact with were really nice. I had interesting conversation with all of them.
The Location: Lisbon. What a beautiful city. It is the perfect location for such event. The IST campus is quite central and easy to reach.

The talk ordering was well planned. Every lecturer built on the knowledge acquired in the previous session.

The Topics: The school covered from the very basics to current topics such as deep learning.

The Speakers: All of the speakers are renown researchers and academics coming from prestigious institution and companies. We had speakers from Amazon and Google, as well from universities such as CMU and Stanford.

The Organization Team: The organizers / tutors were always ready to help. Every time I had a question or doubt they tried their best to explain.

Using Python for the Labs. I think that Python is becoming the standard for NLP / ML / Data Science. It is also a easy programming language to learn.

Things to improve

Not everything can be perfect right? There are quite a few things that in my opinion can be improved for future schools:

Use IPython notebooks for the Labs. I think the toolkit provided by the school is pretty nice, but the fact that people have to install Python makes the first day not really smooth. IPython can be served remotely very easy so people can access it through the web interface. Also, a virtual machine could be prepared so the persons do not need to tinker around installing Python and the required packages.

I did not like a lot the way the labs were executed. I liked the chosen topics and the instructors/tutors were really helpful. However, I believe they should be made more interactive. Also, an explanation of the basics of each exercise as well how each part of the algorithm relates to the code was missing. I had a lot trouble understanding a simple part that could have been easily explained (but it was not clear in the guide).

More social events / group activities. I would have loved to have more group activities. We had dinner in a fancy restaurant but doing other stuff together would have been nice. A group visit to the castle would have been a nice idea.

The Auditorium. The Auditorium were the main lectures were held was OK but for one thing. The space between the chairs was minimal. I am not that tall and when I couldn't get a place in the last row or the middle one (both with extra space) I was all the time uncomfortable.

Add a (optional) poster session along with the demo day. I think that might be helpful for early researchers to discuss the approaches with the speakers (that are generally experts on the field).

The Canteen. The Canteen was relatively small and it was shared with students. So, people had to run to be able to eat 1 hour time-slot. Either giving more time to have lunch or looking for alternatives is necessary for next events.

Conclusion

I had a great experience coming here. I had the chance to meet interesting people as well to learn (or at least get informed) about ML and its applications to NLP. The talks were relevant and the topics well chosen. I really enjoyed most of the lectures.

I don't think people come to the summer school to actually learn something. It is really hard to fully understand advanced topics in machine learning and NLP in just one day. One can however, get informed about the subjects and learn what needs to be learned. Also, the networking part is really important, even more for those early Ph.d. students as they have the chance to validate approaches with other researchers and experts in the field.

Lisbon is a magical. I really loved it. It is a beautiful and organized city. The public transportation works pretty well and there are many interesting places to visit both in the city and in the surrounding area.

I would recommend this summer school to anyone interested in NLP and in ML in general. You will not only have the chance to learn but you will also enjoy what the beautiful Lisbon and Portugal have to offer.