I passed a course this week. For the last few months I’ve been studying a distance-learning course on Natural Language Processing taught by Stanford University lecturers, Professors Dan Jurafsky and Chris Manning.
Now I’ve finished I thought I’d share my experiences of doing the course, partly since they run these courses again so you may be considering doing one like it, in the future.
More importantly, because I’ve been working hard on this (Seriously. Blood, sweat and tears went into this.) so I’m damn well going to shout about it. 😉
Lectures were delivered as videos with slides, interspersed with quick multi-choice questions to check you’ve got the right idea. The lectures were complex, and more than once I’d have to watch them a few times to get the idea. But they were interesting, and covered a range of topics including:
Text Processing beyond Regular Expressions, Edit Distance, Language Modeling, Spelling Correction, Text Classification and Naive Bayes, Sentiment Analysis, Maximum Entropy Classifiers, Information Extraction and Sequence Modeling, Maximum Entropy Models, Smoothing models, Part of Speech Tagging, Dynamic Processing for Parsing: The CKY algorithm, Statistical Parsing, Information Retrieval: Boolean Retrieval, and Ranked Retrieval, Word Meaning and Word Similarity, Question Answering, and lots more
Practical, interesting programming assignments
The course included weekly programming assignments. These weren’t trivial. I found some very difficult, and many were seriously time-consuming. It wasn’t unusual for me to spend twelve hours or more on an assignment.
The assignments felt practical and realistic. It wasn’t “implement this theoretical thing described in the lectures”, the assignments described (albeit simplified) real-world NLP problems. I loved this. It made the lectures real, you could see how and why the methods we were taught are used.
We had to implement systems capable of tackling a range of tasks:
- Extract email addresses (obscured or obfuscated in ways ranging from trivial to complex) from documents
- Train language models as part of building a spell-checker
- Write a sentiment analysis classifier able to classify text from a movie review as positive or negative
- Build a named entity classifier that can pick out people’s names from text taken from newswire
- Write rules making up a probabilistic grammar to represent the English langauge
- Write a parser for English sentences able to identify the parts of speech (e.g. nouns, verbs, pronouns, adjectives,etc.) in the sentence
- Write an information retrieval system to build an index of documents and use weighted queries to finding the right document
- Write a question answering system using a corpus of Wikipedia and Google results to answer trivia questions
These were weekly assignments, so the scope was limited.
For example, in assignment 5 where I had to write grammar rules to represent English, I only had to implement a defined subset of the English language. Implementing something to represent all of the vast nuances of English wouldn’t be realistic in a week.
Or for assignment 8, writing a question answering system, the scope was limited so I knew all questions would ask for the location of a famous landmark, or the name of the spouse of a famous person. I didn’t have to build another Watson.
The expectations were set for a week’s work. For example, in assignment 2, the task was to train a spell-checker to have an accuracy of at least 18%. This is a reflection of how challenging this is to do, but was enough to demonstrate a working knowledge of the fundamental approaches. I’m not claiming that I was asked to write a competitive advanced spell-checker in a week. Or that I could. 🙂
In all assignments, I didn’t start from a blank page. Each assignment came with a skeleton framework so the mundane non-NLP elements were already implemented. For example, in assignment 3, I didn’t have to find or read in the movie reviews as I had code ready to go with stubbed-out methods for where I needed to train and apply a classifier.
I could choose whether to implement the assignments in Python or Java. I chose Java, but it’s personal preference and I don’t think you’d be at a particular disadvantage with either.
Each week’s block of lectures ended with a set of five review questions. Sometimes these were multiple-choice, other times they were a question followed by a free-text field for the answer.
These were complex: often a question would require the application of one or more of the techniques described in the lectures, and I’d cover my desk with pages of scribbles, workings and diagrams before coming up with a single final answer.
It typically took me about an hour to complete the review questions each week. If you got something wrong, you were allowed to take the test again, multiple times. However the system was crafty and on each retry, it would change numbers and values in the question, and in the case of multiple-choice questions, it’d give a new set of possible answers.
So I couldn’t just find the answers by repeated trial-and-error. (In fact, if I got 4-out-of-5 for a test, to try the question again I’d have to go through the time-consuming process to work out the four questions that I’d gotten right all over again because the specific values and models in those questions had changed.)
Would I recommend the course?
I loved this course and I’m glad I did it. I found it interesting, and also useful for my day job. I’ve been learning about the approaches that Watson uses over the last year or so, but I was doing this without much prior NLP experience. This course was a chance to go back to learn some fundamentals, helping me to understand underlying principles behind the applied stuff I’ve learnt on-the-job.
If asked to recommend the course, I’d include warnings. It’s not easy and I spent many evenings scratching my head trying to understand things. I enjoyed this challenge. But if you want to be given definitive single “answers” to memorise and implement in return for full marks in coursework and exams, you’ll be disappointed.
We were taught general principles and introduced to ideas and approaches. A superficial understanding would get you some marks at the coursework. More time, head-scratching, and thought about the implications and possible implementations would improve on that mark. And a very large amount of time refining those ideas and implementations might get you nearer a top mark.
But it wasn’t a binary “you’ve got it right” vs “you’ve got it wrong” thing. There was always something else you could do to tweak a model further, or refine a parser, to improve the performance by another fraction of a percent.
This made it time-consuming. It was a huge time commitment, more than I expected, and for something I was doing in my evenings and weekends, that was a big deal. It monopolised a lot of my free time over the last few months.
Partly, this was my fault. I wanted full-marks for every assignment, and on a course with a straight pass/fail at 70%, I could have submitted assignments sooner with less time and effort and still passed. You could argue that my aiming for a score of 90% represents a massive amount of wasted time and effort. But I’m glad I did, as I think I benefited from getting a better understanding of the topics.
And I’m an obsessive perfectionist.
Are all (free, distance-learning) Stanford courses like this?
There are other courses available on a variety of disciplines. I enjoyed the NLP course so much that I’ve signed up for another: Machine Learning with Andrew Ng.
The course structure is the same : video lectures, quick multi-choice questions, and review questions & programming assignments every week.
But the experience is different. I’m finding it more straightfoward. The lectures teach an algorithm or a technique, and I have to implement them in the coursework which becomes a binary pass/fail. I’ve either implemented it correctly (and get 100%) or I haven’t (and get 0%). There isn’t so much middle ground.
That said, I’m only a few weeks into ML, so this may change as I get further into it, particularly as we’re only just getting started with neural networks. But where NLP was time-consuming and complex from day one, ML has so far not taken anywhere near as much of my evenings. My point is, don’t assume that my experience with the NLP will apply to all of Coursera courses.
What was your final mark?
Thanks very much for asking. (No? 🙂 )
I got 89% (by the time the scores were weighted – 3.5% for each of the sets of review questions, 9% for each programming assignment).
I’m not pretending this makes me an NLP expert, but I know enough of the fundamentals now to be dangerous. And that’s kinda fun.
Tags: coursera, natural language processing, nlp, stanford
This entry was posted on Friday, May 25th, 2012 at 9:25 pm and is filed under misc. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.
Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.
In this course you will study mathematical and computational models of language, and the application of these models to key problems in natural language processing. The course has a focus on machine learning methods, which are widely used in modern NLP systems: we will cover formalisms such as hidden Markov models, probabilistic context-free grammars, log-linear models, and statistical models for machine translation. The curriculum closely follows a course currently taught by Professor Collins at Columbia University, and previously taught at MIT.