Craft beers labeled by styles and composition. A separate dataset lists breweries by state.
The IMDb database: moderately large, real database of movies.
MovieLens data set from the UC Irvine machine learning repository.
The Open Source Shakespeare is a collection of Shakespeare's complete works. This is a much more interesting data set than some boring imaginary online retailer. In this dataset, people die! The task is to predict the character, who speaks the lines.