For checkouts or to view logs direct your SVN client to svn://svn.saintamh.org/
Hervé's playground. A dumping ground for many years, contains some polished code, as well as several unfinished projects. See the home page for a presentation of the more presentable ones.
A Hadoop application to extract a parallel corpus from the Common Crawl. Written with Jason Smith and Magdalena Plamada. Described in Dirt Cheap Web-Scale Parallel Text from the Common Crawl