Benchmarking Python Content Extraction Algorithms: Dragnet, Readability, Goose, and Eatiht - Moz

Recently, we have been working to improve Dragnet, Moz's content extraction algorithms. These algorithms analyze a web page and separate the main article content (optionally  with user-generated comments) from the navigation "chrome" (sidebars, footers, copyright notices, etc). Along the way, we be…