Data is downloaded from geonames.org, and processed with an AWK and a Perl script. The result is the part of the distribution, so the average user (or a packager) doesn't have to download that much of data.