The concept of the script is straightforward: read the mirrors page from RedHat's web site, make a list of all the mirrors, test how long it takes to download from each, and present a sorted list of the results.
The first task, reading and parsing the RedHat mirrors list, is handled with the urllib and HTMLParser modules, respectively. I chose HTMLParser over the more comprehensive parser in sgmllib because it's a bit less work to override the default parser for simple tasks. After the parser sees the content comment in the HTML source, it starts recording any tags that have a scheme of 'homepage it stops recording after it sees the end of the content comment. Currently, it happens that there aren't any absolute URLs on the mirror page outside the content block, but I didn't want to rely on that fact.
To test the bandwidth of each mirror site, I simply test how long it takes to download the index page of the mirror. This is not a perfect test, but it gives reasonably good results without depending on knowledge of the site structure.
The bandwidth test demonstrates a few important paradigms when dealing with multithreading, either in Python or other languages:
1. Let the underlying libraries do as much work as possible.
2. Isolate your threads from the rest of the program.
The main thread creates a work queue of URLs to be tested and a result queue for retrieving results, then starts a number of threads to do the work and waits for those threads to exit. Because the Queue class is a threadsafe container, Python guarantees that no two threads will ever get the same work unit, and the storing of results by multiple threads will never leave the queue in a bad state.
Initially, each worker thread downloaded the mirror index page directly, but this caused the process to run for long amounts of time (over three minutes) when some sites were heavily loaded. To avoid this, I defined a maximum time to attempt downloading, and made each worker thread spawn a new daemon thread to do the download. The worker thread can use Thread.join() to wait on the subthread with a timeout; timeouts are counted as failures. Note that I pass an empty list to the subthread to collect the results. Threads in Python don't have a convenient way to return a status code back to the caller; by passing a mutable object like a list, the subthread can append value to the list to indicate a result. When the join() on the subthread completes, the worker thread can tell that it timed out if the list it passed in is empty.
The worker threads put the results for each URL into a results queue. For successful tests, they put a tuple of the URL and the time it took to download; for unsuccessful results, they put a tuple of the URL and a string describing the type of failure. When the main thread has detected that all worker threads have exited, it separates successes from failures, sorts the two lists, and prints them in aligned columns.
Note that the script could be written without the second-level threads. Using them helps isolate the failure-prone download from the more reliable worker thread pool, at the cost of a few more ephemeral threads, and provides a good demonstration of how and when to use daemon threads to keep a script from hanging indefinitely at shutdown.
This script is useful to tell which mirrors are most heavily loaded, but it has shortcomings. Some HTTP-based mirrors are actually redirects to FTP mirrors, and some seem to apply different bandwidth throttles to index pages and ISO downloads. Additionally, the script can't tell which of the mirrors actually have up-to-date files; this can't easily be fixed without having knowledge of each mirror site, since mirror sites differ in their directory structure. But this at least gives the would-be upgrader an idea of where to look.
Top 4 Download periodically updates information of A Python script to test download mirrors 1.1 script from the developer, but some information may be slightly out-of-date.
Our script download links are directly from our mirrors or publisher's website. A Python script to test download mirrors 1.1 torrent files or shared files from free file sharing and free upload services, including Rapidshare, MegaUpload, YouSendIt, MailBigFile, DropSend, HellShare, HotFile, FileServe, MediaMax, zUpload, MyOtherDrive, SendSpace, DepositFiles, Letitbit, LeapFile, DivShare or MediaFire, are not allowed!