A Python script to test download mirrors

A Python script to test download mirrors 1.1

A Python script to test download mirrors 1.1 Download Summary

  • Language: Python
  • Platform: Windows / Linux / Mac OS / BSD / Solaris
  • License: Freeware
  • Databases: N/A
  • Downloads: 1508
  • Released: Jun 6, 2007

A Python script to test download mirrors 1.1 Description

The concept of the script is straightforward: read the mirrors page from RedHat's web site, make a list of all the mirrors, test how long it takes to download from each, and present a sorted list of the results.

The first task, reading and parsing the RedHat mirrors list, is handled with the urllib and HTMLParser modules, respectively. I chose HTMLParser over the more comprehensive parser in sgmllib because it's a bit less work to override the default parser for simple tasks. After the parser sees the content comment in the HTML source, it starts recording any tags that have a scheme of 'homepage it stops recording after it sees the end of the content comment. Currently, it happens that there aren't any absolute URLs on the mirror page outside the content block, but I didn't want to rely on that fact.

To test the bandwidth of each mirror site, I simply test how long it takes to download the index page of the mirror. This is not a perfect test, but it gives reasonably good results without depending on knowledge of the site structure.

The bandwidth test demonstrates a few important paradigms when dealing with multithreading, either in Python or other languages:

1. Let the underlying libraries do as much work as possible.
2. Isolate your threads from the rest of the program.

The main thread creates a work queue of URLs to be tested and a result queue for retrieving results, then starts a number of threads to do the work and waits for those threads to exit. Because the Queue class is a threadsafe container, Python guarantees that no two threads will ever get the same work unit, and the storing of results by multiple threads will never leave the queue in a bad state.

Initially, each worker thread downloaded the mirror index page directly, but this caused the process to run for long amounts of time (over three minutes) when some sites were heavily loaded. To avoid this, I defined a maximum time to attempt downloading, and made each worker thread spawn a new daemon thread to do the download. The worker thread can use Thread.join() to wait on the subthread with a timeout; timeouts are counted as failures. Note that I pass an empty list to the subthread to collect the results. Threads in Python don't have a convenient way to return a status code back to the caller; by passing a mutable object like a list, the subthread can append value to the list to indicate a result. When the join() on the subthread completes, the worker thread can tell that it timed out if the list it passed in is empty.

The worker threads put the results for each URL into a results queue. For successful tests, they put a tuple of the URL and the time it took to download; for unsuccessful results, they put a tuple of the URL and a string describing the type of failure. When the main thread has detected that all worker threads have exited, it separates successes from failures, sorts the two lists, and prints them in aligned columns.

Note that the script could be written without the second-level threads. Using them helps isolate the failure-prone download from the more reliable worker thread pool, at the cost of a few more ephemeral threads, and provides a good demonstration of how and when to use daemon threads to keep a script from hanging indefinitely at shutdown.

This script is useful to tell which mirrors are most heavily loaded, but it has shortcomings. Some HTTP-based mirrors are actually redirects to FTP mirrors, and some seem to apply different bandwidth throttles to index pages and ISO downloads. Additionally, the script can't tell which of the mirrors actually have up-to-date files; this can't easily be fixed without having knowledge of each mirror site, since mirror sites differ in their directory structure. But this at least gives the would-be upgrader an idea of where to look.

A Python script to test download mirrors 1.1 Keywords

A Python script to test download mirrors Bookmark

Hyperlink code:
Hyperlink for Forum code:

A Python script to test download mirrors 1.1 Script Download Notice

Top 4 Download periodically updates information of A Python script to test download mirrors 1.1 script from the developer, but some information may be slightly out-of-date.

Our script download links are directly from our mirrors or publisher's website. A Python script to test download mirrors 1.1 torrent files or shared files from free file sharing and free upload services, including Rapidshare, MegaUpload, YouSendIt, MailBigFile, DropSend, HellShare, HotFile, FileServe, MediaMax, zUpload, MyOtherDrive, SendSpace, DepositFiles, Letitbit, LeapFile, DivShare or MediaFire, are not allowed!

XDCC Fetcher

XDCC Fetcher is a PHP script that allows you to download files from XDCC. You need to simply go ... an HTML form of XDCC Fetcher and the script will attempt to download the file. This script is different from other XDCC leecher applications in ... files and allowing to choose which one to download. You would need to specifically tell it what ...

php Download Manager

php Download Manager is a script package written in php with a MySQL back-end. The script allows site owners to offer downloads by category. The program generates code to link to categories or individual downloads. When a visitor downloads, the file location is masked. MySQL generates the actual download. Features of the script package include: - ...

Octod

Octod is a download manager daemon -- a daemon that listens at the network interface for incoming download jobs and performs them (simultaneously) in the background. ... on desktop systems as a reliable and fast download manager without the overhead of a graphical user ... it can also be used as a centralized download station running in network environments with multiple users.To ...

Resuming download of a file

This script shows how to resume downloading of a file that has been partially downloaded from a web server. It's been tested with Apache 1.3.x, but should work with any web server that understands the "range" header.This script uses the extra header - "Range" to let ... want a certian range of data to be downloaded. The server must support this, but this is ...
Python

MultiGet

MultiGet is a homepage downloader with a nice GUI for linux/windows/unix desktop users.It can run on almost all desktops without any configuration. It has many powerful functions comparing to others. ...