Welcome, guest | Sign In | My Account | Store | Cart

This script demonstrates reading and writing an mbox style mailbox. This script is an mbox filter. It scans through an entire mbox and writes the messages to a new file. Each message is passed through a filter function which may modify the document or ignore it.

Python, 73 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#!/usr/bin/env python
"""This is an mbox filter. It scans through an entire mbox style mailbox
and writes the messages to a new file. Each message is passed
through a filter function which may modify the document or ignore it.

The passthrough_filter() example below simply prints the 'from' email
address and returns the document unchanged. After running this script
the input mailbox and output mailbox should be identical.
"""

import mailbox, rfc822
import sys, os, string, re

LF = '\x0a'

def main ():
    mailboxname_in = sys.argv[1]
    mailboxname_out = mailboxname_in + '.out'
    process_mailbox (mailboxname_in, mailboxname_out, passthrough_filter)

def passthrough_filter (msg, document):
    """This prints the 'from' address of the message and
    returns the document unchanged.
    """
    from_addr = msg.getaddr('From')[1]
    print from_addr
    return document

def process_mailbox (mailboxname_in, mailboxname_out, filter_function):
    """This processes a each message in the 'in' mailbox and optionally
    writes the message to the 'out' mailbox. Each message is passed to
    the  filter_function. The filter function may return None to ignore
    the message or may return the document to be saved in the 'out' mailbox.
    See passthrough_filter().
    """

    # Open the mailbox.
    mb = mailbox.UnixMailbox (file(mailboxname_in,'r'))
    fout = file(mailboxname_out, 'w')

    msg = mb.next()
    while msg is not None:
        # Properties of msg cannot be modified, so we pull out the
        # document to handle is separately. We keep msg around to
        # keep track of headers and stuff.
        document = msg.fp.read()

        document = filter_function (msg, document)
        
        if document is not None:
            write_message (fout, msg, document)

        msg = mb.next()

    fout.close()

def write_message (fout, msg, document):
    """This writes an 'rfc822' message to a given file in mbox format.
    This assumes that the arguments 'msg' and 'document' were generate
    by the 'mailbox' module. The important thing to remember is that the
    document MUST end with two linefeeds ('\n'). It comes this way from
    the mailbox module, so you don't need to do anything if you want to
    write it unchanged. If you modified the document then be sure that
    it still ends with '\n\n'.
    """
    fout.write (msg.unixfrom)
    for l in msg.headers:
        fout.write (l)
    fout.write (LF)
    fout.write (document)

if __name__ == '__main__':
    main ()

I find myself writing lots of little scripts to filter my mbox. I use this script as the basis of my mbox filters. The docs for the 'mailbox' and 'rfc822' modules shows how to read an mbox, but the docs don't show how to write to an mbox format. This is pretty easy to do, but it is not obvious, so here it is.

The 'process_messages()' function loops over each message in the input mailbox. Each message is passed to a filter function. In this example the filter is called 'passthrough_filter()' which does nothing but print the 'from' address of the message and then return the document unchanged. After running this script the input mailbox and output mailbox should be identical. The 'write_message()' function shows how to take an 'rfc822' message and write it to a file in mbox format.

I used this script as the basis of a tiny spam filter. I defined a filter function that accepts any message with a 'from' address in a list of addresses known to me. It rejects any message over 30KB from addresses that are not in the list. This does not get rid of all spam messages, but it does get rid of most of the ones that take a lot of space (mostly viruses and huge HTML advertisements).

Python 2.2 and above should use the 'email' module for creating new messages by hand. I used the 'rfc822' module because that is what is returned by the 'mailbox' module.

6 comments

Peter Bengtsson 20 years, 10 months ago  # | flag

Python 2.1. What do I need to do to get this script to work with Python 2.1? I want to run it with Zope without having to connect via XML-RPC or something.

(my knowledge of python is OK, but not in what the difference is between 2.1 and 2.2)

Dave Benjamin 20 years, 6 months ago  # | flag

Use PortableUnixMailbox. I had to use PortableUnixMailbox instead of UnixMailbox:

before: mb = mailbox.UnixMailbox (file(mailboxname_in,'r'))

after: mb = mailbox.PortableUnixMailbox (file(mailboxname_in,'r'))

Otherwise, this script works great! You can return None from your filter-function to remove a message. I was able to delete a bunch of duplicate messages by adding this:

found_ids = {}

# ...

def passthrough_filter (msg, document):
    """This prints the 'from' address of the message and
    returns the document unchanged.
    """
    id = msg.getheader('Message-ID')
    if found_ids.has_key(id):
        return None
    found_ids[id] = 1
    return document

Thanks,

Dave

Uwe Schmitt 19 years, 2 months ago  # | flag

similar task, nicer user interface. Hi, I wrote a similar program for previewing mail and deleting it from the server without downloading it. It has a text-based interface using effbots console module. So it runs on Windows only.

Look at PyPi: http://www.python.org/pypi?:action=display&name=PyPosta&version=1.2

Farhad Fouladi 19 years, 1 month ago  # | flag

Changes for Python 2.1. In order you can use this scrip with Python 2.1 change the following lines:

31      mb = mailbox.UnixMailbox (file(mailboxname_in,'r'))
32      fout = file(mailboxname_out, 'w')

such that you replace class "file", it means:

fin = open(mailboxname_in,"r")
mb = mailbox.UnixMailbox (fin)
fout = open(mailboxname_out,"w")
Tim Lesher 19 years, 1 month ago  # | flag

Slight addition for Windows. If you use this on Windows, you'll need to change the 'r' argument to file() to an 'rb', to make sure Python doesn't munge the end-of-line characters.

Favaz Farook 14 years, 9 months ago  # | flag

Hi Guys,

i am using python 2.6 on Windows XP, how can i run the script above, my python knowledge is very poor, i am a newbie, i want to read mails from Mozila Thunderbird Inbox

Thanks