Using Multi-Byte Character Sets in PHP (Unicode, UTF-8, etc)
Using PHP pspell Spell Check Functions with a Custom Dictionary
Enforce Coding Standards with PHP_CodeSniffer and Eclipse IDE on Ubuntu Linux
ENUMs, User Preferences, and the MySQL SET Datatype
Book Review: How to Implement Design Patterns in PHP
Visualising Website Performance with Flame Graphs

Changing Mailman Python Scripts for Virtual Host Support

Tuesday, 22 September 09, 12:49 pm
Mailman is a tried-and-tested Open Source mailing list manager. It's robust and reasonably efficient when running, however it organises lists internally by their local name only. In other words, you can't have one list called maillist@domain.org on the same server as another list called maillist@somewhereelse.com on the same machine, unless you have a separate mailman installation for each domain.

Many production environments do appear to use the separate installation per domain approach, however with 8 'queue runner' processes running per installation, it would not require many domains before mailman starts gobbling unacceptable levels of resources.

One solution is to internally combine the domain name and the list name selected by the user. This would mean that each list would appear to the user as listname@domain.com, but internally would be listname-domain.com@domain.com.

In order to pull this off transparently, the following needs to be accomplished:

email aliases:
The address {listname}@mydomain.com needs to be aliased to {listname}-{domain}@mydomain.com so that incoming mail for the list is delivered correctly

mailing list archives:
Some public web address such as www.domain.com/mailarchive/{listname} should be redirected to www.domain.com/pipermail/{listname}-{domain}

list creation:
a mechanism that prevents lists being created with a hyphen in the name (or whatever other character is used as a separator)

Outgoing email messages:
need to be changed so that email addresses and listnames in them appear in the 'friendly' format

There's a good synopsis of the basics of Python syntax at this site, which is quite suitable for experienced programmers wanting a quick overview, despite its name. IBM also provide a good primer.

Like in C, strings are really just character arrays. Thus you can refer to a single character of a string using array notation, such as somestring[5]. Python adds to this standard array notation with the concept of slicing, where two indicies are supplied, separated by a colon. This creates a new array consisting of the elements between the two indicies. When dealing with strings, the slice is the Python way of specifying substrings e.g. somestring[5:8].

Python does use a lot of syntax that's quite different to other programming languages such as C++ or Java. For instance, to indicate that a class is derived from another class, we simply add the name of the base class in parenthesis after the name of the derived class. The derived class can override member functions of the base class.

One thing in Python which we need to get our head round for these virtual domain script changes is the Python dictionary type: UserDict. This is really just a class wrapper for the built-in dictionary type, allowing classes to inherit from it to override or add methods and data attributes. (In Python, class member variables are known as data attributes.) Use of this UserDict wrapper class has been largely obviated in newer versions of Python by the ability to inherit directly from the built-in dictionary type using the keyword dict as the class name; however the mailman scripts use the older syntax.

The dictionary type is in effect an associative array, i.e. a list of name-value pairs. The UserDict class defines a single data attribute (cunningly called 'data'), which is underlying dictionary data structure of the class.

Here's the standard Python way to use a dictionary to replace tokens in a string of text:
template_string = 'foo %(sometoken)s bar %(sometoken)s baz' message = template_string % {'sometoken': variable}
This is exactly what the SafeDict.interpolate() method does:
def interpolate(self, template): return template % self
2 

compton

1:22 pm, Tuesday, 22 September 09

Script changes:

Here are the required changes, in principle. The assumption here is that some external script/web-page appends the domain name to the beginning of the listname before the list is created. We thus only need to change how the listname is output for display.

Note that this code does not check if the listname does indeed contain a hyphen, but this is not a problem because the -1 returned by find() will simply cause the full internal listname to be returned by the slice. This is because the slice [0:-1] is interpreted as the slice from the first character (zeroth) to the character with index one less than the string length, i.e. the last character.

Mailman/MailList.py
def getListAddress(self, extra=None): unabridged_name = self.internal_name() if extra is None: return '%s@%s' % (unabridged_name[0:unabridged_name.find('-')], self.host_name) return '%s-%s@%s' % (unabridged_name[0:unabridged_name.find('-')], extra, self.host_name)
- returns email address for the mailing list, used in the email templates.
- in various places, self.internal_name() is used to get the mailing list name, sidestepping the above fix. Such instances could be swapped for self.getListAddress(self).

In many places, self.real_name (where self is an instance of MailList) is used to get the name - rather than the email address - of the list. This could be replaced with self.real_name[0:real_name.find('-')].

Perhaps quicker but slightly more kludgy would be to change Utils.maketext() so that any tokens called listname are truncated at the hyphen. To do this, we need to change the SafeDict.__getitem__() method defined at line 31:
def __getitem__(self, key): try: value = self.data[key]; if key == 'listname' or key == 'realname' or key == 'real_name': return value[0:value.find('-')] elif key == 'emailaddr': return value[0:value.find('-')]+value[value.find('@'):] else: return value except KeyError: if isinstance(key, StringType): return '%('+key+')s' else: return '<Missing key: %s>' % `key`

compton

2:23 pm, Thursday, 15 October 09

So our sysadmins went on a whinging festival when they realised we intended to run python scripts via PHP using exec(). We do use exec() and its relatives in various places, but they want to outlaw it to make things more secure. It's not an unreasonable concern with the threat level we see these days.

The result is that in order to manage the mailman lists, we'll need to make requests to the Perl CGI interface via cURL. It's going to be a little kludgy, but for a quick and simple solution it will have to do.

One issue that needs to be dealt with is the fact that certain administration pages of the web interface require authentication which is tracked using session cookies. cURL is more than capable at dealing with cookies, but as we're already hacking scripts we may as well remove the need for that. When deployed the CGI interface will be locked down so that all requests that do not originate from our web server will be denied. This is because we don't want the mailman pages to be accessible, so we needn't worry about the security risk of allowing lists to be managed without authentication.

The simple way to avoid the authentication process would be to comment out lines 85 to 94 of Mailman/Cgi/admin.py:
# if not mlist.WebAuthenticate((mm_cfg.AuthListAdmin, # mm_cfg.AuthSiteAdmin), # cgidata.getvalue('adminpw', '')): # if cgidata.has_key('adminpw'): # This is a re-authorization attempt # msg = Bold(FontSize('+1', _('Authorization failed.'))).Format() # else: # msg = '' # Auth.loginpage(mlist, 'admin', msg=msg) # return
While this does the job, it never feels good to remove a security measure, even if you know the mailman installation is going to be locked down and only accessible from your own IP. Instead, I can pass the list admin password to the CGI via cURL as an additional POST parameter, named either adminpw or password, depending on context.

compton

2:12 pm, Wednesday, 28 October 09

When managing lists, a customer will be in their control panel, where they can view all the domains on their account.

It's necessary for a customer to be able to choose any one of their domains and have a list created for that domain.

However, the CGI interface creates lists for the hostname it is being accessed under - i.e. if you are at www.mydomain.com/mailman and create a list, it will attempt to create the list using a URL host of mydomain.com.

We'll actually be accessing the mailman CGI interface using the box's IP address, which is cool because it means we can override the hostname header sent by cURL to the domain we want the list set up on.

This works stonkingly brilliantly.

There is still the problem that the hostname for a list must be defined in the dictionary which is created from the add_virtualhost(.., ..) lines in mm_cfg.py - in other words, for each hostname we want to create a list on, we need a add_virtualhost(.., ..) line in /usr/lib/mailman/Mailman/mm_cfg.py. If such a line does not already exist, the Cgi/create.py script throws up around line 161:
if mm_cfg.VIRTUAL_HOST_OVERVIEW and \ not mm_cfg.VIRTUAL_HOSTS.has_key(hostname): safehostname = Utils.websafe(hostname) request_creation(doc, cgidata, _('Unknown virtual host: %(safehostname)s')) return
We can simply replace this error message with some code that appends the new hostname to the mm_cfg.py file, and adds it to the hostname dictionary:
if mm_cfg.VIRTUAL_HOST_OVERVIEW and \ not mm_cfg.VIRTUAL_HOSTS.has_key(hostname): fileHandle = open ('/usr/lib/mailman/Mailman/mm_cfg.py', 'a') fileHandle.write ('add_virtualhost(\'%(hostname)s\',\'%(hostname)s\')\n' % {'hostname' : hostname}) fileHandle.close() mm_cfg.VIRTUAL_HOSTS.update({hostname: hostname})  

compton

1:49 pm, Wednesday, 18 November 09

My completed solution comprises of a class which uses cURL to request pages from the CGI interface. In order to do this, parameters are sent via cURL to the CGI URL, and the HTML response is parsed using the PHP DOMDocument class. This class is pretty natty, and allows you to manipulate an HTML tree using XPath and other XML paradigms. The HTML it consumes doesn't even have to be XHTML, and it's quite happy scoffing up a load of old HTML 3 for instance, which is handy because that's what mailman's CGI produces.

The class I created allows mailing lists to be configured by accessing the configuration pages for the list, extracting the tables of options, which are then wrapped up in our control panel templates for presentation to the user. The DOMDocument class, as it uses an XML data structure internally, is great for this, because our control panel is XHTML.

Some elements of these config tables are rewritten or removed, for instance links to additional pages of a list's subscribers are rewritten so they work within the control panel, and form actions are also rewritten to post to a control panel page.

There is an issue with the POST data however. In PHP, POST data is stored in the $_POST superglobal array. This is all groovy of course, but it brings with it the limitation that you can only have a single POST parameter with one single name, while Python allows multiple. For instance, when editing lists of subscribers, the mailman CGI has a hidden HTML field called user for each user. PHP's superglobal array would 'concertina' all these down to make only a single value for this parameter accessible. Obviously not good.

PHP has another quirk when dealing with HTTP parameters, which this time stems from the language's infamous 'register globals' feature. Basically, every HTTP parameter name must also be a valid PHP variable name - and as PHP variable names cannot contain full stop characters, any full stops within HTTP parameter keys are replaced with underscores. This is a problem because mailman uses email addresses as part of some HTTP parameters on some of the config forms.

In order to get round this, we can parse the POST data ourselves, and thus avoid both the above problems. The following PHP snippet parses HTTP POST parameters into an array of the form Array ( Array ( key => value ), Array ( key => value ) ... ):
$params = array(); $post_data = explode('&', file_get_contents('php://input')); $user_num = 0; foreach ($post_data as $keyvalue) { list($key, $value) = explode('=', urldecode($keyvalue), 2); $params[] = array($key => $value); }
2 

Please enter your comment in the box below. Comments will be moderated before going live. Thanks for your feedback!

Cancel Post

/xkcd/ Sitting in a Tree