David Mertz covers the 2006 O'Reilly Open Source Convention: A Foray into Journalism David Mertz Roving reporter, developerWorks July, 2006 OSCON presenter and author of developerWorks columns _Charming Python_ and _XML matters_, David Mertz is perfectly positioned--both on the podium and off--to bring OSCON to us. That's how IBM described me, these are my reports, originally published at: http://www-03.ibm.com/developerworks/blogs/page/davidmertz SUNDAY JULY 30, 2006: INTERVIEW WITH JOSH BERKUS Josh Berkus is another interesting fellow I had a chance to talk with at some length. Sun really made an effort, it seems, to get folks in the public eye. A lot of the vendors sent me solicitations to check out their booths, usually with blurbs about their products in the press releases. But Sun made the extra effort to schedule interviews between press members and Sun employees. I see that not only because I had such scheduled interviews with Josh Berkus and Tim Bray, but also because some other folks out there in the press (or blogosphere, if that is a real word) have also posted comments from such interviews. For what it is worth, Simon Phipps is another prominent Sun employee who was slated in the interview trackÑI did not talk to him personally, but I did attend his talk in conjunction with Tom Marble (I might come back to that in another entry). Let me get the last part of my talk with Josh out of the way first. The reason Sun was putting him forward, was almost certainly to answer the question I asked him towards the end of our talk (or something closely along the same lines), namely: Why is Sun a good fit for maintenance and development of PostgreSQL (for those not in-the-know, Josh has been one of the main developers of PostgreSQL for four years). The sort of vague answer is about the stability and scalability of Solaris and Sun hardware. True enough, but I think slightly at the level of nicety. Of more substance to my mind was Josh's specific statement on the benefits of the ZFS filesystem. In particular, ZFS allows dynamic use of multiple physical volumes, with a volume manager controlling virtual storage pools. Just what you want for growing databases. What Josh and I talked about in more detail is probably idiosyncratic to my interview with him. Although I had not spoken with him before directly, Josh has also worked with the Open Voting Consortium that I am CTO of, and roughly in affiliation with I gave my paper. It was interesting to get Josh's perspective on these issues, and he is someone quite knowledgeable in this. Clearly, in whatever area he enters, Josh does his homework. Last year, Josh testified before CA legislature on FOSS in relation to voting systems, during a hearing considering legislation to mandate such use. Well, really the hearing followed up on the non-binding CA HR 242 that stated a preference for such systems, instructed the California Secretary of State to conduct hearings on the matter. The SoS wound up stonewalling on hearings, but the California Senate picked up on the gap. Initially, OVC had asked Brian Belendorf to testify; but when Brian was unavailable, he recommended Josh. Lots of background that I just happen to know, but readers need not necessarily follow. In our interview, Josh expressed some alarm at the conflict of interest that paid lobbyist who get money from proprietary vendors, but work in elections, have. Some of them testified in the same committee. Josh was proud of a coup he accomplished in having on hand, during his testimony, a large list of FOSS vendors (in CA), in refutation of claims by the proprietary software lobby that no such companies existed. In a nicely strident statement, Josh observed that the main "trade secret" of current vendors of election systems is just how bad their source code is. But in a more abstract tone he emphasises the "many eyes" needed to make sure bugs/backdoors are caught; he believes, as I do, that it is not sufficient simply to reveal code in limited contexts. Concretely, as soon as "code auditors" who have signed NDAs start finding bugs in proprietary system that they have been assigned to examine, they (the auditors) find themselves in court, with lawsuits from vendors. Probably these are non-meritorious SLAPP actions; but how many programmers can afford lawyers to aid the public good? One interesting claim Josh made was that in Canada and the UK, opponents of computerized voters feared that FOSS voting systems would legitimize such systems, despite their technical lack of readiness. This is certainly an interesting inversion of "FOSS-poison" attitude in the USA. That is, here in the USA, a popular equation is of FOSS systems with vulnerabilities (and equation promoted by FUD and lobbying in my opinion, and I am sure in Josh's). Josh made an interesting point that one computer security expert who made the claim about the non-readiness of FOSS systems was overly pessimistic about computer security, and overly optimistic about non-computer security. I think there is a nice point there: while computer systems have vulnerabilities, that does not mean that non-computerized systems are necessarily safe. At least as a general rule: I think the safeguards of the Australian ballot and padlocks on ballot boxes is relatively well-understood, after 150 years. I also attended one of the three sessions Josh gave (a busy guy), the one on FOSS press relations. He did a nice job with this as well. Probably a failing of many FOSS projects is not knowing exactly how to deal with the broader media, and how to formulate and time good press releases. Certainly these concerns are big for big and widely-used projects like PostgreSQL. Many perfectly usable and useful smaller projects (like, say, my own little Gnosis Utilities are actually probably fine with a sort of "let the release go out quietly" approach... some tools are meant for a narrow and technical audience who already know where to look. PostgreSQL is one of those tools used by millions, including by many big companies and organizations. For something like that, FOSS should show the same savvy (or better) as that big proprietary software vendors with PR departments have. In a lot of ways, FOSS projects can and do achieve better media relations than the unfree guys. SUNDAY JULY 30, 2006: A QUICK NOTE ON TIM BRAY AND ATOM One of the topics I interviewed Tim Bray about was the use of a globally unique identifier in Atom feeds. Basically, each Atom entry is required to have a name distinct from the name of any other entry in the world. However, the Atom standard (RFC4287) does not require a particular rule for assigning these identifiers. Obvious options one might use are UUIDs (RFC4122) or URIs (RRC3986). I suggested to Tim as well that something sensible might be an identifier that somehow hashes the content of the entry itself, hence providing a certain kind of integrity constraint. My concern here is twofold. Basically there are a couple ways that non-unique identifiers might arise. One is that someone is going to write a bad Atom Publishing Protocol server that either assigns the same ID to multiple entries it holds, or where multiple installations of the same server fail to find appropriate unique components (e.g. a default prefix that is not site-configured). In response to this, Tim suggested it would happen less than I think, simply because it is pretty easy to get either URIs or UUIDs right. Fair enough. The more interesting problem is where people maliciously duplicate IDs, either to spoof entries, or to perform insertion attacks, or otherwise to disrupt the use of Atom (or disrupt particular producers of feeds). In support of my point, Tim noted that soon there will be feeds with substantial financial value, such as credit card transactions. At the same time, he made a point of the fact that Atom does not make anything worse in comparison with existing RSS feeds: in his example, if e.g. Technorati decided to become malicious, they could perfectly easily put words in his mouth. Part of Tim's attitude reflects what I noted before about his commitment to practicality over purity. He comments that he saw much of this as social problems not technology problem. A nice quote from his comments is: "In general it's a good thing to name things using URIs; and in general it's good not to micro-manage how people use URIs." That has a nice sound to it. In fairness, I am sure that Tim does not fail to recognize that there is a technology component to security layers, authentication mechanisms, and so on... he just sees these questions as lying outside the concerns Atom itself addresses (and are reasonably described as "social"). Still, the issue of security attacks involving identifier falsification or spoofing intersts me. Hopefully I will have a chance to write about this someday soon, in more detail (and once I have thought through the specific threat models). SUNDAY JULY 30, 2006: MORE ON OPEN SOURCE VOTING PRESENTATION In my initial entry, I mentioned in a general way my enthusiasm about my Open Source Voting presentation. But really, I did not say very much about its content. In part I was waiting to be able to provide relevant links for readers. I believe our slides will soon be available via the OSCon 2006 website, but the below resources are available now: * Arthur M. Keller and David Mertz, "Open Source Voting," presented at Open Source Convention (OSCON 2006)", Portland Oregon, July 24-28, 2006. * Arthur's collection of papers on electronic voting; on most or all of them I am one of his coauthors. * My own papers related to open source voting issues, again sometimes with coauthors, including Arthur My hope is that readers of this blog will decide to read some of those fuller papers, which generally reflect what I presented at OSCon. The presentation was something of a combination of the ideas in several papers, but informally structured. In fact, despite the fact there are only 14 slides in the whole show (including one that contains just the name of the paper and its authors), I really only discussed about half the slides during the lively discussion. One issue I did highlight in my talk is something that is not really emphasized in any of the papers, just implied. But this point is of growing importance in my mind, and also ties in especially well with the OSCon context. The idea is that issues about covert channels mean that FOSS is required for rigorous mathematical reasons, not simply out of general political desirability, or because of the positive "many eyes" effects that FOSS promotes. Sure, for me the first principle is that the technical mechanisms of elections should be disclosed to voters for the same fundamental democratic reasons that so-called Sunshine Laws reveal the workings of governance. However, even for readers (or audience members) who do not share my political sentiment, there is some basic mathematics to consider. One of the principle considerations in designing voting systems is that it is important not to disclose the identity of voters. A vote does not simply need to be recorded accurately and reliably, it also needs to be recorded anonymously. Apart from the specifics of the OVC design, a voting system contains a variety of channels for transmission of information: some might be electronic, XML files and whatnot; others are simply pieces of paper that get moved around according to various rules and patterns (paper is an excellent steganographic medium). The plain fact is that very few channels are at their Shannon limit; and what that tends to meanÑalmost alwaysÑis that multiple concrete encodings can represent the same semantic content. For example, an XML file can have slightly different forms that are reduced to a common meaning via whitespace normalization. Or a computer-printed paper ballot can have a pixel here and there that does not effect which vote is cast (for example, subtly moved around in an identificatory watermark; or even effects that superficially look like printer artifacts). The problem is that even a fully open and disclosed data API leaves this sort of wiggle room to hide some bits and bytes in a covert channel. Maybe that extra space character is an accident of how the outputter is coded; or maybe it is put there to deliberately leak information about voter identity (once the "black hats" know where to look). Any closed source implementationÑeven one produced by (counterfactual) vendors that we fully trust and who have shown a good prior record of best security practices (both very much contrary to the status of existing voting system vendors)Ñcan fully conform with an open standards data API, while still containing a covert channel. An open source implementation however, can be checked at the code level to make sure no such covert channel is encoded... and the proof of its operation is that all the channels contain exactly those bits that the open source should produce. That is, if someone were to substitute malicious source for the examined source, say during the installation or distribution process, that malicious code would have to produce slightly different bits if it were to produce a covert channel. So there we have it: closed source cannot, in principle, guard against this significant attack. Open source is required as a simple question of mathematics. SATURDAY JULY 29, 2006: SECOND DAY: SOFTWARE LIBRE I visited a couple sessions that got at general notions of FOSS as acting in the service of political freedoms. In my mind, this ties fairly closely to the licensing issues I chatted about earlier. A really fascinating and, to my mind very optimistic, talk was on FOSS in Venezuela. The movement towards FOSS has been quite strong in South America; in the Venezuelan context, two of the speakers were active members in an organization called SoLVe (Software Libre Venezuela), and organized a conference similar to OSCon under its aegis. Jeff Zucker who has worked with UNICEF and UNESCO on software issues introduced the main speakers. Alejandro Imass gave a perfectly reasonable talk on developing FOSS ERM systems. I confess that the topic seemed slightly dry to me; worthwhile, but it did not grab me from either the political or technical/theoretical point-of-view. He emphasized some good principles of component architectures and loose connections between related systems, but that is relatively common to good design principles. Lino Ramirez, on the other hand was quite fiery, or at least of great interest to me personally. He provided some background on Venezuela's FOSS bill, which has undergone an interesting process of democratic input from ordinary citizens, per some reformed mandates for participatory democracy in Venezuela. Ramirez also compared this bill (following on a presidential directive to similar effect, but the directive is less fixed than a law would be) to similar prior efforts in Brazil and Peru. In both of those cases, quite good bills were derailed by intensive lobbying by Microsoft, who is also running a massive campaign against the Venezuelan legislation. Apart from the specific outcome of this bill, Venezuela has implemented a number of technical outreach programs for poor and indiginous peoples. These include installation of FOSS software in schools and special training centers in remote locations. Many towns and villages have gained computer centers where locals can learn computer skills and access the internet; all of this would have been impossible without FOSS. A nice case in point was the creation of a linux distribution in the native Wayœu language. Having tools like OpenOffice.org in small-group languages like Wayœu aids in preserving the cultural heritage of such languages and peoples. A really nice upshot of this was shown in the question period, where one of the leading OpenOffice.org evangelists first learned of the translation at this session... and the interchange will presumably lead to good promotion and advertisement for both OpenOffice.org (which is accessible to more native peoples than closed source software ever can or will be), and to SoLVe's leadership in education and cultural preservation efforts, in the developing world. I also attended a session by Karl Fogel on early the history of copyright. This talk was interesting, but I guess familiar enough to me. After the development of the printing press in Europe (or really, of course, its transplantion from China), governments like the British Crown granted monopoly control of printing press technology to a limited guild of printers. Rationalizations of the "moral rights of authors" grew out of the base reality that publishers want the state to subsidize their profits... with authors having never played much of a role in any of this. None of that was really surprising to me; even if I had not specifically known it, I would have predicted as much from my knowledge of social and economic history... a Ph.D. in political philosophy, like I have, actually wins you some decent insights into how politics and economics actually work. Still, I am certain the lesson was valuable for many listeners, and the analogies with current issues around blogs, filesharing, and FOSS are worth drawing. FRIDAY JULY 28, 2006: SECOND DAY: PYTHON 3000 One of the events I was especially looking forward to was Guido van Rossum's talk on what is coming up in Python 3.0. In truth, I knew there would not be anything in the talk that has not been discussed in more detail on the Python development lists, or that at least would be discussed there soon enough. Nonetheless, hearing the announcement from the BDFL himself carried a certain mystique. Unfortunately, Guido was developing a cold, or at least a cough, right about when he had to give his talk. So he had a trouble speaking without hoarseness. The presentation was still interesting: as the audience almost certainly hoped, he made a mildly comical disparagement of the Perl 6 process by way of comparisonÑbut strictly in the friendliest manner, obviously without any hostility or competitive sentiment towards the Perl coders. His comment though was that the Perl 6 methodology appeared to be for a group of developers to travel to a distant island, and remain there until they invented a new programming language. In a somewhat more serious tone, he also contrasted Python 3.0 with C++, where the latter is completely unwilling to accept even the smallest backwards-compatibility breakage. Guido described Python 3.0 as falling in the middle of these extremes. Moreover, our BDFL announced a pretty concrete schedule for 3.0: An alpha should be available near the beginning of 2007, with a release version before the end of the year. Python 2.6 will almost surely be released before the final 3.0, and the Python 2.x line will continue for a good while to overlap 3.0 (because 3.0 will not run all the older Python programs unmodified). Python 2.7 will probably contain some back-ports of 3.0 features, where they can be implemented without breakage; and 2.7 will also probably contain a collection of migration tools. Guido envisions migration as relying on two classes of tools: 1. Code analyzers along the lines of PyChecker and PyLint that can in many cases extract the intent of code, vis-a-vis the specific types of objects being handled. Most breakage will come about because particular types (think collections) behave somewhat differently than they used to. Guido gave the example of trying to determine whether f(x.keys()) represents code breakage. There are two points of concern here: (1)(a). Is x.keys() really a call to a method of a dictionary(-like) object, as you would tend to think? (1)(b). As mentioned below, this call on a dictionary will start returning either an iterator or a view in Python 3.0, rather than a fixed list. Depending on what you do with it, the change may or may not matter to the code in f(). I.e. if you just do "for thing in keys:", all is happy; if you mutate the expected list, problems occur. The exact fix is not generally automatable, since developers can reasonably want different behaviors in response to the change. 2. Warnings about likely changes. Presumably with 2.7 (and later 2.x versions), there will be a means of warning developers of constructs that are likely to cause porting issues. In the simplest case, this will include deprecated functions and syntax constructs. But presumably the warnings may cover "potential problems" like the above example. So what is going to be new? And what is going to be removed? Removal is interesting. Some basic redundancy like dct.has_key(x) is going away, since nowadays you write if x in dct anyway. A few other relatively painless things along the same lines happen also. But more interesting is the fact that lambda is not going anywere (it is also not being enhanced according to any of the numerous proposals). This little fact met with a surprising number of cheers (and probably some less audible rolled eyes among a different subset of the audience). Old style classes also go away, to everyone's approval; that is not 100% breakage free, but it is just simply a good thing. Similarly with the removal of string exceptions, and the creation of a BaseException ancestor of all exceptions. A little bit of syntax is simplified too. I will lose my dear <> version of inequality, but that is an awfully easy update. Some new feature include: 1. All strings become Unicode (breaky), and a new bytes type lets you encode mutable arrays of 8-bit bytes. Basically, one is "text" the other is "binary data". Accompanying this will probably be a variety of mechanisms to make I/O methods inherently handle Unicode, transparently deal with decoding on open(fname) and the like (and also things like seeks). 2. Inequality comparisons become even more breaky than they have been (see my recent Charming Python bemoaning inequalities). I have mixed feelings myself, but in a certain way I think it is a reasonable approach. Python will give up (most of) its willingness to guess about what coders intend when comparing unlike types of things. At least that adds consistency. Rather than sometimes-but-who-knows-when having sorts break, we can just assume they do not work unless collections are homogeneous, or unless heroic measures are taken in advance (but as a known requirement). 3. As expected, the move towards iterators and variations on lazy objects continues apace. List comprehensions do not go away, but they are direct synonyms (syntax sugar) for a list() call wrapped around a generator comphrehension. This changes the leakage of variables to surroudning scope, which is a good thing. There is more, some of it mildly incompatible. But overall it looks like a very conservative revision of the Python language, and one looking forward to the next 1000 years of Python programming (as Guido puts it). Another thing I completely failed to notice until Paul McGavin pointed it out to me: Guido said nary a word about optional type declarations. Given what a hot button this idea is, the lacuna was surprising. I would not necessarily be surprised to hear he had decided against it; but hearing nothing at all, either way, is curious. THURSDAY JULY 27, 2006: SECOND DAY: MICROFORMATS I had the opportunity to talk with Tim Bray for about a half hour this morning, as I have mentioned I would. He is an interesting guy, and I am going to scatter topics we spoke about over several of these entries rather than simply report his comments verbatim and linearly. One of the things I asked Tim about was a topic that Dethe Elza has addressed in a recent guest column for XML Matters: microformats. Despite the wonderful article Dethe wrote, I have a certain suspicion in my attitude towards microformats. Specifically, they strike me as a way to smuggle in a brand new schema definition embedded within an existing schema (e.g. XHTML), while pretending not to need a schema. What, after all, is so much clearer about writing