David Mertz Conference Speaking

Summary

2006: OSCon 2006, Open Source Voting
2007: OSCon 2007, Open Source Voting (revisited)
2010: Pycon 2010, Maximize your program's laziness
2012: PyCon 2012, Coroutines, event loops, and the history of Python generators; OSCon 2012, US Patriot Act and implications for Cloud Computing & Data Privacy; PyCon-India 2012, Keynote Address: A verifiable election system
2013: PyCon 2013, Why you should use Python 3 for text processing; PyCon-UK 2013, Keynote Address: What I learned about Python – and about Guido's time machine – by reading the python-ideas mailing list
2014: PyCon-ZA 2014, Keynote Address: What I learned about Python – and about Guido's time machine – by reading the python-ideas mailing list; Los Angeles Professional Python Users Group, PyPy-STM
2015: PyCon Belarus 2015, Keynote Address: Python's (future) type annotation system(s)
2016: Encuentro Social de Desarrolladores (Cuba), Functional Programming in Python; Conferencia Internacional de Software Libre 2016 (Cuba), Reflections on teaching Python to working scientists; PyCon 2016 (Education Summit), Reflections on teaching Python to working scientists; PyData SF 2016, Keynote Address: Working Efficiently with Big Data in Text Formats
2017: PyData Seattle 2017, Tutorial: Parallelizing Scientific Python with Dask, with Jim Christ
2018
2019: PiterPy 2019, Interview with PiterPy Organizer; PiterPy 2019, An Introduction to Generative Adversarial Networks with PyTorch

OSCon 2006

Open Source Voting

About: http://conferences.oreillynet.com/cs/os2006/view/e_sess/8606

Arthur Keller, University of California, Santa Cruz
David Mertz, CTO, Open Voting Consortium

The hanging chads of 2000 showed that America's voting systems were out of date and unreliable. Yet, the electronic voting systems widely adopted since then are even worse. In the 2004 elections, nearly 50 million votes existed only in alterable electronic form. The software that processed them would have made tampering easy. Both the election data and the software were hidden from public view. There were serious allegations of fraud, but no possibility of a public audit to resolve them.

Electoral fraud disenfranchises everyone. To preserve the right to vote, the Open Voting Consortium (OVC) is working to establish a voting system worthy of public trust. While protecting voter anonymity, this Open Voting system makes all data and software auditable, publicly inspectable, permanent, and tamper-proof. The proposed project will develop the software and data systems needed to tabulate countywide voting. This project is a vital enabling step in a larger campaign, teaming OVF and OVC with government, business, and universities, to make open voting the norm in American elections.

In April 2004, OVC publicly demonstrated an open source precinct voting system. The system included:

an electronic voting machine, accessible through either a touch screen or an auditory interface, which printed paper ballots and maintained an electronic audit trail,
a ballot verification system that allowed the visually impaired to hear the selections on their ballots, and
a ballot reconciliation system that compared the paper ballots with the electronic audit trail, and accounted for spoiled ballots.

We are developing a secure, reliable, auditable vote tabulation system that covers five main functions: security, auditing, vote tabulation, bulk optical ballot scanning, and web-based vote tally reporting.

OSCon 2007

Open Source Voting

About: http://conferences.oreillynet.com/cs/os2006/view/e_sess/8606

Arthur Keller, University of California, Santa Cruz
Fred McLain
David Mertz, CTO, Open Voting Consortium

The hanging chads of 2000 showed that America's voting systems were out of date and unreliable. Yet, the electronic voting systems widely adopted since then are even worse. In the 2004 elections, nearly 50 million votes existed only in alterable electronic form. Suspicious undervotes on the electronic voting machines affected the outcome of several congressional elections in 2006. The software that processed them would have made tampering easy. Both the election data and the software were hidden from public view. There were serious allegations of fraud, but no possibility of a public audit to resolve them.

In April 2004, OVC publicly demonstrated an open source precinct voting system. The system included:

An electronic voting machine, accessible through either a touch screen or an auditory interface, that printed paper ballots and maintained an electronic audit trail,
A ballot verification system that allowed the visually impaired to hear the selections on their ballots, and
A ballot reconciliation system that compared the paper ballots with the electronic audit trail, and accounted for spoiled ballots.

We will update last year's presentation with progress in the development of a secure, reliable, auditable vote tabulation system that covers five main functions: security, auditing, vote tabulation, bulk optical ballot scanning, and web-based vote tally reporting.

Pycon 2010

Maximize your program's laziness

Video: http://www.talkminer.com/viewtalk.jsp?videoid=bliptv3259746&q=

Slides: http://gnosis.cx/publish/Laziness.pdf

David Mertz

Summary

The cheapest, fastest and most reliable components of a computer system are those that aren't there" has a parallel in data structures. The fastest, most parsimonious, and best performing data structure is one which is never concretized within a program run. A promise to create data when, or if, it is needed is often easy to make without needing to realize the data computationally.

The addition of iterators and generators to Python during the 2.x series, and their more systematic use in 3.x, provides an easy way to work with lazy computation. Using these facilities well can improve program performance, often in terms of big-O complexity even. However, sometimes more complex lazy data structures require special design in order to encapsulate more complex promises than one can make with list-like iterators.

Description

Talk outline

5 minute review of laziness, with examples from functional programming language(s).
8 minute review of iterators and itertools.
3 minute review of generators and generator expressions.
3 minute review of memoization and weakrefs.
10 minute case study on laziness in computation components in acyclical direct graph.
1 minute wrap up of miscellaneous exoterica.

PyCon 2012

Coroutines, event loops, and the history of Python generators

Slides: http://gnosis.cx/publish/Generators.pdf

Video: http://pyvideo.org/video/668/coroutines-event-loops-and-the-history-of-pytho

David Mertz

Description

This talk traces lightweight concurrency from Python 2.2's generators, which enabled semi-coroutines as a mechanism for scheduling "weightless" threads; to PEP 342, which created true coroutines, and hence made event-driven programming easier; to 3rd party libraries built around coroutines, from older GTasklet and peak.events to the current Greenlet/gevent and Twisted Reactor.

Abstract

This talk aims to provide both a practical guide and theoretical underpinnings to the use of generator-based lightweight concurrency in Python.

Lightning tour of generator constructs. Why generator-based scheduling is particularly useful for event-based programming.
Simple example of a "trampoline" or scheduler.
Slightly fleshed out example of scheduler with discussion of data-passing issues.
Examples using GTasklet to make coroutine code look more like familiar sequential code (the framework is based on greenlets rather than generators, but accomplishes similar purpose).
Brief examples of Twisted Reactors and Deferreds.
Limits of generator-based concurrency (i.e. doesn't help with multiple cores and multiple servers). "Throw at the wall" list of ways to generalize to larger scales than single cores.

OSCon 2012

US Patriot Act and implications for Cloud Computing & Data Privacy

About: http://www.oscon.com/oscon2012/public/schedule/detail/23880

Slides:

Diane Mueller (ActiveState),
David Mertz (IBM developerWorks)

Is the US Patriot Act causing you to hesitate on leveraging the cloud in your enterprise? Do you want to leverage the power of cloud computing but unsure what the security and privacy implications are for sensitive corporate data?

Organizations are thinking long and hard about the legal and regulatory implications of cloud computing. When it comes to actual corporate data, no matter what the efficiency gains are, legal departments are often directing IT departments to steer clear of any service that eliminates their ability to keep potential sensitive information out of the hands of Federal prosecutors.

Despite all the hype about every application moving into the cloud, some practical patterns are starting to emerge in the types of data corporations are willing to move to the cloud. Learn how to create a secure, compliant, private platform and cloud for developing, distributing and managing enterprise applications.

I will cover: Introduction to the US Patriot Act and Data Privacy issues Implications for on Cloud Computing Jurisdictional Issues Best Practices & Practical Patterns Classes of applications that best leverage the cloud What types of applications should stay on-premise Private Cloud Model(s) Building a Compliant Cloud Strategy

PyCon-India 2012

Keynote Address

A verifiable election system

Video: http://www.youtube.com/watch?v=EJseJV6RLUg

Slides: http://www.slideshare.net/LuluLotus/election-security

The first part of this talk addressed Python Software Foundation administrative matters. The rest of this talk was about voting systems. It doesn't have anything to do with the PSF per se, although the PSF has used the described method.

This talk doesn't really have much to do with Python either. The systems presented and implemented were written in Python, but you could do it in a different programming language. ... for that matter, you could do most of it with pencils and paper.

PyCon 2013

Why you should use Python 3 for text processing

Slides: https://speakerdeck.com/pyconslides/why-you-should-use-python-3-for-text-processing-by-david-mertz

Video: http://pyvideo.org/video/1704/why-you-should-use-python-3-for-text-processing

David Mertz

Description

Python is a great language for text processing. Each new version of Python--but especially the 3.x series--has enhanced this strength of the language. String (and byte) objects have grown some handy methods and some built-in functions have improved or been added. More importantly, refinements and additions have been made to the standard library to cover the most common tasks in text processing.

Abstract

This talk, by its nature, will be a somewhat impressionistic review of nice-to-have improvements to text processing that have come to python--in part in the long time frame since my book on the topic, but with an emphasis on 3.x features.

Improvements to collections help with many things, but seem to come up particularly often as nice ways to do text processing tasks: e.g. namedtuple; Counter; OrderedDict; defaultdict.
Lots of improvements and rationalization of email package (mailbox too).
Unicode handling--sometimes an important aspect of text processing--remains unwieldy, but has at least entered the domain of "possible to do right" (usually).
Codecs improvements
Relatively old but continues to improve: textwrap.
ElementTree as standard library high-level option for XML handling (with various tweaks in 3.x version).
str.format(); technically back ported to 2.x also, but a good option that wasn't in historical python versions.
Miscellaneous improvements to datetime.
logging has become good enough that it should be a standard tool for logging (also backported generally).
hashlib
csv improvements.
Not only in 3.x, but json as a standard module is wonderful for serialization and data sharing.
Ancient but little known tip: use str.startswith([list,of,values]).

PyCon-UK 2013

Keynote Address

What I learned about Python – and about Guido's time machine – by reading the `python-ideas` mailing list

Video: http://youtu.be/t0Nyk5WSGuE

Slides: http://gnosis.cx/pycon-uk-2013/Keynote-Ideas.pdf

Opening keynote address by David Mertz at PyCon-UK 2013. Discusses threads on the python-ideas mailing list, and takes as example variations on the built-in sum() function to reveal surprising subtleties.

PyCon-ZA 2014

Keynote Address

What I learned about Python – and about Guido's time machine – by reading the `python-ideas` mailing list

Video: http://pyvideo.org/pycon-za-2014/what-i-learned-about-python-and-about-guidos-t.html

Slides: http://gnosis.cx/pycon-za-2014/Keynote-Ideas.pdf

Keynote address by David Mertz at PyCon-ZA 2014. Discusses threads on the python-ideas mailing list, and takes as example variations on the built-in sum() function to reveal surprising subtleties.

Closing Remarks

PSF Membership Changes

Video: http://pyvideo.org/pycon-za-2014/pyconza-2014-closing-remarks.html

Slides: http://gnosis.cx/pycon-za-2014/PSF-PyCon-ZA.pdf

Los Angeles Professional Python Users Group

PyPy-STM

Description

PyPy is a compliant alternative implementation of the Python language, written in Python... It's fast!
STM is a "concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization."
A win in using PyPy-STM is that code that looks like threaded Python code can actually execute (optimistically) on multiple CPU cores.
"[A]lthough running on multiple cores in parallel, pypy-stm gives the illusion that threads are run serially"

Slides: http://gnosis.cx/Python-LA-2014/PyPy-STM.pdf

PyCon Belarus 2015

Keynote Address

Python's (future) type annotation system(s)

Video: https://youtu.be/8QIYPws-51A

Slides: http://gnosis.cx/pycon-by-2015/Type-Annotations.pdf

Abstract:

Python is a dynamically (but strongly, for some value of "strongly") typed programming language. Notwithstanding its dynamism, checking types--or other behaviors--of variables has always been possible in Python code, and a steady stream of users have had a desire to do so.

At a conceptual level, enforcing a type is a subset of enforcing an invariant on a variable, and the broader demand for design by contract has been a recurrent theme in Python discussions. PEP 316 addressed this desire (but was not accepted) a decade ago, as did the long defunct library PyDBC. Currently maintained, however, is the PyContracts library, which allows documenting and enforcing both types narrowly, and predicates of variables more broadly. I myself wrote a simple recipe for basic type checking using PEP 3107 annotations at the Python Cookbook: Type checking using Python 3.x annotations (http://code.activestate.com/recipes/578528-type-checking-using-python-3x-annotations/).

Both assert statements and guarded conditionals are often used to declare and assure a type or other invariant. Enforcing a type can either be done strictly using isinstance(x, Type) or issubclass(X, Type); or via duck typing, often in the form of hasattr(x, method). The introduction of abstract base classes (ABCs) in Python 2.6 added an inheritance-based approach to duck typing, including the ability to register virtual subclasses. Of course, one can assert or do a boolean test on any predicate as well.

The Mypy tool operated a bit differently than other type checking or invariant libraries. There are two elements to this difference:

It performs its type checking statically prior to a program actually running, using PEP 3107 annotations that do not otherwise affect the runtime behavior of a program.
It has a relatively rich type system, inspired much more by the recursive typing of Haskell and functional programming than by the simple types in C (C++ template typing stands somewhere in the middle of this, perhaps in a similar place as Mypy). The library reticulated has a similar system to Mypy.

In August of 2014, our BDFL, GvR, proposed "blessing" Mypy's approach to type annotations as the official purpose of function annotations for Python 3.6+. Such a blessing is intended to lead to more widespread use of type annotations, and a standard about their syntax is intended to allow for an ecosystem of tools beyond Mypy to take advantage of the provided information.

Encuentro Social de Desarrolladores (Cuba)

Functional Programming in Python

An introduction to using functional programming techniques in Python presented to the Havana Python User Group. Presentation based on and utilizes teaching materials developed for Continuum Analytics.

Notebooks: http://gnosis.cx/pycon-cuba-2016/Functional_Programming_in_Python/

Conferencia Internacional de Software Libre 2016 (Cuba)

Reflections on teaching Python to working scientists

Slides and a live demonstration of some teaching notebooks from Continuum Analytics at the end of the talk

Slides: http://gnosis.cx/talks/pycon-cuba-2016/Teaching-Scientists.pdf

Notebooks: http://gnosis.cx/pycon-cuba-2016/Scientific_Programming_using_Anaconda/

PyData SF 2016

Keynote Address

Working Efficiently with Big Data in Text Formats

Abstract

In an ideal world, all our large datasets would live in well optimized storage formats, such as RDBMS's, key-value NoSQL stores, HDF5 hierarchical datasets, or other formats that are well typed and fast to access. In our actual world, a great deal of our data lives in CSV, flat-file, or JSON formats, roughly stored on file systems, with little typing of data values. Moreover, data in these formats often have variably sized records making seeking data a linear scan operation.

Continuum Analytics has produced a custom optimized library called IOPro that includes a component called TextAdapter. TextAdapter provides abstractions to data access into these textual formats that adds much better data typing, minimizes memory use, uses indexing for seeking, and other facilities for better, faster data access without requiring conversion of exploratory datasets into permanent optimized formats. We will be releasing this code as an Open Source project, and plan on enhancing the library to allow further performance optimizations and integration with the Dask project.

As well as looking at technical and performance details of TextAdapter, this talk will discuss the economic and social concerns of company developed and supported Open Source projects. Continuum continues to explore some of these issues through our release of TextAdapter, following on company trajectory of moving projects from proprietary to open source status whenever reasonable.

Slides: http://gnosis.cx/talks/pydata-sf-2016/TextAdapter.pdf

Video: https://www.youtube.com/watch?v=qZy-9chm3dk

PyData Seattle 2017

Tutorial

Parallelizing Scientific Python with Dask

Description

Dask is a flexible tool for parallelizing Python code on a single machine or across a cluster. It builds upon familiar tools in the PyData ecosystem (e.g. NumPy and Pandas) while allowing them to scale across multiple cores or machines. This tutorial will cover both the high-level use of dask collections, as well as the low-level use of dask graphs and schedulers.

We can think of dask at a high and a low level:

High level collections: Dask provides high-level Array, Bag, and DataFrame collections that mimic and build upon NumPy arrays, Python lists, and Pandas DataFrames, but that can operate in parallel on datasets that do not fit into main memory.

Low Level schedulers: Dask provides dynamic task schedulers that execute task graphs in parallel. These execution engines power the high-level collections mentioned above but can also power custom, user-defined workloads to expose latent parallelism in procedural code. These schedulers are low-latency and run computations with a small memory footprint.

Different users operate at different levels but it is useful to understand both. This tutorial will cover both the high-level use of dask.array and dask.dataframe and the low-level use of dask graphs and schedulers. Attendees will come away able to use dask.delayed to parallelize existing code, understanding the differences between the dask schedulers, and when to use one over another, and with a firm understanding of the different dask collections (dask.array and dask.dataframe) and how and when to use them.

Video: https://www.youtube.com/watch?v=VAuFSo2cIhs&t=988s

PiterPy 2018

Conference Talk

An Introduction to Generative Adversarial Networks with PyTorch

Description

A GAN (Generative Adversarial Network) is a recent and powerful idea in design of neural networks. While a GAN is technically a form of unsupervised learning, it cleverly captures much of the power of supervised learning models. These models seem to have been used most widely in image generation contexts, but there is no reason they cannot be applied equally to other domains. When applied to images, GAN's often produce "surreal" and sometimes disturbing resemblances to real images.

It takes just a few lines of PyTorch code to create the "dueling networks" that make up a GAN, and this talk walks attendees through that.

Slides: http://gnosis.cx/talks/PiterPy-2019/SimpleGAN.slides.html

Video: https://www.youtube.com/watch?v=MZy6BgAfVBE

Interview: Interview with PiterPy Organizer