futureutils — Introduces futures and promises into iterators

This module provides several trivial wrappers of iterable object that make iterators running in parallel easily and simply. Why its name is futureutils is that it introduces the concept of futures and promises into Python iterators and generators.

It works well on Python 2.5+ — tested on CPython 2.5+, PyPy 1.4+, IronPython 2.6+. (Unfortunately there’s no plan to support Python 3+ yet, but see PEP 3148 also.)

Installation

The easiest way to install futureutils is using pip or easy_install.

$ pip install futureutils  # or
$ easy_install futureutils

You can install it from the source code in Mercurial repository also, if you want:

$ hg clone https://bitbucket.org/dahlia/futureutils
$ cd futureutils/
futureutils$ python setup.py install

Usage

There are only two functions in the module: promise() and future_generator(). The former is a lower but more general interface. The latter is a decorator-style higher interface, cannot be used for all iterators but only for generator functions.

If your iterator is a generator which is made by a generator function, just use future_generator() decorator:

import lxml.html
from futureutils import *

@future_generator
def list_hrefs(url):
    html = lxml.html.parse(url)
    for href in html.xpath('//a[@href]/@href'):
        href = href.strip()
        if href and not href.startswith('#'):
            yield href

Then, iterators made by your generator function are automatically running in parallel and yield items like normal iterators. Whatever you apply future_generator() decorator to your generator function or not, its behavior is the same always. Because it doesn’t change its semantics, but only efficiency.

Warning

If your generator function already is parallelized, its semantics could be probably changed, so be careful in this case.

If your iterator is not a generator, you could use promise() function. What it does is very simple: takes an iterable object then returns a wrapping iterator.

import lxml.html
from futureutils import *

def list_hrefs(url):
    html = lxml.html.parse(url)
    for href in html.xpath('//a[@href]/@href'):
        href = href.strip()
        if href and not href.startswith('#'):
            yield href

iterator = list_hrefs('http://dahlia.kr/')
parallelized_iterator = promise(iterator)

Read the following API references for details.

API

futureutils.DEFAULT_BUFFER_SIZE

Promised iterators have their own buffer queue internally, and every queue has their maximum size. It intends to avoid wasting memory unlimitedly in case of infinite iterators.

This constant is a default size of a queue.

futueutils.SIGNAL_YIELD
futueutils.SIGNAL_RAISE
futueutils.SIGNAL_RETURN
futueutils.SIGNAL_CONTINUE

The internal-use only flag constants.

futureutils.promise(iterable, buffer_size=100)

Promises the passed iterable object and returns its future iterator.

>>> import time, datetime
>>> def myiter():
...     for x in xrange(5):
...         yield x
...         time.sleep(0.5)
...
>>> it = promise(myiter())
>>> time.sleep(2)
>>> start = datetime.datetime.now()
>>> list(it)
[0, 1, 2, 3, 4]
>>> delta = datetime.datetime.now() - start
>>> delta.seconds
0
>>> delta.microseconds > 500000
True

It could be used for simple parallelization of IO-bound iterable objects.

It propagates an inner exception during iteration also as well as a normal iterator:

>>> def pooriter():
...     yield 1
...     raise Exception('future error')
...
>>> it = promise(pooriter())
>>> it.next()
1
>>> it.next()
Traceback (most recent call last):
  ...
Exception: future error

It can deal with infinite iterators as well also:

>>> import itertools
>>> it = promise(itertools.cycle('Hong Minhee '))
>>> ''.join(itertools.islice(it, 23))
'Hong Minhee Hong Minhee'

Every future iterator has its own buffer queue that stores iterator’s result internally, and every queue has their maximum size. It intends to avoid wasting memory unlimitedly in case of infinite iterators. You can tune the queue buffer size through buffer_size option.

>>> import itertools
>>> def infloop():
...     i = 0
...     while True:
...         print i
...         yield i
...         i += 1
...
>>> list(itertools.islice(promise(infloop(), buffer_size=5), 5)
... )  
0
1
2
3
4
[0, 1, 2, 3, 4]
Parameters:
  • iterable (iterable object) – an iterable object to promise
  • buffer_size (int(), long()) – it has its own buffer queue that stores iterator’s result internally, and every queue has their maximum size. it intends to avoid wasting too many memory. by default it follows the constant DEFAULT_BUFFER_SIZE
Returns:

a promised future iterator

Return type:

iterable object

See also

Decorator future_generator()

futureutils.future_generator(function)

The decorator that makes the result of decorated generator function to be promised and return a future iterator.

It’s a simple decorator wrapper of promise() for generator functions.

>>> import time, datetime
>>> @future_generator
... def mygenerator():
...     for x in xrange(5):
...         yield x
...         time.sleep(0.5)
...
>>> it = mygenerator()
>>> time.sleep(2)
>>> start = datetime.datetime.now()
>>> list(it)
[0, 1, 2, 3, 4]
>>> delta = datetime.datetime.now() - start
>>> delta.seconds
0
>>> delta.microseconds > 500000
True
Parameters:function (callable object) – a generator function to make to future generator
Returns:a future generator function
Return type:callable object

See also

Function promise()

Author and distribution

It is written by Hong Minhee.

The source code is distributed under MIT license, and can be found from Mercurial repository.

$ hg clone https://bitbucket.org/dahlia/futureutils

Reporting bugs always welcome, visit the issue tracker.