UnitTesting Ingredients: PyTest, Factory Boy, YAML and docstring

“[Separation of Concerns], even if not perfectly possible, is yet the only available technique for effective ordering of one’s thoughts, that I know of.” — Edsger W. Dijkstra http://deviq.com/separation-of-concerns/

Background

The project under test is a cinema ticket booking system. Users can issues certain queries related to schedules for upcoming movie showtimes. System models include:

  • Cinema1: The geographic places you would go to watch movies
  • Theater2: These are the little rooms inside each cinema
  • Movie: "Nativity", "Star Wars", "Passion of the Christ"...
  • Schedule: aka, showtimes

The focus of our interest should be the schedules.

The ingredients

PyTest

A Python unit testing facility which features:

  • Fixture dependency injection
  • Isolated
  • Composable
  • Plus unittest compatibility

See this slide for advanced features of PyTest. Also, I would recommend this site if you are really into testing, especially Python <3.

In my case, I use pytest dependency injection to inject Flask app test client, and dataset into each test methods.

class TestQuerySchedules():  
    def test_query_by_movie_title(self, client, dataset_saigon_weekend):
        response = client.get('/api/query')

YAML

If you have heard of JSON, then you should see YAML3. It is much friendlier than JSON and yet it is by no means less expressive than JSON. Hence, it is much easier to maintain especially you have thousands of LOC.

The following fragment of YAML presents a list of movies, each of which has code, title and status:

movies:  
-   code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
    title: The Fox
    status: 2
-   code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
    title: Star Wars
    status: 2

No curly braces, double quotes whatsoever! And it also looks very Pythonic <3. FYI, Google AppEngine uses .yaml files for application configurations.

Factory Boy

Initially I used Factory Boy to replace the needs for file-based fixtures. I do enjoy the concepts of building test fixtures with factory:

  • Using custom sequence to generate unique yet meaningful values
  • Faker to generate human friendly fields
  • Built-in integration with SQLAlchemy, Google Datastore, Django...
  • Fixture dependency support with SubFactory
class ScheduleFactory(SQLAlchemyModelFactory):  
    class Meta:
        model = Schedule
        sqlalchemy_session = db.session

    theater = SubFactory(TheaterFactory)
    movie = SubFactory(MovieFactory, status=Movie.STATUS_PUBLISHED)

While testing features, we do not really care about a field's value, we care more about the logicalness in such values. For example, a fixture with full name "Elton John", we would expect:

  • This is a person
  • This person's email is "elton.john@gmail.com"
  • This person's job is "singer"
  • He works at a company named "Rocket Music Entertainment Group"

Factory Boy stubs in default, meaningful values for fields unless you override it with one your own.

You can find more about Factory Boy and its inner working here

Docstring

In Python, docstrings are blocks of string right beneath a Python class/method/function quoted by triple quotation marks. The purpose of docstrings are to describe the class/method/function it belongs to.

def demo():  
    """
    Demo is short for demonstration
    """ 
    pass

It is very nice of Python <3 that it lets you access this piece of information out-of-the-box.

And yes - you can parse this block of text to PyYAML to complete the big picture of UnitTesting Ingredients: PyTest, Factory Boy, YAML and docstring

See this thread on Stack Overflow

See more on docstrings, PEP8 and PEP257.

The mix

You need to pip install PyYAML as an dependency of your project.

Now, in order to test our showtime query features, we really need a lot of data. Unlike other operations in CRUD, ad-hoc queries needs a manageable well-controlled dataset to verify whether or not such and such combination of filtering conditions would contain the correct subset of data while maintaining the constraints of data integrity enforced by DBMS.

In other words, you have to fake them consistently and fake a lot of them. I have given a few criteria of acceptance regarding our testing setup:

  • Manageable
  • Well controlled
  • Large dataset

Imagine all that can be achieved with the following chunk of text. You can skip it, tho. Just know that:

  • Three movies are created
  • Three cinemas are created, each has three theaters
  • Six schedules are created, three of which are approved
  • All are managed under a well-known name dataset_saigon_weekend
  • All are visible under one Python file test_schedule_api.py

in readable format, with ~133 LOC + data:

  • ~100 lines of data in YAML format
  • ~34 LOC. Let's call it an overhead
@pytest.fixture(scope='function')
def dataset_saigon_weekend(request, db):  
    """
    """
    movies:
    -   code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
        title: The Red Fox
        status: 2
    -   code: 10.5240/0067-DEFB-A9F6-DD23-70DA-2
        title: Kramus
        status: 2
    -   code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
        title: Star Wars
        status: 2

    cinemas:
    -   name: Galaxy Tan Binh
        group: Galaxy
        prefix: GALAXYTB
        district: Tân Bình
        city: Ho Chi Minh
        country: Vietnam
        status: 2
        theaters:
        -   code: GALAXYTB-T000001
            name: Theater One
        -   code: GALAXYTB-T000002
            name: Theater Two
        -   code: GALAXYTB-T000003
            name: Theater Three

    -   name: Galaxy Nguyen Trai
        group: Galaxy
        prefix: GALAXYNT
        district: Quan 1
        city: Ho Chi Minh
        country: Vietnam
        status: 2
        theaters:
        -   code: GALAXYNT-T000001
            name: Theater One
        -   code: GALAXYNT-T000002
            name: Theater Two
        -   code: GALAXYNT-T000003
            name: Theater Three

    -   name: Lotte Cong Hoa
        group: Lotte
        prefix: LOTTCONG
        district: Tân Bình
        city: Ho Chi Minh
        country: Vietnam
        status: 2
        theaters:
        -   code: LOTTCONG-T000001
            name: Theater One
        -   code: LOTTCONG-T000002
            name: Theater Two
        -   code: LOTTCONG-T000003
            name: Theater Three

    schedules:
    -   movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
        theater_code: GALAXYTB-T000001
        start_at: 2016-01-01 09:00:00 UTC
        end_at: 2016-01-01 10:30:00 UTC
        status: 2

    -   movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-2
        theater_code: GALAXYTB-T000002
        start_at: 2016-01-01 09:00:00 UTC
        end_at: 2016-01-01 10:30:00 UTC
        status: 2

    -   movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
        theater_code: GALAXYTB-T000003
        start_at: 2016-01-01 09:00:00 UTC
        end_at: 2016-01-01 10:30:00 UTC
        status: 2

    # The same movies are not published at Galaxy Nguyen Trai (GalaxyNT)
    -   movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-1
        theater_code: GALAXYNT-T000001
        start_at: 2016-01-01 09:00:00 UTC
        end_at: 2016-01-01 10:30:00 UTC
        status: 1

    -   movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-2
        theater_code: GALAXYNT-T000002
        start_at: 2016-01-01 09:00:00 UTC
        end_at: 2016-01-01 10:30:00 UTC
        status: 1

    -   movie_code: 10.5240/0067-DEFB-A9F6-DD23-70DA-3
        theater_code: GALAXYNT-T000003
        start_at: 2016-01-01 09:00:00 UTC
        end_at: 2016-01-01 10:30:00 UTC
        status: 1
    """

    from tests.fixtures.simplefactories import (CinemaFactory,
                                                TheaterFactory,
                                                MovieFactory,
                                                ScheduleFactory
                                                )
    for cinema in dataset['cinemas']:
        inserted = CinemaFactory(**{key:value for key, value in cinema.items() if key != 'theaters'})

        for theater in cinema['theaters']:
            theater['cinema'] = inserted
            TheaterFactory(**theater)

    for movie in dataset['movies']:
        MovieFactory(**movie)

    for schedule in dataset['schedules']:
        schedule['start_at'] = datetime.strptime(schedule['start_at'], DATETIME_FORMAT)
        schedule['end_at'] = datetime.strptime(schedule['end_at'], DATETIME_FORMAT)
        ScheduleFactory(**schedule)

Imagine how you would achieve the same goals otherwise. Keep in mind with this setup, we do not need to add more to the ~34 LOC as we load our dataset with a variety of more data.

Conclusion

Since I started with "Separation of Concerns", let me recap likewise: I have observed a few concerns while doing unit-testing:

  • Manageability concern
  • Controllability concern
  • Scalability concern
  • Readability concern

One must treat these as mutually orthogonal vectors. It is a must to do unit-testing. It is only a matter of how to keep our own sanity while maintaining the test cases. Keep the concerns separate as a change in one vector should not mess with others.

My special thanks to:

  • Holger Krekel (@hpk42) and pytest-dev team
  • Raphaël Barrois, Mark Sandstrom for Factory Boy
  • Kirill Simonov for PyYAML (249kB of awesomeness)
  • Guido van Rossum for the snake, I mean Python <3

This blog article is a part of an upcoming series: Building thebox: A cinema ticket booking system


  1. For the sake of giving real world object names while maintaining readers' sanity, let's say cinema is the house and

  2. ... theaters are the rooms inside

  3. http://www.yaml.org/spec/1.2/spec.html