Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.

#288 Performance benchmarks for Python 3.11 are amazing

June 14, 2022 00:33:05 27.91 MB Downloads: 0

Watch the live stream:

Watch on YouTube

About the show

Sponsored by us! Support our work through:

Brian #1: Polars: Lightning-fast DataFrame library for Rust and Python

  • Suggested by a several listeners
  • “Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.
    • Lazy | eager execution
    • Multi-threaded
    • SIMD (Single Instruction/Multiple Data)
    • Query optimization
    • Powerful expression API
    • Rust | Python | ...”
  • Python API syntax set up to allow parallel and execution while sidestepping GIL issues, for both lazy and eager use cases. From the docs: Do not kill parallelization
  • The syntax is very functional and pipeline-esque:

    import polars as pl
        q = (
            pl.scan_csv("iris.csv")
            .filter(pl.col("sepal_length") > 5)
            .groupby("species")
            .agg(pl.all().sum())
        )
        df = q.collect()
    
  • Polars User Guide is excellent and looks like it’s entirely written with Python examples.

  • Includes a 30 min intro video from PyData Global 2021

Michael #2: PSF Survey is out

  • Have a look, their page summarizes it better than my bullet points will.

Brian #3: Gin Config: a lightweight configuration framework for Python

  • Found through Vincent D. Warmerdam’s excellent intro videos on gin on calmcode.io
  • Quickly make parts of your code configurable through a configuration file with the @gin.configurable decorator.
  • It’s in interesting take on config files. (Example from Vincent)

        # simulate.py
        @gin.configurable
        def simulate(n_samples):
          ...
        # config.py
        simulate.n_samples = 100
    
  • You can specify:

    • required settings: def simulate(n_samples=gin.REQUIRED)`
    • blacklisted settings: @gin.configurable(blacklist=["n_samples"])
    • external configurations (specify values to functions your code is calling)
    • can also references to other functions: dnn.activation_fn = @tf.nn.tanh
  • Documentation suggests that it is especially useful for machine learning.
  • From motivation section:
    • “Modern ML experiments require configuring a dizzying array of hyperparameters, ranging from small details like learning rates or thresholds all the way to parameters affecting the model architecture.
    • Many choices for representing such configuration (proto buffers, tf.HParams, ParameterContainer, ConfigDict) require that model and experiment parameters are duplicated: at least once in the code where they are defined and used, and again when declaring the set of configurable hyperparameters.
    • Gin provides a lightweight dependency injection driven approach to configuring experiments in a reliable and transparent fashion. It allows functions or classes to be annotated as @gin.configurable, which enables setting their parameters via a simple config file using a clear and powerful syntax. This approach reduces configuration maintenance, while making experiment configuration transparent and easily repeatable.”

Michael #4: Performance benchmarks for Python 3.11 are amazing

  • via Eduardo Orochena
  • Performance may be the biggest feature of all
  • Python 3.11 has
    • task groups in asyncio
    • fine-grained error locations in tracebacks
    • the self-type to return an instance of their class
  • The "Faster CPython Project" to speed-up the reference implementation.
    • See my interview with Guido and Mark: talkpython.fm/339
    • Python 3.11 is 10~60% faster than Python 3.10 according to the official figures
    • And a 1.22x speed-up with their standard benchmark suite.
  • Arriving as stable until October

Extras

Michael:

Joke: Why wouldn't you choose a parrot for your next application