Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.
#434 Most of OpenAI’s tech stack runs on Python
- Making PyPI’s test suite 81% faster
- People aren’t talking enough about how most of OpenAI’s tech stack runs on Python
- PyCon Talks on YouTube
- Optimizing Python Import Performance
- Extras
- Joke
About the show
Sponsored by Digital Ocean: pythonbytes.fm/digitalocean-gen-ai Use code DO4BYTES and get $200 in free credit
Connect with the hosts
- Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky)
- Brian: @brianokken@fosstodon.org / @brianokken.bsky.social
- Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky)
Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too.
Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it.
Brian #1: Making PyPI’s test suite 81% faster
- Alexis Challande
- The PyPI backend is a project called Warehouse
- It’s tested with pytest, and it’s a large project, thousands of tests.
- Steps for speedup
- Parallelizing test execution with pytest-xdist
- 67% time reduction
- --numprocesses=auto allows for using all cores
- DB isolation - cool example of how to config postgress to give each test worker it’s on db
- They used pytest-sugar to help with visualization, as xdist defaults to quite terse output
- Use Python 3.12’s sys.monitoring to speed up coverage instrumentation
- 53% time reduction
- Nice example of using COVERAGE_CORE=sysmon
- Optimize test discovery
- Always use testpaths
- Sped up collection time. 66% reduction (collection was 10% of time)
- Not a huge savings, but it’s 1 line of config
- Eliminate unnecessary imports
- Use python -X importtime
- Examine dependencies not used in testing.
- Their example: ddtrace
- A tool they use in production, but it also has a couple pytest plugins included
- Those plugins caused ddtrace to get imported
- Using -p:no ddtrace turns off the plugin bits
- Parallelizing test execution with pytest-xdist
- Notes from Brian:
- I often get questions about if pytest is useful for large projects.
- Short answer: Yes!
- Longer answer: But you’ll probably want to speed it up
- I need to extend this article with a general purpose “speeding up pytest” post or series.
- -p:no can also be used to turn off any plugin, even builtin ones.
- Examples include
- nice to have developer focused pytest plugins that may not be necessary in CI
- CI reporting plugins that aren’t needed by devs running tests locally
- Examples include
Michael #2: People aren’t talking enough about how most of OpenAI’s tech stack runs on Python
- Original article: Building, launching, and scaling ChatGPT Images
- Tech stack: The technology choices behind the product are surprisingly simple; dare I say, pragmatic!
- Python: most of the product’s code is written in this language.
- FastAPI: the Python framework used for building APIs quickly, using standard Python type hints. As the name suggests, FastAPI’s strength is that it takes less effort to create functional, production-ready APIs to be consumed by other services.
- C: for parts of the code that need to be highly optimized, the team uses the lower-level C programming language
- Temporal: used for asynchronous workflows and operations inside OpenAI. Temporal is a neat workflow solution that makes multi-step workflows reliable even when individual steps crash, without much effort by developers. It’s particularly useful for longer-running workflows like image generation at scale
Michael #3: PyCon Talks on YouTube
- Some talks that jumped out to me:
- Keynote by Cory Doctorow
- 503 days working full-time on FOSS: lessons learned
- Going From Notebooks to Scalable Systems
- And my Talk Python conversation around it. (edited episode pending)
- Unlearning SQL
- The Most Bizarre Software Bugs in History
- The PyArrow revolution in Pandas
- And my Talk Python episode about it.
- What they don't tell you about building a JIT compiler for CPython
- And my Talk Python conversation around it (edited episode pending)
- Design Pressure: The Invisible Hand That Shapes Your Code
- Marimo: A Notebook that "Compiles" Python for Reproducibility and Reusability
- And my Talk Python episode about it.
- GPU Programming in Pure Python
- And my Talk Python conversation around it (edited episode pending)
- Scaling the Mountain: A Framework for Tackling Large-Scale Tech Debt
Brian #4: Optimizing Python Import Performance
- Mostly pay attention to #'s 1-3
- This is related to speeding up a test suite, speeding up necessary imports.
- Finding what’s slow
- Use python -X importtime <the reset of the command
- Ex: python -X importtime ptyest
- Techniques
- Lazy imports
- move slow-to-import imports into functions/methods
- Avoiding circular imports
- hopefully you’re doing that already
- Optimize __init__.py files
- Avoid unnecessary imports, heavy computations, complex logic
- Lazy imports
- Notes from Brian
- Some questions remain open for me
- Does module aliasing really help much?
- This applies to testing in a big way
- Test collection imports your test suite, so anything imported at the top level of a file gets imported at test collection time, even if you only are running a subset of tests using filtering like -x or -m or other filter methods.
- Run -X importtime on test collection.
- Move slow imports into fixtures, so they get imported when needed, but NOT at collection.
- Some questions remain open for me
- See also:
- option -X in the standard docs
- Consider using import_profile
Extras
Brian:
- PEPs & Co.
- PEP is a ‘backronym”, an acronym where the words it stands for are filled in after the acronym is chosen. Barry Warsaw made this one up.
- There are a lot of “enhancement proposal” and “improvement proposal” acronyms now from other communities
- pythontest.com has a new theme
- More colorful. Neat search feature
- Now it’s excruciatingly obvious that I haven’t blogged regularly in a while
- I gotta get on that
- Code highlighting might need tweaked for dark mode
Michael:
Joke: There is hope.