Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.

Similar Podcasts

24H24L

24H24L
Evento en línea, de 24 horas de duración que consiste en la emisión de 24 audios de diversas temáticas sobre GNU/Linux. Estos son los audios del evento en formato podcast.

The Infinite Monkey Cage

The Infinite Monkey Cage
Brian Cox and Robin Ince host a witty, irreverent look at the world through scientists' eyes.

Talking Kotlin

Talking Kotlin
A bimonthly podcast that covers the Kotlin programming language by JetBrains, as well as related technologies. Hosted by Hadi Hariri

#285 Where we talk about UIs and Python

May 25, 2022 00:50:54 42.93 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored: RedHat: Compiler Podcast Special guests Mark Little Ben Cosby Michael #1: libgravatar A library that provides a Python 3 interface to the Gravatar APIs. If you have users and want to show some sort of an image, Gravatar is OK PyPI uses this for example (gravatar, not necessarily this lib) Usage: >>> g = Gravatar('myemailaddress@example.com') >>> g.get_image() 'https://www.gravatar.com/avatar/0bc83cb571cd1c50ba6f3e8a78ef1346' Brian #2: JSON to Pydantic Converter Suggested by Chun Ly, “this awesome JSON to @samuel_colvin's pydantic is so useful. It literally saved me days of work with a complex nested JSON schema.“ “JSON to Pydantic is a tool that lets you convert JSON objects into Pydantic models.” It’s a live site, where you can plop JSON on one the left, and Pydantic models show up on the right. There’s a couple options: Specify every field as Optional Alias camelCase fields as snake_case It’s also an open source project, built with FastAPI, Create React App, and a project called datamodel-code-generator. Mark #3: tailwindcss, tailwindui Not python, but helpful for web UI and open source business model example tailwindcss generates CSS Used on the Lexchart app Benefits of tailwindcss and tailwindui: Just-in-Time makes it fast. Output includes only classes used for the project. Stand on shoulders of design thinking from Steve Schoger and Adam Wathan. See also refactoingui.com. Use in current projects without CSS conflicts. Custom namespace with prefix in tailwind.config.js. Bonus: custom namespace prefixes work with the tailwind plug-ins for VS Code and PyCharm. Works well with template engines like, Chameleon. We use tailwind for our app UI. Toolbar template example. Another example of docs and tutorials being a strategic business asset. Resources tailwindcss.com tailwindlabs on YouTube, great tutorials from Simon at Tailwind Beginner friendly tutorials: Thirus, example of tailwind install methods Michael #4: PEP 690 – Lazy Imports From Itamar Discussion at https://discuss.python.org/t/pep-690-lazy-imports/15474 PEP proposes a feature to transparently defer the execution of imported modules until the moment when an imported object is used. PEP 8 says imports go a the top, that means you pay the full price of importing code This means that importing the main module of a program typically results in an immediate cascade of imports of most or all of the modules that may ever be needed by the program. Lazy imports also mostly eliminate the risk of import cycles or crashes. The implementation in this PEP has already demonstrated startup time improvements up to 70% and memory-use reductions up to 40% on real-world Python CLIs. Brian #5: Two small items pytest-rich Suggested by Brian Skinn Created by Bruno Oliveira as a proof of concept pytest + rich, what’s not to love? Now we just need a maintainer or two or three…. Embedding images in GitHub README Suggested by Henrik Finsberg Video by Anthony Sottile This is WITHOUT putting the image in the repo. Upload or drop an image to an issue comment. Don’t save the comment, just wait for GitHub to upload it to their CDN. GH will add a markdown link in the comment text box with a link to the now uploaded image. Now you can use that image in a README file. You can do the same while editing the README in the online editor. Ben #6: pyotp A library for generating and verifying one-time passwords (OTP). Helpful for implementing multi-factor authentication (MFA) in web applications. Supports HMAC-based one-time passwords (HOTP) and time-based one-time passwords (TOTP). While HOTP delivered via SMS text messages is a common approach to implementing MFA, SMS is not really secure. TOTP using an authenticator app on the user’s device such as Google Authenticator or Microsoft Authenticator is more secure, fairly easy to implement, and free (no SMS messaging fees and multiple free authenticator apps available for users). TOTP works best by making a QR code available to simplify the setup for the user in their authenticator app. Lots of easy to implement QR code generators to choose from (qrcode is a popular one if you use javascript on the front end). TOTP quick reference: import pyotp def generate_shared_secret(): # securely store this shared secret with user account data return pyotp.random_base32() def generate_provisioning_uri(secret, email): # generate uri for a QR code from the user's shared secret and email address return pyotp.totp.TOTP(secret).provisioning_uri(name=email, issuer_name='YourApp') def verify_otp(secret, otp): # verify user's one-time password entry with their shared secret totp = pyotp.TOTP(secret) return totp.verify(otp) Extras Brian: PyConUS 2022 videos now up A few more Python related extensions for VSCode black, pylint, isort, and Jupyter PowerToys Work has begun on a pytest course Saying this in public to inspire me to finish it. No ETA yet Sad Python Girls Club podcast Michael: PyTorch M1 Mission Encodable PWAs and pyscript Michael's now released pyscript PWA YouTube video cal.com (open source calendly) Supabase (open source Firebase) Joke: Beginner problems

#284 Spicy git for Engineers

May 18, 2022 00:41:12 34.73 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Brian #1:distinctipy “distinctipy is a lightweight python package providing functions to generate colours that are visually distinct from one another.” Small, focused tool, but really cool. Say you need to plot a dynamic number of lines. Why not let distinctipy pick colors for you that will be distinct? Also can display the color swatches. Some example palettes here: https://github.com/alan-turing-institute/distinctipy/tree/main/examples from distinctipy import distinctipy # number of colours to generate N = 36 # generate N visually distinct colours colors = distinctipy.get_colors(N) # display the colours distinctipy.color_swatch(colors) Michael #2: Soda SQL Soda SQL is a free, open-source command-line tool. It utilizes user-defined input to prepare SQL queries that run tests on dataset in a data source to find invalid, missing, or unexpected data. Looks good for data pipelines and other CI/CD work! Daniel #3: Python in Nature There’s a review article from Sept 2020 on array programming with NumPy in the research journal Nature. For reference, in grad school we had a fancy paper on quantum entanglement that got rejected from Nature Communications, a sub-journal to Nature. Nature is hard to get into. List of authors includes Travis Oliphant who started NumPy. Covers NumPy as the foundation, building up to specialized libraries like QuTiP for quantum computing. If you search “Python” on their site, many papers come up. Interesting to see their take on publishing software work. Brian #4: Supercharging GitHub Actions with Job Summaries From a tweet by Simon Willison and an article: GH Actions job summaries Also, Ned Batchelder is using it for Coverage reports “You can now output and group custom Markdown content on the Actions run summary page.” “Custom Markdown content can be used for a variety of creative purposes, such as: Aggregating and displaying test results Generating reports Custom output independent of logs” Coverage.py example: - name: "Create summary" run: | echo '### Total coverage: ${{ env.total }}%' >> $GITHUB_STEP_SUMMARY echo '[${{ env.url }}](${{ env.url }})' >> $GITHUB_STEP_SUMMARY Michael #5:Language Summit is write up out via Itamar, by Alex Waygood Python without the GIL: A talk by Sam Gross Reaching a per-interpreter GIL: A talk by Eric Snow The "Faster CPython" project: 3.12 and beyond: A talk by Mark Shannon WebAssembly: Python in the browser and beyond: A talk by Christian Heimes F-strings in the grammar: A talk by Pablo Galindo Salgado Cinder Async Optimisations: A talk by Itamar Ostricher The issue and PR backlog: A talk by Irit Katriel The path forward for immortal objects: A talk by Eddie Elizondo and Eric Snow Lightning talks, featuring short presentations by Carl Meyer, Thomas Wouters, Kevin Modzelewski, Samuel Colvin and Larry Hastings Daniel #6:AllSpice is Git for EEs Software engineers have Git/SVN/Mercurial/etc None of the other engineering disciplines (mechanical, electrical, optical, etc), have it nearly as good. Altium has their Vault and “365,” but there’s nothing with a Git-like UX. Supports version history, diffs, all the things you expect. Even self-hosting and a Gov Cloud version. “Bring your workflow to the 21st century, finally.” Extras Brian: Will McGugan talks about Rich, Textual, and Textualize on Test & Code 188 Also 3 other episodes since last week. (I have a backlog I’m working through.) Michael: Power On-Xbox Documentary | Full Movie The 4 Reasons To Branch with Git - Illustrated Examples with Python A Python spotting - via Jason Pecor 2022 StackOverflow Developer Survey is live, via Brian TextSniper macOS App PandasTutor on webassembly Daniel: I know Adafruit’s a household name, shout-out to Sparkfun, Seeed Studio, OpenMV, and other companies in the field. Joke: A little awkward

#283 The sports episode

May 12, 2022 00:32:58 27.88 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored: RedHat: Compiler Podcast Special guest: Tonya Sims Michael #1: Pathy: a Path interface for local and cloud bucket storage via Spencer Pathy is a python package (with type annotations) for working with Cloud Bucket storage providers using a pathlib interface. It provides an easy-to-use API bundled with a CLI app for basic file operations between local files and remote buckets. It enables a smooth developer experience by letting developers work against the local file system during development and only switch over to live APIs for deployment. Also has optional local file caching. From Spenser The really cool function is "Pathy.fluid" which can take any type of local, GCS, or S3 path string and then just give you back a Path object that you can interact with agnostic of what platform it was. So this has worked amazingly for me in local testing since i can just change the file path from the "s3://bucket/path" that i use in prod to a local "test_dir/path" and it works automatically. Brian #2: Robyn “Robyn is a fast, high-performance Python web framework with a Rust runtime.” Hello, Robyn! - intro article docs, repo Neat things doesn’t need WSGI or ASGI async very Flask-like Early, so still needs some TLC docs, etc. getting started and demo apps would be good. Tonya #3: Python package 'nba_api' is a package to access data for NBA.com This package is maintained by Swar Patel API Client package for NBA.com, more accessible endpoints, and better documentation The NBA.com API's are not well documented and change frequently (player traded, injured, retired, points per game, stats, etc) The nba_api package has tons of features: The nba_api starts with static data on players and teams (Full name, team name, etc). Each player and Team has an id. Can get game data from the playergamelog API endpoint The package also has many different API endpoints that it can hit by passing in features from the static data to the API endpoints as parameters Michael #4: Termshot From Jay Miller Creates screenshots based on terminal command output Just run termshot YOUR_CMD or termshot --show-cmd -- python program.py Even termshot /bin/zsh for full interactive “recording” Example I made: Brian #5: When Python can’t thread: a deep-dive into the GIL’s impact Itamar Turner-Trauring Building a mental model of the GIL using profiler graphs of simple two thread applications. The graphs really help a lot to see when the CPU is active or waiting on each thread. Tonya #6: Sportsipy: A free sports API written for python Free python API that pulls the stats from www.sports-reference.com sports-reference.com - great website for getting sports stats for professional sports(NBA, NFL, NHL, MLB, college sports) Looks like an HTML website for the 90s - great for scraping (email site owners) You can get API queries for every sport (North American sports) like the list of teams for that sport, the date and time of a game, the total number of wins for a team during the season, and many more team-related metrics. You can also get stats from players and box scores - so you can build cool stuff around how a team performed during a game or during a season. Extras Michael: Python 3.11.0 beta 1 is out Test with GitHub Actions against Python 3.11 Joke:Finding my family

#282 Don't Embarrass Me in Front of The Wizards

May 03, 2022 00:28:32 24.1 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Brian #1: pyscript Python in the browser, from Anaconda. repo here Announced at PyConUS “During a keynote speech at PyCon US 2022, Anaconda’s CEO Peter Wang unveiled quite a surprising project — PyScript. It is a JavaScript framework that allows users to create Python applications in the browser using a mix of Python and standard HTML. The project’s ultimate goal is to allow a much wider audience (for example, front-end developers) to benefit from the power of Python and its various libraries (statistical, ML/DL, etc.).” from a nice article on it, PyScript — unleash the power of Python in your browser PyScript is built on Pyodide, which is a port of CPython based on WebAssembly. Demos are cool. Note included in README: “This is an extremely experimental project, so expect things to break!” Michael #2: Memray from Bloomberg Memray is a memory profiler for Python. It can track memory allocations in Python code native extension modules the Python interpreter itself Works both via CLI and focused app calls Memray can help with the following problems: Analyze allocations in applications to help discover the cause of high memory usage. Find memory leaks. Find hotspots in code which cause a lot of allocations. Notable features: 🕵️‍♀️ Traces every function call so it can accurately represent the call stack, unlike sampling profilers. ℭ Also handles native calls in C/C++ libraries so the entire call stack is present in the results. 🏎 Blazing fast! Profiling causes minimal slowdown in the application. Tracking native code is somewhat slower, but this can be enabled or disabled on demand. 📈 It can generate various reports about the collected memory usage data, like flame graphs. 🧵 Works with Python threads. 👽🧵 Works with native-threads (e.g. C++ threads in native extensions) Has a live view in the terminal. Linux only Brian #3: pytest-parallel I’ve often sped up tests that can be run in parallel by using -n from pytest-xdist. I was recommending this to someone on Twitter, and Bruno Oliviera suggested a couple of alternatives. One was pytest-parallel, so I gave it a try. pytest-xdist runs using multiprocessing pytest-parallel uses both multiprocessing and multithreading. This is especially useful for test suites containing threadsafe tests. That is, mostly, pure software tests. Lots of unit tests are like this. System tests are often not. Use --workers flag for multiple processors, --workers auto works great. Use --tests-per-worker for multi-threading. --tesst-per-worker auto let’s it pick. Very cool alternative to xdist. - Michael #4: Pooch: A friend for data files via via Matthew Fieckert Just want to download a file without messing with requests and urllib? Who is it for? Scientists/researchers/developers looking to simply download a file. Pooch makes it easy to download a file (one function call). On top of that, it also comes with some bonus features: Download and cache your data files locally (so it’s only downloaded once). Make sure everyone running the code has the same version of the data files by verifying cryptographic hashes. Multiple download protocols HTTP/FTP/SFTP and basic authentication. Download from Digital Object Identifiers (DOIs) issued by repositories like figshare and Zenodo. Built-in utilities to unzip/decompress files upon download file_path = pooch.retrieve(url) Extras Michael: New course! Up and Running with Git - A Pragmatic, UI-based Introduction. Joke: Don’t embarrass me in front of the wizards Michael’s crashing github is embarrassing him in front of the wizards!

#281 ohmyzsh + ohmyposh + mcfly + pls + nerdfonts = wow

April 28, 2022 00:46:34 39.65 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored: RedHat: Compiler Podcast Special guest: Anna Astori Michael #1: Take Your Github Repository To The Next Level 🚀️ Step 0. Make Your Project More Discoverable Step 1. Choose A Name That Sticks Step 2. Display A Beautiful Cover Image Step 3. Add Badges To Convey More Information Step 4. Write A Convincing Description Step 5. Record Visuals To Attract Users 👀 Step 6. Create A Detailed Installation Guide (if needed) Step 7. Create A Practical Usage Guide 🏁 Step 8. Answer Common Questions Step 9. Build A Supportive Community Step 10. Create Contribution Guidelines Step 11. Choose The Right License Step 12. Plan Your Future Roadmap Step 13. Create Github Releases (know release drafter) Step 14. Customize Your Social Media Preview Step 15. Launch A Website Brian #2: Fastero “Python timeit CLI for the 21st century.” Arian Mollik Wasi, @wasi_master Colorful and very usable benchmarking/comparison tool Time or Compare one ore more code snippet python file mix and match, even Allows setup code before snippets run Multiple output export formats: markdown, html, csv, json, images, … Lots of customization possible Takeaway especially for comparing two+ options, this is super handy Anna #3: langid vs langdetect langdetect This library is a direct port of Google's language-detection library from Java to Python langdetect supports 55 languages out of the box (ISO 639-1 codes): Basic usage: detect() and detect_langs() great to work with noisy data like social media and web blogs being statistical, works better on larger pieces of text vs short posts langid hasn't been updated for a few years 97 languages can use Python's built-in wsgiref.simple_server (or fapws3 if available) to provide language identification as a web service. To do this, launch python langid.py -s, and access http://localhost:9008/detect . The web service supports GET, POST and PUT. the actual calculations are implemented in the log-probability space but can also have a "confidence" score for the probability prediction between 0 and 1: > from langid.langid import LanguageIdentifier, model > identifier = LanguageIdentifier.from_modelstring(model, norm_probs=True) > identifier.classify("This is a test") > ('en', 0.9999999909903544) - minimal dependencies - relatively fast - NB algo, can train on user data. Michael #4: Watchfiles by Samual Colvin (of Pydantic fame) Simple, modern and high performance file watching and code reload in python. Underlying file system notifications are handled by the Notify rust library. Supports sync watching but also async watching CLI example Running and restarting a command¶ Let's say you want to re-run failing tests whenever files change. You could do this with watchfiles using Running a command: watchfiles 'pytest --lf``' Brian #5: Slipcover: Near Zero-Overhead Python Code Coverage From coverage.py twitter account, which I’m pretty sure is Ned Bachelder coverage numbers with “3% or less overhead” Early stages of the project. It does seem pretty zippy though. Mixed results when trying it out with a couple different projects flask: just pytest: 2.70s with slipcover: 2.88s with coverage.py: 4.36s flask with xdist n=4 pytest: 2.11 s coverage: 2.60s slipcover: doesn’t run (seems to load pytest plugins) Again, still worth looking at and watching. It’s good to see some innovation in the coverage space aside from Ned’s work. Anna #6: scrapy vs robox scra-py shell to try out things: fetch url, view response object, response.text extract using css selectors or xpath lets you navigate between levels e.g. the parent of an element with id X crawler to crawl websites and spider to extract data startproject for project structure and templates like settings and pipelines some advanced features like specifying user-agents etc for large scale scraping. various options to export and store the data nice features like LinkExtractor to determine specific links to extract, already deduped. FormRequest class robox layer on top of httpx and beautifulsoup4 allows to interact with forms on pages: check, choose, submit Extras Michael: ohmyzsh + ohmyposh + mcfly + iterm2 + pls + nerdfonts = wow Watch the video we discussed here Joke: Out for a byte

#280 Easy terminal scripts by sourcing your Py

April 21, 2022 00:37:36 32.39 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by Mergify! Special guest: Pat Decker Michael #0: New live stream / recording time: 12pm US PT on Tuesdays. Please subscribe to our YouTube channel to get notified and be part of the episodes. Brian #1: BTW, don’t make a public repo private How we lost 54k GitHub stars Jakub Roztočil HTTPie kinda sorta accidentally flipped their main repo to private for a sec. And dropped the star count from 54k to 0 oops They’re back up to 16k, as of today. But ouch. “HTTPie is a command-line HTTP client. Its goal is to make CLI interaction with web services as human-friendly as possible. HTTPie is designed for testing, debugging, and generally interacting with APIs & HTTP servers. The http & https commands allow for creating and sending arbitrary HTTP requests. They use simple and natural syntax and provide formatted and colorized output.” Actually, pretty cool tool to use for developing and testing APIs. Michael #2: The counter-intuitive rise of Python in scientific computing via Galen Swint In our laboratory, a polarizing debate rages since around 2010, summarized by this question: Why are more and more time-critical scientific computations formerly performed in Fortran now written in Python, a slower language? Python has the reputation of being slow, i.e. significantly slower than compiled languages such as Fortran, C or Rust. So yes, plain Python is much slower than Fortran. However, this comparison makes little sense, as scientific uses of Python do not rely on plain Python. Used the right way, Python is slightly slower than compiled code. Pat #3: Meta donates $300,000 to PSF to add a second year for the Developer in Residence Brian #4: Dashboards in Python Two suggestions from Marc Skov Madsen The Easiest Way to Create an Interactive Dashboard in Python Sophia Yang & Mark Skov Madsen Includes animated gif showing the dashboard video of Sophia walking through the article in under 6 minutes “Turn Pandas pipelines into a dashboard using hvPlot .interactive" hvPlot is part of HoloViz and this example is pretty short and amazing to get a great dashboard with controls up very quickly. Python Dashboarding Shootout and Showdown | PyData Global 2021 5 speakers, 4 dashboard libraries, nice for comparison. Nice clickable index posted by Duy Nguyen 00:00 - Begin and Welcome 03:15 - Intro to the 4 Dashboarding libraries 07:04 - Plotly - Nicolas Kruchten 22:01 - Panel - Marc Skov Madsen 37:38 - voila - Sylvain Corlay 51:36 - Streamlit - Adrien Treuille 01:10:52 - Discussion Topics Michael #5: sourcepy by Dave Chevell Sourcepy lets you source python scripts natively inside your shell Imagine a Python script with functions in it. This converts those to CLI commands (kind of like entrypoints, but simpler) Type hints can be used to coerce input values into their corresponding types. standard IO type hints can be used to target stdin at different arguments and to receive the sys.stdin Sourcepy has full support for asyncio syntax Pat #6: Xonsh Xonsh Shell Combines the Best of Bash Shell and Python in Linux Terminal Awesome demo video (50 min) https://youtu.be/x85LSyCxiw8 Extras Pat: Donate to the PSF by using https://rewards.microsoft.com Joke: Can you really quit vim? Joke: Forgetting how to count

#279 Autocorrect and other Git Tricks

April 15, 2022 00:41:52 37.18 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by Datadog: pythonbytes.fm/datadog Special guest: Brian Skinn (Twitter | Github) Michael #1: OpenBB wants to be an open source challenger to Bloomberg Terminal OpenBB Terminal provides a modern Python-based integrated environment for investment research, that allows an average joe retail trader to leverage state-of-the-art Data Science and Machine Learning technologies. As a modern Python-based environment, OpenBBTerminal opens access to numerous Python data libraries in Data Science (Pandas, Numpy, Scipy, Jupyter) Machine Learning (Pytorch, Tensorflow, Sklearn, Flair) Data Acquisition (Beautiful Soup, and numerous third-party APIs) They have a discord community too BTW, seem to be a successful open source project: OpenBB Raises $8.5M in Seed Round Funding Following Open Source Project Gamestonk Terminal's Success Great graphics / gallery here. Way more affordable than the $1,900/mo/user for the Bloomberg Terminal Brian #2: Python f-strings https://fstring.help Florian Bruhin Quick overview of cool features of f-strings, made with Jupyter Python f-strings Are More Powerful Than You Might Think Martin Heinz More verbose discussion of f-strings Both are great to up your string formatting game. Brian S. #3: pyproject.toml and PEP 621 Support in setuptools PEP 621: “Storing project metadata in pyproject.toml” Authors: Brett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung (Jun-Oct 2020) Covers build-tool-independent fields (name, version, description, readme, authors, etc.) Various tools had already implemented pyproject.toml support, but not setuptools Including: Flit, Hatch, PDM, Trampolim, and Whey (h/t: Scikit-HEP) Not Poetry yet, though it's under discussion setuptools support had been discussed pretty extensively, and had been included on the PSF’s list of fundable packaging improvements Initial experimental implementation spearheaded by Anderson Bravalheri, recently completed Seeking testing and bug reports from the community (Discuss thread) I tried it on one of my projects — it mostly worked, but revealed a bug that Anderson fixed super-quick (proper handling of a dynamic long_description, defined in setup.py) Related tools (all early-stage/experimental AFAIK) ini2toml (Anderson Bravalheri) — Can convert setup.cfg (which is in INI format) to pyproject.toml Mostly worked well for me, though I had to manually fix a couple things, most of which were due to limitations of the INI format INI has no list syntax! validate-pyproject (Anderson Bravalheri) — Automated pyproject.toml checks pyproject-fmt (Bernát Gábor) — Autoformatter for pyproject.toml Don’t forget to use it with build, instead of via a python setup.py invocation! $ pip install build $ python -m build Will also want to constrain your setuptools version in the build-backend.requires key of pyproject.toml (you are using PEP517/518, right??) Michael #4: JSON Web Tokens @ jwt.io JSON Web Tokens are an open, industry standard RFC 7519 method for representing claims securely between two parties. Basically a visualizer and debugger for JWTs Enter an encoded token Select a decryption algorithm See the payload data verify the signature List of libraries, grouped by language Brian #5: Autocorrect and other Git Tricks - Waylon Walker - Use `git config --global help.autocorrect 10` to have git automatically run the command you meant in 1 second. The `10` is 10 x 1/10 of a second. So `50` for 5 seconds, etc. Automatically set upstream branch if it’s not there git config --global push.default current You may NOT want to do this if you are not careful with your branches. From https://stackoverflow.com/a/22933955 git commit -a Automatically “add” all changed and deleted files, but not untracked files. From https://git-scm.com/docs/git-commit#Documentation/git-commit.txt--a Now most of my interactions with git CLI, especially for quick changes, is: $ git checkout main $ git pull $ git checkout -b okken_something $ git commit -a -m 'quick message' $ git push With these working, with autocorrect $ git chkout main $ git pll $ git comit -a -m 'quick message' $ git psh Brian S. #6: jupyter-tempvars Jupyter notebooks are great, and the global namespace of the Python kernel backend makes it super easy to flow analysis from one cell to another BUT, that global namespace also makes it super easy to footgun, when variables leak into/out of a cell when you don’t want them to jupyter-tempvars notebook extension Built on top of the tempvars library, which defines a TempVars context manager for handling temporary variables When you create a TempVars context manager, you provide it patterns for variable names to treat as temporary In its simplest form, TempVars (1) clears matching variables from the namespace on entering the context, and then (2) clears them again upon exiting the context, and restoring their prior values, if any TempVars works great, but it’s cumbersome and distracting to manually include it in every notebook cell where it’s needed With jupyter-tempvars, you instead apply tags with a specific format to notebook cells, and the extension automatically wraps each cell’s code in a TempVars context before execution Javascript adapted from existing extensions Patching CodeCell.execute, from the jupyter_contrib_nbextensions ‘Execution Dependencies’ extension, to enclose the cell code with the context manager Listening for the ‘kernel ready’ event, from [jupyter-black](https://github.com/drillan/jupyter-black/blob/d197945508a9d2879f2e2cc99cafe0cedf034cf2/kernel_exec_on_cell.js#L347-L350), to import the [TempVars](https://github.com/bskinn/jupyter-tempvars/blob/491babaca4f48c8d453ce4598ac12aa6c5323181/src/jupyter_tempvars/extension/jupyter_tempvars.js#L42-L46) context manager upon kernel (re)start See the README (with animated GIFs!) for installation and usage instructions It’s on PyPI: $ pip install jupyter-tempvars And, I made a shortcut install script for it: $ jupyter-tempvars install && jupyter-tempvars enable Please try it out, find/report bugs, and suggest features! Future work Publish to conda-forge (definitely) Adapt to JupyterLab, VS Code, etc. (pending interest) Extras Brian: Ok. Python issues are now on GitHub. Seriously. See for yourself. Lorem Ipsum is more interesting than I realized. O RLY Cover Generator Example: Michael: New course: Secure APIs with FastAPI and the Microsoft Identity Platform Pyenv Virtualenv for Windows (Sorta'ish) Hipster Ipsum Brian S.: PSF staff is expanding PSF hiring an Infrastructure Engineer Link now 404s, perhaps they’ve made their hire? Last year’s hire of the Packaging Project Manager (Shamika Mohanan) Steering Council supports PSF hiring a second developer-in-residence PSF has chosen its new Executive Director: Deb Nicholson! PyOhio 2022 Call for Proposals is open Teaser tweet for performance improvements to pydantic Jokes: https://twitter.com/CaNerdIan/status/1512628780212396036 https://www.reddit.com/r/ProgrammerHumor/comments/tuh06y/i_guess_we_all_have_been_there/ https://twitter.com/PR0GRAMMERHUM0R/status/1507613349625966599

#278 Multi-tenant Python applications

April 08, 2022 00:33:34 28.32 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by: Microsoft for Startups Founders Hub. Special guest: Vuyisile Ndlovu Brian #1: dunk - a prettier git diff Darren Burns Uses Rich “⚠️ This project is very early stages” - whatever, I like it. Recommendation is to use less as a pager for it git diff | dunk | less -R Michael #2: Is your Python code vulnerable to log injection? via Adam Parkin Let’s just appreciate log4jmemes.com for a moment Ok, now we can talk about Python We can freak our the logging with line injection "hello'.\nINFO:__main__:user 'alice' commented: 'I like pineapple pizza" Results in two lines for one statement INFO:__main__:user 'bob' commented: 'hello'. INFO:__main__:user 'alice' commented: 'I like pineapple pizza'. The safest solution is to simply not log untrusted text. If you need to store it for an audit trail, use a database. Alternatively, structured logging can prevent newline-based attacks. Padding a ton? One such case is abusing padding syntax. Consider this message: *"%(user)999999999s"* This will pad the user with almost a gigabyte of whitespace. Mitigation: To eliminate these risks, you should always let logging handle string formatting. See this discussion: Safer logging methods for f-strings and new-style formatting Vuyisile #3: Building multi tenant applications with Django Free book by Agiliq, covers different approaches to building Software as a service applications in Python/Django. Covers four approaches to multi tenancy, namely: Shared database with shared schema Shared database with isolated schema Isolated database with a shared app server Completely isolated tenants using Docker Brian #4: Should you pre-allocate lists in Python? Redowan Delowar Discussion of 3 ways to build up a list Start empty and append: l=[]; l.append(1); … Pre-allocate: l = [None] * 10_000; … List comprehension: l = [i for i in range(10_000)] Interesting discussion and results The times (filling the list with the index): append: 499 µs ± 1.23 µs pre-allocate: 321 µs ± 71.1 comprehension: 225 µs ± 711 Python lists dynamically allocate extra memory when they run out, and it’s pretty fast at doing this. Pre-allocation can save a little time. Conclusion: use comprehensions when you can, otherwise, don’t sweat it unless you really need to shave off as much time as possible Of note: this was just measuring time, no discussion of memory usage. Michael #5: mockaroo and tonic Do you need to generate fake data? Mockaroo let’s you generate realistic data based data types (car registrations, credit cards, dates, etc) Tonic takes your actual production data and reworks it into test data (possibly striping out PII) Vuyisile #6: Brachiograph —the cheapest, simplest possible Python powered pen plotter by Daniele Procida Low tech Raspberry Pi project that can be built for < $50 using common household objects like a clothes peg ice cream stick Extras Brian: April 8 new date for Python Issues migrating to GH Michael: ngrok has a detailed web explorer Vuyisile: Thunder Client : VS Code extension, Lightweight client for testing REST APIs Postman alternative Joke: Linux world in tatters Related: Origin of the joke - Lapsus$ claims to leak 90% of Microsoft Bing's source code

#277 It's a Python package showdown!

April 02, 2022 00:45:01 37.94 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by: Microsoft for Startups Founders Hub. Special guest: Thomas Gaigher, creator/maintainer pypyr taskrunner Michael #1: March Package Madness via Chris May Start with 16 packages They battle it out 2-on-2 in elimination rounds Voting is once a week So go vote! Brian #2: nbpreview “A terminal viewer for Jupyter notebooks. It’s like cat for ipynb files.” Some cool features pretty colors by default piping strips formatting, so you can pass it to grep or other post processing automatic paging syntax highlighting line numbers and wrapping work nicely markdown rendering images converted to block, character, or dots (braille) dataframe rendering clickable links Thomas #3: pyfakefs A fake file system! It intercepts all calls that involve the filesystem in Python - e.g open(), shutil, or pathlib.Path. This is completely transparent - your functional code does not know or need to know that under the hood it's been disconnected from the actual filesystem. The nice thing about this is that you don't have to go patching open using mock_open - which works fine, but gets annoying quickly for more complex test scenarios. E.g Doing a mkdir -p before a file write to ensure parent dirs exist. What it looks like without a fake filesystem: in_bytes = b"""[table] foo = "bar" # String """ # read with patch('pypyr.toml.open', mock_open(read_data=in_bytes)) as mocked_open: payload = toml.read_file('arb/path.in') # write with io.BytesIO() as out_bytes: with patch('pypyr.toml.open', mock_open()) as mock_output: mock_output.return_value.write.side_effect = out_bytes.write toml.write_file('arb/out.toml', payload) out_str = out_bytes.getvalue().decode() mock_output.assert_called_once_with('arb/out.toml', 'wb') assert out_str == """[table] foo = "bar" """ If you've ever tried to patch/mock out pathlib, you'll know the pain! Also, no more annoying test clean-up routines or tempfile - as soon as the fake filesystem goes out of scope, it's gone, no clean-up required. Not a flash in the pan - long history: originally developed by Mike Bland at Google back in 2006. Open sourced in 2011 on Google Code. Moved to Github and nowadays maintained by John McGehee. This has been especially useful for pypyr, because as a task-runner or automation tool pypyr deals with wrangling config files on disk a LOT (reading, generating, editing, token replacing, globs, different encodings), so this makes testing so much easier. Especially to keep on hitting the 100% test coverage bar! Works great with pytest with the provided fs fixture. Just add the fs fixture to a test, and all code under test will use the fake filesystem. Dynamically switch between Linux, MacOs & Windows filesystems. Set up paths/files in your fake filesystem as part of test setup with some neat helper functions. Very responsive maintainers - I had a PR merged in less than half a day. Shoutout to mrbean-bremen. Docs here: http://jmcgeheeiv.github.io/pyfakefs/release/ Github here: https://github.com/jmcgeheeiv/pyfakefs Real world example: @patch('pypyr.config.config.default_encoding', new='utf-16') def test_json_pass_with_encoding(fs): """Relative path to json should succeed with encoding.""" # arrange in_path = './tests/testfiles/test.json' fs.create_file(in_path, contents="""{ "key1": "value1", "key2": "value2", "key3": "value3" } """, encoding='utf-16') # act context = pypyr.parser.jsonfile.get_parsed_context([in_path]) # assert assert context == { "key1": "value1", "key2": "value2", "key3": "value3" } def test_json_parse_not_mapping_at_root(fs): """Not mapping at root level raises.""" # arrange in_path = './tests/testfiles/singleliteral.json' fs.create_file(in_path, contents='123') # act with pytest.raises(TypeError) as err_info: pypyr.parser.jsonfile.get_parsed_context([in_path]) # assert assert str(err_info.value) == ( "json input should describe an object at the top " "level. You should have something like\n" "{\n\"key1\":\"value1\",\n\"key2\":\"value2\"\n}\n" "at the json top-level, not an [array] or literal.") Michael #4: strenum A Python Enum that inherits from str. To complement enum.IntEnum in the standard library. Supports python 3.6+. Example usage: class HttpMethod(StrEnum): GET = auto() POST = auto() PUT = auto() DELETE = auto() assert HttpMethod.GET == "GET" Use wherever you can use strings, basically: ## You can use StrEnum values just like strings: import urllib.request req = urllib.request.Request('https://www.python.org/', method=HttpMethod.HEAD) with urllib.request.urlopen(req) as response: html = response.read() Can auto-translate casing with LowercaseStrEnum and UppercaseStrEnum. Brian #5: Code Review Guidelines for Data Science Teams Tim Hopper Great guidelines for any team What is code review for? correctness, familiarity, design feedback, mutual learning, regression protection NOT opportunities for reviewer to impose their idiosyncrasies dev to push correctness responsibility to reviewers demands for perfection Opening a PR informative commit messages consider change in context of project keep them short write a description that helps reviewer include tests with new code Reviewing Wait for CI before starting I would also add “wait at least 10 min or so, requester might be adding comments” Stay positive, constructive, helpful Clarify when a comment is minor or not essential for merging, preface with “nit:” for example If a PR is too large, ask for it to be broken into smaller ones What to look for does it look like it works is new code in the right place unnecessary complexity tests Thomas #6: Shell Power is so over. Leave the turtles in the late 80ies. Partly inspired by/continuation of last week’s episode’s mention of running subprocesses from Python. Article by Itamar Turner-Trauring Please Stop Writing Shell Scripts https://pythonspeed.com/articles/shell-scripts/ Aims mostly at bash, but I'll happily include bourne, zsh etc. under the same dictum If nothing else, solid listing of common pitfalls/gotchas with bash and their remedies, which is educational enough in and of itself already. TLDR; Error handling in shell is hard, but also surprising if you're not particularly steeped in the ways of the shell. Error resumes next, unset vars don't raise errors, piping & sub shells errs thrown away If you really-eally HAVE to shell, you prob want this boilerplate on top (aka unofficial bash strict mode: #!/bin/bash set -euo pipefail IFS=$'\n\t' This will, -e: fail immediately on error -u: fail on Unset vars -o pipefail: raise immediately when piping IFS: set Internal Field Separator to newline | tab, rather than space | newline | tab. Prevents surprises when iterating over strings with spaces in them Itamar lists common counter-arguments from shell script die-hards: It's always there! But so is the runtime of whatever you're actually coding in, and in the case of a build CI server. . .almost by definition. Git gud! (I'm paraphrasing) Shell-check (linting for bash, basically) The article is short & sweet - mercifully so in these days of padded content. The rest is going to be me musing out loud, so don't blame the OG author. So expanding on this, I think there're a couple of things going on here: If anything, the author is going a bit soft on your average shell script. If you’re just calling a couple of commands in a row, okay, fine. But the moment you start worrying about retrying on failure, parsing some values into or out of some json, conditional branching - which, if you are writing any sort of automation script that interacts with other systems, you WILL be doing - shell scripts are an unproductive malarial nightmare. Much the same point applies to Makefile. It’s an amazing tool, but it’s also misused for things it was never really meant to do. You end up with Makefiles that call shell scripts that call Makefiles. . . Given that coding involves automating stuff, amazingly often the actual automation of the development process itself is deprioritized & unbudgeted. Sort of like the shoemaker's kid not having shoes. Partly because when management has to choose between shiny new features and automation, shiny new features win every time. Partly because techies will just "quickly" do a thing in shell to solve the immediate problem… Which then becomes part of the firmament like a dead dinosaur that fossilises and more and more inscrutable layers accrete on top of the original "simple" script. Partly because coders would rather get on with clever but marginal micro-optimisations and arguing over important stuff like spaces vs tabs, rather than do the drudge work of automating the development/deployment workflow. There's the glimmering of a point in there somewhere: when you have to choose between shiny new features & more backoffice automation, shiny new features probably win. Your competitiveness in the marketplace might well depend on this. BUT, we shouldn’t allow the false idea that shell scripts are "quicker" or "lighter touch" to sneak in there alongside the brutal commercial reality of trade-offs on available budget & time. If you have to automate quickly, it's more sensible to use a task-runner or just your actual programming language. If you're in python already, you're in luck, python's GREAT for this. Don’t confuse excellent cli programs like git , curl , awscli, sed or awk with a shell script. These are executables, you don’t need the shell to invoke these. Aside from these empirical factors, a couple of psychological factors also. Dealing with hairy shell scripts is almost a Technocratic rite of passage - coupled with imposter syndrome, it's easy to be intimidated by the Shell Bros who're steeped in the arcana of bash. It's the tech equivalent of "back in my day, we didn't even have <<>>", as if this is a justification for things being more difficult than they need to be ever thereafter. This isn't Elden Ring, the extra difficulty doesn't make it more fun. You're trying to get business critical work done, reliably & quickly, so you can get on with those new shiny features that actually pay the bills. Extras Michael: A changing of the guard Firefox → Vivaldi (here’s a little more info on the state of Firefox/Mozilla financially) (threat team is particularly troubling) Google email/drive/etc → Zoho @gmail.com to @customdomain.com Google search → DuckDuckGo BTW Calendar apps/integrations and email clients are trouble Joke: A missed opportunity - and cybersecurity

#276 Tracking cyber intruders with Jupyter and Python

March 23, 2022 00:45:04 39.44 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by FusionAuth: pythonbytes.fm/fusionauth Special guest: Ian Hellen Brian #1: gensim.parsing.preprocessing Problem I’m working on Turn a blog title into a possible url example: “Twisted and Testing Event Driven / Asynchronous Applications - Glyph” would like, perhaps: “twisted-testing-event-driven-asynchrounous-applications” Sub-problem: remove stop words ← this is the hard part I started with an article called Removing Stop Words from Strings in Python It covered how to do this with NLTK, Gensim, and SpaCy I was most successful with remove_stopwords() from Gensim from gensim.parsing.preprocessing import remove_stopwords It’s part of a gensim.parsing.preprocessing package I wonder what’s all in there? a treasure trove gensim.parsing.preprocessing.preprocess_string is one this function applies filters to a string, with the defaults almost being just what I want: strip_tags() strip_punctuation() strip_multiple_whitespaces() strip_numeric() remove_stopwords() strip_short() stem_text() ← I think I want everything except this this one turns “Twisted” into “Twist”, not good. There’s lots of other text processing goodies in there also. Oh, yeah, and Gensim is also cool. topic modeling for training semantic NLP models So, I think I found a really big hammer for my little problem. But I’m good with that Michael #2: DevDocs via Loic Thomson Gather and search a bunch of technology docs together at once For example: Python + Flask + JavaScript + Vue + CSS Has an offline mode for laptops / tablets Installs as a PWA (sadly not on Firefox) Ian #3: MSTICPy MSTICPy is toolset for CyberSecurity investigations and hunting in Jupyter notebooks. What is CyberSec hunting/investigating? - responding to security alerts and threat intelligence reports, trawling through security logs from cloud services and hosts to determine if it’s a real threat or not. Why Jupyter notebooks? SOC (Security Ops Center) tools can be excellent but all have limitations You can get data from anywhere Use custom analysis and visualizations Control the workflow…. workflow is repeatable Open source pkg - created originally to support MS Sentinel Notebooks but now supports lots of providers. When I start this 3+ yrs ago I thought a lot this would be in PyPI - but no 😞 MSTICPy has 4 main functional areas: Data querying - import log data (Sentinel, Splunk, MS Defender, others…working on Elastic Search) Enrichment - is this IP Address or domain known to be malicious? Analysis - extract more info from data, identify anomalies (simple example - spike in logon failures) Visualization - more specialized than traditional graphs - timelines, process trees. All components use pandas, Bokeh for visualizations Current focus on usability, discovery of functionality and being able to chain Always looking for collaborators and contributors - code, docs, queries, critiques https://github.com/microsoft/msticpy https://msticpy.readthedocs.io/ Brian #4: The Right Way To Compare Floats in Python David Amos Definitely an easier read than the classic What Every Computer Scientist Should Know About Floating-Point Arithmetic What many of us remember floating point numbers aren’t exact due to representation limitations and rounding error, errors can accumulate comparison is tricky Be careful when comparing floating point numbers, even simple comparisons, like: >>> 0.1 + 0.2 == 0.3 False >>> 0.1 + 0.2 <= 0.3 False David has a short but nice introduction to the problems of representation and rounding. Three reasons for rounding more significant digits than floating point allows irrational numbers rational but non-terminating So how do you compare: math.isclose() be aware of rel_tol and abs_tol and when to use each. numpy.allclose(), returns a boolean comparing two arrays numpy.isclose(), returns an array of booleans pytest.approx(), used a bit differently 0.1 + 0.2 == pytest.approx(0.3) Also allows rel and abs comparisons Discussion of Decimal and Fraction types And the memory and speed hit you take on when using them. Michael #5: Pypyr Task runner for automation pipelines For when your shell scripts get out of hand. Less tricky than makefile. Script sequential task workflow steps in yaml Conditional execution, loops, error handling & retries Have a look at the getting started. Ian #6: Pygments Python package that’s useful for anyone who wants to display code Jupyter notebook Markdown and GitHub markdown let you display code with syntax highlighting. (Jupyter uses Pygments behind the scenes to do this.) There are tools that convert code to image format (PNG, JPG, etc) but you lose the ability to copy/paste the code Pygments can intelligently render syntax-highlighted code to HTML (and other formats) Applications: Documentation (used by Sphinx/ReadtheDocs) - render code to HTML + CSS Displaying code snippets dynamically in readable form Lots (maybe 100s) of code lexers - Python (code, traceback), Bash, C, JS, CSS, HTML, also config and data formats like TOML, JSON, XML Easy to use - 3 lines of code - example: from IPython.display import display, HTML from pygments import highlight from pygments.lexers import PythonLexer from pygments.formatters import HtmlFormatter code = """ def print_hello(who="World"): message = f"Hello {who}" print(message) """ display(HTML( highlight(code, PythonLexer(), HtmlFormatter(full=True, nobackground=True)) )) # use HtmlFormatter(style="stata-dark", full=True, nobackground=True) # for dark themes Output to HTML, Latex, image formats. We use it in MSTICPy for displaying scripts used in attacks. Example: Extras Brian: smart-open one of the 3 Gensim dependencies It’s for streaming large files, from really anywhere, and looks just like Python’s open(). Michael: Python 3.10.3 is out. git fixup (follow up from last week, via Adam Parkin) Joke: What’s your secret?

#275 Airspeed velocity of an unladen astropy

March 16, 2022 00:42:43 36.89 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by Microsoft for Startups Founders Hub. Special guest: Emily Morehouse-Valcarcel Michael #1: Async and await with subprocesses by Fredrik Averpil People know I do all sorts of stuff with async Lots of cool async things are not necessarily built into Python, but our instead third-party packages E.g. files via aiofiles But asyncio has asyncio.create_subprocess_exec Fredrik’s article has some nice examples I started using this for mp3 uploads and behind the scenes processing for us Brian #2: Typesplainer Arian Mollik Wasi, @wasi_master Suggested by Will McGugan Now released a vscode extension for that! Available on vscode as typesplainer Emily #3: Ibis Project via Marlene Mhangami “Productivity-centric Python data analysis framework for SQL engines and Hadoop” focused on: Type safety Expressiveness Composability Familiarity Marlene wrote an excellent blog post as an introduction Works with tons of different backends, either directly or via compilation Depending on the backend, it actually uses SQLAlchemy under the hood There’s a ton of options for interacting with a SQL database from Python, but Ibis has some interesting features geared towards performance and analyzing large sets of data. It’s a great tool for simple projects, but an excellent tool for anything data science related since it plays so nicely with things like pandas Michael #4: ASV via Will McGugan AirSpeed Velocity (asv) is a tool for benchmarking Python packages over their lifetime. Runtime, memory consumption and even custom-computed values may be tracked. See quickstart Example of astropy here. Finding a commit that produces a large regression Brian #5: perflint Anthony Shaw pylint extension for performance anti patterns curious why a pylint extension and not a flake8 plugin. I think the normal advice of “beware premature optimization” is good advice. But also, having a linter show you some code habits you may have that just slow things down is a nice learning tool. Many of these items are also not going to be the big show stopper performance problems, but they add unnecessary performance hits. To use this, you also have to use pylint, and that can be a bit painful to start up with, as it’s pretty picky. Tried it on a tutorial project today, and it complained about any variable, or parameter under 3 characters. Seems a bit picky to me for tutorials, but probably good advice for production code. These are all configurable though, so you can dial back the strictness if necessary. perflint checks: W8101 : Unnessecary list() on already iterable type W8102: Incorrect iterator method for dictionary W8201: Loop invariant statement (loop-invariant-statement) ←- very cool W8202: Global name usage in a loop (loop-invariant-global-usage) R8203 : Try..except blocks have a significant overhead. Avoid using them inside a loop (loop-try-except-usage). W8204 : Looped slicing of bytes objects is inefficient. Use a memoryview() instead (memoryview-over-bytes) W8205 : Importing the "%s" name directly is more efficient in this loop. (dotted-import-in-loop) Emily #6: PEP 594 Acceptance “Removing dead batteries from the standard library” Written by Christian Heimes and Brett Cannon back in 2019, though the conversation goes back further than that It’s a very thin line for modules that might still be useful to someone versus the development effort needed to maintain them. Recently accepted, targeting Python 3.11 (final release planned for October 2022, development begins in May 2021. See the full release schedule) Deprecations will begin in 3.11 and modules won’t be fully removed until 3.13 (~October 2024) See the full list of deprecated modules Bonus: new PEP site and theme! Extras Brian: Michael: Emily: Riff off of one of Brian’s topics from last week: Automate your interactive rebases with fixups and auto-squashing Cool award that The PSF just received PSF Spring Fundraiser Cuttlesoft is hiring! Jokes: *Changing * (via Ruslan) Please hire me

#274 12 Questions You Should Be Asking of Your Dependencies

March 09, 2022 00:39:54 33.64 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by Microsoft for Startups Founders Hub. Special guest: Anne Barela Brian #1: The Adam Test : 12 Questions for New Dependencies Found through a discussion with Ryan Cheley, who will be on an upcoming episode of Test & Code, talking about Managing Software Teams. The Joel Test dates back to 2000, and some of it is a bit dated. I should probably do a Test & Code episode or pythontest article on my opinions of this at some point. Nice shameless plugs, don’t you think? The Joel Test is 12 questions and is a “highly irresponsible, sloppy test to rate the quality of a software team.” “The Adam Test” is 12 questions “to decide whether a new package we’re considering depending on is well-maintained.” He’s calling it “The Well-Maintained Test”, but I like “The Adam Test” Here’s the test: Is it described as “production ready”? Is there sufficient documentation? Is there a changelog? Is someone responding to bug reports? Are there sufficient tests? Are the tests running with the latest language version? like Python 3.10, of course Are the tests running with the latest integration version? Examples include Django, PostgreSQL, etc. Is there a Continuous Integration (CI) configuration? Is the CI passing? Does it seem relatively well used? Has there been a commit in the last year? Has there been a release in the last year? Article has a short discussion of each. What score is good? That’s up to you to decide. But these questions are good to think about for your dependencies. I also think I’ll use these questions for my own projects. I’ve got a README.md, but do I need more examples in it? Should I have RTD docs for it? Have I updated the test matrix to include the newest versions of Python, etc? Have I hooked up CI? Michael #2: Validate emails with email-validator When you think about validating emails, you probably think regex (or just nothing) Regex is fine but so is this email: jane_doe@domain_that_doesnt_exist.com Problem is (at the time of the recording), domain_that_doesnt_exist.com is not a website. What about unicode variations that are technically the same but visually different? If the passed email address is valid, the validate_email() method will return an object containing a normalized form of the passed email address. Anne #3: The Python on Microcontrollers Newsletter One of my main focuses at Adafruit since the pandemic started is as editor of the Python on Microcontrollers Newsletter. With a weekly distribution of almost 9,400 subscribers, it’s the largest newsletter of it’s kind. It mainly focuses on CircuitPython and MicroPython and also discusses Python on single board computers (SBC) like Raspberry Pi. News about Python with a small computer emphasis Folks may subscribe by going to https://www.adafruitdaily.com/ which is separate from adafruit.com. The information is not sold or used for marketing and it’s easy to unsubscribe (no “do you really want to do this, please reconsider…) The challenge, like for Python Bytes and other publications, is to find content. I scour the internet, with a bit of a focus on Twitter as I have an active account there. We encourage others to put in issues and Pull Requests on the newsletter GitHub, email information to cpnews@adafruit.com and using hashtag CircuitPython or MicroPython on Twitter. Brian #4: Git Organized: A Better Git Flow Annie Sexton Found through Changelog episode 480: Get your reset on A possible and common git workflow Branch off of a main branch to a personal dev branch Commit and Push during development to save your work When ready to merge, make a PR Problems Commits are hard to follow and messy, not ever really intended to separate parts of the workflow or anything. Commits are therefore useless in helping someone code review large changes. Annie’s workflow Branch off of a main branch to a personal dev branch Commit and Push during development to save your work. But don’t worry to much about commit messages, “WIP” is sufficient. Or a note to yourself. When ready to merge git reset origin/main Re-commit all changes in a logical order that makes more sense than the way the work actually happened. These will be several commits, with descriptive messages. Even partial commits, if there are unrelated changes in a file, work with this process Push all the new commits. (Is --force going to be necessary?) Create a PR. Now there are a set of commits that are actually helpful to break up large PRs into small chunks that tell a story. I’m looking to try this soon to see how it goes Michael #5: CPython issues moving to GitHub soon Update by The Python Developer in Residence, Łukasz Langa The Steering Council is working on migrating the data that is currently residing in Roundup at https://bugs.python.org/ (BPO) into the GitHub issues of the CPython repository hosted there. Laid out in PEP 581 -- Using GitHub Issues for CPython The ultimate goal is to move user- and core developer-provided issue-reporting entirely to Github. Each issue that currently exists on BPO will include metadata indicating where it was moved on Github. New issues will only exist on Github. Feedback, please: At the current stage, we’re asking you to take a look at the links and important dates below, and share any feedback you might have. Timeline: Friday, March 11th 2022: Github starts transfer of the issues in the temporary repository to github.com/python/cpython/ . The migration is estimated to take anywhere from 3 to 7 days, depending on the load on Github.com. Anne #6: MicroPython, CircuitPython and GitHub What are Microcontrollers and Single Board Computers (SBCs)? Why not use CPython on Microcontrollers? MicroPython was originally created by the Australian programmer and theoretical physicist Damien George, after a successful Kickstarter backed campaign in 2013. Originally it only ran on a number of boards and was based on Python 3.4. CircuitPython was forked from MicroPython in 2017 by Adafruit Industries. Both MicroPython and CircuitPython are Open Source under MIT Licenses so adoption and modification by anyone is easy. Why fork CircuitPython? 1) Make a requirement that CircuitPython boards can enumerate to computers as a USB thumb drive to add or change code files with any text editor. 2) Aim to make CircuitPython use CPython library syntax whenever possible. 3) Make it easy to use and understand for beginners yet powerful for more advanced users. All CircuitPython code is on GitHub. GitHub Actions is used on repos like the Adafruit Learning System code to automate CI with Pylint, Black, and ensuring code has proper SPDX author and license tags, which is a new addition this year. Currently there are 283 microcontroller boards compatible with CircuitPython and 87 single board computers can use CircuitPython libraries in CPython via the Adafruit Blinka abstraction layer. Code portability between boards requires little if any changes. There are 346 CircuitPython libraries (all on PyPI / pip as well as GitHub) covering a wide range of hardware and real world needs. From blinking LEDs to using ulab (microlab), a subset of numpy, for data crunching. I just counted and there are exactly 1,000 Adafruit Learning System guides referencing CircuitPython, all free and open source/MIT licensed. https://learn.adafruit.com/ Extras Brian: Quick read: The Thirty Minute Rule, by Daniel Roy Greenfield summary: Stuck on a software problem for 30 min? Ask for help. Michael: The CircuitPython Show by Paul Cutler Follow up from my Python 3 == Active Python 3? James wrote: In episode #273, you guys were discussing supporting "Python 3" to mean any currently supported version of Python rather than "Python 3.7+" or similar. That's a really bad idea. There are still tons of people using unsupported versions of Python, and they're not all invalid use cases. For example, I am one of the upstream maintainers for cloud-init, and I was only recently able to remove Python 3.5 in order to make 3.6 our minimum supported version (which will continue for the next year). The reason is that our main consumers are downstream distro packagers (ubuntu, red hat, fedora, etc), and it's not uncommon for software released into long-term supported OS releases to be supported for 5-10 years or more. If I fire up an Ubuntu Trusty container, which still receives extended support until 2024, I get Python 3.4. So even though 3.4 is unsupported by Python upstream, it is still absolutely in use and supported by OS manufacturers. Joke: A case of the Mondays

#273 Getting dirty with __eq__(self, other)

March 04, 2022 00:37:05 31.27 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by Datadog: pythonbytes.fm/datadog Michael #1: Physics Breakthrough as AI Successfully Controls Plasma in Nuclear Fusion Experiment Interesting break through using AI Is Python at the center of it? With enough digging, the anwswer is yes, and we love it! Brian #2: PEP 680 -- tomllib: Support for Parsing TOML in the Standard Library Accepted for Python 3.11 This PEP proposes basing the standard library support for reading TOML on the third-party library tomli Michael #3: Thread local threading.local: A class that represents thread-local data. Thread-local data are data whose values are thread specific. Just create an instance of local (or a subclass) and store attributes on it You can even subclass it. Brian #4: What is a generator function? Trey Hunner Super handy, and way easier than you think, if you’ve never written your own. Really, it’s just a function that uses yield instead of return and supplies one element at a time instead of returning a list or dict or tuple or other large structure. Some details generator functions return generator objects generator objects are on pause and use the built in next() function to get next item. they raise StopIteration when done. Most generally used from for loops. Generator objects cannot be re-used when exhausted but you can get a new one with the next for loop you use. So, it’s all good. Michael #5: dirty-equals via Will McGugan, by Samual Colvin Doing dirty (but extremely useful) things with equals. from dirty_equals import IsPositive assert 1 == IsPositiveassert -2 == IsPositive # this will fail! user_data = db_conn.fetchrow('select * from users') assert user_data == { 'id': IsPositiveInt,'username': 'samuelcolvin','avatar_file': IsStr(regex=r'/[a-z0-9\-]{10}/example\.png'),'settings_json': IsJson({'theme': 'dark', 'language': 'en'}),'created_ts': IsNow(delta=3),} Brian #6: Commitizen from the docs Command-line utility to create commits with your rules. Defaults: Conventional commits Display information about your commit rules (commands: schema, example, info) Bump version automatically using semantic versioning based on the commits. Read More Generate a changelog using Keep a changelog considering using for consistent commit message formatting can be used with python-semantic-release for automatic semantic versioning learned about it in 10 Tools I Wish I Knew When I Started Working with Python questions anyone using this or something similar? does this make sense for small to medium sized projects? or overkill? Extras: pytest book 40% off sale continues through March 19 for eBook Amazon lists the book as “shipping in 1-2 days”, as of March 2 Michael: Pronouncing the Python Walrus operator := as “becomes” Via John Sheehan: String methods startswith() and endswith() can take a tuple as its first argument that lets you check for multiple values with one call: >>> x = "abcdefg" >>> x.startswith(("ab", "cd", "ef"), 2) True Joke: CS Background

#272 The tools episode

February 24, 2022 00:48:09 41.55 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsor: Brought to you by FusionAuth - check them out at pythonbytes.fm/fusionauth Special guest: Calvin Hendryx-Parker Brian #1: Why your mock still doesn’t work Ned Batchelder Some back story: Why your mock doesn’t work a quick tour of Python name assignment The short version of Python Names and Values talk importing difference between from foo import bar and import foo w.r.t mocking punchline: “Mock it where it’s used” Now, Why your mock still doesn’t work talks about using @patch decorator (also applies to @patch.object decorator) and utilizing mock_thing1, mock_thing2 parameters to test you can change the return value or an attribute or whatever. normal mock stuff. But…. the punchline…. be careful about the order of patches. It needs to be @patch("foo.thing2") @patch("foo.thing1") def test_(mock_thing1, mock_thing2): ... Further reading: https://docs.python.org/3/library/unittest.mock.html#patch https://docs.python.org/3/library/unittest.mock.html#patch-object Michael #2: pls via Chris May Are you a developer who uses the terminal? (likely!) ls/l are not super helpful. There are replacements and alternatives But if you’re a dev, you might want the most relevant info for you, so enter pls See images in Michael’s tweets [1, 2]. You must install nerdfonts and set your terminal’s font to them Calvin #3: Kitty Cross platform GPU accelerated terminal (written in Python Extended with Kittens written in Python Super fast rendering Has a rich set of plugins available for things like searching the buffer with fzf Brian #4: Futures and easy parallelisation Jaime Buelta Code example for quick scripts to perform slow tasks in parallel. Uses concurrent.futures and ThreadPoolExecutor. Starts with a small toy example, then goes on to a requests example to grab a bunch of pages in parallel. The call to executor.submit() sets up the job. This is done in a list comprehension, generating a list of futures. The call to futures.result() on each future within the list is where the blocking happens. Since we want to wait for all results, it’s ok to block on the first, second, etc. Nice small snippet for easy parallel work. Example: from concurrent.futures import ThreadPoolExecutor import time import requests from urllib.parse import urljoin NUM_WORKERS = 2 executor = ThreadPoolExecutor(NUM_WORKERS) def retrieve(root_url, path): url = urljoin(root_url, path) print(f'{time.time()} Retrieving {url}') result = requests.get(url) return result arguments = [('https://google.com/', 'search'), ('https://www.facebook.com/', 'login'), ('https://nyt.com/', 'international')] futures_array = [executor.submit(retrieve, *arg) for arg in arguments] result = [future.result() for future in futures_array] print(result) Michael #5: pgMustard So you have a crappy web app that is slow but don’t know why. Is it an N+1 problem with an ORM? Is it a lack of indexes? If you’re using postgres, check out pgMustard: A simple yet powerful tool to help you speed up queries This is a paid product but might be worthwhile if you live deeply in postgres. Calvin #6: bpytop Great way to see what is going on in your system/server Shows nice graphs in the terminal for system performance such as CPU and Network traffic Support themes and is fast and easy to install with pipx Michael uses Glances which is fun too. Calvin used to be a heavy Glances user until he saw the light 🙂 Extras Brian: pytest book is officially no longer Beta, next is printing, expected paper copy ship date at March 22, although I’m hoping earlier. For a limited time, to celebrate, 40% off the eBook PyCamp Spain is April 15-18: a weekend that includes 4 days and 3 nights with full board (breakfast, lunch and dinner) in Girona, Spain Calvin: Python Web Conference 2022 ← bigger and better than ever! Michael: witch macOS switcher list comprehensions vs. loops [[video](https://www.youtube.com/watch?v=uVQVn8z8kxo), [code](https://gist.github.com/mikeckennedy/2ddb5ad84d6e116e6d14b5c2eef4245a)] syncify.run and nesting asyncio Joke: Killer robots

#271 CPython: Async Task Groups in Python 3.11

February 16, 2022 00:57:21 48.91 MB Downloads: 0

Watch the live stream: Watch on YouTube About the show Sponsored by us: Check out the courses over at Talk Python And Brian’s book too! Special guest: Steve Dower Michael #1: fastapi-events Asynchronous event dispatching/handling library for FastAPI and Starlette Features: straightforward API to emit events anywhere in your code events are handled after responses are returned (doesn't affect response time) support event piping to remote queues powerful built-in handlers to handle events locally and remotely coroutine functions (async def) are the first-class citizen write your handlers, never be limited to just what fastapi_events provides Brian #2: Ways I Use Testing as a Data Scientist Peter Baumgartner “In my work, writing tests serves three purposes: making sure things work, documenting my understanding, preventing future errors.” Test The results of some analysis process (using assert) Code that operates on data (using hypothesis) Aspects of the data (using pandera or Great Expectations) Code for others (using pytest) Use asserts liberally even within the code use on as many intermediate calculations and processes as you can embed expressions in f-strings as the last argument to assert to help debug failures check calculations and arithmetic check the obvious Notebooks: “One practice I’ve started is that whenever I visually investigate some aspect of my data by writing some disposable code in a notebook, I convert that validation into an assert statement.” utilize numpy and pandas checks, especially for arrays and floating point values hypothesis can help you think of edge cases that should work, but don’t, like empty Series, and NaN values. Write tests on the data itself pandera useful for lightweight cases, checking schema on datasets. Great Expectations if we’re epecting to repeatedly read new data with the same structure. Use pytest, especially for code you are sharking with other people, like libraries. TDD works great for API development Arrange-Act-Assert is a great structure. “Even if we’re not sure what to assert, writing a test that executes the code is still valuable. “ At least you’ll catch when you’ve forgotten to implement something. Steve #3: PEP 654 Exception groups and except A necessary building block for more advanced asyncio helpers Mainly for use by scheduler libraries to handle raising multiple errors “simultaneously” except: “a single exception group can cause several except clauses to execute, but each such clause executes at most once (for all matching exceptions from the group)” Necessary for complex scheduling, such as task groups Michael #4: py-overload A Runtime method override decorator. Python lacks method overriding (do_it(7) vs. do_it(``"``7``"``)) Probably due to lack of typing in the early days Go from this: def _func_str(a: str): ... def _func_int(a: int): ... def func(a: Union[str, int]): if isinstance(a, str): _func_str(a) else: _func_int(a) To this: @overload def func(a: str): ... @overload def func(a: int): ... Brian #5: Next-generation seaborn interface Love the background and goals section “This work grew out of long-running efforts to refactor the seaborn internals so that its functions could rely on common code-paths. At a certain point, I decided that I was developing an API that would also be interesting for external users too.” “seaborn was originally conceived as a toolbox of domain-specific statistical graphics to be used alongside matplotlib.” I’ve always wondered about this Some people now reach for, or learn, seaborn first. As seaborn has grown, reproducing with raw matplotlib to change something seaborn doesn’t expose is sometimes painful goal : “expose seaborn’s core features — integration with pandas, automatic mapping between data and graphics, statistical transformations — within an interface that is more compositional, extensible, and comprehensive.” I also like interface discussions that have phrases like “This is a clean namespace, and I’m leaning towards recommending from seaborn.objects import * for interactive usecases. But let’s not go so far just yet.” I like clean namespaces, and use some of my own libs like this, but import * always is a red flag for me. The new interface exists as a set of classes that can be acessed through a single namespace import: import seaborn.objects as so Start with so.Plot, add layers, like so.Scatter(), even multiple layers. layers have a Mark object, which defines how to draw the plot, like so.Line or so.Dot There’s a lot more detail in there. The discussion is great. Also a neat understanding that established libraries can change their mind on APIs. This is a good way to discuss it, in the open. Note included at the top: “This is very much a work in progress. It is almost certain that code patterns demonstrated here will change before an official release. I do plan to issue a series of alpha/beta releases so that people can play around with it and give feedback, but it’s not at that point yet.” Steve #6: Compile CPython to Web Assembly Allows fully in-browser use of CPython (demo at https://repl.ethanhs.me/) Currently uses Emscriptem as its runtime environment, to fill in gaps that browsers don’t normally offer (like an in-memory file system), or WASI to more carefully add system functionality Still the CPython runtime, and a lot of work to do before you’ll see it as part of client-side web apps, but the possibility is now there. Extras Michael: Get minutes, hours, and days from Python timedelta - A Python Short Did you know ohmyzsh is kind of local? Django reformatted code with Black (via PyCoders) Steve: Python 3.11’s latest alpha now has Windows ARM64 installers. These aren’t the dominant devices yet, but they’re out there, and if you’ve got one the CPython team would love to hear about your experience. Steve just released a new version of Deck, which started as a way to help people who misspelled collections.deque, but has grown into a useful building block for traditional 52-card games (or 54 including jokers). Joke: Help is coming