Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.
#319 CSS-Style Queries for... JSON?
About the show
Sponsored by Microsoft for Startups Founders Hub.
Connect with the hosts
- Michael: @mkennedy@fosstodon.org
- Brian: @brianokken@fosstodon.org
- Show: @pythonbytes@fosstodon.org
Join us on YouTube at pythonbytes.fm/stream/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.
Michael #1: Secure maintainer workflow
- by Ned Batchelder
- We are the magicians, but also the gatekeepers for our users
- Terminal sessions with implicit access to credentials
- first is unlikely: a bad guy gets onto my computer and uses the credentials to cause havoc
- second way is a more serious concern: I could unknowingly run evil or buggy code that uses my credentials in bad ways.
- Mitigations
- 1Password: where possible, I store credentials in 1Password, and use tooling to get them into environment variables.
- Side bar: Do not use lastpass, see end segment
- I can have the credentials in the environment for just long enough to use them. This works well for things like PyPI credentials, which are used rarely and could cause significant damage.
- Docker: To really isolate unknown code, I use a Docker container.
- 1Password: where possible, I store credentials in 1Password, and use tooling to get them into environment variables.
Brian #2: Tools for parsing HTML and JSON
- Learned these from A Year of Writing about Web Scraping in Review
- Parsel - extract and remove data from HTML using XPath and CSS selectors
- jmespath - “James Path” - declaratively specify how to extract elements from a JSON document
Michael #3: git-sizer
- Compute various size metrics for a Git repository, flagging those that might cause problems.
Tip, partial clone: git clone --filter=blob:none URL
# Stats for training.talkpython.fm # Full: git clone repo Receiving objects: 100% (118820/118820), 514.31 MiB | 28.83 MiB/s, done. Resolving deltas: 100% (71763/71763), done. Updating files: 100% (10792/10792), done. 1.01 GB on disk # Partial: git clone --filter=blob:none repo Receiving objects: 100% (10120/10120), 220.25 MiB | 24.92 MiB/s, done. Resolving deltas: 100% (1454/1454), done. Updating files: 100% (10792/10792), done. 694.4 MB on disk
Partial clone is a performance optimization that “allows Git to function without having a complete copy of the repository. The goal of this work is to allow Git better handle extremely large repositories.” When changing branches, Git may download more missing files.
- Not the same as shallow clones or sparse checkouts
- Consider shallow clones for CI/CD/deployment
- Sparse checkouts for a slice of a monorepo
Brian #4: Dataclasses without type annotations
- Probably file this under “don’t try this at home”.
- Or maybe “try this at home, but not at work”.
- Or just “that Brian fella is a bad influence”.
- What! It’s not me. It’s Adrian, the dude that wrote the article.
- Unless you’re using a type checker, for dataclasses, “… use any type you want. If you're not using a static type checker, no one is going to care what type you use.”
@dataclass class Literally: anything: ("can go", "in here") as_long_as: lambda: "it can be evaluated" # Now, I've noticed a tendency for this program to get rather silly. hell: with_("from __future__ import annotations") it_s: not even.evaluated it: just.has(to=be) * syntactically[valid] # Right! Stop that! It's SILLY!
Extras
Michael:
- LastPass story just keeps getting worse
- We will see problems in supply chains because of this too
- A whole 2 hour discussion diving into what I touched on: twit.tv/shows/security-now
- Got your new mac mini yet? Or MacBook Pro?