#319 CSS-Style Queries for... JSON?

Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.

#319 CSS-Style Queries for... JSON?

January 18, 2023 00:32:44 23.8 MB Downloads: 0

Watch on YouTube

About the show

Sponsored by Microsoft for Startups Founders Hub.

Connect with the hosts

Join us on YouTube at pythonbytes.fm/stream/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.

Michael #1: Secure maintainer workflow

by Ned Batchelder
We are the magicians, but also the gatekeepers for our users
Terminal sessions with implicit access to credentials
- first is unlikely: a bad guy gets onto my computer and uses the credentials to cause havoc
- second way is a more serious concern: I could unknowingly run evil or buggy code that uses my credentials in bad ways.
Mitigations
- 1Password: where possible, I store credentials in 1Password, and use tooling to get them into environment variables.
  - Side bar: Do not use lastpass, see end segment
  - I can have the credentials in the environment for just long enough to use them. This works well for things like PyPI credentials, which are used rarely and could cause significant damage.
- Docker: To really isolate unknown code, I use a Docker container.

Brian #2: Tools for parsing HTML and JSON

Learned these from A Year of Writing about Web Scraping in Review
Parsel - extract and remove data from HTML using XPath and CSS selectors
jmespath - “James Path” - declaratively specify how to extract elements from a JSON document

Michael #3: git-sizer

Compute various size metrics for a Git repository, flagging those that might cause problems.

Tip, partial clone: git clone --filter=blob:none URL

    # Stats for training.talkpython.fm
    # Full: git clone repo
    Receiving objects: 100% (118820/118820), 514.31 MiB | 28.83 MiB/s, done.
    Resolving deltas: 100% (71763/71763), done.
    Updating files: 100% (10792/10792), done.
    1.01 GB on disk

    # Partial: git clone --filter=blob:none repo
    Receiving objects: 100% (10120/10120), 220.25 MiB | 24.92 MiB/s, done.
    Resolving deltas: 100% (1454/1454), done.
    Updating files: 100% (10792/10792), done.
    694.4 MB on disk

Partial clone is a performance optimization that “allows Git to function without having a complete copy of the repository. The goal of this work is to allow Git better handle extremely large repositories.” When changing branches, Git may download more missing files.
Not the same as shallow clones or sparse checkouts
- Consider shallow clones for CI/CD/deployment
- Sparse checkouts for a slice of a monorepo

Brian #4: Dataclasses without type annotations

Probably file this under “don’t try this at home”.
- Or maybe “try this at home, but not at work”.
- Or just “that Brian fella is a bad influence”.
  - What! It’s not me. It’s Adrian, the dude that wrote the article.

Unless you’re using a type checker, for dataclasses, “… use any type you want. If you're not using a static type checker, no one is going to care what type you use.”

    @dataclass
    class Literally:
        anything: ("can go", "in here")
        as_long_as: lambda: "it can be evaluated"
        # Now, I've noticed a tendency for this program to get rather silly.
        hell: with_("from __future__ import annotations")
        it_s: not even.evaluated
        it: just.has(to=be) * syntactically[valid]
        # Right! Stop that! It's SILLY!

Extras

Michael:

LastPass story just keeps getting worse
- We will see problems in supply chains because of this too
- A whole 2 hour discussion diving into what I touched on: twit.tv/shows/security-now
Got your new mac mini yet? Or MacBook Pro?

Joke: Developer/maker, what’s my purpose?

Add New Podcast

Subscribe to this podcast

#319 CSS-Style Queries for... JSON?