Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.
Similar Podcasts
The Infinite Monkey Cage
Brian Cox and Robin Ince host a witty, irreverent look at the world through scientists' eyes.
The Top Shelf
ThePrimeagen and teej_dv are on a quest to find the best possible technical speakers and ask the best possible questions we can find. You all know ThePrimeagen can't read, so this is a great format for him to really shine. Teej is here to make sure that Prime knows who the guest is and also to interrupt Prime wherever possible
24H24L
Evento en línea, de 24 horas de duración que consiste en la emisión de 24 audios de diversas temáticas sobre GNU/Linux. Estos son los audios del evento en formato podcast.
#290 Sentient AI? If so, then what?
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Special guest: Nick Muoh Brian #1: picologging From a tweet by Anthony Shaw From README.md “early-alpha” stage project with some incomplete features. (cool to be so up front about that) “Picologging is a high-performance logging library for Python. picologging is 4-10x faster than the logging module in the standard library.” “Picologging is designed to be used as a drop-in replacement for applications which already use logging, and supports the same API as the logging module.” Now you’ve definitely got my attention. For many common use cases, it’s just way faster. Sounds great, why not use it? A few limitations listed: process and thread name not captured. Some logging globals not observed: logging.logThreads, logging.logMultiprocessing, logging.logProcesses Logger will always default to the Sys.stderr and not observe (emittedNoHandlerWarning). Michael #2: CheekyKeys via Prayson Daniel What if you could silently talk to your computer? CheekyKeys uses OpenCV and MediaPipe's Face Mesh to perform real-time detection of facial landmarks from video input. The primary input is to "type" letters, digits, and symbols via Morse code by opening and closing your mouth quickly for . and slightly longer for -. Most of the rest of the keyboard and other helpful actions are included as modifier gestures, such as: shift: close right eye command: close left eye arrow up/down: raise left/right eyebrow … Watch the video where he does a coding interview for a big tech company using no keyboard. Nick #3: Is Google’s LaMDA Model Sentient? authored by Richard Luscombe (The Guardian) The Google engineer who thinks the company’s AI has come to life Transcript of conversation Brian #4: richbench Also from Anthony “A little Python benchmarking tool.” Give it a list of (first_func, second_func, “label”), and it times them and prints out a comparison. Simple and awesome. def sort_seven(): """Sort a list of seven items""" for _ in range(10_000): sorted([3,2,4,5,1,5,3]) def sort_three(): """Sort a list of three items""" for _ in range(10_000): sorted([3,2,4]) __benchmarks__ = [ (sort_seven, sort_three, "Sorting 3 items instead of 7") ] Michael #5: typeguard A run-time type checker for Python Three principal ways to do type checking are provided, each with its pros and cons: Manually with function calls @typechecked decorator import hook (typeguard.importhook.install_import_hook()) Example: @typechecked def some_function(a: int, b: float, c: str, *args: str) -> bool: ... return retval Nick #6: CustomTkinter A modern and customizable python UI-library based on Tkinter. Extras Michael: OpenSSF Funds Python and Eclipse Foundations - OpenSSF’s Alpha-Omega Project has committed $400K to the Python Software Foundation (PSF), in order to create a new role which will provide security expertise for Python, the Python Package Index (PyPI), and the rest of the Python ecosystem, as well as funding a security audit. (via Python Weekly) Nick: Terms of Service Didn’t Read - Terms of Service; Didn't Read” (short: ToS;DR) is a young project started in June 2012 to help fix the “biggest lie on the web”: almost no one really reads the terms of service we agree to all the time. Joke: Serverless A DevOps approach to COVID-19
#289 Textinator is coming for your text, wherever it is
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Special guest: Gina Häußge, creator & maintainer of OctoPrint Michael #1: beanita Local MongoDB-like database prepared to work with Beanie ODM So, you know Beanie - Pydantic + async + MongoDB And you know Mongita - Mongita is to MongoDB as SQLite is to SQL Beanita lets you use Beanie, but against Mongita rather than a server-based MongoDB server Brian #2: The Good Research Code Handbook Patrick J Mineault “for grad students, postdocs and PIs (principle investigator) who do a lot of programming as part of their research.” lessons setup git, virtual environments, project layout, packaging, cookie cutter style style guides, keeping things clean coding separating concerns, separating pure functions and those with side effects, pythonic-ness testing unit testing, testing with side effects, … (incorrect definition of end-to-end tests, but a good job at covering the other bits) documentation comments, tests, docstrings, README.md, usage docs, tutorials, websites documenting pipelines and projects social aspects various reviews, pairing, open source, community sample project extras testing example good tools to use Gina #3: CadQuery Python lib to do build parametric 3D CAD models Can output STL, STEP, AMF, SVG and some more Uses same geometry kernel as FreeCAD (OpenCascade) Also available: desktop editor, Jupyter extension, CLI Would recommend the Jupyter extension, the app seems a bit behind latest development Jupyter extension is easy to set up on Docker and comes with a nice 3D preview pane Was able to create a basic parametric design of an insert for an assortment box easily Python 3.8+, not yet 3.11, OpenCascade related Michael #4: Textinator Like TextSniper, but in Python Simple MacOS StatusBar / Menu Bar app to automatically detect text in screenshots Built with RUMPS: Ridiculously Uncomplicated macOS Python Statusbar apps Take a screenshot of a region of the screen using ⌘ + ⇧ + 4 (Cmd + Shift + 4). The app will automatically detect any text in the screenshot and copy it to your clipboard. How Textinator Works At startup, Textinator starts a persistent NSMetadataQuery Spotlight query (using the pyobjc Python-to-Objective-C bridge) to detect when a new screenshot is created. When the user creates screenshot, the NSMetadataQuery query is fired and Textinator performs text detection using a Vision VNRecognizeTextRequest call. Brian #5: Handling Concurrency Without Locks "How to not let concurrency cripple your system” Haki Benita “…common concurrency challenges and how to overcome them with minimal locking.” Starts with a Django web app A url shortener that generates a unique short url and stores the result in a database so it doesn’t get re-used. Discussions of collision with two users checking, then storing keys at the same time. locking problems in general utilizing database ability to make sure some items are unique, in this case PostgreSQL updating your code to take advantage of database constraints support to allow you to do less locking within your code Gina #6: TatSu Generates parsers from EBNF grammars (or ANTLR) Can compile the model (similar to regex) for quick reuse or generate python source Many examples provided Active development, Python 3.10+ Extras Michael: Back on 285 we spoke about PEP 690. Now there is a proper blog post about it. Expedited release of Python3.11.0b3 - Due to a known incompatibility with pytest and the previous beta release (Python 3.11.0b2) and after some deliberation, Python release team have decided to do an expedited release of Python 3.11.0b3 so the community can continue testing their packages with pytest and therefore testing the betas as expected. (via Python Weekly) Kagi search via Daniel Hjertholm Not really python related, but if I know Michael right, he'll love the new completely ad free and privacy-respecting search engine kagi.com. I've used kagi.com since their public beta launched, mainly to search for solutions to Python issues at work. The results are way better than DuckDuckGo's results, and even better than Googles! Love the Programming-lens and the ability to up/down prioritize domains in the results. Their FAQ explains everything you need to know: https://kagi.com/faq Looks great but not sure about the pricing justification (32 sec of compute = $1), that’s either 837x more than all of Talk Python + Python Bytes or more than 6,700x more than just one of our sites/services. (We spend about $100/mo on 8 servers.) But they may be buying results from Google and Bing, and that could be the cost. Here's a short interview with the man who started kagi. Gina: rdserialtool: Reads out low-cost USB power monitors (UM24C, UM25C, UM34C) via BLE/pybluez. Amazing if you need to monitor the power consumption/voltage/current of some embedded electronics on a budget. Helped me solve a very much OctoPrint development specific problem. Python 3.4+ nodejs-bin: by Sam Willis: https://twitter.com/samwillis/status/1537787836119793667 Install nodejs via pypi/as dependency, still very much an Alpha but looks promising Makes it easier to obtain a full stack environment Very interesting for end to end testing with JS based tooling, or packaging a frontend with your Python app See also nodeenv, which does a similar thing, but with additional steps Joke: Rejected Github Badges
#288 Performance benchmarks for Python 3.11 are amazing
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Brian #1: Polars: Lightning-fast DataFrame library for Rust and Python Suggested by a several listeners “Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model. Lazy | eager execution Multi-threaded SIMD (Single Instruction/Multiple Data) Query optimization Powerful expression API Rust | Python | ...” Python API syntax set up to allow parallel and execution while sidestepping GIL issues, for both lazy and eager use cases. From the docs: Do not kill parallelization The syntax is very functional and pipeline-esque: import polars as pl q = ( pl.scan_csv("iris.csv") .filter(pl.col("sepal_length") > 5) .groupby("species") .agg(pl.all().sum()) ) df = q.collect() Polars User Guide is excellent and looks like it’s entirely written with Python examples. Includes a 30 min intro video from PyData Global 2021 Michael #2: PSF Survey is out Have a look, their page summarizes it better than my bullet points will. Brian #3: Gin Config: a lightweight configuration framework for Python Found through Vincent D. Warmerdam’s excellent intro videos on gin on calmcode.io Quickly make parts of your code configurable through a configuration file with the @gin.configurable decorator. It’s in interesting take on config files. (Example from Vincent) # simulate.py @gin.configurable def simulate(n_samples): ... # config.py simulate.n_samples = 100 You can specify: required settings: def simulate(n_samples=gin.REQUIRED)` blacklisted settings: @gin.configurable(blacklist=["n_samples"]) external configurations (specify values to functions your code is calling) can also references to other functions: dnn.activation_fn = @tf.nn.tanh Documentation suggests that it is especially useful for machine learning. From motivation section: “Modern ML experiments require configuring a dizzying array of hyperparameters, ranging from small details like learning rates or thresholds all the way to parameters affecting the model architecture. Many choices for representing such configuration (proto buffers, tf.HParams, ParameterContainer, ConfigDict) require that model and experiment parameters are duplicated: at least once in the code where they are defined and used, and again when declaring the set of configurable hyperparameters. Gin provides a lightweight dependency injection driven approach to configuring experiments in a reliable and transparent fashion. It allows functions or classes to be annotated as @gin.configurable, which enables setting their parameters via a simple config file using a clear and powerful syntax. This approach reduces configuration maintenance, while making experiment configuration transparent and easily repeatable.” Michael #4: Performance benchmarks for Python 3.11 are amazing via Eduardo Orochena Performance may be the biggest feature of all Python 3.11 has task groups in asyncio fine-grained error locations in tracebacks the self-type to return an instance of their class The "Faster CPython Project" to speed-up the reference implementation. See my interview with Guido and Mark: talkpython.fm/339 Python 3.11 is 10~60% faster than Python 3.10 according to the official figures And a 1.22x speed-up with their standard benchmark suite. Arriving as stable until October Extras Michael: Python 3.10.5 is available (changelog) Raycast (vs Spotlight) e.g. CMD+Space => pypi search: Joke: Why wouldn't you choose a parrot for your next application
#287 Surprising ways to use Jupyter Notebooks
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Michael #1: auto-py-to-exe Converts .py to .exe using a simple graphical interface A good candidate to install via pipx For me, just point it at the top level app.py file and click go Can add icons, etc. Got a .app version and CLI version (I think 😉 ) Required brew install python-tk to get tkinter on my mac I tested it against my URLify app. Oddly, only ran on Python 3.9 but not 3.10 Brian #2: 8 surprising ways how to use Jupyter Notebook by Aleksandra Płońska, Piotr Płoński Fun romp through ways you can use and abuse notebooks package development web app slides book blog report dashboard REST API Michael #3: piptrends by Tankala Ashok Use piptrends.com for comparing python packages downloads and GitHub Statistics. Whenever doing research which python package, check multiple places to finalize it so thought of putting all those things in a single place. Inspired by npmtends.com. Brian #4: Is it a class or a function? It's a callable! by Trey Hunner It’s kinda hard to tell in Python. Actually, impossible to tell from staring at the calling code. “Of the 69 “built-in functions” listed in the Python Built-In Functions page, only 42 are actually implemented as functions: 26 are classes and 1 (help) is an instance of a callable class. Of the 26 classes among those built-in “functions”, four were actually functions in Python 2 (the now-lazy map, filter, range, and zip) but have since become classes. The Python built-ins and the standard library are both full of maybe-functions-maybe-classes.” len - yep, that’s a function zip - that’s a class reversed, enumerate, range, and filter “functions” are all classes. But callable classes. Cool discussion of callable objects partials, itemgetters, iterators, generators, factory functions … Extras Brian: What’s in which Python - Ned Batchelder brief bullet list of a few memorable changes in versions 2.1 through 3.11 Michael: Orion Browser via Dan Bader PSF 2021 Survey Results are out (full analysis next week) Joke: async problems
#286 Unreasonable f-strings
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Brian #1: The Python GIL: Past, Present, and Future Bary Warsaw and Paweł Polewicz Michael #2: Announcing the PyOxy Python Runner PyOxy is all of the following: An executable program used for running Python interpreters. A single file and highly portable (C)Python distribution. An alternative python driver providing more control over the interpreter than what python itself provides. A way to make some of PyOxidizer's technology more broadly available without using PyOxidizer. PyOxidizer is often used to generate binaries embedding a Python interpreter and a custom Python application. However, its configuration files support additional functionality, such as the ability to produce Windows MSI installers, macOS application bundles, and more. The pyoxy executable also embeds a copy of the Python standard library and imports it from memory using the oxidized_importer Python extension module. Brian #3: The unreasonable effectiveness of f-strings and re.VERBOSE Michael #4: PyCharm PR Management Really nice but not very discoverable Not covered in the docs, but super useful. Available in pro and free community edition Steps Open a project that has an associated github git repo If the GitHub repo has a PR, you’ll see it in the Pull Requests tab. Browse the PRs, and open them for details There you can see the comments, close or merge it, and more Most importantly, check it out to see how it works Extras Brian: Pandas Tutor: Using Pyodide to Teach Data Science at Scale Michael: Python + pyscript + WebAssembly: Python Web Apps, Running Locally with pyscript video is out And an iOS Python Apps video too Joke: Losing an orm!
#285 Where we talk about UIs and Python
Watch the live stream: Watch on YouTube About the show Sponsored: RedHat: Compiler Podcast Special guests Mark Little Ben Cosby Michael #1: libgravatar A library that provides a Python 3 interface to the Gravatar APIs. If you have users and want to show some sort of an image, Gravatar is OK PyPI uses this for example (gravatar, not necessarily this lib) Usage: >>> g = Gravatar('myemailaddress@example.com') >>> g.get_image() 'https://www.gravatar.com/avatar/0bc83cb571cd1c50ba6f3e8a78ef1346' Brian #2: JSON to Pydantic Converter Suggested by Chun Ly, “this awesome JSON to @samuel_colvin's pydantic is so useful. It literally saved me days of work with a complex nested JSON schema.“ “JSON to Pydantic is a tool that lets you convert JSON objects into Pydantic models.” It’s a live site, where you can plop JSON on one the left, and Pydantic models show up on the right. There’s a couple options: Specify every field as Optional Alias camelCase fields as snake_case It’s also an open source project, built with FastAPI, Create React App, and a project called datamodel-code-generator. Mark #3: tailwindcss, tailwindui Not python, but helpful for web UI and open source business model example tailwindcss generates CSS Used on the Lexchart app Benefits of tailwindcss and tailwindui: Just-in-Time makes it fast. Output includes only classes used for the project. Stand on shoulders of design thinking from Steve Schoger and Adam Wathan. See also refactoingui.com. Use in current projects without CSS conflicts. Custom namespace with prefix in tailwind.config.js. Bonus: custom namespace prefixes work with the tailwind plug-ins for VS Code and PyCharm. Works well with template engines like, Chameleon. We use tailwind for our app UI. Toolbar template example. Another example of docs and tutorials being a strategic business asset. Resources tailwindcss.com tailwindlabs on YouTube, great tutorials from Simon at Tailwind Beginner friendly tutorials: Thirus, example of tailwind install methods Michael #4: PEP 690 – Lazy Imports From Itamar Discussion at https://discuss.python.org/t/pep-690-lazy-imports/15474 PEP proposes a feature to transparently defer the execution of imported modules until the moment when an imported object is used. PEP 8 says imports go a the top, that means you pay the full price of importing code This means that importing the main module of a program typically results in an immediate cascade of imports of most or all of the modules that may ever be needed by the program. Lazy imports also mostly eliminate the risk of import cycles or crashes. The implementation in this PEP has already demonstrated startup time improvements up to 70% and memory-use reductions up to 40% on real-world Python CLIs. Brian #5: Two small items pytest-rich Suggested by Brian Skinn Created by Bruno Oliveira as a proof of concept pytest + rich, what’s not to love? Now we just need a maintainer or two or three…. Embedding images in GitHub README Suggested by Henrik Finsberg Video by Anthony Sottile This is WITHOUT putting the image in the repo. Upload or drop an image to an issue comment. Don’t save the comment, just wait for GitHub to upload it to their CDN. GH will add a markdown link in the comment text box with a link to the now uploaded image. Now you can use that image in a README file. You can do the same while editing the README in the online editor. Ben #6: pyotp A library for generating and verifying one-time passwords (OTP). Helpful for implementing multi-factor authentication (MFA) in web applications. Supports HMAC-based one-time passwords (HOTP) and time-based one-time passwords (TOTP). While HOTP delivered via SMS text messages is a common approach to implementing MFA, SMS is not really secure. TOTP using an authenticator app on the user’s device such as Google Authenticator or Microsoft Authenticator is more secure, fairly easy to implement, and free (no SMS messaging fees and multiple free authenticator apps available for users). TOTP works best by making a QR code available to simplify the setup for the user in their authenticator app. Lots of easy to implement QR code generators to choose from (qrcode is a popular one if you use javascript on the front end). TOTP quick reference: import pyotp def generate_shared_secret(): # securely store this shared secret with user account data return pyotp.random_base32() def generate_provisioning_uri(secret, email): # generate uri for a QR code from the user's shared secret and email address return pyotp.totp.TOTP(secret).provisioning_uri(name=email, issuer_name='YourApp') def verify_otp(secret, otp): # verify user's one-time password entry with their shared secret totp = pyotp.TOTP(secret) return totp.verify(otp) Extras Brian: PyConUS 2022 videos now up A few more Python related extensions for VSCode black, pylint, isort, and Jupyter PowerToys Work has begun on a pytest course Saying this in public to inspire me to finish it. No ETA yet Sad Python Girls Club podcast Michael: PyTorch M1 Mission Encodable PWAs and pyscript Michael's now released pyscript PWA YouTube video cal.com (open source calendly) Supabase (open source Firebase) Joke: Beginner problems
#284 Spicy git for Engineers
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Brian #1:distinctipy “distinctipy is a lightweight python package providing functions to generate colours that are visually distinct from one another.” Small, focused tool, but really cool. Say you need to plot a dynamic number of lines. Why not let distinctipy pick colors for you that will be distinct? Also can display the color swatches. Some example palettes here: https://github.com/alan-turing-institute/distinctipy/tree/main/examples from distinctipy import distinctipy # number of colours to generate N = 36 # generate N visually distinct colours colors = distinctipy.get_colors(N) # display the colours distinctipy.color_swatch(colors) Michael #2: Soda SQL Soda SQL is a free, open-source command-line tool. It utilizes user-defined input to prepare SQL queries that run tests on dataset in a data source to find invalid, missing, or unexpected data. Looks good for data pipelines and other CI/CD work! Daniel #3: Python in Nature There’s a review article from Sept 2020 on array programming with NumPy in the research journal Nature. For reference, in grad school we had a fancy paper on quantum entanglement that got rejected from Nature Communications, a sub-journal to Nature. Nature is hard to get into. List of authors includes Travis Oliphant who started NumPy. Covers NumPy as the foundation, building up to specialized libraries like QuTiP for quantum computing. If you search “Python” on their site, many papers come up. Interesting to see their take on publishing software work. Brian #4: Supercharging GitHub Actions with Job Summaries From a tweet by Simon Willison and an article: GH Actions job summaries Also, Ned Batchelder is using it for Coverage reports “You can now output and group custom Markdown content on the Actions run summary page.” “Custom Markdown content can be used for a variety of creative purposes, such as: Aggregating and displaying test results Generating reports Custom output independent of logs” Coverage.py example: - name: "Create summary" run: | echo '### Total coverage: ${{ env.total }}%' >> $GITHUB_STEP_SUMMARY echo '[${{ env.url }}](${{ env.url }})' >> $GITHUB_STEP_SUMMARY Michael #5:Language Summit is write up out via Itamar, by Alex Waygood Python without the GIL: A talk by Sam Gross Reaching a per-interpreter GIL: A talk by Eric Snow The "Faster CPython" project: 3.12 and beyond: A talk by Mark Shannon WebAssembly: Python in the browser and beyond: A talk by Christian Heimes F-strings in the grammar: A talk by Pablo Galindo Salgado Cinder Async Optimisations: A talk by Itamar Ostricher The issue and PR backlog: A talk by Irit Katriel The path forward for immortal objects: A talk by Eddie Elizondo and Eric Snow Lightning talks, featuring short presentations by Carl Meyer, Thomas Wouters, Kevin Modzelewski, Samuel Colvin and Larry Hastings Daniel #6:AllSpice is Git for EEs Software engineers have Git/SVN/Mercurial/etc None of the other engineering disciplines (mechanical, electrical, optical, etc), have it nearly as good. Altium has their Vault and “365,” but there’s nothing with a Git-like UX. Supports version history, diffs, all the things you expect. Even self-hosting and a Gov Cloud version. “Bring your workflow to the 21st century, finally.” Extras Brian: Will McGugan talks about Rich, Textual, and Textualize on Test & Code 188 Also 3 other episodes since last week. (I have a backlog I’m working through.) Michael: Power On-Xbox Documentary | Full Movie The 4 Reasons To Branch with Git - Illustrated Examples with Python A Python spotting - via Jason Pecor 2022 StackOverflow Developer Survey is live, via Brian TextSniper macOS App PandasTutor on webassembly Daniel: I know Adafruit’s a household name, shout-out to Sparkfun, Seeed Studio, OpenMV, and other companies in the field. Joke: A little awkward
#283 The sports episode
Watch the live stream: Watch on YouTube About the show Sponsored: RedHat: Compiler Podcast Special guest: Tonya Sims Michael #1: Pathy: a Path interface for local and cloud bucket storage via Spencer Pathy is a python package (with type annotations) for working with Cloud Bucket storage providers using a pathlib interface. It provides an easy-to-use API bundled with a CLI app for basic file operations between local files and remote buckets. It enables a smooth developer experience by letting developers work against the local file system during development and only switch over to live APIs for deployment. Also has optional local file caching. From Spenser The really cool function is "Pathy.fluid" which can take any type of local, GCS, or S3 path string and then just give you back a Path object that you can interact with agnostic of what platform it was. So this has worked amazingly for me in local testing since i can just change the file path from the "s3://bucket/path" that i use in prod to a local "test_dir/path" and it works automatically. Brian #2: Robyn “Robyn is a fast, high-performance Python web framework with a Rust runtime.” Hello, Robyn! - intro article docs, repo Neat things doesn’t need WSGI or ASGI async very Flask-like Early, so still needs some TLC docs, etc. getting started and demo apps would be good. Tonya #3: Python package 'nba_api' is a package to access data for NBA.com This package is maintained by Swar Patel API Client package for NBA.com, more accessible endpoints, and better documentation The NBA.com API's are not well documented and change frequently (player traded, injured, retired, points per game, stats, etc) The nba_api package has tons of features: The nba_api starts with static data on players and teams (Full name, team name, etc). Each player and Team has an id. Can get game data from the playergamelog API endpoint The package also has many different API endpoints that it can hit by passing in features from the static data to the API endpoints as parameters Michael #4: Termshot From Jay Miller Creates screenshots based on terminal command output Just run termshot YOUR_CMD or termshot --show-cmd -- python program.py Even termshot /bin/zsh for full interactive “recording” Example I made: Brian #5: When Python can’t thread: a deep-dive into the GIL’s impact Itamar Turner-Trauring Building a mental model of the GIL using profiler graphs of simple two thread applications. The graphs really help a lot to see when the CPU is active or waiting on each thread. Tonya #6: Sportsipy: A free sports API written for python Free python API that pulls the stats from www.sports-reference.com sports-reference.com - great website for getting sports stats for professional sports(NBA, NFL, NHL, MLB, college sports) Looks like an HTML website for the 90s - great for scraping (email site owners) You can get API queries for every sport (North American sports) like the list of teams for that sport, the date and time of a game, the total number of wins for a team during the season, and many more team-related metrics. You can also get stats from players and box scores - so you can build cool stuff around how a team performed during a game or during a season. Extras Michael: Python 3.11.0 beta 1 is out Test with GitHub Actions against Python 3.11 Joke:Finding my family
#282 Don't Embarrass Me in Front of The Wizards
Watch the live stream: Watch on YouTube About the show Sponsored by us! Support our work through: Our courses at Talk Python Training Test & Code Podcast Patreon Supporters Brian #1: pyscript Python in the browser, from Anaconda. repo here Announced at PyConUS “During a keynote speech at PyCon US 2022, Anaconda’s CEO Peter Wang unveiled quite a surprising project — PyScript. It is a JavaScript framework that allows users to create Python applications in the browser using a mix of Python and standard HTML. The project’s ultimate goal is to allow a much wider audience (for example, front-end developers) to benefit from the power of Python and its various libraries (statistical, ML/DL, etc.).” from a nice article on it, PyScript — unleash the power of Python in your browser PyScript is built on Pyodide, which is a port of CPython based on WebAssembly. Demos are cool. Note included in README: “This is an extremely experimental project, so expect things to break!” Michael #2: Memray from Bloomberg Memray is a memory profiler for Python. It can track memory allocations in Python code native extension modules the Python interpreter itself Works both via CLI and focused app calls Memray can help with the following problems: Analyze allocations in applications to help discover the cause of high memory usage. Find memory leaks. Find hotspots in code which cause a lot of allocations. Notable features: 🕵️♀️ Traces every function call so it can accurately represent the call stack, unlike sampling profilers. ℭ Also handles native calls in C/C++ libraries so the entire call stack is present in the results. 🏎 Blazing fast! Profiling causes minimal slowdown in the application. Tracking native code is somewhat slower, but this can be enabled or disabled on demand. 📈 It can generate various reports about the collected memory usage data, like flame graphs. 🧵 Works with Python threads. 👽🧵 Works with native-threads (e.g. C++ threads in native extensions) Has a live view in the terminal. Linux only Brian #3: pytest-parallel I’ve often sped up tests that can be run in parallel by using -n from pytest-xdist. I was recommending this to someone on Twitter, and Bruno Oliviera suggested a couple of alternatives. One was pytest-parallel, so I gave it a try. pytest-xdist runs using multiprocessing pytest-parallel uses both multiprocessing and multithreading. This is especially useful for test suites containing threadsafe tests. That is, mostly, pure software tests. Lots of unit tests are like this. System tests are often not. Use --workers flag for multiple processors, --workers auto works great. Use --tests-per-worker for multi-threading. --tesst-per-worker auto let’s it pick. Very cool alternative to xdist. - Michael #4: Pooch: A friend for data files via via Matthew Fieckert Just want to download a file without messing with requests and urllib? Who is it for? Scientists/researchers/developers looking to simply download a file. Pooch makes it easy to download a file (one function call). On top of that, it also comes with some bonus features: Download and cache your data files locally (so it’s only downloaded once). Make sure everyone running the code has the same version of the data files by verifying cryptographic hashes. Multiple download protocols HTTP/FTP/SFTP and basic authentication. Download from Digital Object Identifiers (DOIs) issued by repositories like figshare and Zenodo. Built-in utilities to unzip/decompress files upon download file_path = pooch.retrieve(url) Extras Michael: New course! Up and Running with Git - A Pragmatic, UI-based Introduction. Joke: Don’t embarrass me in front of the wizards Michael’s crashing github is embarrassing him in front of the wizards!
#281 ohmyzsh + ohmyposh + mcfly + pls + nerdfonts = wow
Watch the live stream: Watch on YouTube About the show Sponsored: RedHat: Compiler Podcast Special guest: Anna Astori Michael #1: Take Your Github Repository To The Next Level 🚀️ Step 0. Make Your Project More Discoverable Step 1. Choose A Name That Sticks Step 2. Display A Beautiful Cover Image Step 3. Add Badges To Convey More Information Step 4. Write A Convincing Description Step 5. Record Visuals To Attract Users 👀 Step 6. Create A Detailed Installation Guide (if needed) Step 7. Create A Practical Usage Guide 🏁 Step 8. Answer Common Questions Step 9. Build A Supportive Community Step 10. Create Contribution Guidelines Step 11. Choose The Right License Step 12. Plan Your Future Roadmap Step 13. Create Github Releases (know release drafter) Step 14. Customize Your Social Media Preview Step 15. Launch A Website Brian #2: Fastero “Python timeit CLI for the 21st century.” Arian Mollik Wasi, @wasi_master Colorful and very usable benchmarking/comparison tool Time or Compare one ore more code snippet python file mix and match, even Allows setup code before snippets run Multiple output export formats: markdown, html, csv, json, images, … Lots of customization possible Takeaway especially for comparing two+ options, this is super handy Anna #3: langid vs langdetect langdetect This library is a direct port of Google's language-detection library from Java to Python langdetect supports 55 languages out of the box (ISO 639-1 codes): Basic usage: detect() and detect_langs() great to work with noisy data like social media and web blogs being statistical, works better on larger pieces of text vs short posts langid hasn't been updated for a few years 97 languages can use Python's built-in wsgiref.simple_server (or fapws3 if available) to provide language identification as a web service. To do this, launch python langid.py -s, and access http://localhost:9008/detect . The web service supports GET, POST and PUT. the actual calculations are implemented in the log-probability space but can also have a "confidence" score for the probability prediction between 0 and 1: > from langid.langid import LanguageIdentifier, model > identifier = LanguageIdentifier.from_modelstring(model, norm_probs=True) > identifier.classify("This is a test") > ('en', 0.9999999909903544) - minimal dependencies - relatively fast - NB algo, can train on user data. Michael #4: Watchfiles by Samual Colvin (of Pydantic fame) Simple, modern and high performance file watching and code reload in python. Underlying file system notifications are handled by the Notify rust library. Supports sync watching but also async watching CLI example Running and restarting a command¶ Let's say you want to re-run failing tests whenever files change. You could do this with watchfiles using Running a command: watchfiles 'pytest --lf``' Brian #5: Slipcover: Near Zero-Overhead Python Code Coverage From coverage.py twitter account, which I’m pretty sure is Ned Bachelder coverage numbers with “3% or less overhead” Early stages of the project. It does seem pretty zippy though. Mixed results when trying it out with a couple different projects flask: just pytest: 2.70s with slipcover: 2.88s with coverage.py: 4.36s flask with xdist n=4 pytest: 2.11 s coverage: 2.60s slipcover: doesn’t run (seems to load pytest plugins) Again, still worth looking at and watching. It’s good to see some innovation in the coverage space aside from Ned’s work. Anna #6: scrapy vs robox scra-py shell to try out things: fetch url, view response object, response.text extract using css selectors or xpath lets you navigate between levels e.g. the parent of an element with id X crawler to crawl websites and spider to extract data startproject for project structure and templates like settings and pipelines some advanced features like specifying user-agents etc for large scale scraping. various options to export and store the data nice features like LinkExtractor to determine specific links to extract, already deduped. FormRequest class robox layer on top of httpx and beautifulsoup4 allows to interact with forms on pages: check, choose, submit Extras Michael: ohmyzsh + ohmyposh + mcfly + iterm2 + pls + nerdfonts = wow Watch the video we discussed here Joke: Out for a byte
#280 Easy terminal scripts by sourcing your Py
Watch the live stream: Watch on YouTube About the show Sponsored by Mergify! Special guest: Pat Decker Michael #0: New live stream / recording time: 12pm US PT on Tuesdays. Please subscribe to our YouTube channel to get notified and be part of the episodes. Brian #1: BTW, don’t make a public repo private How we lost 54k GitHub stars Jakub Roztočil HTTPie kinda sorta accidentally flipped their main repo to private for a sec. And dropped the star count from 54k to 0 oops They’re back up to 16k, as of today. But ouch. “HTTPie is a command-line HTTP client. Its goal is to make CLI interaction with web services as human-friendly as possible. HTTPie is designed for testing, debugging, and generally interacting with APIs & HTTP servers. The http & https commands allow for creating and sending arbitrary HTTP requests. They use simple and natural syntax and provide formatted and colorized output.” Actually, pretty cool tool to use for developing and testing APIs. Michael #2: The counter-intuitive rise of Python in scientific computing via Galen Swint In our laboratory, a polarizing debate rages since around 2010, summarized by this question: Why are more and more time-critical scientific computations formerly performed in Fortran now written in Python, a slower language? Python has the reputation of being slow, i.e. significantly slower than compiled languages such as Fortran, C or Rust. So yes, plain Python is much slower than Fortran. However, this comparison makes little sense, as scientific uses of Python do not rely on plain Python. Used the right way, Python is slightly slower than compiled code. Pat #3: Meta donates $300,000 to PSF to add a second year for the Developer in Residence Brian #4: Dashboards in Python Two suggestions from Marc Skov Madsen The Easiest Way to Create an Interactive Dashboard in Python Sophia Yang & Mark Skov Madsen Includes animated gif showing the dashboard video of Sophia walking through the article in under 6 minutes “Turn Pandas pipelines into a dashboard using hvPlot .interactive" hvPlot is part of HoloViz and this example is pretty short and amazing to get a great dashboard with controls up very quickly. Python Dashboarding Shootout and Showdown | PyData Global 2021 5 speakers, 4 dashboard libraries, nice for comparison. Nice clickable index posted by Duy Nguyen 00:00 - Begin and Welcome 03:15 - Intro to the 4 Dashboarding libraries 07:04 - Plotly - Nicolas Kruchten 22:01 - Panel - Marc Skov Madsen 37:38 - voila - Sylvain Corlay 51:36 - Streamlit - Adrien Treuille 01:10:52 - Discussion Topics Michael #5: sourcepy by Dave Chevell Sourcepy lets you source python scripts natively inside your shell Imagine a Python script with functions in it. This converts those to CLI commands (kind of like entrypoints, but simpler) Type hints can be used to coerce input values into their corresponding types. standard IO type hints can be used to target stdin at different arguments and to receive the sys.stdin Sourcepy has full support for asyncio syntax Pat #6: Xonsh Xonsh Shell Combines the Best of Bash Shell and Python in Linux Terminal Awesome demo video (50 min) https://youtu.be/x85LSyCxiw8 Extras Pat: Donate to the PSF by using https://rewards.microsoft.com Joke: Can you really quit vim? Joke: Forgetting how to count
#279 Autocorrect and other Git Tricks
Watch the live stream: Watch on YouTube About the show Sponsored by Datadog: pythonbytes.fm/datadog Special guest: Brian Skinn (Twitter | Github) Michael #1: OpenBB wants to be an open source challenger to Bloomberg Terminal OpenBB Terminal provides a modern Python-based integrated environment for investment research, that allows an average joe retail trader to leverage state-of-the-art Data Science and Machine Learning technologies. As a modern Python-based environment, OpenBBTerminal opens access to numerous Python data libraries in Data Science (Pandas, Numpy, Scipy, Jupyter) Machine Learning (Pytorch, Tensorflow, Sklearn, Flair) Data Acquisition (Beautiful Soup, and numerous third-party APIs) They have a discord community too BTW, seem to be a successful open source project: OpenBB Raises $8.5M in Seed Round Funding Following Open Source Project Gamestonk Terminal's Success Great graphics / gallery here. Way more affordable than the $1,900/mo/user for the Bloomberg Terminal Brian #2: Python f-strings https://fstring.help Florian Bruhin Quick overview of cool features of f-strings, made with Jupyter Python f-strings Are More Powerful Than You Might Think Martin Heinz More verbose discussion of f-strings Both are great to up your string formatting game. Brian S. #3: pyproject.toml and PEP 621 Support in setuptools PEP 621: “Storing project metadata in pyproject.toml” Authors: Brett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung (Jun-Oct 2020) Covers build-tool-independent fields (name, version, description, readme, authors, etc.) Various tools had already implemented pyproject.toml support, but not setuptools Including: Flit, Hatch, PDM, Trampolim, and Whey (h/t: Scikit-HEP) Not Poetry yet, though it's under discussion setuptools support had been discussed pretty extensively, and had been included on the PSF’s list of fundable packaging improvements Initial experimental implementation spearheaded by Anderson Bravalheri, recently completed Seeking testing and bug reports from the community (Discuss thread) I tried it on one of my projects — it mostly worked, but revealed a bug that Anderson fixed super-quick (proper handling of a dynamic long_description, defined in setup.py) Related tools (all early-stage/experimental AFAIK) ini2toml (Anderson Bravalheri) — Can convert setup.cfg (which is in INI format) to pyproject.toml Mostly worked well for me, though I had to manually fix a couple things, most of which were due to limitations of the INI format INI has no list syntax! validate-pyproject (Anderson Bravalheri) — Automated pyproject.toml checks pyproject-fmt (Bernát Gábor) — Autoformatter for pyproject.toml Don’t forget to use it with build, instead of via a python setup.py invocation! $ pip install build $ python -m build Will also want to constrain your setuptools version in the build-backend.requires key of pyproject.toml (you are using PEP517/518, right??) Michael #4: JSON Web Tokens @ jwt.io JSON Web Tokens are an open, industry standard RFC 7519 method for representing claims securely between two parties. Basically a visualizer and debugger for JWTs Enter an encoded token Select a decryption algorithm See the payload data verify the signature List of libraries, grouped by language Brian #5: Autocorrect and other Git Tricks - Waylon Walker - Use `git config --global help.autocorrect 10` to have git automatically run the command you meant in 1 second. The `10` is 10 x 1/10 of a second. So `50` for 5 seconds, etc. Automatically set upstream branch if it’s not there git config --global push.default current You may NOT want to do this if you are not careful with your branches. From https://stackoverflow.com/a/22933955 git commit -a Automatically “add” all changed and deleted files, but not untracked files. From https://git-scm.com/docs/git-commit#Documentation/git-commit.txt--a Now most of my interactions with git CLI, especially for quick changes, is: $ git checkout main $ git pull $ git checkout -b okken_something $ git commit -a -m 'quick message' $ git push With these working, with autocorrect $ git chkout main $ git pll $ git comit -a -m 'quick message' $ git psh Brian S. #6: jupyter-tempvars Jupyter notebooks are great, and the global namespace of the Python kernel backend makes it super easy to flow analysis from one cell to another BUT, that global namespace also makes it super easy to footgun, when variables leak into/out of a cell when you don’t want them to jupyter-tempvars notebook extension Built on top of the tempvars library, which defines a TempVars context manager for handling temporary variables When you create a TempVars context manager, you provide it patterns for variable names to treat as temporary In its simplest form, TempVars (1) clears matching variables from the namespace on entering the context, and then (2) clears them again upon exiting the context, and restoring their prior values, if any TempVars works great, but it’s cumbersome and distracting to manually include it in every notebook cell where it’s needed With jupyter-tempvars, you instead apply tags with a specific format to notebook cells, and the extension automatically wraps each cell’s code in a TempVars context before execution Javascript adapted from existing extensions Patching CodeCell.execute, from the jupyter_contrib_nbextensions ‘Execution Dependencies’ extension, to enclose the cell code with the context manager Listening for the ‘kernel ready’ event, from [jupyter-black](https://github.com/drillan/jupyter-black/blob/d197945508a9d2879f2e2cc99cafe0cedf034cf2/kernel_exec_on_cell.js#L347-L350), to import the [TempVars](https://github.com/bskinn/jupyter-tempvars/blob/491babaca4f48c8d453ce4598ac12aa6c5323181/src/jupyter_tempvars/extension/jupyter_tempvars.js#L42-L46) context manager upon kernel (re)start See the README (with animated GIFs!) for installation and usage instructions It’s on PyPI: $ pip install jupyter-tempvars And, I made a shortcut install script for it: $ jupyter-tempvars install && jupyter-tempvars enable Please try it out, find/report bugs, and suggest features! Future work Publish to conda-forge (definitely) Adapt to JupyterLab, VS Code, etc. (pending interest) Extras Brian: Ok. Python issues are now on GitHub. Seriously. See for yourself. Lorem Ipsum is more interesting than I realized. O RLY Cover Generator Example: Michael: New course: Secure APIs with FastAPI and the Microsoft Identity Platform Pyenv Virtualenv for Windows (Sorta'ish) Hipster Ipsum Brian S.: PSF staff is expanding PSF hiring an Infrastructure Engineer Link now 404s, perhaps they’ve made their hire? Last year’s hire of the Packaging Project Manager (Shamika Mohanan) Steering Council supports PSF hiring a second developer-in-residence PSF has chosen its new Executive Director: Deb Nicholson! PyOhio 2022 Call for Proposals is open Teaser tweet for performance improvements to pydantic Jokes: https://twitter.com/CaNerdIan/status/1512628780212396036 https://www.reddit.com/r/ProgrammerHumor/comments/tuh06y/i_guess_we_all_have_been_there/ https://twitter.com/PR0GRAMMERHUM0R/status/1507613349625966599
#278 Multi-tenant Python applications
Watch the live stream: Watch on YouTube About the show Sponsored by: Microsoft for Startups Founders Hub. Special guest: Vuyisile Ndlovu Brian #1: dunk - a prettier git diff Darren Burns Uses Rich “⚠️ This project is very early stages” - whatever, I like it. Recommendation is to use less as a pager for it git diff | dunk | less -R Michael #2: Is your Python code vulnerable to log injection? via Adam Parkin Let’s just appreciate log4jmemes.com for a moment Ok, now we can talk about Python We can freak our the logging with line injection "hello'.\nINFO:__main__:user 'alice' commented: 'I like pineapple pizza" Results in two lines for one statement INFO:__main__:user 'bob' commented: 'hello'. INFO:__main__:user 'alice' commented: 'I like pineapple pizza'. The safest solution is to simply not log untrusted text. If you need to store it for an audit trail, use a database. Alternatively, structured logging can prevent newline-based attacks. Padding a ton? One such case is abusing padding syntax. Consider this message: *"%(user)999999999s"* This will pad the user with almost a gigabyte of whitespace. Mitigation: To eliminate these risks, you should always let logging handle string formatting. See this discussion: Safer logging methods for f-strings and new-style formatting Vuyisile #3: Building multi tenant applications with Django Free book by Agiliq, covers different approaches to building Software as a service applications in Python/Django. Covers four approaches to multi tenancy, namely: Shared database with shared schema Shared database with isolated schema Isolated database with a shared app server Completely isolated tenants using Docker Brian #4: Should you pre-allocate lists in Python? Redowan Delowar Discussion of 3 ways to build up a list Start empty and append: l=[]; l.append(1); … Pre-allocate: l = [None] * 10_000; … List comprehension: l = [i for i in range(10_000)] Interesting discussion and results The times (filling the list with the index): append: 499 µs ± 1.23 µs pre-allocate: 321 µs ± 71.1 comprehension: 225 µs ± 711 Python lists dynamically allocate extra memory when they run out, and it’s pretty fast at doing this. Pre-allocation can save a little time. Conclusion: use comprehensions when you can, otherwise, don’t sweat it unless you really need to shave off as much time as possible Of note: this was just measuring time, no discussion of memory usage. Michael #5: mockaroo and tonic Do you need to generate fake data? Mockaroo let’s you generate realistic data based data types (car registrations, credit cards, dates, etc) Tonic takes your actual production data and reworks it into test data (possibly striping out PII) Vuyisile #6: Brachiograph —the cheapest, simplest possible Python powered pen plotter by Daniele Procida Low tech Raspberry Pi project that can be built for < $50 using common household objects like a clothes peg ice cream stick Extras Brian: April 8 new date for Python Issues migrating to GH Michael: ngrok has a detailed web explorer Vuyisile: Thunder Client : VS Code extension, Lightweight client for testing REST APIs Postman alternative Joke: Linux world in tatters Related: Origin of the joke - Lapsus$ claims to leak 90% of Microsoft Bing's source code
#277 It's a Python package showdown!
Watch the live stream: Watch on YouTube About the show Sponsored by: Microsoft for Startups Founders Hub. Special guest: Thomas Gaigher, creator/maintainer pypyr taskrunner Michael #1: March Package Madness via Chris May Start with 16 packages They battle it out 2-on-2 in elimination rounds Voting is once a week So go vote! Brian #2: nbpreview “A terminal viewer for Jupyter notebooks. It’s like cat for ipynb files.” Some cool features pretty colors by default piping strips formatting, so you can pass it to grep or other post processing automatic paging syntax highlighting line numbers and wrapping work nicely markdown rendering images converted to block, character, or dots (braille) dataframe rendering clickable links Thomas #3: pyfakefs A fake file system! It intercepts all calls that involve the filesystem in Python - e.g open(), shutil, or pathlib.Path. This is completely transparent - your functional code does not know or need to know that under the hood it's been disconnected from the actual filesystem. The nice thing about this is that you don't have to go patching open using mock_open - which works fine, but gets annoying quickly for more complex test scenarios. E.g Doing a mkdir -p before a file write to ensure parent dirs exist. What it looks like without a fake filesystem: in_bytes = b"""[table] foo = "bar" # String """ # read with patch('pypyr.toml.open', mock_open(read_data=in_bytes)) as mocked_open: payload = toml.read_file('arb/path.in') # write with io.BytesIO() as out_bytes: with patch('pypyr.toml.open', mock_open()) as mock_output: mock_output.return_value.write.side_effect = out_bytes.write toml.write_file('arb/out.toml', payload) out_str = out_bytes.getvalue().decode() mock_output.assert_called_once_with('arb/out.toml', 'wb') assert out_str == """[table] foo = "bar" """ If you've ever tried to patch/mock out pathlib, you'll know the pain! Also, no more annoying test clean-up routines or tempfile - as soon as the fake filesystem goes out of scope, it's gone, no clean-up required. Not a flash in the pan - long history: originally developed by Mike Bland at Google back in 2006. Open sourced in 2011 on Google Code. Moved to Github and nowadays maintained by John McGehee. This has been especially useful for pypyr, because as a task-runner or automation tool pypyr deals with wrangling config files on disk a LOT (reading, generating, editing, token replacing, globs, different encodings), so this makes testing so much easier. Especially to keep on hitting the 100% test coverage bar! Works great with pytest with the provided fs fixture. Just add the fs fixture to a test, and all code under test will use the fake filesystem. Dynamically switch between Linux, MacOs & Windows filesystems. Set up paths/files in your fake filesystem as part of test setup with some neat helper functions. Very responsive maintainers - I had a PR merged in less than half a day. Shoutout to mrbean-bremen. Docs here: http://jmcgeheeiv.github.io/pyfakefs/release/ Github here: https://github.com/jmcgeheeiv/pyfakefs Real world example: @patch('pypyr.config.config.default_encoding', new='utf-16') def test_json_pass_with_encoding(fs): """Relative path to json should succeed with encoding.""" # arrange in_path = './tests/testfiles/test.json' fs.create_file(in_path, contents="""{ "key1": "value1", "key2": "value2", "key3": "value3" } """, encoding='utf-16') # act context = pypyr.parser.jsonfile.get_parsed_context([in_path]) # assert assert context == { "key1": "value1", "key2": "value2", "key3": "value3" } def test_json_parse_not_mapping_at_root(fs): """Not mapping at root level raises.""" # arrange in_path = './tests/testfiles/singleliteral.json' fs.create_file(in_path, contents='123') # act with pytest.raises(TypeError) as err_info: pypyr.parser.jsonfile.get_parsed_context([in_path]) # assert assert str(err_info.value) == ( "json input should describe an object at the top " "level. You should have something like\n" "{\n\"key1\":\"value1\",\n\"key2\":\"value2\"\n}\n" "at the json top-level, not an [array] or literal.") Michael #4: strenum A Python Enum that inherits from str. To complement enum.IntEnum in the standard library. Supports python 3.6+. Example usage: class HttpMethod(StrEnum): GET = auto() POST = auto() PUT = auto() DELETE = auto() assert HttpMethod.GET == "GET" Use wherever you can use strings, basically: ## You can use StrEnum values just like strings: import urllib.request req = urllib.request.Request('https://www.python.org/', method=HttpMethod.HEAD) with urllib.request.urlopen(req) as response: html = response.read() Can auto-translate casing with LowercaseStrEnum and UppercaseStrEnum. Brian #5: Code Review Guidelines for Data Science Teams Tim Hopper Great guidelines for any team What is code review for? correctness, familiarity, design feedback, mutual learning, regression protection NOT opportunities for reviewer to impose their idiosyncrasies dev to push correctness responsibility to reviewers demands for perfection Opening a PR informative commit messages consider change in context of project keep them short write a description that helps reviewer include tests with new code Reviewing Wait for CI before starting I would also add “wait at least 10 min or so, requester might be adding comments” Stay positive, constructive, helpful Clarify when a comment is minor or not essential for merging, preface with “nit:” for example If a PR is too large, ask for it to be broken into smaller ones What to look for does it look like it works is new code in the right place unnecessary complexity tests Thomas #6: Shell Power is so over. Leave the turtles in the late 80ies. Partly inspired by/continuation of last week’s episode’s mention of running subprocesses from Python. Article by Itamar Turner-Trauring Please Stop Writing Shell Scripts https://pythonspeed.com/articles/shell-scripts/ Aims mostly at bash, but I'll happily include bourne, zsh etc. under the same dictum If nothing else, solid listing of common pitfalls/gotchas with bash and their remedies, which is educational enough in and of itself already. TLDR; Error handling in shell is hard, but also surprising if you're not particularly steeped in the ways of the shell. Error resumes next, unset vars don't raise errors, piping & sub shells errs thrown away If you really-eally HAVE to shell, you prob want this boilerplate on top (aka unofficial bash strict mode: #!/bin/bash set -euo pipefail IFS=$'\n\t' This will, -e: fail immediately on error -u: fail on Unset vars -o pipefail: raise immediately when piping IFS: set Internal Field Separator to newline | tab, rather than space | newline | tab. Prevents surprises when iterating over strings with spaces in them Itamar lists common counter-arguments from shell script die-hards: It's always there! But so is the runtime of whatever you're actually coding in, and in the case of a build CI server. . .almost by definition. Git gud! (I'm paraphrasing) Shell-check (linting for bash, basically) The article is short & sweet - mercifully so in these days of padded content. The rest is going to be me musing out loud, so don't blame the OG author. So expanding on this, I think there're a couple of things going on here: If anything, the author is going a bit soft on your average shell script. If you’re just calling a couple of commands in a row, okay, fine. But the moment you start worrying about retrying on failure, parsing some values into or out of some json, conditional branching - which, if you are writing any sort of automation script that interacts with other systems, you WILL be doing - shell scripts are an unproductive malarial nightmare. Much the same point applies to Makefile. It’s an amazing tool, but it’s also misused for things it was never really meant to do. You end up with Makefiles that call shell scripts that call Makefiles. . . Given that coding involves automating stuff, amazingly often the actual automation of the development process itself is deprioritized & unbudgeted. Sort of like the shoemaker's kid not having shoes. Partly because when management has to choose between shiny new features and automation, shiny new features win every time. Partly because techies will just "quickly" do a thing in shell to solve the immediate problem… Which then becomes part of the firmament like a dead dinosaur that fossilises and more and more inscrutable layers accrete on top of the original "simple" script. Partly because coders would rather get on with clever but marginal micro-optimisations and arguing over important stuff like spaces vs tabs, rather than do the drudge work of automating the development/deployment workflow. There's the glimmering of a point in there somewhere: when you have to choose between shiny new features & more backoffice automation, shiny new features probably win. Your competitiveness in the marketplace might well depend on this. BUT, we shouldn’t allow the false idea that shell scripts are "quicker" or "lighter touch" to sneak in there alongside the brutal commercial reality of trade-offs on available budget & time. If you have to automate quickly, it's more sensible to use a task-runner or just your actual programming language. If you're in python already, you're in luck, python's GREAT for this. Don’t confuse excellent cli programs like git , curl , awscli, sed or awk with a shell script. These are executables, you don’t need the shell to invoke these. Aside from these empirical factors, a couple of psychological factors also. Dealing with hairy shell scripts is almost a Technocratic rite of passage - coupled with imposter syndrome, it's easy to be intimidated by the Shell Bros who're steeped in the arcana of bash. It's the tech equivalent of "back in my day, we didn't even have <<>>", as if this is a justification for things being more difficult than they need to be ever thereafter. This isn't Elden Ring, the extra difficulty doesn't make it more fun. You're trying to get business critical work done, reliably & quickly, so you can get on with those new shiny features that actually pay the bills. Extras Michael: A changing of the guard Firefox → Vivaldi (here’s a little more info on the state of Firefox/Mozilla financially) (threat team is particularly troubling) Google email/drive/etc → Zoho @gmail.com to @customdomain.com Google search → DuckDuckGo BTW Calendar apps/integrations and email clients are trouble Joke: A missed opportunity - and cybersecurity
#276 Tracking cyber intruders with Jupyter and Python
Watch the live stream: Watch on YouTube About the show Sponsored by FusionAuth: pythonbytes.fm/fusionauth Special guest: Ian Hellen Brian #1: gensim.parsing.preprocessing Problem I’m working on Turn a blog title into a possible url example: “Twisted and Testing Event Driven / Asynchronous Applications - Glyph” would like, perhaps: “twisted-testing-event-driven-asynchrounous-applications” Sub-problem: remove stop words ← this is the hard part I started with an article called Removing Stop Words from Strings in Python It covered how to do this with NLTK, Gensim, and SpaCy I was most successful with remove_stopwords() from Gensim from gensim.parsing.preprocessing import remove_stopwords It’s part of a gensim.parsing.preprocessing package I wonder what’s all in there? a treasure trove gensim.parsing.preprocessing.preprocess_string is one this function applies filters to a string, with the defaults almost being just what I want: strip_tags() strip_punctuation() strip_multiple_whitespaces() strip_numeric() remove_stopwords() strip_short() stem_text() ← I think I want everything except this this one turns “Twisted” into “Twist”, not good. There’s lots of other text processing goodies in there also. Oh, yeah, and Gensim is also cool. topic modeling for training semantic NLP models So, I think I found a really big hammer for my little problem. But I’m good with that Michael #2: DevDocs via Loic Thomson Gather and search a bunch of technology docs together at once For example: Python + Flask + JavaScript + Vue + CSS Has an offline mode for laptops / tablets Installs as a PWA (sadly not on Firefox) Ian #3: MSTICPy MSTICPy is toolset for CyberSecurity investigations and hunting in Jupyter notebooks. What is CyberSec hunting/investigating? - responding to security alerts and threat intelligence reports, trawling through security logs from cloud services and hosts to determine if it’s a real threat or not. Why Jupyter notebooks? SOC (Security Ops Center) tools can be excellent but all have limitations You can get data from anywhere Use custom analysis and visualizations Control the workflow…. workflow is repeatable Open source pkg - created originally to support MS Sentinel Notebooks but now supports lots of providers. When I start this 3+ yrs ago I thought a lot this would be in PyPI - but no 😞 MSTICPy has 4 main functional areas: Data querying - import log data (Sentinel, Splunk, MS Defender, others…working on Elastic Search) Enrichment - is this IP Address or domain known to be malicious? Analysis - extract more info from data, identify anomalies (simple example - spike in logon failures) Visualization - more specialized than traditional graphs - timelines, process trees. All components use pandas, Bokeh for visualizations Current focus on usability, discovery of functionality and being able to chain Always looking for collaborators and contributors - code, docs, queries, critiques https://github.com/microsoft/msticpy https://msticpy.readthedocs.io/ Brian #4: The Right Way To Compare Floats in Python David Amos Definitely an easier read than the classic What Every Computer Scientist Should Know About Floating-Point Arithmetic What many of us remember floating point numbers aren’t exact due to representation limitations and rounding error, errors can accumulate comparison is tricky Be careful when comparing floating point numbers, even simple comparisons, like: >>> 0.1 + 0.2 == 0.3 False >>> 0.1 + 0.2 <= 0.3 False David has a short but nice introduction to the problems of representation and rounding. Three reasons for rounding more significant digits than floating point allows irrational numbers rational but non-terminating So how do you compare: math.isclose() be aware of rel_tol and abs_tol and when to use each. numpy.allclose(), returns a boolean comparing two arrays numpy.isclose(), returns an array of booleans pytest.approx(), used a bit differently 0.1 + 0.2 == pytest.approx(0.3) Also allows rel and abs comparisons Discussion of Decimal and Fraction types And the memory and speed hit you take on when using them. Michael #5: Pypyr Task runner for automation pipelines For when your shell scripts get out of hand. Less tricky than makefile. Script sequential task workflow steps in yaml Conditional execution, loops, error handling & retries Have a look at the getting started. Ian #6: Pygments Python package that’s useful for anyone who wants to display code Jupyter notebook Markdown and GitHub markdown let you display code with syntax highlighting. (Jupyter uses Pygments behind the scenes to do this.) There are tools that convert code to image format (PNG, JPG, etc) but you lose the ability to copy/paste the code Pygments can intelligently render syntax-highlighted code to HTML (and other formats) Applications: Documentation (used by Sphinx/ReadtheDocs) - render code to HTML + CSS Displaying code snippets dynamically in readable form Lots (maybe 100s) of code lexers - Python (code, traceback), Bash, C, JS, CSS, HTML, also config and data formats like TOML, JSON, XML Easy to use - 3 lines of code - example: from IPython.display import display, HTML from pygments import highlight from pygments.lexers import PythonLexer from pygments.formatters import HtmlFormatter code = """ def print_hello(who="World"): message = f"Hello {who}" print(message) """ display(HTML( highlight(code, PythonLexer(), HtmlFormatter(full=True, nobackground=True)) )) # use HtmlFormatter(style="stata-dark", full=True, nobackground=True) # for dark themes Output to HTML, Latex, image formats. We use it in MSTICPy for displaying scripts used in attacks. Example: Extras Brian: smart-open one of the 3 Gensim dependencies It’s for streaming large files, from really anywhere, and looks just like Python’s open(). Michael: Python 3.10.3 is out. git fixup (follow up from last week, via Adam Parkin) Joke: What’s your secret?