Find all Headings with BeautifulSoup ==================================== BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from... Date: February 1, 2022 BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from stack overflow, but who doesn't. ## Make an example _sample.html_ Lets make a sample.html file with the following contents. It mainly has some headings, `` and `` tags that I want to be able to find. ```html hello this is a paragraph second heading this is also a paragraph third heading this is the last paragraph ``` ## Get the headings with BeautifulSoup Lets import our packages, read in our `sample.html` using pathlib and find all headings using BeautifulSoup. ```python from bs4 import BeautifulSoup from pathlib import Path soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml") headings = soup.find_all(re.compile("^h[1-6]$")) ``` And what we get is a list of `bs4.element.Tag`'s. ```python >> print(headings) [hello , second heading , third heading ] ``` I recently added a heading_link plugin to markata, you might notice the 🔗's next to each heading on this page, that is powered by this exact technique.