Find all Headings with BeautifulSoup
====================================

BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from...

Date: February 1, 2022

BeautifulSoup is a DOM like library for python. It's quite useful to
manipulate html. Here is an example to find_all html headings. I stole
the regex from stack overflow, but who doesn't.

## Make an example
_sample.html_

Lets make a sample.html file with the following contents. It mainly has
some headings, `` and `` tags that I want to be able to find.

```html

 hello

 this is a paragraph

 second heading

 this is also a paragraph

 third heading

 this is the last paragraph

```

## Get the headings with BeautifulSoup

Lets import our packages, read in our `sample.html` using pathlib and find all
headings using BeautifulSoup.

```python
from bs4 import BeautifulSoup
from pathlib import Path

soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml")
headings = soup.find_all(re.compile("^h[1-6]$"))
```

And what we get is a list of `bs4.element.Tag`'s.

```python
>> print(headings)
[hello

, second heading

, third heading

]
```

I recently added a heading_link plugin to markata, you might notice the
🔗's next to each heading on this page, that is powered by this exact
technique.