# SSP Spider and Notification Prototyping

Once again using Jupyter to prototype a solution. This uses the following python packages (installable via pip):

- Requests (HTTP)
- BS4 (HTML parsing)
- DiskCache (Results Cache)

If we only wanted to pull the data down via a notebook we could use pandas and just run something like:

```python
active_ssps = pd.read_html(ssp_url)[0]
expired_ssps = pd.read_html(ssp_url)[1]
```

We could write that to a CSV and have a script run with the CSV as an argument. The spider approach is slightly more complex but 'cleaner'. Having said that, I'm not going to judge someone doing the quick and dirty when the Pandas libraries are *that good* at parsing HTML tables.

In [1]:
import requests, pprint

from diskcache import Index
from bs4 import BeautifulSoup
from unicodedata import normalize


In [2]:
site = 'https://www.nhsbsa.nhs.uk'
base_url = f"{site}/pharmacies-gp-practices-and-appliance-contractors/serious-shortage-protocols-ssps"
active_ssps = Index('ssps/active')
expired_ssps = Index('ssps/expired')

In [3]:
request = requests.get(base_url, headers = {'User-agent': 'friendly_python'})
request.status_code

200

In [4]:
data = request.text
soup = BeautifulSoup(data, 'html.parser')
tables = soup.find_all('table')
len(tables)

2

## Active SSPs

Data we want to capture includes:

- SSP Name/Ref
- Link to SSP Name/Ref
- Start date
- End date
- If this was reactivated
- Supporting guidance
- Link to supporting guidance PDF

In [1]:
# I'm using normalize here because some of the drug names use funky Unicode

def extract_table(table, ssp_list):
    new_ssps = []
    for row in table.tbody.find_all('tr'):
        columns = row.find_all('td')
        if(columns != []):
            # SSP Name/Ref
            ssp_name = normalize('NFKD',columns[0].text.strip())
            ssp_link = normalize('NFKD',f"{site}{columns[0].find('a').get('href')}")
            # Start and End date
            dates = normalize('NFKD',columns[1].text.strip())
            ds = dates.split('\n')[0].split('to')
            #print(f"Splitting dates: [{dates}] | ds len: {len(ds)}")
            start_date = ds[0].strip()
            end_date = ds[1].strip()
            # Guidance
            guidance_name = normalize('NFKD',columns[2].text.strip())
            guidance_link = normalize('NFKD',f"{site}{columns[2].find('a').get('href')}")
            support = columns[2].text.strip()
            item = {
                'name': ssp_name,
                'url': ssp_link,
                'start_date': start_date,
                'end_date': end_date,
                'guidance': guidance_name,
                'guidance_url': guidance_link,
            }
            if not ssp_link in ssp_list:
                ssp_list[ssp_link] = item
                new_ssps.append(item)
            #print(item)
    return new_ssps

In [6]:
actives = extract_table(tables[0], active_ssps)
if len(actives) > 0:
    print("New Active SSPs")
    for i in actives:
        pprint.pp(i)

New Active SSPs
{'name': 'SSP061 Creon® 25000 capsules (PDF:193KB)',
 'url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2024-05/SSP061%20Creon%2025000%20restriction%20FINAL%2023052024%20-%20Signed.pdf',
 'start_date': '24 May 2024',
 'end_date': '22 November 2024',
 'guidance': 'Creon® 25000 capsules supporting guidance plus Q&A (PDF:125KB)',
 'guidance_url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2024-05/Endorsement%20Guidance%20SSP061%20Creon%2025000%20restriction%20FINAL%2024052024.pdf'}
{'name': 'SSP060 Creon® 10000 capsules (PDF:193KB)',
 'url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2024-05/SSP060%20Creon%2010000%20restriction%20FINAL%2023052024%20-%20Signed.pdf',
 'start_date': '24 May 2024',
 'end_date': '22 November 2024',
 'guidance': 'Creon® 10000 capsules supporting guidance plus Q&A (PDF:123KB)',
 'guidance_url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2024-05/Endorsement%20Guidance%20SSP060%20Creon%2010000%20restriction%20FINAL%2024052024.pdf'}


In [7]:
# And Expired SSPs
expireds = extract_table(tables[1], expired_ssps)
if len(expireds) > 0:
    print("Newly Expired SSPs")
    for i in expireds:
        pprint.pp(i)

Newly Expired SSPs
{'name': 'SSP059 Monomil® XL 60mg tablets (PDF:164KB)',
 'url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2024-03/Annex%20A%20SSP059%20Monomil%20XL%2060mg%20signed%2026032024.pdf',
 'start_date': '26 Mar 2024',
 'end_date': '07 Jun 2024',
 'guidance': 'Monomil® XL 60mg tablets supporting guidance plus Q&A '
             '(PDF:150KB)',
 'guidance_url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2024-03/Endorsement%20guidance%20SSP059%20Monomil%20XL%2060mg%2026032024.pdf'}
{'name': 'SSP058 Jext® 300micrograms/0.3ml (1 in 1000) solution for injection '
         'auto-injector pen (PDF:200KB)',
 'url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2023-08/SSP058%20Jext%20300mcg%20autoinjector.pdf',
 'start_date': '21 Aug 2023',
 'end_date': '27 Oct 2023',
 'guidance': 'Jext® 300mcg auto-injector supporting guidance plus Q&A '
             '(PDF:122KB)',
 'guidance_url': 'https://www.nhsbsa.nhs.uk/sites/default/files/2023-08/Endorsement%20guidance%20SSP058%20Jex

In [8]:
len(actives)

4

In [45]:
columns[2].text

'Creon®\xa025000 capsules\xa0supporting guidance plus Q&A (PDF:125KB)'