Web browsers and scraper in Python

From 12 – HTTP C – Python for Everybody Course – YouTube
Learning about the socket library, sockets, UTF 8, HTTP, HTML, and ASCII / Unicode

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    print(data.decode(),end='')
mysock.close()

From 12 – HTTP E – Python for Everybody Course – YouTube
Now using the urllib library in Python

import urllib.request
file_handle = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in file_handle:
    print(line.decode().strip())

From 12 – HTTP F – YouTube
Using Beautiful Soup
Beautiful Soup Documentation — Beautiful Soup 4.12.0 documentation (crummy.com)

My link-scraping script

import requests
from bs4 import BeautifulSoup

# asking for URL of the website you want to scrape
url = 'https://'+ input("Website please : ")

# Sending a GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract all the links on the page
    links = soup.find_all('a')

    # Print each link
    for link in links:
        print(link.get('href'))
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")

Installing libraries

pip install beautifulsoup4
pip install requests
pip install lxml

Web browsers and scraper in Python

Published by jamesrivers.tech on 23 November 202323 November 2023

My link-scraping script

0 Comments

Leave a Reply Cancel reply

Shift agent to iCalendar

Automated Shift Agent Notifier using Web Scraping

Lan HTTPS Chat Room

Web browsers and scraper in Python

Published by jamesrivers.tech on 23 November 202323 November 2023

My link-scraping script

0 Comments

Leave a Reply Cancel reply

Related Posts

Shift agent to iCalendar

Automated Shift Agent Notifier using Web Scraping

Lan HTTPS Chat Room