From 12 – HTTP C – Python for Everybody Course – YouTube
Learning about the socket library, sockets, UTF 8, HTTP, HTML, and ASCII / Unicode

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    print(data.decode(),end='')
mysock.close()
From 12 – HTTP E – Python for Everybody Course – YouTube
Now using the urllib library in Python

import urllib.request
file_handle = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in file_handle:
    print(line.decode().strip())
From 12 – HTTP F – YouTube
Using Beautiful Soup
Beautiful Soup Documentation — Beautiful Soup 4.12.0 documentation (crummy.com)

My link-scraping script

import requests
from bs4 import BeautifulSoup

# asking for URL of the website you want to scrape
url = 'https://'+ input("Website please : ")

# Sending a GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract all the links on the page
    links = soup.find_all('a')

    # Print each link
    for link in links:
        print(link.get('href'))
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")

Installing libraries

pip install beautifulsoup4
pip install requests
pip install lxml

Categories: Project

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *