This Python script fetches and parses patch notes from the official Dota 2 website. The extracted information includes the patch number, release timestamp, general notes, item changes, and hero changes (including new additions).
-
Fetches patch notes from multiple versions.
-
Handles different HTML structures for various patch note pages.
-
Extracts the following data from each patch note:
- Patch number
- Release timestamp (in epoch format)
- General notes
- Item changes
- Hero changes (including new heroes)
The script depends on the following Python libraries:
- requests: For sending HTTP requests.
- bs4 (BeautifulSoup): For parsing HTML.
- datetime: For converting date strings to epoch timestamps.
- json: For handling JSON data.
You can install these dependencies using pip:
pip install requests bs4 datetime jsonFirst, define a list of patch versions you want to scrape. For example:
patch_versions = ["6.00","6.01","6..."]Then, run the script:
python patch_scraper.pyThe script will fetch and parse the patch notes for each version, then save the data to a JSON file named patch_notes.json.
The output is a JSON file named patch_notes.json, which contains an array of patch note data. Each element in the array is a dictionary with the following structure:
{
"patch_number": "<version number>",
"patch_timestamp": "<release timestamp>",
"general": ["<general note 1>", "<general note 2>", ...],
"items": {
"<item name>": ["<change 1>", "<change 2>", ...],
...
},
"heroes": {
"Added": ["<new hero 1>", "<new hero 2>", ...],
"<hero name>": ["<change 1>", "<change 2>", ...],
...
}
}- The script might not work correctly if the structure of the Dota 2 website changes.
- The script might not fetch or parse data correctly if there are inconsistencies in the structure or content of the patch notes.
- The script doesn't handle errors or exceptions beyond basic error logging. It's recommended to monitor the script's output for any error messages.