
Scrape Data Tool — Website Scraper
A Python & Playwright-based web scraping tool that fully renders dynamic JavaScript content and downloads complete websites — HTML, CSS, JS and images — packaged into a downloadable ZIP archive.

Santosh Gautam
Full Stack Developer · India
Features
Full JS Rendering
Playwright (Chromium) renders dynamic JS content.
Complete Asset Download
Downloads CSS, JS, images — all linked assets.
Local Link Rewriting
HTML links rewritten to point to local files.
ZIP Download
Full website bundled into a downloadable ZIP.
REST API
/scrape and /download endpoints included.
CORS Enabled
Cross-origin requests supported out of the box.
Auto-Cleanup
ZIP files auto-deleted after download.
Built-in Logging
Detailed logs for easier debugging.
Requirements
Installation & Setup
Clone the repository and navigate into the project folder.
Create and activate a Python virtual environment.
Install all required dependencies.
Install Playwright browsers (Chromium).
You're ready! Start the server.
Running the Server
Start the Flask development server:
python app.py # Server starts at: # http://127.0.0.1:5000
API Usage
/scrapeScrape a websitezipGET http://127.0.0.1:5000/scrape?url=https://example.com&type=zip
/download/<filename>Download ZIPGET http://127.0.0.1:5000/download/example_com_1a2b3c4d.zip
How It Works
1. Playwright loads full page (including JS-rendered content) 2. BeautifulSoup parses HTML structure 3. External assets (CSS, JS, images) downloaded locally 4. HTML links rewritten to point to local assets 5. All files bundled into ZIP archive → downloads/ folder 6. ZIP path returned via API → user downloads file 7. ZIP auto-deleted after download to save disk space
Notes & Tips
- Supports CSS, JS and image assets linked via
hrefandsrc. - Always verify that scraping is permitted on target websites.
- Large sites may take longer — consider timeout settings.
- ZIP files are auto-deleted after download to save storage.
- Use rate limiting or proxies to avoid IP blocking on large-scale scraping.