Desent Solutions – Developer Assessment

Website Archiver Tool

Build a tool that archives any website in its current state — including all assets and server-side API responses — so the archived version works even when the original site is completely offline. Similar to the Wayback Machine,

Get Started

Download the test website package. It includes a Docker-based sample site with static content, images, and API endpoints you can spin up and tear down to verify your archiver works.

Download Test Website (ZIP)

Contains: Docker setup, Express server, HTML/CSS/JS, and image generator script

Requirements

Must Have (Core)

  • Accept a URL as input and crawl the full page
  • Download all assets: HTML, CSS, images, JS, fonts, media
  • Intercept and cache API calls — capture XHR/fetch requests server-side and replay their responses in the archive
  • Serve the archived version via a local HTTP server
  • The archived site must work 100% offline

Should Have

  • Multi-page archiving (crawl depth ≥ 1)
  • Simple web UI or CLI to trigger archiving and browse archives
  • Metadata storage (archive date, original URL, page title)

Nice to Have (Bonus)

  • Client-side rendering support (SPA/React/Vue) via headless browser
  • Scheduled re-archiving
  • Diff view between archive versions
  • Export as .warc or .zip

How to Test

# 1. Unzip and start the test website

cd archiver-test-website

node create-images.js

docker-compose up -d

# 2. Archive the site while it's running

your-tool archive http://localhost:3777

# 3. Shut down the original

docker-compose down

# 4. Serve the archive and verify everything works

your-tool serve

What the Test Site Covers

FeatureWhat to Verify
Static HTML + CSSLayout and styles render correctly
Multiple images (SVG)All images load from archive
Google Fonts (external CDN)Fonts display correctly offline
Fetch API → /api/productsProduct list appears without server
Fetch API → /api/statsStats bar shows data offline
Fetch API → /api/reviews/:idParameterized API responses cached
JavaScript interactivityClick handlers work in archive
CSS background-image url()Background pattern renders
Responsive designMedia queries still apply

Tech Stack

Free choice — but justify your decisions. Some suggestions:

Crawling: Puppeteer / Playwright (headless browser) or wget/httrack

API interception: Puppeteer request interception, mitmproxy, or custom proxy

Storage: File system or SQLite

Serving: Express.js, FastAPI, or similar lightweight server

Language: Node.js, Python, or Go

Evaluation Criteria

Completeness30%

Does the archived site work fully offline?

API Interception25%

Are server-side responses captured and replayed?

Code Quality20%

Clean, readable, well-structured code

Architecture15%

Sensible tech choices, documented trade-offs

Bonus Features10%

SPA support, UI, export, etc.

Deliverables

1. Source code in a Git repository

2. README with setup instructions

3. Architecture document — what approach you chose and why

4. Demo — screen recording or live demo showing: archive → shutdown → offline browsing

Time Budget

5–7 working days

Tips

• Start with a simple wget --mirror approach to understand the baseline, then improve from there

• Puppeteer/Playwright with request interception is probably the most powerful approach for API capture

• Don't forget relative vs. absolute URL rewriting — this is where most archivers break

• Look at how the Wayback Machine / SingleFile / HTTrack solve similar problems

• Edge case: inline <style> blocks with url() references

Ready to Submit?

Once your archiver is complete, submit your work for review.

Submit Your Work →

Happy hacking and good luck!

Questions? Reach out to your contact Ramsey or Lukas.