URL Extractor
Extract all URLs from any text or source code instantly.
Why use it
Discover all links in seconds
HTTP and HTTPS
Detects URLs with both schemes, including ports, paths, query strings, and fragments.
100% private
Your text never leaves the browser. Ideal for source code and internal data.
No duplicates
Automatic deduplication. Each URL appears only once in the result.
Instant
Extraction in milliseconds, regardless of text or HTML length.
How it works
Three steps, no hassle
Paste your text or HTML
Paste the content you want to extract URLs from: plain text, HTML code, API responses, logs.
Automatic extraction
The extractor detects all http and https URLs, with query strings, fragments, and encoded characters.
Copy the URL list
Get the deduplicated URL list, one per line, ready for analysis or auditing.
FAQ
Got questions?
The extractor detects URLs with http:// and https:// schemes. The pattern includes the domain with subdomains, path, query string parameters (after ?), and fragments (after #). It also detects URLs with explicit ports (https://example.com:8080/path) and URL-encoded characters (%20, %2F, etc.). Scheme-less URLs (example.com/path) are not detected to avoid false positives in regular text.
Query strings (https://example.com/search?q=term&page=2) are included in full in the result. URL fragments (https://example.com/doc#section) are also preserved. Percent-encoded characters per RFC 3986 such as %20 for space and %2F for slash are kept as they appear in the text. This matters for REST API URLs containing parameters with complex values.
Yes. The extractor compares full URLs including query strings and fragments to determine duplicates. Two URLs pointing to the same resource but with different query strings are considered distinct (https://example.com?id=1 and https://example.com?id=2 are different URLs). Comparison is case-sensitive for the path but case-insensitive for the domain, following RFC 3986.
To extract URLs from a web page: 1) In Chrome/Firefox, press Ctrl+U (or Cmd+U on Mac) to view the page source. 2) Select all with Ctrl+A and copy. 3) Paste into the extractor. This captures all URLs in href, src, action, and data-* attributes, plus URLs in comments and scripts. For pages with dynamic JavaScript, use the developer tools Network tab to capture actual requests made.
The most common use cases are: SEO audits to find all links on a page, broken link detection by comparing extracted URLs with HTTP responses, server log analysis to see which URLs are most requested, extracting resource sources (images, scripts, styles) from HTML pages, XML sitemap analysis, URL verification in technical documentation, and collecting sources for structured scraping.
URL structure per RFC 3986 and link analysis in SEO
The structure of URLs (Uniform Resource Locators) is defined by RFC 3986 (Uniform Resource Identifier: Generic Syntax), published in 2005. The specification defines the components: scheme (http, https, ftp), authority (user:password@host:port), path, query, and fragment. RFC 3986 also defines percent-encoding to represent characters not allowed directly in URLs. It is an evolution of RFC 2396 (1998) and RFC 1738 (1994), the first RFC to define the URL format.
The distinction between URI, URL, and URN is frequently confused. A URI (Uniform Resource Identifier) is the broadest concept: it identifies a resource. A URL (Uniform Resource Locator) is a URI that also specifies how to access the resource (includes the access scheme like http://). A URN (Uniform Resource Name) is a URI that identifies a resource by name in a namespace, like ISBN or DOI. In practice, the terms URL and URI are used interchangeably in the web context, though technically URLs are a subset of URIs.
Link analysis is a fundamental SEO technique. Web crawlers like Googlebot extract URLs from pages to discover new content. PageRank, Google's original algorithm patented in 1998, values pages by the quantity and quality of links they receive. Tools like Screaming Frog, Ahrefs, and Semrush base part of their functionality on mass URL extraction from web pages to build link graphs that allow analyzing the authority structure of a site.