Extract URLs
During the last week, two projects needed URLs extracted (selected and copied, not removed) from within text content. One was for a client project, the other was for the new Pro URL (still in development).
I'll show you how I did it. Perhaps you have, or will have, a project where you need to extract the URLs.
The code is PHP. The PHP script will find URLs from within plain text, which may include text that is marked up with HTML tags. Text content copied straight from the browser window of a web page and pasted into the text box should extract all absolute HTTP and HTTPS URLs from within the pasted text, including URLs of linked text. Relative URLs won't be found, only absolute URLs beginning with http and https.
The text with URLs is pasted into a text box — not a textarea
box but an editable div
box. An editable div
can hold source code in ways a textarea
field can not.
After the content with URLs is pasted into the text box, click the button to extract the URLs and list them in your browser window.
The software is all in one PHP file. No customization is required. Simply upload it to your server and use it.
Here is a screenshot with the text box. When using the live software and tapping the button, the extracted URLs are listed below the text box.
Click here for a live implementation.
If you like what it does and want the functionality on your website, here is the source code.
<?php // URL Extractor // April 14, 2023 // Will Bontrager Software LLC if( isset($_POST['extractURLs']) ) { // Extracts any http... URLs and returns them to browser. $matches = array(); preg_match_all('!https?://[^"\'\s<>]+!',$_POST['extractText'],$matches); if( empty($matches[0]) ) { echo 'No URLs found.'; } else { echo implode("\n",$matches[0]); } exit; } ?> <!doctype html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Extract URLs</title> <style>body { font-size:100%; }</style> </head> <body><div style="max-width:500px; margin:.5in auto;"> <h1 class="interior-page-header">Extract URLs</h1> <div id="text-container" style="border:2px solid #ccc; border-radius:.5em; padding:.5em; min-height:1in; margin:1em 0;" contenteditable="true"></div> <input type="button" onclick="SubmitFormWithTextToExtract()" style="width:100%;" value="Extract URLs"> <div id="url-container" style="border:1px dotted #ccc; border-radius:.5em; padding:.5em; min-height:.2in; margin:1em 0; white-space:pre; font-family:monospace; overflow:auto;"></div> <script type="text/javascript"> function SubmitFormWithTextToExtract() { // Submits text with URLs to the PHP code at the top of this page and posts the response. document.getElementById("url-container").innerHTML = ""; let http = new XMLHttpRequest();; if(! http) { alert("Sorry, unable to connect to the internet. Perhaps tapping again will get through."); return; } var params = new Array(); params.push( "extractURLs=yes" ); params.push( "extractText=" + encodeURIComponent(document.getElementById("text-container").innerHTML) ); http.onreadystatechange = function() { if(http.readyState == 4) { if(http.status == 200) { document.getElementById("url-container").innerHTML = http.responseText; } else { alert('Content request error, status code: '+http.status+' '+http.statusText); } } } http.open("POST","<?php echo($_SERVER['PHP_SELF']) ?>",true); http.setRequestHeader("Content-type", "application/x-www-form-urlencoded"); http.send( params.join("&") ); } // function SubmitFormWithTextToExtract(); </script> </div></body> </html>
Save the source code as extractURLs.php
or other suitable *.php
file name. Upload it to your server. To use the software, types its URL into your browser.
URLs beginning with http://
or https://
will be extracted and listed. URLs may be to web pages, images, or elsewhere. If a URL is found more than once, it will be listed more than once.
(This content first appeared in Possibilities newsletter.)
Will Bontrager