PHP URL/URI-Validation mit IDN (Internationalized domain name) Support
Eines vorweg:
Dies ist keine Implementierung von Internationalized Resource Identifier (IRI). Aber diese Implementierung erlaubt Internationale Domain Namen (Umlaut-Domains) wie sie bereits über 40 Top-Level-Provider (Denic, Verizon,..) anbieten.
Lösung:
Meine Funktion sucht zuerst den Hostnamen heraus.
Dies erledigt die Funktion parse_url erstaunlich zuverlässig.
Dann wird das erste vorkommen dieser Zeichenkette durch den Hostnamen in Punycode-Form ersetzt.
Dies geschieht über die Funktion "encodeDomain". Diese findet ihr hier: http://hx3.de/software-webentwicklun...support-17398/
Auf diese Weise sorgen wir dafür das die URL wie eine URL behandelt werden kann die keine IDN-Domain enthält.
Nun können wir die URL mit einem gewöhnlichen Regex-Script überprüfen. Ich habe gleich 2 gefunden und beide in die Funktion gepackt.
Zuverlässigkeit:
Mit diesem Validierungs-Zeugs ist es so das es nur näherungsweise Perfekte Lösungen gibt. Auch diese Lösung hat ihre Fälle wo sie versagt.
Aber sie ist besser als alle Lösungen ohne IDN-Support :daumen:
Hier der Quellcode:
PHP-Code:
/** * Validate a URI * Supports International Domain Names * * @param string url/uri to validate * @author Herbert Walde * @see http://hx3.de/software-webentwicklung-23/php-url-uri-validation-idn-internationalized-domain-name-support-17404 * @return boolean success */ function validateURL($url, $use_pattern=2){ $hostname = parse_url($url); if(isset($hostname) and $hostname!==false and isset($hostname["host"])){ $hostname = trim($hostname["host"]); if(!empty($hostname)){ $res = strpos($url, $hostname); if($res !== false) { // There is data to be replaced $left_seg = substr($url, 0, strpos($url, $hostname)); $right_seg = substr($url, (strpos($url, $hostname) + strlen($hostname))); $url = $left_seg . encodeDomain($hostname) . $right_seg; } } } if($use_pattern == 1){ /* * PATTERN 1 - Quelle: http://www.mattfarina.com/2009/01/08/rfc-3986-url-validation */ # Start at the beginning of the text $pattern = "/^"; # The scheme $pattern .= "([a-z][a-z0-9\*\-\.]*):\/\/"; # Userinfo (optional) $pattern .= "(?:"; $pattern .= "(?:(?:[\w\.\-\+!$&'\(\)*\+,;=]|%[0-9a-f]{2})+:)*"; $pattern .= "(?:[\w\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})+@"; $pattern .= ")?"; # The domain $pattern .= "(?:"; # Domain name or IPv4 $pattern .= "(?:[a-z0-9\-\.]|%[0-9a-f]{2})+"; # or IPv6 $pattern .= "|(?:\[(?:[0-9a-f]{0,4}:)*(?:[0-9a-f]{0,4})\])"; $pattern .= ")"; # Server port number (optional)"; $pattern .= "(?::[0-9]+)?"; # The path (optional) $pattern .= "(?:[\/|\?]"; $pattern .= "(?:[\w#!:\.\?\+=&@!$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2})"; $pattern .= "*)?"; $pattern .= "$/xi"; } else if($use_pattern == 2){ /* * PATTERN 2 - Quelle: http://www.yiiframework.com/extension/urivalidator/ * * Lizenz: * * Copyright © 2008 by MetaYii. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * - Redistributions of source code must retain the above copyright notice, this * list of conditions and the following disclaimer. * - Redistributions in binary form must reproduce the above copyright notice, * this list of conditions and the following disclaimer in the documentation * and/or other materials provided with the distribution. * - Neither the name of MetaYii nor the names of its contributors may * be used to endorse or promote products derived from this software without * specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ $pattern = "/^([a-z0-9\+\.\-]+):(?:\/\/(?:((?:[a-z0-9\-\._\~!\$\&\'\(\)\*\+\,\;\=\:]|%[0-9A-F]{2})*)\@"; $pattern .= ")?((?:[a-z0-9-\.\_\~\!\$\&\'\(\)\*\+\,\;\=]|%[0-9A-F]{2})*)(?::(\d*))?(\/(?:[a-z0-9\-\._\~\!"; $pattern .= "\$\&\'\(\)\*\+\,\;\=\:\@\/]|%[0-9A-F]{2})*)?|(\/?(?:[a-z0-9\-\._\~\!\$\&\'\(\)\*\+\,\;\=\:\@"; $pattern .= "]|%[0-9A-F]{2})+(?:[a-z0-9\-\._\~\!\$\&\'\(\)\*\+\,\;\=\:\@\/]|%[0-9A-F]{2})*)?)(?:\?((?"; $pattern .= ":[a-z0-9\-\._\~\!\$\&\'\(\)\*\+\,\;\=\:\/\?\@]|%[0-9A-F]{2})*))?(?:\#((?:[a-z0-9\-\._\~\!\$"; $pattern .= "\&\'\(\)\*\+\,\;\=\:\/\?\@]|%[0-9A-F]{2})*))?$/i"; } else { trigger_error("Unknown pattern!"); } return preg_match($pattern, $url); }
////////////////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////// Tests ///////////////////////////////////////// //////////////////////////////////////////////////////////////////////////////////////////
define("URL_IS_VALID",0); define("INVALID_URL",1);
$urls = array( array(URL_IS_VALID, "http://username:password@hostname/path?arg=value#anchor"), array(URL_IS_VALID, "http://username:password@süß.name/path?arg=value&sid=123jkjbkjb#anchor"), array(INVALID_URL, "http://username:password@ süß.name/path?arg=value&sid=123jkjbkjb#anchor"), array(URL_IS_VALID, "http://username:password@süß.name"), array(URL_IS_VALID, "http://username:password@süß.name?arg=value&sid=123jkjbkjb#anchor"), array(URL_IS_VALID, "http://username:password@süß.name#anchor"), array(URL_IS_VALID, "http://süß.name"), );
foreach ($urls as $url){ if($url[0]==URL_IS_VALID){ echo "<hr>Test valid url: <font color=\"#0000FF\">".$url[1]."</font><br>Result: "; } else { echo "<hr>Test <strong>invalid</strong> url: <font color=\"#0000FF\">".$url[1]."</font><br>Result: "; } if(validateURL($url[1])){ echo "URL is valid!"; } else { echo "URL is <strong>invalid</strong>!"; } }
Die Funktion "encodeDomain" findet ihr hier: http://hx3.de/software-webentwicklun...support-17398/
Viel Erfolg
|