Eines vorweg:
Dies ist keine Implementierung von
Internationalized Resource Identifier (IRI). Aber diese Implementierung erlaubt Internationale Domain Namen (Umlaut-Domains) wie sie bereits über 40 Top-Level-Provider (Denic, Verizon,..) anbieten.
Lösung:
Meine Funktion sucht zuerst den Hostnamen heraus.
Dies erledigt die Funktion
parse_url erstaunlich zuverlässig.
Dann wird das erste vorkommen dieser Zeichenkette durch den Hostnamen in
Punycode-Form ersetzt.
Dies geschieht über die Funktion "encodeDomain". Diese findet ihr hier:
PHP E-Mail-Validation mit IDN (Internationalized domain name) Support
Auf diese Weise sorgen wir dafür das die URL wie eine URL behandelt werden kann die keine IDN-Domain enthält.
Nun können wir die URL mit einem gewöhnlichen Regex-Script überprüfen. Ich habe gleich 2 gefunden und beide in die Funktion gepackt.
Zuverlässigkeit:
Mit diesem Validierungs-Zeugs ist es so das es nur näherungsweise Perfekte Lösungen gibt. Auch diese Lösung hat ihre Fälle wo sie versagt.
Aber sie ist besser als alle Lösungen ohne IDN-Support
Hier der Quellcode:
PHP-Code:
/**
* Validate a URI
* Supports International Domain Names
*
* @param string url/uri to validate
* @author Herbert Walde
* @see http://hx3.de/software-webentwicklung-23/php-url-uri-validation-idn-internationalized-domain-name-support-17404
* @return boolean success
*/
function validateURL($url, $use_pattern=2){
$hostname = parse_url($url);
if(isset($hostname) and $hostname!==false and isset($hostname["host"])){
$hostname = trim($hostname["host"]);
if(!empty($hostname)){
$res = strpos($url, $hostname);
if($res !== false) {
// There is data to be replaced
$left_seg = substr($url, 0, strpos($url, $hostname));
$right_seg = substr($url, (strpos($url, $hostname) + strlen($hostname)));
$url = $left_seg . encodeDomain($hostname) . $right_seg;
}
}
}
if($use_pattern == 1){
/*
* PATTERN 1 - Quelle: http://www.mattfarina.com/2009/01/08/rfc-3986-url-validation
*/
# Start at the beginning of the text
$pattern = "/^";
# The scheme
$pattern .= "([a-z][a-z0-9\*\-\.]*):\/\/";
# Userinfo (optional)
$pattern .= "(?:";
$pattern .= "(?:(?:[\w\.\-\+!$&'\(\)*\+,;=]|%[0-9a-f]{2})+:)*";
$pattern .= "(?:[\w\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})+@";
$pattern .= ")?";
# The domain
$pattern .= "(?:";
# Domain name or IPv4
$pattern .= "(?:[a-z0-9\-\.]|%[0-9a-f]{2})+";
# or IPv6
$pattern .= "|(?:\[(?:[0-9a-f]{0,4}:)*(?:[0-9a-f]{0,4})\])";
$pattern .= ")";
# Server port number (optional)";
$pattern .= "(?::[0-9]+)?";
# The path (optional)
$pattern .= "(?:[\/|\?]";
$pattern .= "(?:[\w#!:\.\?\+=&@!$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2})";
$pattern .= "*)?";
$pattern .= "$/xi";
} else if($use_pattern == 2){
/*
* PATTERN 2 - Quelle: http://www.yiiframework.com/extension/urivalidator/
*
* Lizenz:
*
* Copyright © 2008 by MetaYii. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* - Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
* - Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* - Neither the name of MetaYii nor the names of its contributors may
* be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
$pattern = "/^([a-z0-9\+\.\-]+):(?:\/\/(?:((?:[a-z0-9\-\._\~!\$\&\'\(\)\*\+\,\;\=\:]|%[0-9A-F]{2})*)\@";
$pattern .= ")?((?:[a-z0-9-\.\_\~\!\$\&\'\(\)\*\+\,\;\=]|%[0-9A-F]{2})*)(?::(\d*))?(\/(?:[a-z0-9\-\._\~\!";
$pattern .= "\$\&\'\(\)\*\+\,\;\=\:\@\/]|%[0-9A-F]{2})*)?|(\/?(?:[a-z0-9\-\._\~\!\$\&\'\(\)\*\+\,\;\=\:\@";
$pattern .= "]|%[0-9A-F]{2})+(?:[a-z0-9\-\._\~\!\$\&\'\(\)\*\+\,\;\=\:\@\/]|%[0-9A-F]{2})*)?)(?:\?((?";
$pattern .= ":[a-z0-9\-\._\~\!\$\&\'\(\)\*\+\,\;\=\:\/\?\@]|%[0-9A-F]{2})*))?(?:\#((?:[a-z0-9\-\._\~\!\$";
$pattern .= "\&\'\(\)\*\+\,\;\=\:\/\?\@]|%[0-9A-F]{2})*))?$/i";
} else {
trigger_error("Unknown pattern!");
}
return preg_match($pattern, $url);
}
//////////////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////// Tests /////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////////////
define("URL_IS_VALID",0);
define("INVALID_URL",1);
$urls = array(
array(URL_IS_VALID, "http://username:password@hostname/path?arg=value#anchor"),
array(URL_IS_VALID, "http://username:password@süß.name/path?arg=value&sid=123jkjbkjb#anchor"),
array(INVALID_URL, "http://username:password@ süß.name/path?arg=value&sid=123jkjbkjb#anchor"),
array(URL_IS_VALID, "http://username:password@süß.name"),
array(URL_IS_VALID, "http://username:password@süß.name?arg=value&sid=123jkjbkjb#anchor"),
array(URL_IS_VALID, "http://username:password@süß.name#anchor"),
array(URL_IS_VALID, "http://süß.name"),
);
foreach ($urls as $url){
if($url[0]==URL_IS_VALID){
echo "<hr>Test valid url: <font color=\"#0000FF\">".$url[1]."</font><br>Result: ";
} else {
echo "<hr>Test <strong>invalid</strong> url: <font color=\"#0000FF\">".$url[1]."</font><br>Result: ";
}
if(validateURL($url[1])){
echo "URL is valid!";
} else {
echo "URL is <strong>invalid</strong>!";
}
}
Die Funktion "encodeDomain" findet ihr hier:
PHP E-Mail-Validation mit IDN (Internationalized domain name) Support
Viel Erfolg