跳到主要內容

寫一個UTF8的split函式 - PHP版

因為一些原因,所以必須寫一個utf8的split函式,可惜php只有str_split函式,沒有utf8的…。
所以就試寫一個,程式碼如下

<?php
function utf_str_split($utf_str, $split_len = 1){ 
    $len = mb_strlen($utf_str, 'UTF-8'); 
    $arr = array(); 
    $temp_str = $utf_str; 
    for($i = 0 ; $i<$len/$split_len ; $i++){ 
        $arr[] = mb_substr($temp_str, 0, $split_len, 'UTF-8'); 
        $temp_str = mb_substr($temp_str, $split_len, $len, 'UTF-8'); 
    } 
    return $arr; 
}
?>

以下是Joomla 的 utf8_str_split()
原文連結:Source code for file /phputf8/str_split.php


<?php
function utf8_str_split($str, $split_len = 1){  
    if (!preg_match('/^[0-9]+$/', $split_len) || $split_len < 1) return FALSE; 
    $len = mb_strlen($str, 'UTF-8'); 
    if ($len <= $split_len) return array($str); 
    preg_match_all('/.{'.$split_len.'}|[^\x00]{1,'.$split_len.'}$/us', $str, $ar); 
    return $ar[0]; 
}
?>


用一個六千多字的字串測試
分割長度是1
utf_str_split:0.712316989899 sec
utf8_str_split:1.12109088898 sec
分割長度是2
utf_str_split:0.525295972824 sec
utf8_str_split:0.536201000214 sec
分割長度是10
utf_str_split:0.109910964966 sec
utf8_str_split:0.109711885452 sec
測試完後,只有分割長度為1的時候比較有明顯差距,但是大於1的效率都差不多了

留言

這個網誌中的熱門文章

What is phpize

What is phpize According to the PHP official document : The phpize command is used to prepare the build environment for a PHP extension. If you need to build such an extension that from github or another code repositories, you can use  build tools to perform the build manually.