我需要在段落中突出显示关键字,就像google在其搜索结果中一样。假设我有一个带有博客文章的MySQL数据库。当用户搜索某个关键字时,我希望返回包含这些关键字的帖子,但只显示帖子的一部分(包含搜索关键字的段落)并突出显示那些关键字。
我的计划是这样的:
您能为我提供一些逻辑上的帮助吗,或者至少可以告诉我我的逻辑是否可以?我正在学习PHP。
如果它包含html(请注意,这是一个非常强大的解决方案):
$string = '<p>foo<b>bar</b></p>'; $keyword = 'foo'; $dom = new DomDocument(); $dom->loadHtml($string); $xpath = new DomXpath($dom); $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]'); foreach ($elements as $element) { foreach ($element->childNodes as $child) { if (!$child instanceof DomText) continue; $fragment = $dom->createDocumentFragment(); $text = $child->textContent; $stubs = array(); while (($pos = stripos($text, $keyword)) !== false) { $fragment->appendChild(new DomText(substr($text, 0, $pos))); $word = substr($text, $pos, strlen($keyword)); $highlight = $dom->createElement('span'); $highlight->appendChild(new DomText($word)); $highlight->setAttribute('class', 'highlight'); $fragment->appendChild($highlight); $text = substr($text, $pos + strlen($keyword)); } if (!empty($text)) $fragment->appendChild(new DomText($text)); $element->replaceChild($fragment, $child); } } $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
结果是:
<p><span class="highlight">foo</span><b>bar</b></p>
与:
$string = '<body><p>foobarbaz<b>bar</b></p></body>'; $keyword = 'bar';
您得到(为了便于阅读,分为多行):
<p>foo <span class="highlight">bar</span> baz <b> <span class="highlight">bar</span> </b> </p>
提防非dom解决方案(如regex或str_replace),因为突出显示诸如“ div”之类的东西有完全破坏HTML的趋势……这只会在正文中“突出显示”字符串,而不会在标签内……
regex
str_replace
编辑 由于需要Google样式的结果,因此这是一种处理方法:
function getKeywordStubs($string, array $keywords, $maxStubSize = 10) { $dom = new DomDocument(); $dom->loadHtml($string); $xpath = new DomXpath($dom); $results = array(); $maxStubHalf = ceil($maxStubSize / 2); foreach ($keywords as $keyword) { $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]'); $replace = '<span class="highlight">'.$keyword.'</span>'; foreach ($elements as $element) { $stub = $element->textContent; $regex = '#^.*?((\w*\W*){'. $maxStubHalf.'})('. preg_quote($keyword, '#'). ')((\w*\W*){'. $maxStubHalf.'}).*?$#ims'; preg_match($regex, $stub, $match); var_dump($regex, $match); $stub = preg_replace($regex, '\\1\\3\\4', $stub); $stub = str_ireplace($keyword, $replace, $stub); $results[] = $stub; } } $results = array_unique($results); return $results; }
好的,所以要做的就是返回一个包含$maxStubSize单词的匹配数组(即该数字之前的一半,之后的一半)…
$maxStubSize
因此,给定一个字符串:
<p>a whole <b>bunch of</b> text <a>here for</a> us to foo bar baz replace out from this string <b>bar</b> </p>
调用getKeywordStubs($string, array('bar', 'bunch'))将导致:
getKeywordStubs($string, array('bar', 'bunch'))
array(4) { [0]=> string(75) "here for us to foo <span class="highlight">bar</span> baz replace out from " [3]=> string(34) "<span class="highlight">bar</span>" [4]=> string(62) "a whole <span class="highlight">bunch</span> of text here for " [7]=> string(39) "<span class="highlight">bunch</span> of" }
所以,那么您可以通过对列表进行排序strlen,然后选择两个最长的匹配项来构建结果blurb (假设php 5.3+):
strlen
usort($results, function($str1, $str2) { return strlen($str2) - strlen($str1); }); $description = implode('...', array_slice($results, 0, 2));
here for us to foo <span class="highlight">bar</span> baz replace out...a whole <span class="highlight">bunch</span> of text here for
我希望有帮助…(我确实觉得这有点......肿…我敢肯定有更好的方法可以做到这一点,但这是一种方法)…