對於給定的字串(通常是一個段落),我想替換一些單字/短語,但如果它們碰巧以某種方式被標籤包圍,則忽略它們。這也需要不區分大小寫。
以此為例:
You can find a link here <a href="#">link</a> and a lot of things in different styles. Public platform can appear in bold: <b>public platform</b>, and we also have italics here too: <i>italics</i>. While I like soft pillows I am picky about soft <i>pillows</i>. While I want to find fox, I din't want foxes to show up. The text "shiny fruits" is in a span tag: one of the <span>shiny fruits</span>.
假設我想替換這些字:
link
:出現 2 次。第一個是純文字(匹配),第二個是 A
標記(忽略)公共平台
:純文字(匹配,不區分大小寫),B
標記中的第二個(忽略)softpillows
:1 個純文字符合。 fox
:1 個純文字符合。它查看完整的單字。 fruits
:純文字(符合),span
標記中的第二個(忽略)與其他文字作為背景;我正在搜尋短語匹配(不是單字)並將匹配連結到相關頁面。
我想避免巢狀HTML(粗體標籤內沒有連結,反之亦然)或其他錯誤(例如:the <a href="# ">phrase <b>goes</ a> 這裡</b>
)
我嘗試了幾種方法,例如搜尋已刪除 HTML 內容的經過清理的文字副本,雖然這告訴我存在匹配項,但我遇到了將其映射回原始內容的全新問題。
P粉5949413012024-03-28 12:56:47
我發現了關於正規表示式否定前瞻的提及,並且在打破我的想法之後得到這個正規表示式(假設你有VALID html標籤配對)
// made function a bit ugly just to try to show how it comes together
public function replaceTextOutsideTags($sourceText = null, $toReplace = 'inner text', $dummyText = '(REPLACED TEXT HERE)')
{
$string = $sourceText ?? "Inner text
You can find a link here link and a lot
of things in different styles. Public platform can appear in bold:
public platform, and we also have italics here too: italics.
While I like soft pillows I am picky about soft pillows.
While I want to find fox, I din't want foxes to show up.
The text \"shiny fruits\" is in a span tag: one of the shiny fruits.
The inner text like this inner inner text here to test too, event inner text
omg thats sad... or not
";
// it would be nice to use [[:punct:]] but somehow regex thinks that < and > are also punctuation marks
$punctuation = "\.,!\?:;\|\/=\"#"; // this part might take additional attention but you get the point
$stringPart = "\b$toReplace\b";
$excludeSequence = "(?![\w\n\s>$punctuation]*?";
$excludeOutside = "$excludeSequence<\/)"; // note on closing )
$excludeTag = "$excludeSequence>)"; // note on closing )
$pattern = "/" . $stringPart . $excludeOutside . $excludeTag . "/im";
return preg_replace($pattern, $dummyText, $string);
}
帶有預設參數的範例輸出
""" (REPLACED TEXT HERE)\r\n You can find a link here link and a lot \r\n of things in different styles. Public platform can appear in bold: \r\n public platform, and we also have italics here too: italics. \r\n While I like soft pillows I am picky about soft pillows. \r\n While I want to find fox, I din't want foxes to show up.\r\n The text "shiny fruits" is in a span tag: one of the shiny fruits.\r\n The (REPLACED TEXT HERE) like this inner inner text here to test too, event (REPLACED TEXT HERE)\r\n omg thats sad... or not """
現在一步一步
pillowS
,我們就不需要 pillow
#)\w
單字符號、\s
空格或\n
換行符號和允許以開始結束標記
結尾的標點符號 - 我們不需要這個匹配,這裡出現了否定的先行(?![\w\n\s>$標點符號]*?<\/ )<\/)
。在這裡我們可以確定匹配不會進入新標籤,因為 <<
不在描述的序列中($excludeOutside
變數)$excludeTag
變數與$excludeOutside
基本上相同,但適用於$toReplace
可以是html 標籤本身的情況,例如一個
<<
或 >
覆寫文本,並且使用這些符號可能會導致意外行為