Home >Backend Development >C++ >How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?
As stated in the query, while trying to collect video clip links from a specific website using the WebBrowser control, the web page contains iframes that host their own documents and elements. The video element cannot be found by relying solely on the main document's Html element. Therefore, it is necessary to delve into the elements of the iframe to find the required information.
To efficiently retrieve Html elements from a frame/iframe, you can take the following steps:
1. Identification framework:
2. Parse the framework document:
3. Extract Html element attributes:
Here is a sample implementation demonstrating how to parse Html elements from a frame:
<code class="language-c#">public class FrameHtmlElementParser { private List<MovieLink> movieLinks = new List<MovieLink>(); //更正变量名 public void ParseMovies(WebBrowser browser) { browser.DocumentCompleted += Browser_DocumentCompleted; } private void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { var browser = sender as WebBrowser; if (browser.ReadyState != WebBrowserReadyState.Complete) { return; } var documentFrames = browser.Document.Window.Frames; foreach (HtmlWindow frame in documentFrames) { try { var videoElement = frame.Document.Body.GetElementsByTagName("video").OfType<HtmlElement>().FirstOrDefault(); //更正标签名 if (videoElement != null) { string videoLink = videoElement.GetAttribute("src"); int hash = videoLink.GetHashCode(); if (movieLinks.Any(m => m.Hash == hash)) //更正变量名 { // 完成此 URL 的解析。删除处理程序或采取其他适当的操作。 return; } string sourceImage = videoElement.GetAttribute("poster"); movieLinks.Add(new MovieLink //更正变量名 { Hash = hash, VideoLink = videoLink, ImageLink = sourceImage }); } } catch (UnauthorizedAccessException) { } // 无法避免:忽略 catch (InvalidOperationException) { } // 无法避免:忽略 } } } public class MovieLink //添加MovieLink类定义 { public int Hash { get; set; } public string VideoLink { get; set; } public string ImageLink { get; set; } }</code>
To prevent duplicate Html element attributes from being stored, the sample code uses a custom MovieLink class that contains the HashCode of each referenced link. By comparing the HashCode, it checks for duplicates before adding the new item to the movieLinks list.
The following modifications were made to the code:
movielink
is corrected to MovieLink
, and VIDEO
is corrected to video
. MovieLink
class definition: This makes the code more complete and easier to understand. This makes the code easier to compile and run, and more compliant with C# coding conventions.
The above is the detailed content of How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?. For more information, please follow other related articles on the PHP Chinese website!