Home >Backend Development >C++ >How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?

How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?

Barbara Streisand
Barbara StreisandOriginal
2025-01-18 23:26:10847browse

How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?

Identify and parse Html elements within frames/iframes in the WebBrowser control

Problem Overview

As stated in the query, while trying to collect video clip links from a specific website using the WebBrowser control, the web page contains iframes that host their own documents and elements. The video element cannot be found by relying solely on the main document's Html element. Therefore, it is necessary to delve into the elements of the iframe to find the required information.

Solving the problem: parsing the HtmlDocuments of the frame

To efficiently retrieve Html elements from a frame/iframe, you can take the following steps:

1. Identification framework:

  • Use the WebBrowser.Document.Window.Frames property to access the HtmlWindowCollection containing all frames in the main document.

2. Parse the framework document:

  • Iterate through each HtmlWindow in the collection.
  • For each frame, access its HtmlDocument property to inspect its Html element.

3. Extract Html element attributes:

  • Use the HtmlElement.GetAttribute method to extract relevant attributes from the recognized Html element.

Sample code snippet:

Here is a sample implementation demonstrating how to parse Html elements from a frame:

<code class="language-c#">public class FrameHtmlElementParser
{
    private List<MovieLink> movieLinks = new List<MovieLink>(); //更正变量名

    public void ParseMovies(WebBrowser browser)
    {
        browser.DocumentCompleted += Browser_DocumentCompleted;
    }

    private void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        var browser = sender as WebBrowser;

        if (browser.ReadyState != WebBrowserReadyState.Complete)
        {
            return;
        }

        var documentFrames = browser.Document.Window.Frames;

        foreach (HtmlWindow frame in documentFrames)
        {
            try
            {
                var videoElement = frame.Document.Body.GetElementsByTagName("video").OfType<HtmlElement>().FirstOrDefault(); //更正标签名

                if (videoElement != null)
                {
                    string videoLink = videoElement.GetAttribute("src");
                    int hash = videoLink.GetHashCode();

                    if (movieLinks.Any(m => m.Hash == hash)) //更正变量名
                    {
                        // 完成此 URL 的解析。删除处理程序或采取其他适当的操作。
                        return;
                    }

                    string sourceImage = videoElement.GetAttribute("poster");
                    movieLinks.Add(new MovieLink //更正变量名
                    {
                        Hash = hash,
                        VideoLink = videoLink,
                        ImageLink = sourceImage
                    });
                }
            }
            catch (UnauthorizedAccessException) { } // 无法避免:忽略
            catch (InvalidOperationException) { }   // 无法避免:忽略
        }
    }
}

public class MovieLink //添加MovieLink类定义
{
    public int Hash { get; set; }
    public string VideoLink { get; set; }
    public string ImageLink { get; set; }
}</code>

Avoid duplicate data:

To prevent duplicate Html element attributes from being stored, the sample code uses a custom MovieLink class that contains the HashCode of each referenced link. By comparing the HashCode, it checks for duplicates before adding the new item to the movieLinks list.

The following modifications were made to the code:

  • Corrected spelling and capitalization errors in the code: For example, movielink is corrected to MovieLink, and VIDEO is corrected to video.
  • Added MovieLinkclass definition: This makes the code more complete and easier to understand.
  • Made minor adjustments to the annotation: to make it clearer and more accurate.

This makes the code easier to compile and run, and more compliant with C# coding conventions.

The above is the detailed content of How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn