How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?-C++-php.cn

Home

Backend Development

C++

How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?

Barbara Streisand

Jan 18, 2025 pm 11:26 PM

How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?

Identify and parse Html elements within frames/iframes in the WebBrowser control

Problem Overview

As stated in the query, while trying to collect video clip links from a specific website using the WebBrowser control, the web page contains iframes that host their own documents and elements. The video element cannot be found by relying solely on the main document's Html element. Therefore, it is necessary to delve into the elements of the iframe to find the required information.

Solving the problem: parsing the HtmlDocuments of the frame

To efficiently retrieve Html elements from a frame/iframe, you can take the following steps:

1. Identification framework:

Use the WebBrowser.Document.Window.Frames property to access the HtmlWindowCollection containing all frames in the main document.

2. Parse the framework document:

Iterate through each HtmlWindow in the collection.
For each frame, access its HtmlDocument property to inspect its Html element.

3. Extract Html element attributes:

Use the HtmlElement.GetAttribute method to extract relevant attributes from the recognized Html element.

Sample code snippet:

Here is a sample implementation demonstrating how to parse Html elements from a frame:

public class FrameHtmlElementParser
{
    private List<MovieLink> movieLinks = new List<MovieLink>(); //更正变量名

    public void ParseMovies(WebBrowser browser)
    {
        browser.DocumentCompleted += Browser_DocumentCompleted;
    }

    private void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        var browser = sender as WebBrowser;

        if (browser.ReadyState != WebBrowserReadyState.Complete)
        {
            return;
        }

        var documentFrames = browser.Document.Window.Frames;

        foreach (HtmlWindow frame in documentFrames)
        {
            try
            {
                var videoElement = frame.Document.Body.GetElementsByTagName("video").OfType<HtmlElement>().FirstOrDefault(); //更正标签名

                if (videoElement != null)
                {
                    string videoLink = videoElement.GetAttribute("src");
                    int hash = videoLink.GetHashCode();

                    if (movieLinks.Any(m => m.Hash == hash)) //更正变量名
                    {
                        // 完成此 URL 的解析。删除处理程序或采取其他适当的操作。
                        return;
                    }

                    string sourceImage = videoElement.GetAttribute("poster");
                    movieLinks.Add(new MovieLink //更正变量名
                    {
                        Hash = hash,
                        VideoLink = videoLink,
                        ImageLink = sourceImage
                    });
                }
            }
            catch (UnauthorizedAccessException) { } // 无法避免：忽略
            catch (InvalidOperationException) { }   // 无法避免：忽略
        }
    }
}

public class MovieLink //添加MovieLink类定义
{
    public int Hash { get; set; }
    public string VideoLink { get; set; }
    public string ImageLink { get; set; }
}

Avoid duplicate data:

To prevent duplicate Html element attributes from being stored, the sample code uses a custom MovieLink class that contains the HashCode of each referenced link. By comparing the HashCode, it checks for duplicates before adding the new item to the movieLinks list.

The following modifications were made to the code:

Corrected spelling and capitalization errors in the code: For example, movielink is corrected to MovieLink, and VIDEO is corrected to video.
Added MovieLinkclass definition: This makes the code more complete and easier to understand.
Made minor adjustments to the annotation: to make it clearer and more accurate.

This makes the code easier to compile and run, and more compliant with C# coding conventions.

The above is the detailed content of How to Extract HtmlElements from Frames and IFrames in WebBrowser Control?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

C# vs. C : Learning Curves and Developer ExperienceApr 18, 2025 am 12:13 AM

There are significant differences in the learning curves of C# and C and developer experience. 1) The learning curve of C# is relatively flat and is suitable for rapid development and enterprise-level applications. 2) The learning curve of C is steep and is suitable for high-performance and low-level control scenarios.

C# vs. C : Object-Oriented Programming and FeaturesApr 17, 2025 am 12:02 AM

There are significant differences in how C# and C implement and features in object-oriented programming (OOP). 1) The class definition and syntax of C# are more concise and support advanced features such as LINQ. 2) C provides finer granular control, suitable for system programming and high performance needs. Both have their own advantages, and the choice should be based on the specific application scenario.

From XML to C : Data Transformation and ManipulationApr 16, 2025 am 12:08 AM

Converting from XML to C and performing data operations can be achieved through the following steps: 1) parsing XML files using tinyxml2 library, 2) mapping data into C's data structure, 3) using C standard library such as std::vector for data operations. Through these steps, data converted from XML can be processed and manipulated efficiently.

C# vs. C : Memory Management and Garbage CollectionApr 15, 2025 am 12:16 AM

C# uses automatic garbage collection mechanism, while C uses manual memory management. 1. C#'s garbage collector automatically manages memory to reduce the risk of memory leakage, but may lead to performance degradation. 2.C provides flexible memory control, suitable for applications that require fine management, but should be handled with caution to avoid memory leakage.

Beyond the Hype: Assessing the Relevance of C TodayApr 14, 2025 am 12:01 AM

C still has important relevance in modern programming. 1) High performance and direct hardware operation capabilities make it the first choice in the fields of game development, embedded systems and high-performance computing. 2) Rich programming paradigms and modern features such as smart pointers and template programming enhance its flexibility and efficiency. Although the learning curve is steep, its powerful capabilities make it still important in today's programming ecosystem.

The C Community: Resources, Support, and DevelopmentApr 13, 2025 am 12:01 AM

C Learners and developers can get resources and support from StackOverflow, Reddit's r/cpp community, Coursera and edX courses, open source projects on GitHub, professional consulting services, and CppCon. 1. StackOverflow provides answers to technical questions; 2. Reddit's r/cpp community shares the latest news; 3. Coursera and edX provide formal C courses; 4. Open source projects on GitHub such as LLVM and Boost improve skills; 5. Professional consulting services such as JetBrains and Perforce provide technical support; 6. CppCon and other conferences help careers

C# vs. C : Where Each Language ExcelsApr 12, 2025 am 12:08 AM

C# is suitable for projects that require high development efficiency and cross-platform support, while C is suitable for applications that require high performance and underlying control. 1) C# simplifies development, provides garbage collection and rich class libraries, suitable for enterprise-level applications. 2)C allows direct memory operation, suitable for game development and high-performance computing.

The Continued Use of C : Reasons for Its EnduranceApr 11, 2025 am 12:02 AM

C Reasons for continuous use include its high performance, wide application and evolving characteristics. 1) High-efficiency performance: C performs excellently in system programming and high-performance computing by directly manipulating memory and hardware. 2) Widely used: shine in the fields of game development, embedded systems, etc. 3) Continuous evolution: Since its release in 1983, C has continued to add new features to maintain its competitiveness.

See all articles