Heim  >  Artikel  >  Web-Frontend  >  Wie kann ich dynamisch generiertes HTML über den .NET-WebBrowser effektiv abrufen?

Wie kann ich dynamisch generiertes HTML über den .NET-WebBrowser effektiv abrufen?

DDD
DDDOriginal
2024-10-18 08:37:29234Durchsuche

How to Retrieve Dynamically Generated HTML via .NET WebBrowser Effectively?

How to Extract Dynamically Generated HTML Using .NET WebBrowser

This discussion revolves around the challenge of dynamically retrieving HTML content as rendered by a web browser in a .NET application.

Problem:

Existing solutions have focused on the System.Windows.Forms.WebBrowser class or the mshtml.HTMLDocument interface without satisfactory results. Retrieving raw HTML from WebClient or mshtml.HTMLDocument does not provide the dynamic content generated by browser rendering.

Investigated Approaches:

  • Accessing the document using the WebBrowser class failed to retrieve rendered HTML.
  • Using mshtml.HTMLDocument and parsing downloaded raw HTML also yielded unsatisfactory results.

Elegant Solution:

While the ultimate solution may vary depending on specific requirements, a combination of techniques can provide a robust solution:

  1. WebBrowser Control: Embed a WebBrowser control to navigate to the desired URL.
  2. State Monitoring: Monitor the DocumentCompleted event and check the IsBusy property until rendering completes.
  3. Asynchronous/Await: Utilize async/await to handle asynchronous polling and streamline the code flow.
  4. HTML5 Rendering: Enable HTML5 rendering using Browser Feature Control to ensure up-to-date rendering behavior.

Code Sample:

The following code sample combines these techniques to extract dynamic HTML content:

<code class="csharp">using System;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
using mshtml;

namespace HtmlExtractor
{
    public partial class MainForm : Form
    {
        public MainForm()
        {
            SetFeatureBrowserEmulation();
            InitializeComponent();
            this.Load += MainForm_Load;
        }

        async void MainForm_Load(object sender, EventArgs e)
        {
            try
            {
                var cts = new CancellationTokenSource(10000); // cancel in 10s
                var html = await LoadDynamicPage("https://www.google.com/#q=where+am+i", cts.Token);
                MessageBox.Show(html.Substring(0, 1024) + "..."); // it's too long!
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
        }

        async Task<string> LoadDynamicPage(string url, CancellationToken token)
        {
            var tcs = new TaskCompletionSource<bool>();
            WebBrowserDocumentCompletedEventHandler handler = (s, arg) =>
                tcs.TrySetResult(true);

            using (token.Register(() => tcs.TrySetCanceled(), useSynchronizationContext: true))
            {
                this.webBrowser.DocumentCompleted += handler;
                try
                {
                    this.webBrowser.Navigate(url);
                    await tcs.Task; // wait for DocumentCompleted
                }
                finally
                {
                    this.webBrowser.DocumentCompleted -= handler;
                }
            }

            var documentElement = this.webBrowser.Document.GetElementsByTagName("html")[0];

            var html = documentElement.OuterHtml;
            while (true)
            {
                await Task.Delay(500, token);
                if (this.webBrowser.IsBusy)
                    continue;

                var htmlNow = documentElement.OuterHtml;
                if (html == htmlNow)
                    break;

                html = htmlNow;
            }

            token.ThrowIfCancellationRequested();
            return html;
        }

        static void SetFeatureBrowserEmulation()
        {
            if (LicenseManager.UsageMode != LicenseUsageMode.Runtime)
                return;
            var appName = System.IO.Path.GetFileName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
            Registry.SetValue(@"HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION",
                appName, 10000, RegistryValueKind.DWord);
        }
    }
}</code>

This approach provides a more comprehensive and efficient way to extract dynamically generated HTML content from a web browser in a .NET application.

Das obige ist der detaillierte Inhalt vonWie kann ich dynamisch generiertes HTML über den .NET-WebBrowser effektiv abrufen?. Für weitere Informationen folgen Sie bitte anderen verwandten Artikeln auf der PHP chinesischen Website!

Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn