Example of using C# to obtain the HTML source code of a web page-C#.Net Tutorial-php.cn

Home

Backend Development

C#.Net Tutorial

Example of using C# to obtain the HTML source code of a web page

高洛峰

Jan 14, 2017 pm 01:29 PM

I'm working on a project recently, and one of the functions is to get the source code of a web page based on a URL address. In ASP.NET (C#), there seem to be many ways to obtain the source code of a web page. I just made a simple WebClient, which is very simple and easy. But a very annoying problem came out later, and that was the garbled Chinese characters.

After careful study, Chinese web pages are nothing more than GB2312 and UTF-8. So we have the following code:

       /// <summary>
       /// 根据网址的URL，获取源代码HTML
       /// </summary>
       /// <param name="url"></param>
       /// <returns></returns>
       public static string GetHtmlByUrl(string url)
       {
           using (WebClient wc = new WebClient())
           {
               try
               {
                   wc.UseDefaultCredentials = true;
                   wc.Proxy = new WebProxy();
                   wc.Proxy.Credentials = CredentialCache.DefaultCredentials;
                   wc.Credentials = System.Net.CredentialCache.DefaultCredentials;
                   byte[] bt = wc.DownloadData(url);
                   string txt = System.Text.Encoding.GetEncoding("GB2312").GetString(bt);
                   switch (GetCharset(txt).ToUpper())
                   {
                       case "UTF-8":
                           txt = System.Text.Encoding.UTF8.GetString(bt);
                           break;
                       case "UNICODE":
                           txt = System.Text.Encoding.Unicode.GetString(bt);
                           break;
                       default:
                           break;
                   }
                   return txt;
               }
               catch (Exception ex)
               {
                   return null;
               }
           }
       }

To explain a little bit, WebClient is used here to create a wc object (the naming is a bit awkward). Then call the DownloadData method of the wc object, pass in the URL value, and return a byte array. By default, GB2312 is used to read this byte array and convert it into a string. Find the characteristic characters of the encoding format of the web page from the string of the web page source code, such as finding information such as charset="utf-8", to determine the encoding format of the current web page.

The GetCharset function is used to obtain the encoding format of the current web page. The specific code is as follows:

      /// <summary>
       /// 从HTML中获取获取charset
       /// </summary>
       /// <param name="html"></param>
       /// <returns></returns>
       public static string GetCharset(string html)
       {
           string charset = "";
           Regex regCharset = new Regex(@"content=[""'].*\s*charset\b\s*=\s*""?(?<charset>[^""']*)", RegexOptions.IgnoreCase);
           if (regCharset.IsMatch(html))
           {
               charset = regCharset.Match(html).Groups["charset"].Value;
           }
           if (charset.Equals(""))
           {
               regCharset = new Regex(@"<\s*meta\s*charset\s*=\s*[""']?(?<charset>[^""']*)", RegexOptions.IgnoreCase);
               if (regCharset.IsMatch(html))
               {
                   charset = regCharset.Match(html).Groups["charset"].Value;
               }
           }
           return charset;
       }

For more related articles on examples of using C# to obtain the HTML source code of web pages, please pay attention to the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Developing with C# .NET: A Practical Guide and ExamplesMay 12, 2025 am 12:16 AM

C# and .NET provide powerful features and an efficient development environment. 1) C# is a modern, object-oriented programming language that combines the power of C and the simplicity of Java. 2) The .NET framework is a platform for building and running applications, supporting multiple programming languages. 3) Classes and objects in C# are the core of object-oriented programming. Classes define data and behaviors, and objects are instances of classes. 4) The garbage collection mechanism of .NET automatically manages memory to simplify the work of developers. 5) C# and .NET provide powerful file operation functions, supporting synchronous and asynchronous programming. 6) Common errors can be solved through debugger, logging and exception handling. 7) Performance optimization and best practices include using StringBuild

C# .NET: Understanding the Microsoft .NET FrameworkMay 11, 2025 am 12:17 AM

.NETFramework is a cross-language, cross-platform development platform that provides a consistent programming model and a powerful runtime environment. 1) It consists of CLR and FCL, which manages memory and threads, and FCL provides pre-built functions. 2) Examples of usage include reading files and LINQ queries. 3) Common errors involve unhandled exceptions and memory leaks, and need to be resolved using debugging tools. 4) Performance optimization can be achieved through asynchronous programming and caching, and maintaining code readability and maintainability is the key.

The Longevity of C# .NET: Reasons for its Enduring PopularityMay 10, 2025 am 12:12 AM

Reasons for C#.NET to remain lasting attractive include its excellent performance, rich ecosystem, strong community support and cross-platform development capabilities. 1) Excellent performance and is suitable for enterprise-level application and game development; 2) The .NET framework provides a wide range of class libraries and tools to support a variety of development fields; 3) It has an active developer community and rich learning resources; 4) .NETCore realizes cross-platform development and expands application scenarios.

Mastering C# .NET Design Patterns: From Singleton to Dependency InjectionMay 09, 2025 am 12:15 AM

Design patterns in C#.NET include Singleton patterns and dependency injection. 1.Singleton mode ensures that there is only one instance of the class, which is suitable for scenarios where global access points are required, but attention should be paid to thread safety and abuse issues. 2. Dependency injection improves code flexibility and testability by injecting dependencies. It is often used for constructor injection, but it is necessary to avoid excessive use to increase complexity.

C# .NET in the Modern World: Applications and IndustriesMay 08, 2025 am 12:08 AM

C#.NET is widely used in the modern world in the fields of game development, financial services, the Internet of Things and cloud computing. 1) In game development, use C# to program through the Unity engine. 2) In the field of financial services, C#.NET is used to develop high-performance trading systems and data analysis tools. 3) In terms of IoT and cloud computing, C#.NET provides support through Azure services to develop device control logic and data processing.

C# .NET Framework vs. .NET Core/5/6: What's the Difference?May 07, 2025 am 12:06 AM

.NETFrameworkisWindows-centric,while.NETCore/5/6supportscross-platformdevelopment.1).NETFramework,since2002,isidealforWindowsapplicationsbutlimitedincross-platformcapabilities.2).NETCore,from2016,anditsevolutions(.NET5/6)offerbetterperformance,cross-

The Community of C# .NET Developers: Resources and SupportMay 06, 2025 am 12:11 AM

The C#.NET developer community provides rich resources and support, including: 1. Microsoft's official documents, 2. Community forums such as StackOverflow and Reddit, and 3. Open source projects on GitHub. These resources help developers improve their programming skills from basic learning to advanced applications.

The C# .NET Advantage: Features, Benefits, and Use CasesMay 05, 2025 am 12:01 AM

The advantages of C#.NET include: 1) Language features, such as asynchronous programming simplifies development; 2) Performance and reliability, improving efficiency through JIT compilation and garbage collection mechanisms; 3) Cross-platform support, .NETCore expands application scenarios; 4) A wide range of practical applications, with outstanding performance from the Web to desktop and game development.

See all articles