


Code example of how C# uses regular expressions to crawl website information
This article mainly introduces the use of CregularExpression Capture website information, combined with examples to analyze C#'s techniques related to regular crawling operations for web page information. It has certain reference value. Friends in need can refer to the following
Examples in this article The method of using regular expressions to capture website information in C# is shared with you for your reference. The details are as follows:
Here is an example of capturing Jingdong Mall product details. ##1. Create JdRobber.cs program class
public class JdRobber { /// <summary> /// 判断是否京东链接 /// </summary> /// <param name="param"></param> /// <returns></returns> public bool ValidationUrl(string url) { bool result = false; if (!String.IsNullOrEmpty(url)) { Regex regex = new Regex(@"^http://item.jd.com/\d+.html$"); Match match = regex.Match(url); if (match.Success) { result = true; } } return result; } /// <summary> /// 抓取京东信息 /// </summary> /// <param name="param"></param> /// <returns></returns> public void GetInfo(string url) { if (ValidationUrl(url)) { string htmlStr = WebHandler.GetHtmlStr(url, "Default"); if (!String.IsNullOrEmpty(htmlStr)) { string pattern = ""; //正则表达式 string sourceWebID = ""; //商品关键ID string title = ""; //标题 decimal price = 0; //价格 string picName = ""; //图片 //提取商品关键ID pattern = @"http://item.jd.com/(?<Object>\d+).html"; sourceWebID = WebHandler.GetRegexText(url, pattern); //提取标题 pattern = @"<p.*id=\""name\"".*>[\s\S]*<h1 id="Object">(?<Object>.*?)</h1>"; title = WebHandler.GetRegexText(htmlStr, pattern); //提取图片 int begin = htmlStr.IndexOf("<p id=\"spec-n1\""); int end = htmlStr.IndexOf("</p>", begin + 1); if (begin > 0 && end > 0) { string subPicHtml = htmlStr.Substring(begin, end - begin); pattern = @"<img .*src=\""(?<Object alt="Code example of how C# uses regular expressions to crawl website information" >.*?)\"".*/>"; picName = WebHandler.GetRegexText(subPicHtml, pattern); } //提取价格 if (sourceWebID != "") { string priceUrl = @"http://p.3.cn/prices/get?skuid=J_" + sourceWebID + "&type=1"; string priceJson = WebHandler.GetHtmlStr(priceUrl, "Default"); pattern = @"\""p\"":\""(?<Object>\d+(\.\d{1,2})?)\"""; price = WebHandler.GetValidPrice(WebHandler.GetRegexText(priceJson, pattern)); } Console.WriteLine("商品名称:{0}", title); Console.WriteLine("图片:{0}", picName); Console.WriteLine("价格:{0}", price); } } } }
2. Create WebHandler.cs public method class
/// <summary> /// 公共方法类 /// </summary> public class WebHandler { /// <summary> /// 获取网页的HTML码 /// </summary> /// <param name="url">链接地址</param> /// <param name="encoding">编码类型</param> /// <returns></returns> public static string GetHtmlStr(string url, string encoding) { string htmlStr = ""; try { if (!String.IsNullOrEmpty(url)) { WebRequest request = WebRequest.Create(url); //实例化WebRequest对象 WebResponse response = request.GetResponse(); //创建WebResponse对象 Stream datastream = response.GetResponseStream(); //创建流对象 Encoding ec = Encoding.Default; if (encoding == "UTF8") { ec = Encoding.UTF8; } else if (encoding == "Default") { ec = Encoding.Default; } StreamReader reader = new StreamReader(datastream, ec); htmlStr = reader.ReadToEnd(); //读取数据 reader.Close(); datastream.Close(); response.Close(); } } catch { } return htmlStr; } /// <summary> /// 获取正则表达式中的关键字 /// </summary> /// <param name="input">文本</param> /// <param name="pattern">表达式</param> /// <returns></returns> public static string GetRegexText(string input, string pattern) { string result = ""; if (!String.IsNullOrEmpty(input) && !String.IsNullOrEmpty(pattern)) { Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); Match match = regex.Match(input); if (match.Success) { result = match.Groups["Object"].Value; } } return result; } /// <summary> /// 返回有效价格 /// </summary> /// <param name="strPrice"></param> /// <returns></returns> public static decimal GetValidPrice(string strPrice) { decimal price = 0; try { if (!String.IsNullOrEmpty(strPrice)) { Regex regex = new Regex(@"^\d+(\.\d{1,2})?$", RegexOptions.IgnoreCase); Match match = regex.Match(strPrice); if (match.Success) { price = decimal.Parse(strPrice); } } } catch { } return price; } }
The above is the detailed content of Code example of how C# uses regular expressions to crawl website information. For more information, please follow other related articles on the PHP Chinese website!

C# and .NET adapt to the needs of emerging technologies through continuous updates and optimizations. 1) C# 9.0 and .NET5 introduce record type and performance optimization. 2) .NETCore enhances cloud native and containerized support. 3) ASP.NETCore integrates with modern web technologies. 4) ML.NET supports machine learning and artificial intelligence. 5) Asynchronous programming and best practices improve performance.

C#.NETissuitableforenterprise-levelapplicationswithintheMicrosoftecosystemduetoitsstrongtyping,richlibraries,androbustperformance.However,itmaynotbeidealforcross-platformdevelopmentorwhenrawspeediscritical,wherelanguageslikeRustorGomightbepreferable.

The programming process of C# in .NET includes the following steps: 1) writing C# code, 2) compiling into an intermediate language (IL), and 3) executing by the .NET runtime (CLR). The advantages of C# in .NET are its modern syntax, powerful type system and tight integration with the .NET framework, suitable for various development scenarios from desktop applications to web services.

C# is a modern, object-oriented programming language developed by Microsoft and as part of the .NET framework. 1.C# supports object-oriented programming (OOP), including encapsulation, inheritance and polymorphism. 2. Asynchronous programming in C# is implemented through async and await keywords to improve application responsiveness. 3. Use LINQ to process data collections concisely. 4. Common errors include null reference exceptions and index out-of-range exceptions. Debugging skills include using a debugger and exception handling. 5. Performance optimization includes using StringBuilder and avoiding unnecessary packing and unboxing.

Testing strategies for C#.NET applications include unit testing, integration testing, and end-to-end testing. 1. Unit testing ensures that the minimum unit of the code works independently, using the MSTest, NUnit or xUnit framework. 2. Integrated tests verify the functions of multiple units combined, commonly used simulated data and external services. 3. End-to-end testing simulates the user's complete operation process, and Selenium is usually used for automated testing.

Interview with C# senior developer requires mastering core knowledge such as asynchronous programming, LINQ, and internal working principles of .NET frameworks. 1. Asynchronous programming simplifies operations through async and await to improve application responsiveness. 2.LINQ operates data in SQL style and pay attention to performance. 3. The CLR of the NET framework manages memory, and garbage collection needs to be used with caution.

C#.NET interview questions and answers include basic knowledge, core concepts, and advanced usage. 1) Basic knowledge: C# is an object-oriented language developed by Microsoft and is mainly used in the .NET framework. 2) Core concepts: Delegation and events allow dynamic binding methods, and LINQ provides powerful query functions. 3) Advanced usage: Asynchronous programming improves responsiveness, and expression trees are used for dynamic code construction.

C#.NET is a popular choice for building microservices because of its strong ecosystem and rich support. 1) Create RESTfulAPI using ASP.NETCore to process order creation and query. 2) Use gRPC to achieve efficient communication between microservices, define and implement order services. 3) Simplify deployment and management through Docker containerized microservices.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

SublimeText3 Mac version
God-level code editing software (SublimeText3)

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software