


Write Java with zero foundation and practice Zhihu crawler first with Baidu homepage (2)
Ah, wrong, small example.
// 定义一个样式模板,此中使用正则表达式,括号中是要抓的内容 // 相当于埋好了陷阱匹配的地方就会掉下去 Pattern pattern = Pattern.compile("href=\"(.+?)\""); // 定义一个matcher用来做匹配 Matcher matcher = pattern.matcher("<a href=\"index.html\">我的主页</a>"); // 如果找到了 if (matcher.find()) { // 打印出结果 System.out.println(matcher.group(1)); }
Run result:
index.html
Yes, this is our first regular code.
The link for grabbing pictures from such an application must be at your fingertips.
We encapsulate the regular matching into a function, and then modify the code as follows:
import java.io.*; import java.net.*; import java.util.regex.*; public class Main { static String SendGet(String url) { // 定义一个字符串用来存储网页内容 String result = ""; // 定义一个缓冲字符输入流 BufferedReader in = null; try { // 将string转成url对象 URL realUrl = new URL(url); // 初始化一个链接到那个url的连接 URLConnection connection = realUrl.openConnection(); // 开始实际的连接 connection.connect(); // 初始化 BufferedReader输入流来读取URL的响应 in = new BufferedReader(new InputStreamReader( connection.getInputStream())); // 用来临时存储抓取到的每一行的数据 String line; while ((line = in.readLine()) != null) { // 遍历抓取到的每一行并将其存储到result里面 result += line; } } catch (Exception e) { System.out.println("发送GET请求出现异常!" + e); e.printStackTrace(); } // 使用finally来关闭输入流 finally { try { if (in != null) { in.close(); } } catch (Exception e2) { e2.printStackTrace(); } } return result; } static String RegexString(String targetStr, String patternStr) { // 定义一个样式模板,此中使用正则表达式,括号中是要抓的内容 // 相当于埋好了陷阱匹配的地方就会掉下去 Pattern pattern = Pattern.compile(patternStr); // 定义一个matcher用来做匹配 Matcher matcher = pattern.matcher(targetStr); // 如果找到了 if (matcher.find()) { // 打印出结果 return matcher.group(1); } return ""; } public static void main(String[] args) { // 定义即将访问的链接 String url = "http://www.baidu.com"; // 访问链接并获取页面内容 String result = SendGet(url); // 使用正则匹配图片的src内容 String imgSrc = RegexString(result, "即将的正则语法"); // 打印结果 System.out.println(imgSrc); } }
Okay, now everything is ready, only a regular grammar is left!
So what regular statement is more appropriate?
We found that as long as we grab the string src="xxxxxx", we can grab the entire src link,
So a simple regular statement: src="(.+?)"
The above is zero To learn the basics of writing Java crawlers, first practice with the Baidu homepage (2). For more related content, please pay attention to the PHP Chinese website (www.php.cn)!

JVMmanagesgarbagecollectionacrossplatformseffectivelybyusingagenerationalapproachandadaptingtoOSandhardwaredifferences.ItemploysvariouscollectorslikeSerial,Parallel,CMS,andG1,eachsuitedfordifferentscenarios.Performancecanbetunedwithflagslike-XX:NewRa

Java code can run on different operating systems without modification, because Java's "write once, run everywhere" philosophy is implemented by Java virtual machine (JVM). As the intermediary between the compiled Java bytecode and the operating system, the JVM translates the bytecode into specific machine instructions to ensure that the program can run independently on any platform with JVM installed.

The compilation and execution of Java programs achieve platform independence through bytecode and JVM. 1) Write Java source code and compile it into bytecode. 2) Use JVM to execute bytecode on any platform to ensure the code runs across platforms.

Java performance is closely related to hardware architecture, and understanding this relationship can significantly improve programming capabilities. 1) The JVM converts Java bytecode into machine instructions through JIT compilation, which is affected by the CPU architecture. 2) Memory management and garbage collection are affected by RAM and memory bus speed. 3) Cache and branch prediction optimize Java code execution. 4) Multi-threading and parallel processing improve performance on multi-core systems.

Using native libraries will destroy Java's platform independence, because these libraries need to be compiled separately for each operating system. 1) The native library interacts with Java through JNI, providing functions that cannot be directly implemented by Java. 2) Using native libraries increases project complexity and requires managing library files for different platforms. 3) Although native libraries can improve performance, they should be used with caution and conducted cross-platform testing.

JVM handles operating system API differences through JavaNativeInterface (JNI) and Java standard library: 1. JNI allows Java code to call local code and directly interact with the operating system API. 2. The Java standard library provides a unified API, which is internally mapped to different operating system APIs to ensure that the code runs across platforms.

modularitydoesnotdirectlyaffectJava'splatformindependence.Java'splatformindependenceismaintainedbytheJVM,butmodularityinfluencesapplicationstructureandmanagement,indirectlyimpactingplatformindependence.1)Deploymentanddistributionbecomemoreefficientwi

BytecodeinJavaistheintermediaterepresentationthatenablesplatformindependence.1)Javacodeiscompiledintobytecodestoredin.classfiles.2)TheJVMinterpretsorcompilesthisbytecodeintomachinecodeatruntime,allowingthesamebytecodetorunonanydevicewithaJVM,thusfulf


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Chinese version
Chinese version, very easy to use

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function
