Home  >  Article  >  Web Front-end  >  Where is web page text information generally stored_html/css_WEB-ITnose

Where is web page text information generally stored_html/css_WEB-ITnose

WBOY
WBOYOriginal
2016-06-24 12:10:261873browse

The topic of the graduation project is to extract web page text information based on statistics. Therefore, we need to know what components general web pages put text information in.


Reply to the discussion (Solution)

Haha
It’s hard to say, it’s inside the body anyway
Haha

Haha
It’s hard to say, it’s inside the body anyway
Haha
Look I read a paper saying that it is usually placed in a table

A table is a table. In the past, when making web pages, tables were usually used for layout and text placement. Now many websites use DIV CSS, then The text may be placed in DIV instead of the table

or it can be placed in the database, which is easy to update and maintain

I feel that it is a bit vague... There are two possibilities: 1. is the displayed text, which of course refers to the content between and . 2. The text of the web page, that is, all the content that makes up the web page, that is, between and ( The previous code is probably the same, right? Not sure). This seems to be the content searched by web crawlers. According to your title (statistically based web page text information extraction), it is estimated that it is through extracting web page content. Then search the specified content for statistics... So it should be the second case... Haha

This requires "specific analysis of specific websites". The main data content of some websites is in the table, and some But it may be in div, or even dl, ol, ul.

is placed in html haha,

in is placed in




This is all nonsense

Just put it wherever you like

Quoting the reply from xming4321 on the 1st floor:
Haha
It’s hard to say, it’s in the body anyway
Haha

I saw a paper saying that
is usually placed in the table. Generally, the text information is in the paragraph

, because

is the terminal block element that meets the standard.
is used in modern web pages. div css is used for typesetting,
so the data placed in

are all data information that has a vertical and horizontal table format relationship.

The topic of the graduation project is to extract text information from web pages based on statistics. Therefore, we need to know what components general web pages put text information in.
Could you please tell me if you have finished the text extraction program? Send me a copy for reference. Thank you very much! !

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn