Query( ) main method



    Query() Static method

    Return value:QueryList object

    The Query method is the only main method of QueryList and is called in a static way.

    Prototype:

    QueryList::Query($page,array $rules, $range = '', $outputEncoding = null, $inputEncoding = null ,$removeHead = false)

    Chinese explanation:

    QueryList::Query(采集的目标页面,采集规则[,区域选择器][,输出编码][,输入编码][,是否移除头部])//采集规则$rules = array(   '规则名' => array('jQuery选择器','要采集的属性'[,"标签过滤列表"][,"回调函数"]),   '规则名2' => array('jQuery选择器','要采集的属性'[,"标签过滤列表"][,"回调函数"]),    ..........    [,"callback"=>"全局回调函数"]);//注:方括号括起来的参数可选

    ##Parameter explanation:

    $page The target page collected

    Type:

    stringThe URL address of the web page to be crawled (supports https); Or html code snippet

    $rules Collection rules

    Type:

    array

    • Rule nameThe rule name can be chosen casually, as long as it is not repeated.
    • jQuery selectorAny CSS3 selector, completely common with jQuery selector
    • Attributes to be collected
      值为以下3种:  1.text:返回当前选中标签下面的纯文本  2.html:返回当前选中标签下面的html片段  3.[HTML标签属性]:如src、href、name、data-src等任意HTML标签属性名
    • Filter tag list

      如果要使用QueryList的内容过滤功能,就请设置这个参数,多个值之间用空格隔开  1.当标签名前面添加减号(-)时(此时标签可以为任意的jQuery选择器),表示移除该标签以及标签内容。  2.当标签名前面没有减号(-)时,当 [要采集的属性] 值为text时表示需要保留的HTML标签,为html时表示要过滤掉的HTML标签

      Explanation: The difference between having a minus sign and not having a minus sign is that when there is a minus sign, that tag will be removed including all the tags. Content, if there is no minus sign, only that tag will be removed and the content within the tag will not be removed

      Example: Content filtering

    • Callback function/global callback functionType:
      callbackYou can do any additional things in the callback function, such as: replace content, complete links, download pictures, etc. ;
      The callback function has two parameters. The first parameter is the selected content, and the second parameter is the selector array subscript (that is,
      Rule name). The callback function will override the global callback. function.
      Note: QueryList cannot be used in the callback function for nested multi-level collection. Please defer these operations to the callback function of the getData() method.

    $range range selector (optional)

    Type:

    stringDefault value:
    ''

    Area selector or Range selector, refers to first selecting a few large blocks according to the rules, and then Then make relevant selections in the blocks respectively. When collecting lists, it is recommended to set this parameter.

    View the region selector example: http://doc.querylist.cc/site/index/doc/29

    $outputEncoding Output encoding (optional)

    Type:

    stringDefault value:
    null

    refers to what encoding to output (UTF-8, GB2312,…..) to prevent garbled characters. If null is set, the original string encoding will not be changed

    $inputEncoding Input encoding (optional)

    Type: string
    Default value: null

    Clearly specify the input page encoding format (UTF-8, GB2312,…..) to prevent garbled characters. If null is set, it will automatically identify

    $removeHead Whether to remove the head (optional)

    Type: bool
    Default value: false

    Yes Remove the page header area, the ultimate solution to garbled characters.
    Note:When this parameter is set to true, the content in the head area of ​​the page cannot be selected.