search

Home  >  Q&A  >  body text

javascript - Problem with crawling web page Jquery selector first-child

When crawling a website,
I feel that h2 and h3 have the same structure. Why can h2:first-child get data, but h3 cannot.

The final results h2_1 and h2_2 are the same, no problem.
h3_1 is ok, but h3_2 is empty. Why is this?

code show as below,

const jsdom = require('jsdom');

const jquery = require('jquery');

jsdom.env('https://www.osram.com/os/news-and-events/spotlights/index.jsp', [], {
    defaultEncoding: 'utf-8'
}, function(err, window) {
    if(err) {
        console.error('error get news url from page [%s]');
        return;
    }

    let $ = jquery(window);

    let el = $('p.col-xs-6.col-sm-7.colalign:first');


    let h2_1 = $(el).find('h2.font-headline-teaser').text();
    console.log('h2_1=' + h2_1);
    let h2_2 = $(el).find('h2.font-headline-teaser:first-child').text();
    console.log('h2_2=' + h2_2);

    let h3_1 = $(el).find('h3.font-sub-headline').text();
    console.log('h3_1=' + h3_1);

    let h3_2 = $(el).find('h3.font-sub-headline:first-child').text();
    console.log('h3_2=' + h3_2);



    window.close();


});
巴扎黑巴扎黑2834 days ago556

reply all(1)I'll reply

  • 为情所困

    为情所困2017-05-16 13:30:41

    The selector xxx:first-child means that when the first child element of the parent element of xxx is xxx, to select xxx, these two conditions need to be met at the same time.

    is not the first child element of the parent element of xxx, nor is it the first xxx among the child elements of the parent element of xxx

    The first child element of the parent element of

    h2.font-headline-teaser is h2.font-headline-teaser, so it can be selected

    The first child element of the parent element of h3.font-sub-headline is not h3.font-sub-headline, so it is empty

    reply
    0
  • Cancelreply