love Shanghai included:
friends just online website, download his Web log analysis tools, analysis from the following to grab the last few days by log:
2, visit the home page
1, visit robots.txt
Through a variety of tests before
daily visits spiders are increasing, access is also increased, but the crawled pages are not regular pages, is full of useless pages, such as comment pages, maximum crawl on the home page, and the page is full of useless grab, grab the page, think really hard enough to be included in future. Only included the home page. As in the work, also not be included, opened the site to see the page, an important position is full of useless pages, a web application is also an important factor affecting included.
this site launched in March, every day to update the article, do not stop the chain, but always can not be included in Shanghai love.
so, want to articles, in addition to organize the content, but also let the crawler crawl.
first search search engine from a large number of reptiles crawl on the content of the website, follow the links while downloading, it extracts the link, and then in the filter, weight, index, establish the results of a series of algorithms, the novice can refer to the official "love Shanghai search engine based on knowledge", understood the process was clearly love Shanghai the sequence included.
so that the web site is not included friends of the most important reasons for spiders crawl depth is not deep enough, and there is no effective solution: grab, modify the program, put some page shielding useless, or directly for a better comparison to grab the template, a lot of new Shanghai dragon staff, often encounter this the problem, so the log is the fastest way to download website with Japanese >
, understand a search engine crawler at least order is as follows:
, 5 pages
3, column page / page / site map
grab the pages later will be better to decide whether or not according to the algorithm included, of course this is just a spider crawling order, one of the 2, 3 and 4 order may have different upside down, but generally included at least once to experience this process, and why I do not visit this several order here too much explanation, in Shanghai at the Phoenix VIP training will I speak out. In addition, the page value is a key factor affecting included not included, not in the fetch order within a range.
a lot of Shanghai dragon new friends just established sites say how long how long is not included. The actual site is a certain condition. First look at a friend’s website: