jolomic 發表於 2024-2-20 14:00:43

Control crawl behavior on domain

The downside is that it is somewhat limited in terms of customizability. Where to put the robots.txt file Place the robots.txt file in the root directory of the subdomain to which it applies. For example, to control crawl behavior on domain.com, you must have access to the robots.txt file at domain.com/ robots.txt. If you want to control crawling on a subdomain, such as blog.domain.com , you must have access to the robots.txt file at blog.domain.com/robots.txt . Best practices for robots.txt files Keep these points in mind to avoid common mistakes. Use a new line for each directive Each directive must be placed on a new line. Otherwise, search engines will be confused. bad example: User-agent:


* Disallow: /directory/ Disallow: /another-directory/ Good example: User-agent: * Disallow: /directory/ Disallow: /another-directory/ Use wildcards to simplifyAustralia Phone Number Data instructions The wildcard (*) not only allows directives to apply to all user agents, but also allows you to match URL patterns when declaring directives. For example, if you want to prevent search engines from accessing parameterized product category URLs on your site , you can create a list like this: User-agent: * Disallow: /products/t-shirts? Disallow: /products/hoodies? Disallow: /products/jackets? … But it's not very efficient. I recommend using wildcards to simplify things like this: User-agent: * Disallow: /products/*? This example blocks search engines from crawling all


http://zh-cn.aolists.com/wp-content/uploads/2024/02/Australia-Phone-Number-Data.jpg       


URLs that contain question marks under the /product/ subfolder. In other words, you are blocking parameterized product category URLs. Use "$" to specify the end of the URL Include a "$" symbol to indicate the end of the URL. For example, if you want to prevent search engines from accessing all .pdf files on your site, your robots.txt file might look like this: User-agent: * Disallow: /*.pdf$ In this example, search engines cannot access URLs that end in .pdf. In other words, /file.pdf is not accessible, but /file.pdf?id=68937586 is accessible because it does not end with ".pdf".


頁: [1]
查看完整版本: Control crawl behavior on domain

一粒米 | 中興米 | 論壇美工 | 設計 抗ddos | 天堂私服 | ddos | ddos | 防ddos | 防禦ddos | 防ddos主機 | 天堂美工 | 設計 防ddos主機 | 抗ddos主機 | 抗ddos | 抗ddos主機 | 抗攻擊論壇 | 天堂自動贊助 | 免費論壇 | 天堂私服 | 天堂123 | 台南清潔 | 天堂 | 天堂私服 | 免費論壇申請 | 抗ddos | 虛擬主機 | 實體主機 | vps | 網域註冊 | 抗攻擊遊戲主機 | ddos |