Discuz! Board

 找回密碼
 立即註冊
搜索
熱搜: 活動 交友 discuz
查看: 1|回復: 0

Control crawl behavior on domain

[複製鏈接]

1

主題

1

帖子

5

積分

新手上路

Rank: 1

積分
5
發表於 2024-2-20 14:00:43 | 顯示全部樓層 |閱讀模式
The downside is that it is somewhat limited in terms of customizability. Where to put the robots.txt file Place the robots.txt file in the root directory of the subdomain to which it applies. For example, to control crawl behavior on domain.com, you must have access to the robots.txt file at domain.com/ robots.txt. If you want to control crawling on a subdomain, such as blog.domain.com , you must have access to the robots.txt file at blog.domain.com/robots.txt . Best practices for robots.txt files Keep these points in mind to avoid common mistakes. Use a new line for each directive Each directive must be placed on a new line. Otherwise, search engines will be confused. bad example: User-agent:


* Disallow: /directory/ Disallow: /another-directory/ Good example: User-agent: * Disallow: /directory/ Disallow: /another-directory/ Use wildcards to simplify  Australia Phone Number Data instructions The wildcard (*) not only allows directives to apply to all user agents, but also allows you to match URL patterns when declaring directives. For example, if you want to prevent search engines from accessing parameterized product category URLs on your site , you can create a list like this: User-agent: * Disallow: /products/t-shirts? Disallow: /products/hoodies? Disallow: /products/jackets? … But it's not very efficient. I recommend using wildcards to simplify things like this: User-agent: * Disallow: /products/*? This example blocks search engines from crawling all


       


URLs that contain question marks under the /product/ subfolder. In other words, you are blocking parameterized product category URLs. Use "$" to specify the end of the URL Include a "$" symbol to indicate the end of the URL. For example, if you want to prevent search engines from accessing all .pdf files on your site, your robots.txt file might look like this: User-agent: * Disallow: /*.pdf$ In this example, search engines cannot access URLs that end in .pdf. In other words, /file.pdf is not accessible, but /file.pdf?id=68937586 is accessible because it does not end with ".pdf".


回復

使用道具 舉報

您需要登錄後才可以回帖 登錄 | 立即註冊

本版積分規則

Archiver|手機版|自動贊助|zv

GMT+8, 2024-11-23 10:08 , Processed in 0.033390 second(s), 18 queries .

抗攻擊 by GameHost X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回復 返回頂部 返回列表
一粒米 | 中興米 | 論壇美工 | 設計 抗ddos | 天堂私服 | ddos | ddos | 防ddos | 防禦ddos | 防ddos主機 | 天堂美工 | 設計 防ddos主機 | 抗ddos主機 | 抗ddos | 抗ddos主機 | 抗攻擊論壇 | 天堂自動贊助 | 免費論壇 | 天堂私服 | 天堂123 | 台南清潔 | 天堂 | 天堂私服 | 免費論壇申請 | 抗ddos | 虛擬主機 | 實體主機 | vps | 網域註冊 | 抗攻擊遊戲主機 | ddos |