According to the concept of indexing, adopted by Google,The completeness, objectivity of the information and its correspondence to the search query when the results are given are taken into account. If a site with illegal content enters the index, or the resource is intended for spam, then the pages of such a site will not be marked in the general database of the search engine. It is important for us to learn how to delete a site from the server by search results.
Zero indexing options for Google
Once the crawler is a collection programinformation about new resources - will scan the site on a page-by-page basis, then, if it meets the requirements of Google's policy regarding parsing, it will be indexed. But we will also describe how to delete your site or individual fragments for search engines using robots.txt - a pointer and at the same time a search terminator.
To exclude the entire resource from the issuance,the root folder of the server on which the site is located, creates a specific text zone - the aforementioned robots.txt. This zone is processed by search engines and operates according to the instructions read.
Keep in mind that the Google search engineindex the page, even if the user is not allowed to view. When the browser responds 401 or 403, "Access is not valid," this applies only to visitors, not to collection programs for this search server.
To understand how to delete a site from search indexing, you should enter the following lines into the text pointer:
User-agent: Googlebot
Disallow: /
This indicates to the search robot that it is forbidden to index the entire content of the site. Here's how to delete a Google site so that the site does not cache a resource in the list of detected sites.
Scan options for different protocols
If you need to list individual standardsLinks for which you would like to apply specific rules for indexing Google, for example, separately for hypertext protocols http / https, this should also be written in robots.txt in the following way (example).
(http://yourserver.com/robots.txt) - the domain name of your site (any)
User-agent: * - for any search engine
Allow: / - allow full indexing
How to remove a site from the issuance completely for the https protocol
(https: //urserver.com/robot.txt):
User-agent: *
Disallow: / full prohibition on indexing
Urgent removal of the URL of the resource from Google's Google search
If you do not want to wait for the re-indexing, andthe site needs to be hidden as soon as possible, I recommend using the service http://services.google.com/urlconsole/controller. Pre-robots.txt should already be placed in the root directory of the site server. The instructions should be written in it.
If the pointer is not available for some reasonfor editing in the root directory, it is enough to create it in the folder with the objects for which you want to hide from the search engines. As soon as you do this and contact the automatic hypertext address removal service, Google will not scan the folders that are spelled out in robots.txt.
The period of such invisibility is fixed for 3 months. After this period, the directory removed from the issuance will be processed again by the Google server.
TOhow to delete a site for scanning in part
When the search bot reads the contents of the robots.txt, then based on its contents, certain decisions are made. For example, you need to exclude from the display the entire directory named anatom. For this it suffices to write such instructions:
User-agent: Googlebot
Disallow: / anatom
Or, for example, you want not to index all pictures like .gif. To do this, add the following list:
User-agent: Googlebot
Disallow: /*.gif$
Here is another example. Let's delete the information about dynamically generated pages from the parsing, then add the following entry to the pointer:
User-agent: Googlebot
Disallow: / *?
Here so, approximately, and rules forsearch engines. Another thing is that it is much more convenient for all this to use the META tag. And webmasters often use just such a standard that regulates the operation of search engines. But we'll talk about this in the next articles.