/ / Parser what it is: idea and movement

Parser what it is: idea and movement

The Internet has made the information available, but tochoose from it the right, you still have to make serious efforts and lose considerable time. Hypertext languages ​​have formalized the presentation of information, but the problem of parsing (recognition) has not been simplified, and in some areas it has even become more complicated. The set of presentation formats, languages, styles, access options, data marking methods must “know and be able to” the parser: that “this is exactly what is needed”.

A person sees and hears above all through a prism.own knowledge and experience, and having formalized it in the form of an algorithm, it receives a static mechanism and makes sure that the ideal solution is still far enough.

Parser what is it

Parsing Tool Palette

Parser - task definition:find the necessary information from the issuance of a search engine, site content, documents, spreadsheets, files of other formats. More formally: to determine and form a stream of information, apply a set of keywords to it according to certain rules for a specific purpose.

Algorithms are traditionally divided into syntactic andsemantic, including a certain number of languages. The tool for parsing can be a program, site, plugin. There are many options for implementation, each with its own advantages and disadvantages. In particular, the content parser X-Parser works on the list of keywords. Result: gives clear text, lists of snipplets, links, URLs, ... A developed system of filters, setting up languages ​​and formatting the result are proposed.

DataCol program is focused on collectinginformation for filling the site with content. For example, to create a site of a specific subject (restaurants, shops, a tour operator, ...) you always need general information, which, in order to save time, you can quickly find on the Internet than scan or type manually.

Mailagent Parser is focused on address collectionEmail; SlimerJs allows you to quickly analyze complex dynamic sites. The WordPress site management system offers its own parsing module that you can set up, for example, a constantly updated news feed.

What is a parser

A lot of tools, but the number of works on the formation, disassembly and formatting of information flows is steadily increasing.

Using available tools reminds morethe process of understanding the necessary mechanism for a particular parsing for a specific task, rather than trying to attach something that already exists to its resource.

The main areas of parsing

Usually a mass customer claims about the parser,that it is a filter, and confidently insists on it. Indeed, in order to fulfill a visitor’s desire, a search site analyzes a variety of information sources, although most often it digs into its own databases, nevertheless, it is systematically replenished. Any decent site also offers a search for its content, its information, related sites. This is also related to the topic “what is a parser”, but the true content of the problem lies in a different plane.

Parser definition

We must pay tribute to the languages ​​of hypertext:their numerous, but strict tags and ways of data design allow to strictly formalize what the browser should recognize, and this is already parsing. Many tools for searching information use browser options (engines). Regular expressions are also an effective way to find the right information. A jQuery implementation is a special form of parsing a document that lies within it and forms part of it or controls it.

Что такое парсер?This is both PHP, and the browser, and JavaScript built into it. These tools perform their most syntactic function. But what is real and essential: a parser is a value that defines the scope and purpose.

Speaking of the tourist office, you can putthe task is to develop a parser of recreation areas, to provide updated information on living conditions, weather, food prices, museums operating modes. When developing a news site, you should write something that will analyze a specific set of sites and collect fresh information from them.

Parser value

The structure and content of the process

Before you make a meaningful answer to the question“Parser: what is it?”, You need to form a stream of information and define a set of keywords. The search results analysis algorithm, despite its seeming formality, has at its entrance various elements in which the search words and their sequences may go beyond the limits of the desired semantics.

Content parser x parser

Even prestigious search engines, performinga user request is often offered not at all what is required by meaning, moreover, in its own way it is supplied with everything that is offered with a significant amount of advertising and spam.

Claim about the parser, which is equivalentartificial intelligence (since we have to deal with the construction of algorithms that must be adapted to changing information flows, mobile rules for the formation and use of keywords) is very early.

The lion's share of "parsing", which automatically andunconsciously, a person makes every second very simple, the logic of this process can be fairly easily formalized, and partly the existing tools demonstrate this.

From statics to dynamics

You can also say about the parser that itthe combination of the algorithm of formation of the flow of information, the rules for determining keywords and their application. But these three bases are unsteady like sand, and in a particular application they can be interpreted in different ways.

Banal search through Google and its variantparsing the word “key” with a probability of 0% will find at least one article about a spring that peacefully murmurs somewhere in a wonderful place. The probability does not increase, even if you specify the "key in the meadow." Google will faithfully issue:

  • The key to start!
  • Resting places in nature - Official site of the administration ...
  • Goryachy Klyuch, official website "Hot Key", forum "Hot Key" ... On the Meadow Sights Taganay - Taganai National Park
  • Guest house on Krasnaya Polyana, rent a house (cottage) on the New ...
  • "Heavenly Key" - Result from Google Books

...

Naturally, the parsing algorithm shouldto optimize this issue and to give out information about the key as a spring, what they are, where they meet, than interests and useful. Obviously, even the most advanced parsing from the issue of Google will not do anything here.

Active knowledge

To solve the problem properlyyou must not parse the search engines, but the content of many sites and the content of an unspecified number of articles. How to get a meaningful information flow from the word “key”?

There can be only one option:you need to make the keywords active, that is, the search for a specific word should be expanded in its meaning. The search rule must be active, that is, initially defined, something in itself turns into a preliminary clarification of the meaning, and then the movement begins, both in terms of the formation of an appropriate source of information (the stream being analyzed) and in what it parses .

Active knowledge is something from the fieldMan> Intellect> Programming, some kind of chipiotics is obtained. This is not just a rule, not just a keyword. The person acquired intellect and formalized it through programming, not statically, but dynamically, giving the parsing a new meaning - input variability and mobility in the process.

The indicated concept assumes the elementself-development is difficult, but if popular search engines “learned” to analyze search queries and began to send adequate advertising to each browser, it is quite possible to direct this success to a more expedient course.

The perfect solution: your own knowledge and experience> the prism of the correct rules

Parsing has become a serious tangible task andformed a specific experience of the formation of information flows, the rules for the use of keywords. The recognition of symbols, scanned images and almost “perfect” translations from one language to another against the background of the development of interaction interfaces (API of sites, search engines, parsers) allow us to determine the correct direction of movement.

Как все будет реализовано, сказать еще трудно, но It is absolutely true that the rules for forming information flows, the structure of keywords and the development of an instrument must be active, and this component, due to the general static and formality of modern programming languages, must be determined in the process of use.

This is the case when the natural human factor in the process of solving urgent problems can and will contribute to the training and development of the sphere of parsing, the formation of a prism of certain rules.