Data science

Data is going to become the fundamental substrate of science and take a legitimate position, as previously only facts. They begin to be more obligatory in the process of cognition than sensory data. Data, not facts, is now an ordinary coin of science. Many scientists cannot resist the temptation of correlation and run unverified data through powerful mills of statistical programs.

Big Data seems to fulfill modern promises by establishing a form of government based on an incorruptible and anti-authoritarian basis. In a universe of useless data, which is quantified in bits and bytes, data surrounds an aura of social neutrality. Science seems to have achieved its goal. But the “raw data” of the data set is far from being as raw and untouched as it sounds in its name.

They are largely the result of the method of collection and production selected for specific purposes. Social networking platforms that record the behavior of their users on the network have a decisive influence on the nature and composition of the data they collect. Allowing only a limited repertoire of behavior from which the company prefers the chosen ones, placing them in a prominent place or simplifying their implementation, some data is generated very often, while others are excluded from the very beginning.

In this sense, the implemented algorithms are selective, and the data they collect is created: the data itself is worth exploring. This is another reason why scientists are becoming proponents of “free data.” They require free access to all data in the expanding data universe. First of all, they want to prevent the transfer of user data to the hands of several companies, the analysis of which cannot be publicly verified. But scientists do not just require the release of data, they require the release of all data. Rather, they require public trust for responsible data processing.

This has good reason. First, anonymity can hardly be guaranteed. This is always the case with identification technologies. 87 percent of the population can already be identified on the basis of so-called quasi identifiers, such as gender, date of birth, and zip code. The more records, the more confident it is to deduce who does what he does intentionally.

Secondly, even the slightest attempt to anonymize a dataset distorts its analysis. The shift of the mean values, scattering and, therefore, the correlation coefficients change their size as a result of anonymization. One thing is certain with certainty: the data created after creation will forever remain in the world. Therefore, they are becoming more and more.

Digital data storage has gigantic capabilities. For four zeta bytes, that is 4 x 1021 bytes, in 2013 the amount of data was estimated worldwide. At the same time, data hunters and collectors are just starting to gain momentum. Everything is recorded and saved. Meanwhile, more sensors are produced than potatoes are grown. Right in the foreground: science. The Large Hadron Collider contributes a hundred terabytes per day, which is 1014 bytes; The radio telescope should generate more data from 2024 per day than it exchanges on the Internet, that is, more than 90 exabytes or 9 · 1019 bytes; 1.5 gigabytes (1.5 x 109 bytes) of the human genome are sequenced in an hour and a half.

As a result, third parties not only know the person better than themselves. The image of mankind is also undergoing fundamental changes: a person mutates from a subject to a project from an autonomous (base) legal entity, whose future actions are subject to free will to the planned volume, which reproduces the actions that can be stored on it. And actions are not limited to consumer behavior.

Therefore, the future is the science of information. Data has become an integral part of life. They carry in themselves everything that our world is today and all the processes that take place in it. It is very important to understand how this data is transmitted and processed. This will help to better understand the data that we receive during marketing campaigns.