The phrase “big data” appeared in 2008 with the light hand of Clifford Lynch. In a special issue of Nature magazine, the expert called the explosive growth of information flows – big data. In it, he attributed any arrays of heterogeneous data over 150 GB per day.
Of the statistical calculations of analytical agencies in 2005, the world operated on 4-5 exabytes of information (4-5 billion gigabytes). After 5 years, the volume of big data grew to 0.19 zettabytes (1 ST = 1024 EB). In 2012, the indicators increased to 1.8 STs, and in 2015 – up to 7 STs. Experts predict that by 2020, big data systems will operate on 42-45 zettabytes of information.
Big data technologies were considered only as scientific analysis until 2011 and had no practical way out. But, data volumes grew and the problem of huge arrays of unstructured and heterogeneous information became relevant in early 2012.
Mastodons of digital business – Microsoft, IBM, Oracle, EMC, and others joined in the development of a new direction. Since 2014, big data has been studied at universities, introduced into applied sciences – engineering, physics, sociology.
What is Big Data?
Big data is a modern technological area related to the processing of large amounts of data that are constantly growing. Big Data is the information itself, the methods of its processing and analytics. The prospects that Big Data can bring are interesting to business, marketing, science, and the state.
First of all, big data is still information. So large that it is difficult for her to operate using conventional software tools. It can be structured (processed), and unstructured (fragmented).
Here are some examples of it:
- Data from seismological stations throughout the Earth.
- Base user accounts on Facebook.
- Geolocation information of all photos posted today on Instagram.
- Databases of mobile operators.
Big Data develops its own algorithms, software tools, and even machines. In order to come up with a means of processing constantly growing information, it is necessary to create new, innovative solutions. That is why big data has become a separate area in the technological field.
What is VVV?
To reduce the fuzziness of Big Data definitions, features have been developed to which they must correspond. Everything starts with the letter V, so the system is called VVV:
- Volume – The amount of information is measurable.
- Velocity – The amount of information is not static – it is constantly increasing, and processing tools should take this into account.
- Variety – Information is not required to have one format. It can be unstructured, partially or completely structured.
To these three principles, with the development of the industry, additional Vs are added. For example, veracity – reliability, value – value or viability – vitality.
But the first three are enough for understanding: big data is measurable, incremental and heterogeneous.
What big data is for?
The main goal of working with big data is to curb them (analyze) and direct them. Mankind has learned to produce and extract huge amounts of information, and there are still problems with its management.
Right now, big data is helping to solve such problems:
- Increase in labor productivity;
- Accurate advertising and sales optimization;
- Forecasting situations in domestic and global markets;
- Improvement of goods and services;
- Improvement of logistics;
- High-quality targeting of customers in any area of business.
Big data makes services more convenient and profitable for both sellers and buyers. Enterprises can find out which products are more popular, how to form a pricing policy, when is the best time for sales, how to optimize production resources to make it more efficient. Due to this, customers receive the exact offer “without water”.
Where more data is used?
- Cloud storage. Storing everything on local computers, disks, and servers are inconvenient and costly. Large cloud data centers are becoming a reliable way to store information available at any time.
- Blockchain. The revolutionary technology that has been shaking the world in recent years simplifies transactions, makes them safer, and, most importantly, copes well with processing operations between a huge number of counterparties due to its mathematical algorithm
- Self-catering. Robotization and industrial automation reduce the cost of doing business and reduce the cost of goods or services.
- Artificial intelligence and deep learning. Imitating the thinking of the brain helps to create responsive systems that are effective in science and business.
These areas are created and progressed through data collection and analysis. Pioneers in the field of such developments are search engines, mobile operators, online commerce giants, banks.
Big Data will be an integral part of Industry 4.0 and the Internet of Things when complex systems from a huge number of devices work as a whole. Here are simple, no longer futuristic, examples of this:
- The automated plant itself changes the product line, focusing on the analysis of demand, supply, cost and market situation.
- A smart home gives recommendations on how to dress according to the weather and which route is the fastest to get to work in the morning.
- The company analyzes production and distribution channels, taking into account changes in the real market situation.
- Road safety is enhanced by collecting data on driving style and violations of individual drivers, as well as the condition of their cars.
Who uses big data?
All large companies use this technology – IBM, Google, Facebook and financial corporations – VISA, Master Card, as well as ministries from around the world. For example, Germany reduced the issuance of unemployment benefits, calculating that some citizens receive them without reason. So it was possible to return to a budget of about 15 billion euros.
The recent scandal with Facebook due to user data leakage suggests that the volume of unstructured information is growing and even the digital era mastodons can not always ensure their complete confidentiality.
For example, Master Card uses big data to prevent fraudulent transactions with customer accounts. Thus, it is possible to annually save more than 3 billion US dollars from theft.
The industry’s greatest progress is in the US and Europe. Here are the largest foreign companies and departments that use Big Data:
- HSBC improves the security of plastic card customers. The company claims that it has 10 times improved recognition of fraudulent transactions and 3 times improved protection against fraud in general.
- Watson’s supercomputer, developed by IBM, analyzes financial transactions in real-time. This allows you to reduce the frequency of false alarms of the security system by 50% and identify 15% more fraudulent activities.
- Procter & Gamble conducts market research using Big Data, more accurately predicting customer wishes and demand for new products.
- The German Ministry of Labor achieves a targeted expenditure of funds by analyzing big data when processing applications for benefits. This helps send money to those who need it (it turned out that 20% of benefits were paid inappropriately). The ministry claims Big Data tools cut costs by € 10 billion
How does big data technology work?
To indicate the array of information with the prefix “big”, it must have the following features:
Data is measured by the physical quantity and occupied space on a digital medium. Arrays of over 150 GB per day referred to as “big”.
Information is regularly updated and intelligent technologies are needed for real-time processing.
Information in arrays can have heterogeneous formats, be partially structured, completely and accumulate haphazardly. For example, social networks use big data in the form of texts, videos, audio, financial transactions, pictures, and more.
In modern systems, two additional factors are considered:
data streams can have peaks and dips, seasonality, periodicity. Bursts of unstructured information are difficult to manage and require powerful processing technologies.
information can have different complexity for perception and processing, which complicates the work of intelligent systems. For example, an array of messages from social networks is one level of data, and transactional operations are another. The task of machines is to determine the degree of importance of the incoming information in order to quickly structure.
The principle of operation of the big data technology is based on the largest informing of the user about any subject or phenomenon. The task of such familiarization with the data is to help weigh the pros and cons to make the right decision. In intelligent machines, a model of the future is built on the basis of an array of information, and then various options are simulated and results are tracked.
Modern analytic agencies run millions of such simulations when they test an idea, speculation, or solve a problem. The process is automated.
The sources of big data include
- Internet blogs, social networks, sites, media and various forums;
- Corporate information – archives, transactions, databases;
- Readings readings – meteorological instruments, cellular sensors and others.
The principles of working with data arrays include three main factors:
- System extensibility. By it is meant usually the horizontal scalability of storage media. That is, the volume of incoming data has grown – the capacity and number of servers for storing them have increased.
- Resistance to failure. It is possible to increase the number of digital media, intelligent machines commensurate with data volumes. But this does not mean that part of the machines will not fail, become obsolete. Thus, one of the factors of stable work with big data is server fault tolerance.
- Localization. Separate arrays of information are stored and processed within a single dedicated server to save time, resources, and data transmission costs.
What is it used for?
The more we know about a particular subject or phenomenon, the more we comprehend the essence and can predict the future. By removing and processing data streams from sensors, the Internet, and transactional operations, companies can predict product demand, and emergency services can prevent technological disasters. Here are some examples outside the business and marketing sectors of how big data technologies are used:
- Healthcare More knowledge about diseases, more treatment options, more information about medications. All this allows you to fight against diseases that were considered incurable 40-50 years ago.
- Prevention of natural and man-made disasters. The most accurate forecast in this area saves thousands of lives. The task of intelligent machines is to collect and process a variety of sensor readings and, based on them, help people determine the date and place of a possible cataclysm.
- Law enforcement agencies. Big data is used to predict the upsurge of crime in different countries and take deterrent measures where the situation requires it.
Analysis and processing techniques
The main methods for analyzing large amounts of information include the following:
- In-depth analysis, data classification. These techniques come from technologies for working with ordinary structured information in small arrays. However, advanced mathematical algorithms based on advances in the digital field are used.
- Crowdsourcing. At the heart of this technology is the ability to receive and process billions of byte streams from multiple sources. The final number of “suppliers” is not limited to anything. Unless the power of the system.
- Split testing. From the array, several elements are selected, which are compared among themselves alternately before and after the change. A \ B tests help determine which factors have the greatest impact on elements. For example, with the help of split testing, a huge number of iterations can be carried out approaching a reliable result.
- Forecasting. Analysts are trying to pre-set the system with certain parameters and then check the behavior of the object based on the arrival of large amounts of information.
- Machine learning. Artificial intelligence in the future is able to absorb and process large volumes of unsystematized data, subsequently using it for independent learning.
- Network activity analysis. Big data techniques are used to study social networks, the relationships between account owners, groups, communities. Based on this, target audiences are created by interests, geolocation, age, and other metrics.
Big data in business and marketing
Business development strategies, marketing events, advertising are based on the analysis and work with the available data. Large arrays make it possible to “shovel” gigantic amounts of data and, accordingly, as accurately as possible adjust the direction of development of a brand, product, service.
For example, the RTB auction in contextual advertising works with big data. That allows you to effectively advertise commercial offers to a selected target audience, and not to everyone.
What are the Business Benefits?
- Creation of projects that are likely to become in demand among users, customers.
- Studying and analyzing customer requirements with the existing company service. Based on the calculation, the work of the maintenance staff is adjusted.
- Identification of customer loyalty and dissatisfaction by analyzing a variety of information from blogs, social networks, and other sources.
- Attracting and retaining the target audience through analytical work with large amounts of information.
Today, the business knows more about its customers than we know about ourselves – therefore, the advertising campaigns of Coca-Cola and other corporations are a resounding success.
In 2019, the importance of understanding and, most importantly, working with arrays of information increased by 4-5 times compared to the beginning of the decade. With widespread integration came big data in the field of small and medium-sized businesses, startups:
- Cloud storage The technology of storing and working with data in the online space allows you to solve a lot of problems of small and medium-sized businesses: it’s cheaper to buy a cloud than to maintain a data center, staff can work remotely, you do not need an office.
- Deep learning, artificial intelligence. Analytical machines mimic the human brain, i.e. artificial neural networks are used. Training takes place independently on the basis of large amounts of information.
- Dark Data – collection and storage of non-digitized company data that does not have a significant role for business development, but they are needed in technical and legislative plans.
- Blockchain Simplification of Internet transactions, reducing the cost of these operations.
- Self-service systems – since 2016, special platforms have been introduced for small and medium-sized businesses, where you can independently store and organize data.
We learned what big data is? And We examined how this technology works, and for this, information arrays are used. We got acquainted with the principles and methods of working with big data.