Did you know that around 90 percent of all the data available around the world today was generated in recent years? Due to the numerous new information and communication technologies, the volume of data worldwide has grown incredibly and offers previously unknown possibilities. Big Data stands for this volume of structured and unstructured data, which cannot be processed with conventional software or hardware due to its size.
These data volumes are created, among other things, with each of our clicks on the Internet. This can be, for example, a purchase on Amazon, a search query on Google, activity on social networks such as Instagram or Facebook etc.
However, large amounts of data alone do not make Big Data. Only the analysis and processing of these data volumes, e.g. by a company, distinguishes Big Data. In 2001, analyst Doug Lane created a definition of Big Data with his 3-V model that is still recognised today. According to Lane, Big Data has the following three characteristics:
- Volume: Companies collect large volumes of data from various sources. These include intelligent devices (IoT) such as mobile phones, videos, social media, etc. In the past, it would not have been possible to store these large volumes of data; today, storage platforms exist for this purpose.
- Velocity: Companies are currently being flooded with data streams at unprecedented speeds that need to be processed quickly.
- Variety: The data collected is diverse and has a wide variety of formats: numerical data, which is available in structured form and stored in ordinary databases, can be part of Big Data, as well as unstructured text documents, data from financial transactions or e-mails.
…stands for a large amount of available data that is analysed and processed for a specific purpose. According to Doug Lane, Big Data is characterised by volume, speed and diversity.
Big Data vs. Small Data
Unlike Big Data, Small Data refers to data in a volume and format accessible to humans. The following points show how Big Data can be distinguished from Small Data:
- Targets: Small Data is used for a defined goal, the use of Big Data often develops unexpectedly.
- Location: Small Data is generally stored in one place, usually in one file on the PC, while Big Data is usually spread across numerous files on different servers located in different countries.
- Data structure: Small Data is structured in a straight line, whereas Big Data can be unstructured and can have many file formats from different fields.
- Data preparation: only one end user is usually involved in the preparation of Small Data. In the case of Big Data, however, it is often the case that one group of people prepares the data, another group analyses the data and finally a third group uses the data. Each of these groups may have different objectives.
- Durability: Small Data is generally retained for a certain period of time after the completion of a project. In the case of Big Data, however, the data remains stored for an unlimited period of time.
- Origin: Small Data is stored within a short time and in specific units of measurement. Big Data, on the other hand, originates from different places, countries, companies, organisations, etc.
- Reproducibility: Small Data can generally be completely reproduced. Big Data, by contrast, originates from many different sources and is available in many forms that reproduction is impossible.
- Quality: the meanings of the data in a Small Data set are unambiguous, these data can therefore describe itself. Big Data, conversely, is much more complex and may also contain unidentifiable information that has no specific meaning. This can reduce the quality of the data.
- Analysis: a single process is usually sufficient for the analysis of Small Data, since the data is analysed from only one computer file. In the case of Big Data, the data must be extracted, checked, reduced, etc. in a time-consuming process.
As you can see from the distinction between Big Data and Small Data, Big Data is literally often difficult to grasp.