In the previous chapters you have witnessed how complex it is to analyse and use Big Data. At least as complex are the challenges and risks associated with large amounts of data. Probably the biggest challenge for companies in connection with Big Data is data protection:
Although companies have been paying more attention to data protection in recent years, there are still problems. For example, personal data of Internet users will be used without their consent and the persons concerned can be identified, controlled and in the worst case blackmailed.
Definition
Personal data
… refers to data that relates to a person and allows conclusions to be drawn about their personality. This includes, for example, Werner Kogler’s license plate number, your neighbour’s date of birth or Bill Gates’ account balance.
An example of a data protection violation in connection with Big Data is the case of the Ashley Madison fling portal, which was already mentioned as an example in chapter 2. In this case, the personaldata became public and was used to blackmail the owners of the data.
Data protection regulations and laws help to protect consumers from abuse. The basis of the general data protection law in the European Union and in Austria is the General Data Protection Regulation, which became effective on 25 May 2018.
Excursus
The General Data Protection Regulation
The General Data Protection Regulation, or GDPR for short, is called in its entirety “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals with regard to the processing of personal data and on the free movement of such data and repealing Directive 95/46/EC”. It is directly applicable in Austria and is supplemented by the Data Protection Act (DPA) and the Directive on Data Protection.
This regulation enables EU citizens to better control the collection and use of their personal data. This should strengthen consumer confidence in the individual companies. Existing rights of EU citizens are consolidated in the GDPR, and new rights are also established. The rights established in the GDPR include:
simplified access to personal data – this includes providing comprehensive, clear and comprehensible information on the processing of the data
a new right to datatransferability – personal data will be transferred in a simplified way
a clearer rightto erasure (“right to be forgotten”) – data are deleted if a citizen does not agree to his or her data being processed and there is no legitimate reason to keep them
a right to be informed about hacked personal data – companies and organisations shall immediately inform the persons concerned about serious violations of the protection of personal data. The responsible data protection supervisory authority must also be notified
For companies, the GDPR is intended to create more business opportunities and to promote innovation with numerous measures. These include:
the creation of uniform EU-wide rules, which will lead to major savings
the appointment of a data protection officer within authorities and companies dealing with large data sets
the designation of a single point of contact in their own country to which businesses must turn
the creation of EU rules for third country companies to which third country companies must adhere when offering goods or services or monitor how people behave
the creation of rules that promote innovation and ensure that data protection rules are taken into account at an early stage in the development of services or products
the use of data protection-compatible techniques such as pseudonymisation (replacement of passages in a data record that make it possible to identify the associated person) and encryption (data is encrypted so that it can only be read by authorised persons)
removing reporting obligations for companies in order to promote the free movement of personal data within the European Union
carrying out impactassessments when the processing of the data is likely to threaten the rights and freedoms of individuals
A further challenge is that the existing employees in companies do not always have the necessary expertise and are not open to the possibilities that the analysis of large amounts of data offers the company.
Time and resources are often wasted because those involved are not clear about the goal of a Big Data- project or what infrastructure is required for it. Finding and retaining competentemployees is usually difficult for companies, as they are in great demand.
Moreover, BigData Technology is diverse and confusing for beginners. Have you ever heard of Spark, Hadoop MapReduce, Cassandra or Hbase? These are Big Data technologies with different features and benefits.
In addition, technologies are evolving at a rapid pace, so companies often simply can’t keep up with the pace of adoption. Therefore, for companies that are considering using a Big Data analysis, expertadvice is useful.
Another point is that Big Data projects are very expensive. This applies both to companies that choose an on-premise model and to those that prefer a cloud model. The difference is that with an on-premise model, the company uses the big data software in its own data centre and is responsible for its operation and maintenance. In a cloud-model, on the other hand, the software is only rented by the company and the data remains with the provider.
Definition
On-Premise-Model
…refers to a solution where the company buys or leases Big Data software and deploys it in its owndatacentre. The company has to take care of the hardware itself, and it also takes responsibility for the use of the software and the data is stored at the company.
Definition
Cloud-Model
…refers to a solution in which a company purchases the Big Data software as a service; the provider takes responsibility for maintenance and operation. The company pays a rental price which includes the hardware, operation and maintenance costs. With this solution, the data is stored at the provider.
If a company decides to use an on-premises solution, it must invest in new hardware and hire new employees to operate and maintain the system. In the case of a cloud solution, the company only needs to hire employees to operate and maintain the system, and the company must pay for the cloud services.
After all, the quality of data is often poor, and companies are faced with the challenge of harmonising data from different sources of varying quality. For example, an online merchant analyses data from social media, website logs, call centres and websites that have different formats.
But even when all the problems mentioned have been solved, it is often not that simple for companies to gain useful insights from the large amounts of data. If information is linked together and wrong conclusions are drawn, for instance, this can be dangerous.
For example, a person may be considered uncreditworthy by a bank that performs a Big Data analysis because he or she lives in the same neighbourhood as many uncreditworthy people and drives the same car as many people who are considered uncreditworthy. The following example also shows why the correct use of the large amounts of data is crucial:
Example
An online retailer relies on Big Data Analytics, which is based on historical data about customer behaviour. It turns out that people who buy black sneakers often add a pair of black sneaker socks. The retailer adjusts his range for the spring accordingly. However, just before the beginning of spring, a well-known rapper posts a photo of himself with black sneakers and yellow socks on Instagram. Many young people are therefore looking for yellow socks to go with their black sneakers, but unfortunately the online retailer soon runs out of them because he was not prepared for the rush. The retailer simply used the wrong Big Data strategy, relying only on historical results and ignoring other important data sources such as social media, shops of competitors, etc.
Remember
In summary, these are the main challenges that companies face when using Big Data:
ensuring data security – compliance with the General Data Protection Regulation (GDPR
professionalcompetence of the employees – proficient use of the diverse and rapidly developing Big Data technology
high costs of Big Data projects (hardware and software or rental costs, staff, maintenance etc.)
poor quality of data, standardisation of data in different formats and with different quality
correct interpretation of the results
As you have noticed, Big Data offers enormous possibilities and opportunities that companies have not even come close to exploiting. However, the large volumes of data are also associated with challenges and risks that should not be underestimated and are unsettling for many people. The decisive factor in ensuring that Big Data is used successfully without causing harm to other people is therefore responsible and proficient handling of the large volumes of data.
In the previous chapters you have witnessed how complex it is to analyse and use Big Data. At least as complex are the challenges and risks associated with large amounts of data. Probably the biggest challenge for companies in connection with Big Data is data protection:
Although companies have been paying more attention to data protection in recent years, there are still problems. For example, personal data of Internet users will be used without their consent and the persons concerned can be identified, controlled and in the worst case blackmailed.
Definition
Personal data
… refers to data that relates to a person and allows conclusions to be drawn about their personality. This includes, for example, Werner Kogler’s license plate number, your neighbour’s date of birth or Bill Gates’ account balance.
An example of a data protection violation in connection with Big Data is the case of the Ashley Madison fling portal, which was already mentioned as an example in chapter 2. In this case, the personal data became public and was used to blackmail the owners of the data.
Data protection regulations and laws help to protect consumers from abuse. The basis of the general data protection law in the European Union and in Austria is the General Data Protection Regulation, which became effective on 25 May 2018.
Excursus
The General Data Protection Regulation
The General Data Protection Regulation, or GDPR for short, is called in its entirety “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals with regard to the processing of personal data and on the free movement of such data and repealing Directive 95/46/EC”. It is directly applicable in Austria and is supplemented by the Data Protection Act (DPA) and the Directive on Data Protection.
This regulation enables EU citizens to better control the collection and use of their personal data. This should strengthen consumer confidence in the individual companies. Existing rights of EU citizens are consolidated in the GDPR, and new rights are also established. The rights established in the GDPR include:
For companies, the GDPR is intended to create more business opportunities and to promote innovation with numerous measures. These include:
The complete General Data Protection Regulation can be accessed at https://eur-lex.europa.eu/legal-content/DE/ALL/?uri=CELEX%3A32016R0679.
A further challenge is that the existing employees in companies do not always have the necessary expertise and are not open to the possibilities that the analysis of large amounts of data offers the company.
Time and resources are often wasted because those involved are not clear about the goal of a Big Data- project or what infrastructure is required for it. Finding and retaining competent employees is usually difficult for companies, as they are in great demand.
Moreover, Big Data Technology is diverse and confusing for beginners. Have you ever heard of Spark, Hadoop MapReduce, Cassandra or Hbase? These are Big Data technologies with different features and benefits.
In addition, technologies are evolving at a rapid pace, so companies often simply can’t keep up with the pace of adoption. Therefore, for companies that are considering using a Big Data analysis, expert advice is useful.
Another point is that Big Data projects are very expensive. This applies both to companies that choose an on-premise model and to those that prefer a cloud model. The difference is that with an on-premise model, the company uses the big data software in its own data centre and is responsible for its operation and maintenance. In a cloud- model, on the other hand, the software is only rented by the company and the data remains with the provider.
Definition
On-Premise-Model
…refers to a solution where the company buys or leases Big Data software and deploys it in its own data centre. The company has to take care of the hardware itself, and it also takes responsibility for the use of the software and the data is stored at the company.
Definition
Cloud-Model
…refers to a solution in which a company purchases the Big Data software as a service; the provider takes responsibility for maintenance and operation. The company pays a rental price which includes the hardware, operation and maintenance costs. With this solution, the data is stored at the provider.
If a company decides to use an on-premises solution, it must invest in new hardware and hire new employees to operate and maintain the system. In the case of a cloud solution, the company only needs to hire employees to operate and maintain the system, and the company must pay for the cloud services.
After all, the quality of data is often poor, and companies are faced with the challenge of harmonising data from different sources of varying quality. For example, an online merchant analyses data from social media, website logs, call centres and websites that have different formats.
But even when all the problems mentioned have been solved, it is often not that simple for companies to gain useful insights from the large amounts of data. If information is linked together and wrong conclusions are drawn, for instance, this can be dangerous.
For example, a person may be considered uncreditworthy by a bank that performs a Big Data analysis because he or she lives in the same neighbourhood as many uncreditworthy people and drives the same car as many people who are considered uncreditworthy. The following example also shows why the correct use of the large amounts of data is crucial:
Example
An online retailer relies on Big Data Analytics, which is based on historical data about customer behaviour. It turns out that people who buy black sneakers often add a pair of black sneaker socks. The retailer adjusts his range for the spring accordingly. However, just before the beginning of spring, a well-known rapper posts a photo of himself with black sneakers and yellow socks on Instagram. Many young people are therefore looking for yellow socks to go with their black sneakers, but unfortunately the online retailer soon runs out of them because he was not prepared for the rush. The retailer simply used the wrong Big Data strategy, relying only on historical results and ignoring other important data sources such as social media, shops of competitors, etc.
Remember
In summary, these are the main challenges that companies face when using Big Data:
As you have noticed, Big Data offers enormous possibilities and opportunities that companies have not even come close to exploiting. However, the large volumes of data are also associated with challenges and risks that should not be underestimated and are unsettling for many people. The decisive factor in ensuring that Big Data is used successfully without causing harm to other people is therefore responsible and proficient handling of the large volumes of data.