Characteristics of Big Data
The characteristics of Big Data currently include ten parts called “10 V’s”.
We know that data sources are becoming more complex than traditional data sources as they are driven by artificial intelligence (AI), mobile devices, social media, and the Internet of Things (IoT). For example, various types of data come from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media — much of it is generated in real-time and on a vast scale. Also, it is essential to remember that Big Data is not just about the amount of data we generate but also about all the different types of data (text, video, browsing logs, sensor logs, customer transactions, etc.).
Volume
Big Data is big, indeed! With the dramatic growth of the internet, mobile devices, social media, and Internet of Things (IoT) technologies, the amount of data generated by all sources has grown accordingly. Volume refers to the amount of data held. Later we will measure our data volume in Gigabytes, Zettabytes (ZB), and Yottabytes (YB). According to industry trends, data volume will increase substantially in the coming years.
Velocity
In addition to the more significant amount of data, the creation of data and the ability of organizations/companies to process it must be faster. Velocity refers to the speed of data processing. High speed is critical to the performance of any significant data process. It consists of change rates, activity bursts, and incoming data sets. This means that big data has the characteristics of changing very quickly in terms of variables and data types. With these characteristics, it needs a unique touch in processing big data.
Read also : What is Big Data?
Variety
Data can be called big data if it has various characteristics and is not homogeneous, but has many variables and is very diverse, covering multiple types of data, both structured data in a database and data not organized in a database. In the past, most data types could be neatly captured within rows in structured tables. In the world of Big Data, data often comes in unstructured formats such as social media posts, server log data, latitude geographic coordinates, photos, audio, video, and free text. Variety refers to different types of big data. This is one of the biggest problems the big data industry faces as it affects performance. It is essential to properly manage various data by organizing them. Variety is the different types of data collected from other types of sources.
Variability
The meaning of words in unstructured data can change based on context. Big Data will continue to change. Data collected from sources a day ago may differ from what you find today. This is called the variability of the data, and it affects the homogenization of the data.
Veracity
With many different types of data and data sources, data quality issues are always present in Big Data sets. Veracity deals with scouring data sets for data quality and systematically cleaning that data to make it worthwhile for analysis. Integrity refers to the accuracy of the data. This is one of the essential characteristics of Big Data because low correctness can seriously undermine the accuracy of your results. Therefore, Big Data has vulnerabilities in terms of accuracy and validity, so it requires depth to analyze big data to make the right decisions.
Visualization
After the data has been analyzed, it needs to be presented in a visualization so that end users can understand and act on it. Visualization refers to displaying the insights generated by Big Data through visual representations such as charts and graphs. This has become prevalent recently as Big Data professionals regularly share their insights with non-technical audiences.
Value
Data must be combined with rigorous processing and analysis to be helpful. Value refers to the benefits an organization derives from data. Is it compatible with the goals of the organization? Does it help the organization improve itself? This is one of the most important core characteristics of Big Data. Therefore, big data is highly valued if it is processed appropriately.
Validity
This is to find out some valid and relevant data to be used for the intended purpose.
Venue
This shows heterogeneous data distributed from several platforms.
Vocabulary
Indicates the existence of a data model, the semantics that describes the structure of the data.
Vagueness
The puzzle of the technique meaning Big data and the tools used.
Read also : What is Big Data Analytics (BDA)?
In addition, it is also necessary to know about several types of Big Data. This type of Big Data is certainly not much different from data types in general which has three pillars, namely:
Structured
Structured data refers to data that can be processed, stored, and retrieved in a fixed format. The resulting information is so organized that it can be quickly and seamlessly stored and accessed from databases using simple algorithms. This is the most accessible type of data to manage because it is known what format the data will work with in advance. For example, the data companies store in their databases in tables and spreadsheets is structured data.
Unstructured
Data with an unknown structure is called unstructured data. The size is much larger than structured data and is heterogeneous. Examples of unstructured data include the results of a Google search. For instance, they are getting web pages, videos, images, text, and other data formats with various sizes that are not fixed.
Semi-Structured
As the name suggests, semi-structured data combines structured and unstructured data. This data has not been classified into a specific database but contains important tags that separate the individual elements. For example, a table definition in a relational DBMS has semi-structured data.