What is BIG DATA?
Large and complicated data collections that are difficult to manage, handle, or analyse using conventional data processing techniques are referred to as "big data." Usually, it involves data that is too large, varied, or changing too quickly for conventional data processing tools and procedures to handle well.
Big Data categories
Structured data: Including all necessary columns in a structured schema. It is organised in a table. The relational database management system stores structured data.
Semi-structured: For semi-structured data, such as JSON, XML, CSV, TSV, and email, the schema is not well defined. Systems for online transaction processing, or OLTP, are designed to handle semi-structured data. It is kept in relations, often known as tables.
Unstructured Data: Unstructured data includes all unstructured files, including log files, audio files, and image files. Many organisations have access to data, but because the data is raw, they are unable to determine its worth.
Quasi-structured Data: The data format includes textual information that is formatted in a variety of inconsistent ways.
The three V's—commonly referred to as the three main characteristics of "big data"—include the following:
Volume : Large-scale data created from numerous sources, including social media, sensors, financial activities, and more. In most cases, the amount of data is expressed in terabytes, petabytes, or even bigger units.
Velocity : Big data is produced quickly, frequently in real-time or very close to it. This comprises data streams from online transactions, log files, sensors, and other sources as well as social media updates. In order to handle big data, it is essential to have rapid data collecting and processing capabilities.
Variety : Structured, semi-structured, and unstructured data are all included in the concept of big data. Semi-structured data comprises formats like XML or JSON, whereas structured data refers to the conventionally organised data found in databases. Text, pictures, videos, social media posts, and other data kinds without a predetermined structure are referred to as unstructured data.
Veracity : Veracity refers to how trustworthy the data is. It can filter or convert data in a variety of ways. Veracity is the ability to effectively handle and manage data. Big Data is crucial for corporate growth as well. Instagram posts with hashtags, for instance.
Value : Big data must have value in order to exist. We do not store or process the data. We store, process, and also do analyses on trustworthy and valuable data.
Facebook posts with hashtags, for instance.
Big data has grown in significance across a range of sectors and professions, including commerce, medicine, finance, marketing, and science. Organisations may learn important lessons from big data analysis, take well-informed decisions, spot patterns and trends, enhance workflows, and develop predictive models. Big data, however, also presents difficulties in terms of data protection, processing speed, and analysis methods, necessitating the use of specialised tools and technologies like distributed computing, cloud computing, and machine learning algorithms.
Students will also study
- Volume
- Velocity
- Variety
- Data analytics
- Data mining
- Data processing
- Data storage
- Data integration
- Data visualization
- Data-driven decision-making
- Machine learning
- Artificial intelligence
- Predictive analytics
- Hadoop
- Distributed computing
- Cloud computing
- Data warehousing
- Data cleansing
- Data governance
- Data privacy
- Data security
- Streaming data
- Real-time analytics
- Scalability
- Internet of Things (IoT)