Unstructured Data Base
An Unstructured Data Base is a data base composed of unstructured data items (with mainly implicit semantics).
- Context:
- It can range from being a Small Unstructured Dataset to being a Large Unstructured Dataset.
- It can be an input to an Unstructured Data Processing Task (such as IE from unstructured data)
- Example(s):
- a Text Document Corpus.
- a Video Library.
- an Audio Library.
- …
- Counter-Example(s):
- a Semi-Structured Database, such as a citation database, an XML dataset, or a Web snapshot.
- a Fully-Structured Database, such as a relational database.
- See: Text Mining Benchmark Task, Digital Library.
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Unstructured_data Retrieved:2020-3-7.
- Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.
In 1998, Merrill Lynch said "unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%." It's unclear what the source of this number is, but nonetheless it is accepted by some. Other sources have reported similar or higher percentages of unstructured data. , IDC and Dell EMC project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010. More recently, IDC and Seagate predict that the global datasphere will grow to 163 zettabytes by 2025 and majority of that will be unstructured. The Computer World magazine states that unstructured information might account for more than 70%–80% of all data in organizations.
- Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.