Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Business intelligence
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Unstructured data== Business operations can generate a very large amount of [[data]] in the form of e-mails, memos, notes from call-centers, news, user groups, chats, reports, web-pages, presentations, image-files, video-files, and marketing material. According to [[Merrill Lynch]], more than 85% of all business information exists in these forms; a company might only use such a document a single time.<ref name="rao">{{cite journal|last1=Rao|first1=R.|year=2003|title=From unstructured data to actionable intelligence|url=http://www.ramanarao.com/papers/rao-itpro-2003-11.pdf|journal=IT Professional|volume=5|issue=6|pages=29β35|doi=10.1109/MITP.2003.1254966}}</ref> Because of the way it is produced and stored, this information is either [[Unstructured data|unstructured]] or [[semi-structured data|semi-structured]]. The management of semi-structured data is an unsolved problem in the information technology industry.<ref name="blumberg">{{cite journal|author1=Blumberg, R.|author2=S. Atre|name-list-style=amp|year=2003|title=The Problem with Unstructured Data|url=http://soquelgroup.com/Articles/dmreview_0203_problem.pdf|url-status=dead|journal=DM Review|pages=42β46|archive-url=https://web.archive.org/web/20110125033210/http://soquelgroup.com/Articles/dmreview_0203_problem.pdf|archive-date=25 January 2011}}</ref> According to projections from Gartner (2003), white-collar workers spend 30β40% of their time searching, finding, and assessing unstructured data. BI uses both structured and unstructured data. The former is easy to search, and the latter contains a large quantity of the information needed for analysis and decision-making.<ref name = blumberg /><ref name="negash">{{cite journal|author=Negash, S|year=2004|title=Business Intelligence|journal=Communications of the Association for Information Systems|volume=13|pages=177β195|doi=10.17705/1CAIS.01315|doi-access=free}}</ref> Because of the difficulty of properly searching, finding, and assessing unstructured or semi-structured data, organizations may not draw upon these vast reservoirs of information, which could influence a particular decision, task, or project. This can ultimately lead to poorly informed decision-making.<ref name = rao /> Therefore, when designing a business intelligence/DW-solution, the specific problems associated with semi-structured and unstructured data must be accommodated for as well as those for the structured data. ===Limitations of semi-structured and unstructured data=== {{update|part=section|reason=It's dubious that searchability and semantic analysis are still limitations at the current stage of NLP and AI development|date=December 2023}} There are several challenges to developing BI with semi-structured data. According to Inmon & Nesavich,<ref name = inmon>Inmon, B. & A. Nesavich, "Unstructured Textual Data in the Organization" from "Managing Unstructured data in the organization", Prentice Hall 2008, pp. 1β13</ref> some of those are: * Physically accessing unstructured textual data β unstructured data is stored in a huge variety of formats. * [[Terminology]] β Among researchers and analysts, there is a need to develop standardized terminology. * Volume of data β As stated earlier, up to 85% of all data exists as semi-structured data. Couple that with the need for word-to-word and semantic analysis. * Searchability of unstructured textual data β A simple search on some data, e.g. apple, results in links where there is a reference to that precise search term. (Inmon & Nesavich, 2008)<ref name = inmon /> gives an example: "a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies". ===Metadata=== To solve problems with searchability and assessment of data, it is necessary to know something about the content. This can be done by adding context through the use of [[metadata]].<ref name = rao />{{Needs independent confirmation|reason=The article is written by a founder of a company that made automatic categorization software. Not sufficient to establish that using automatically generated metadata is a mainstream approach of applying BI to unstructured data.|date=December 2023}} Many systems already capture some metadata (e.g. filename, author, size, etc.), but more useful would be metadata about the actual content β e.g. summaries, topics, people, or companies mentioned. Two technologies designed for generating metadata about content are [[Multiclass classification|automatic categorization]] and [[information extraction]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)