Sunday, March 18, 2012

Chapter 6 - Foundations of Business Intelligence: Databases and Information Management

In order for an information system to be effective, it must provide users with accurate, timely, and relevant information that is free of errors, available to decision makers when it is needed, and useful and appropriate for the types of work and decisions that require it.  Information systems arrange data in computer files in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases.  The traditional approach to file processing encourages each department or area in a company to develop their own systems and data files, known as specialized applications.  These applications require a unique data file and their own computer program to operate.  Over time this leads to data that is difficult to maintain and manage.  This results in in data redundancy and inconsistency, program-data dependence, processing inflexibility, poor data security, and lack of data sharing and availability.

Database technology has evolved to reduce the many problems of the traditional file organization.  A database is defined as a collection of data organized to serve many applications efficiently by centralizing the data and controlling redundant data.  A single database can service multiple applications.  A database management system (DBMS) is a type of software that allows an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs.  This minimizes redundant and inconsistent files. 

The most common type of DBMS used for PCs and larger computers and mainframes is the relational DBMS.  These databases organize data in two-dimensional tables called relations and each table consists of rows and columns.  As long as two tables share a common data element, the relational database tables can be combined easily to deliver data required by users. 

Object-oriented databases are used to handle graphics-based or multimedia applications.  This DBMS stores the data and procedures that act on those data as objects that can be automatically retrieved and shared.  They can also store more complex types of information that relational DBMS, however they are somewhat slow for processing large numbers of transactions compared to relational DBMS.

A DMBS has capabilities and tools for organizing, managing, and accessing the data in a database.  A data definition is a capability that specifies the structure of the content of the database.  It is used to create database tables and to define the characteristics of the fields in each table.  This information would be documented in a data dictionary.  A data dictionary is capability of an automated or manual file that stores definitions of data components and their characteristics.  A third capability is the data manipulating language.  A data manipulation language that is a specialized language in most DBMS that is used to add, change, delete, and retrieve the data in a database.  It contains commands that allow end users and programmers to extract data from the database to satisfy information requests and develop applications.  The most prominent data manipulation language used today is Structured Query Language (SQL). 

In order to create a database, the relationships among the data, the type of data that will be maintained in the database, how the data will be used, and how the organization will need to change to manage data from a company-wide perspective must be clearly understood.  A database requires a conceptual, or logical, design and a physical design.  The conceptual design is an abstract model of the database from a business perspective.  It describes how the data elements in the database are to be grouped to meet business information requirements.  The physical design shows how the database is actually arranged on direct-access storage devices. 

Databases are used by businesses to keep up with their day-to-day activities in addition to providing information that will help the company run more efficiently, and help managers and employees make better decisions.  Special capabilities and tools are required for analyzing large quantities of data and for accessing data from multiple systems.  One capability is data warehousing.  A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company.  It makes the data available for anyone to access as needed, but it cannot be altered.  A data mart is a subset of data warehouses where a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific set of users.  It typically focuses on a single subject area or line of business.

After the data is in data warehouses and data marts, it is available for further analysis using tools for business intelligence, such as multidimensional data analysis and data mining.  These tools enable users to analyze data to see new patterns, relationships, and insights that are useful to assist in decision making.  Online Analytical Processing (OLAP) is the capability for manipulating and analyzing large volumes of data from multiple perspectives, i.e., using multiple dimensions.  Data mining finds hidden patterns and relationships in large databases and deduces rules from them that are used to guide decision making and forecast the effect of those decisions.

A third capability for analyzing large quantities of data are the tools used for accessing internal databases through the Web.  Text mining tools are able to extract key elements from large unstructured data sets, discover patterns and relationships, and summarize the information.  Web mining is the discovery and analysis of useful patterns and information from the Web.

Once a database is set up, special policies and procedures for data management will need to be set into place.  This ensures that the data for the business remains accurate, reliable, and readily available to those who need and use it.  All businesses need an information policy.  This will specify the organization’s rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information.  These policies lay out the specific procedures and accountabilities, identifying which users and organizational units can share information, where information can be distributed, and who is responsible for updating and maintaining the information.  In addition, additional steps must be taken to ensure that the data in organizational databases are accurate and remain reliable through audits and cleansing.

No comments:

Post a Comment