The Business Intelligence and Data Warehousing Glossary

 

The Business Intelligence and Data Warehousing Glossary

I have collected and compiled this from a very nice website.Some of the terms are relating to Microsoft’s tool (Microsoft’s Analysis Services ) but that can be generalised to other non-microsoft tools also.

Link >>

Agent

An application that searches the data and sends an alert when a particular pattern is found.

 

Aggregations

Information stored in a data warehouse in a summarized form.

Instead of recording the date and time each time a certain product is sold, the data warehouse could store the quantity of the product sold each hour, each day, or each week.

 

Aggregations are used for two primary reasons:

To save storage space. Data warehouses can get large. The use of aggregations greatly reduces the space needed to store data.
To improve the performance of business intelligence tools. When queries run faster they take up less processing time and the users get their information back more quickly.

Some data warehouses store both the detailed information and aggregated information. This takes even more space, but gives users the possibility of looking at all the details while still having good query performance when looking at summaries.

Some systems use aggregations for historical data. Perhaps detailed data is kept on-line for a year. After that the detailed data is kept in a less accessible, permanent storage format, and only the aggregated, summary data is kept on-line.

Aggregations are often created as the sum of the individual records. You can also have aggregations for count, distinct count, maximum value, and minimum value.

 

Alert

A message that is sent automatically by a computer system when a certain situation occurs.

One of the greatest benefits of data warehousing is the ability to set alerts.

A store manager can be automatically informed when a certain product’s sales fall below or rise above a specified range.
A factory manager can be automatically informed when the failure rate of a product exceeds a specified level.

A sales manager can be automatically informed when a member of his staff achieves a personal high level of sales for a time period.

Alerts allow a company to receive critical business information in the quickest possible time.

 

Analysis Services

Business Intelligence tools included with Microsoft SQL Server 2000.

The same product in Microsoft SQL Server 7.0 was called OLAP Services. The name was changed because the Microsoft SQL Server 2000 version included data mining capabilities as well as the OLAP capabilities.

Attribute

Additional information included with a dimension, that is not used in defining the levels of the dimension.

Dimensions become more useful when there are many descriptive attributes that can be used for analyzing the data. In Microsoft Analysis Services attributes are used to create member properties and those member properties can be used to create virtual dimensions.

Business Intelligence Tools

Software that enables business users to see and use large amounts of complex data.

The following three types of tools are referred to as Business Intelligence Tools:

1. Multidimensional Analysis Software - Also Known As OLAP (Online Analytical Processing) - Software that gives the user the opportunity to look at the data from a variety of different dimensions.
2. Query Tools - Software that allows the user to ask questions about patterns or details in the data.

3. Data Mining Tools - Software that automatically searches for significant patterns or correlations in the data.

Calculated Member

In Microsoft Analysis Services, a member that is calculated from one or more other members using an MDX formula.

Calculated members are often measures, but they can also be members of levels from other dimensions. In an MDX query, a calculated member can be created in the WITH clause.

 

Cell

A cell is a single point in a cube.

Cubes have cells for all of the possible combinations of points from all of the cube’s dimensions.

 

Cellset

A set of data returned from a cube with an MDX query.

A cellset in the multidimensional world is equivalent to a rowset in the relational database world.

Changed Data Capture

In a database replication, changed data capture occurs when only the data that has changed since the previous replication is copied.

 

Changing Dimensions

A dimension that has level or attribute data that needs to be updated.

Data changes in the fields for levels or attributes is one of the most challenging design issues for multidimensional (star schema) data modeling. This issue is often referred to as the handling of Slowly Changing Dimensions. There are three ways outlined by Ralph Kimball (and others) to handle this situation:

Type #1. Change the data in the dimension table.

Type #2. Add new records to the dimension table that contain the new data.

Type #3. Add new fields to the dimension table to contain the values before and after the change.

Various authors (including Ralph Kimball) also suggest the splitting of dimensions into separate dimensions when a consolidated dimension would result in a high level of dimension change.

 

Clickstream Data

Data regarding web browsing.

Web servers capture a large amount of data in the process of receiving requests for web pages. This data includes page served, time, source of the request, type of browser making request, etc. When analyzed, this data provides information about the behavior of individuals who are browsing the internet. It can help businesses analyze where visitors are coming from, what type of visitors are most likely to buy certain products, what type of web pages are most attractive, etc. This information is essential for analyzing the effectiveness of internet ad campaigns and, in general, for finding ways to improve the effectiveness of internet commerce.

Clickstream data typically requires a significant amount of transformation as it is loaded into a data warehouse. Once in the warehouse it can be used for standard reports, for OLAP, and for data mining.

 

Conformed Dimension

A dimension that is used in more than one cube.

The use of conformed dimensions and shared measures is the primary way a set of data marts can be united into one consolidated data warehouse.

 

Cube

Also Known As Multidimensional Cube

The fundamental structure for data in a multidimensional (OLAP) system.

A cube contains dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a cell.

 

Data - SEE Data, Information, and Knowledge and Data-Based Knowledge

Data-Based Knowledge

Knowledge derived from data through the use of Business Intelligence Tools and the process of Data Warehousing.

Most of our knowledge is based on a combination of our experience, perception, and intuition. Business Intelligence and Data Warehousing give us a new kind of knowledge based on data.

Data-based knowledge can have several advantages over experience/intuition-based knowledge:

1. It can be more accurate because it is based on so many detailed facts.
2. It can be more current because the data warehousing and business intelligence tools can so quickly analyze new data.

3. It can be more comprehensive because so many different perspectives are available through the rapid recombination of elements from different dimensions and different levels of the data hierarchy.

4. It can give new insights because there are complex patterns in the data that can be discovered by data mining that would never be detected by human analysis.

5. It can be less subjective because conclusions are tied directly to the physical data.

SEE ALSO - Data, Information, and Knowledge

 

Data Cleansing

Removing errors and inconsistencies from data being imported into a data warehouse.

SEE ALSO Data Quality Assurance

 

Data, Information, and Knowledge

Data is the reality that a computer records, stores, and processes.

The use of computers can be referred to as data processing. At the lowest level data has no significance for people. This lowest level in the perception of reality is sometimes referred to as “raw data”.

Information is what a person is able to understand about reality.

Information systems use computers to organize data in such a way that people can understand the results.

Knowledge is what a business uses to make decisions.

The process of organizing information in such a way as to create data-based knowledge is called Data Warehousing. The software products that present this knowledge to users are sometimes called Business Intelligence Tools.

The goal of business intelligence and data warehousing - changing data into information and knowledge.

Organizations are gathering and storing more and more data. Every year the amount of data in the world is approximately doubling. This data is of little benefit unless it can be turned into useful information and knowledge.

Information by itself is an inadequate basis for business decisions because the amount of information, like the amount of data, is overwhelming. Business Intelligence Tools are designed to find what is significant - what really adds to our useful knowledge - in the piles of data and information.

 

Data Mart

Also Known As: Local Data Warehouse or Datamart

A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise.

There are three different (and somewhat contradictory) views of the place of the data mart in the world of data warehousing.

1. The data warehouse gathers all the information from the various legacy systems. Specialized data marts are then created with a subset of the information in the data warehouse. These data marts are easier to use because they only have the particular information the specific user group needs. The use of several data marts also allows the querying load to be spread among several different computers. This can reduce network traffic.

2. Free-standing data marts are created, independent from a data warehouse. The information for the data mart probably comes from just one legacy system. It is quicker and cheaper to build a separate data mart instead of building an enterprise-wide data warehouse with data marts derived from it. The drawback of this solution is that the company’s data is not integrated (and thereby violates one of Bill Inmon’s original defining characteristics of the data warehouse). If several separate data marts are built using this strategy, they will usually contain data that is duplicated and inconsistent.

3. The data mart is the prototype or the first step of a data warehousing process. An enterprise picks the division or group that would most benefit from data-based knowledge. A data mart is built with that group’s data. Additional types of information are added to the data mart as time goes on until it is turned into a data warehouse.

New terminology is often created and developed for marketing purposes. The term ‘data mart’ probably has a marketing advantage over the term ‘data warehouse’. The whole data warehousing process is about creating data-based knowledge and bringing that knowledge to people. A warehouse is a place where things are stored away. A mart is a convenient place to buy something. Most data warehousing professionals (including myself) include ready access to information as a defining characteristic of the term ‘data warehouse’. I think, though, that the term ‘data mart’ captures this sense of data availability more effectively.

 

Data Migration

The movement of data from one environment to another.

This happens when data is brought from a legacy system into a data warehouse.

 

Data Mining

The process of finding hidden patterns and relationships in the data.

Analyzing data involves the recognition of significant patterns. Human analysts can see patterns in small data sets. Specialized data mining tools are able to find patterns in large amounts of data. These tools are also able to analyze significant relationships that exist only when several dimensions are viewed at the same time.

Users can ask data questions using standard queries when they know what they’re looking for. Queries can be written for questions like this: “Which of our out-of-town customers have given us the most business in the last year?”

Data mining is needed when the user’s questions are more vague or general in nature. Data mining questions would include: “What attributes characterize the customers that gave us the most business in the past year?”

 

Data Quality Assurance

Also Known As: Data Cleansing or Data Scrubbing

The process of checking the quality of the data being imported into the data warehouse.

Data quality assurance is one of the greatest challenges in the process of data warehousing. If the data-based knowledge generated by the data warehouse is to be trusted, the data entered into the warehouse must be complete and accurate - “garbage in, garbage out”.

Data quality can be a challenge for several reasons:

The data is being consolidated from a variety of legacy sources that may have differing definitions of key concepts such as “customer” or “profit”.
The legacy data was not originally collected for the purpose of decision support so some of the key data might be missing, incomplete, or not as accurate as desired.

There might be times when all the data is not received from one of the legacy systems. This could make comparisons between time periods invalid.

A significant portion of time in the development process should be set aside for setting up the data quality assurance process and implementing whatever data cleansing is needed..

In a production environment, there should be a data quality report generated after each data warehouse import. There should be provision for rolling back an import if data quality testing indicates that the data is unacceptable.

 

Data Scrubbing

Removing errors and inconsistencies from data being imported into a data warehouse.

SEE ALSO Data Quality Assurance

 

Data Transformation

The modification of data as it is moved into the data warehouse.

This modification can include:

  • Data Cleansing - Part of the Process of Data Quality Assurance
    Dimensionalization - Organizing the data into the multidimensional (OLAP) structure of a star schema.
  • Normalization - Organizing the data into the normal structure of a relational database
  • Processing Calculations
  • Changing Data Types
  • Making the Data More Readable
  • Replacing Codes with Actual Values

Data Warehouse

Also Known As: Datawarehouse or Information Warehouse

A database where data is collected for the purpose of being analyzed.

The defining characteristic of a data warehouse is its purpose.

Most data is collected to handle a company’s on-going business. This type of data can be called “operational data”. The systems used to collect operational data are referred to as OLTP (On-Line Transaction Processing).

A data warehouse collects, organizes, and makes data available for the purpose of analysis - to give management the ability to access and analyze information about its business. This type of data can be called “informational data”. The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing).

Bill Inmon coined the term “data warehouse” in 1990. His definition is:

“A (data) warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.”

Subject-oriented - Data that gives information about a particular subject instead of about a company’s on-going operations.
Integrated - Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

Time-variant - All data in the data warehouse is identified with a particular time period.

Non-volatile - Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business.

 

Data Warehousing Management

The on-going supervision of the data warehousing process.

Data warehousing is an on-going process. All of the issues that need to be addressed when a data warehousing project is started also need to be addressed as the data warehouse is used and, most likely, expanded.

The types of data warehousing management issues that need to be addressed are:

Deciding on Management - Who is sponsoring the project? Who is making the tough decisions? Who is going to mediate conflicts?
Deciding on Scope - Which business processes are going to be included? What granularity of data is going to be used?

Training - Of management personnel, technical personnel, and end users.

Staffing - Who is coordinating the project? Who is doing the technical work? Who is doing the training?

Budgeting - For hardware, software, personnel, training, consulting.

 

 

Data Warehousing

The process of visioning, planning, building, using, managing, maintaining, and enhancing data warehouses and/or data marts.

Whether we’re building a data warehouse, a data mart, or both, we are taking part in a complex, on-going process. The emphasis in the data-based knowledge business needs to be kept on the process. That’s why you’re reading a glossary of “data warehousing terminology” instead of a glossary of “data warehouse terminology”.

There are many steps in the data warehousing process -

Visioning - Having an idea about what could be accomplished.
Learning - Studying the potential of data warehousing.

Justifying - Developing a business purpose for the process.

Budgeting - Counting the cost.

Deciding - Making a commitment to develop and use data-based knowledge.

Gathering Information - Examining legacy systems.

Interviewing Users - Finding what information is needed.

Choosing Tools - Choosing the hardware, the database management system, the data extraction tools, and the Business Intelligence tools..

Building, Using, Testing, and Evaluating the Prototype - Repeat this step and the above steps as necessary.

Deploying - Putting the system into operation.

Training - Helping users make full use of the Business Intelligence tools.

Managing - Keeping track of scheduled data replication, system usage, and query performance.

Adding, Modifying, On-Going Development - As the system is used, new possibilities will be discovered.

Consider also all the actions that take place as a part of the data warehousing process -

Data Replication - Periodic copying of legacy data.
Data Transformation - Transforming the legacy data into the form in which it will be stored in the data warehouse.

Data Quality Assurance - Testing the data for inconsistencies and errors.

Data Storage - Storing the data in a DBMS (Database Management System).

Metadata Storage - Storing the description of the data - the data about the data.

Data Mart Population - Populating all the data marts that receive their data from the warehouse.

Setting Up Business Intelligence Tools - Giving users access to the data through multidimensional analysis, querying, and data mining.

Setting Alerts - Establishing conditions that result in an automatic message being sent.

Data Warehousing Management - Keeping track of how well all the other actions are being carried out.

Data Warehousing Information Center

A great source for data warehousing information. Contains links to information and companies in the data warehousing business.

 

 

Data Warehousing Institute

An organization for data warehousing professionals.

 

 

 

Database Management System (DBMS)

The software that is used to store, access, and manage data.

There are two main types of Database Management Systems used for business intelligence and data warehousing - specialized Multidimensional Database Management Systems (MDBMS) and the more widely used general purpose Relational Database Management Systems (RDBMS).

 

Datamart - SEE Data Mart

Datawarehouse - SEE Data Warehouse

DBMS - SEE Database Management System

 

 

Decision Support System (DSS)

A computer system designed to assist an organization in making decisions.

The Decision Support Systems and Enterprise Information Systems of the 1980’s and early 1990’s were forerunners of today’s Business Intelligence Tools.

Density or Dense - SEE Sparsity

 

 

Dimension

One of the perspectives that can be used to analyze the data in an OLAP cube.

When you are browsing the data in a cube, you can view the data from the perspective of different combinations of dimensions.

For a Sales database, the dimensions could include Product, Time, Store, and Promotion.

Dimensions contain one or more hierarchies, which have levels for drilling up and drilling down in the the cube. When a dimension has just one hierarchy (which is quite common ), people often refer to the dimension itself having levels.

 

 

Dimension Table

In a star schema, a table which contains the data for one of the cube’s dimensions.

The dimension table has a primary key which is used to connect it to the fact table.

The dimension table has one field for each level of each hierarchy contained in the dimension. The data values in these fields become the members of each of the dimension’s levels.

The dimension table has as many attribute fields as possible. These fields describe individual characteristics of the dimension.

If there are multiple hierarchies in the dimension, there is one level field for each distinct level in each of the hierarchies. If the hierarchies share some levels in common, they are represented by a single field For Calendar and Fiscal hierarchies in a Time dimension, the level fields could be Fiscal Year, Calendar Year, Fiscal Quarter, Calendar Quarter, Month, and Day.

For the Product dimension table, some of the attribute fields could include Description, Product Number, Product Type, Department, Package Size, Weight, Shelf Length, etc.

The dimension tables in a star schema are intentionally de-normalized. The level fields and the attribute fields contain data that is duplicated in many of the records. This normally does not add a significant amount to the amount of storage space needed in the database, because the overall size of each dimension table is very small when compared to the size of the fact table.

 

 

Dimensionalization

The process of transforming data into a multidimensional (or star) schema.

Drill Down and Drill Up

The ability to move between levels of the hierarchy when viewing data with an OLAP browser.

Drill Down - Changing the view of the data to a greater level of detail.

Drill Up - Changing the view of the data to a higher level of aggregation.

Multidimensional analysis (OLAP) tools organize the data in two primary ways: in multiple dimensions and in hierarchies.

Drilling down and drilling up allow an analyst to move down and up the hierarchies to see how the information at the various levels is related. After looking at the sales totals for a store’s departments, the analyst may want to drill down to see the individual sales for each employee in one of the departments. Then the analyst may choose to drill up to view how this store’s total sales compare to other stores in the same region.

DSS (See Decision Support System)

DTS (Data Transformation Services)

An ETL tool provided as a part of Microsoft SQL Server.

DTS was first released with SQL Server 7.0. It provides a design environment for creating data transformation applications.

Enterprise Information System/Executive Information System (EIS)

Also Known As: Decision Support System (DSS)

A computer system that presents a summary of a company’s important data.

ETL (Extract, Transform, and Load)

ETL refers to the process of getting data out of one data store (Extract), modifiying it (Transform), and inserting it into a different data store (Load).

 

Fact table

In a star schema, the central table which contains the individual facts being stored in the database.

There are two types of fields in a fact table:

1. The fields storing the foreign keys which connect each particular fact to the appropriate value in each dimension.
2. The fields storing the individual facts (or measures) - such as number, amount, or price.

The granularity of the fact table is one of the most significant design decisions in creating a data warehouse. The facts should be as detailed as possible to allow for the data to be viewed from the greatest number of perspectives.

Granularity

The level of detail of the facts stored in a data warehouse.

 

Hierarchy

Organization of data into a logical tree structure.

Dimensions can have one or more hierarchies. A Time dimension, for example, could have a Calendar hierarchy and a Fiscal hierarchy. Hierarchies contain levels, which organize data into a logical structure.

It is the combination of a multidimensional with a hierarchical view in Business Intelligence Software that allows users to grasp large amounts of data. If each member in a level has 5 to 10 children that are members at the next lower level, the user has a better chance of understanding the significance of the data.

Moving between the levels of a hierarchy is called drilling up and drilling down.

 

Hybrid OLAP (HOLAP)

A combined use of Relational OLAP (ROLAP) and Multidimensional OLAP (MOLAP).

In HOLAP, the source data is usually stored using a ROLAP strategy and aggregations are stored using a MOLAP strategy. This combination usually results in the least amount of storage space and the fastest cube processing.

 

 

Hyper-Cube

Also Known As Cube and Multidimensional Cube

A cube with more than three dimensions.

A cube is an object with three dimensions. A hyper-cube is a cube-like structure with more than three dimensions. In the world of OLAP, hyper-cubes are nearly always simply referred to as cubes.

 

Information - SEE Data-Information-Knowledge Spectrum

International Data Warehousing Association, The

An organization for data warehousing professionals.

 

Knowledge - SEE Data-Information-Knowledge Spectrum

 

Legacy System

A computer system that’s been around for a while.

Sometimes organizations have several legacy systems that have been developed at different times by different people for a variety of purposes. The data in these systems is usually mutually incompatible and sometimes inaccurate. One of the biggest challenges of the data warehousing process is to bring data out of the variety of systems where it currently is located and organize it so it all fits together in the data warehouse.

 

Level

The hierarchies in dimensions have levels which can be used to view data at various levels of detail.

A Time dimension could have levels for Year, Quarter, Month, and Day.

A Product dimension could have levels for Product Family, Product Category, Product Subcategory, and Product Name.

A Customer Geography dimension could have levels for Region, Country, District, State, City, and Neighborhood.

 

Local Cube

A cube contained in a file.

Microsoft Analysis Services (OLAP Services) provides the ability to take all or a subset of a server cube and create a local cube file. The local cube can be used to analyze OLAP data while the user is disconnected from the network.

Local Data Warehouse - SEE Data Mart

 

Multidimensional Schema SEE Star Schema

MDD (Multidimensional Database) SEE Multidimensional Database Management System (MDBMS)

MDX (Multidimensional Expressions)

The querying language for OLAP cubes.

MDX has some similarities to SQL, but has many unique features. The following query returns a cellset with the names of the store regions on the columns, the names of product families on the rows, and the profit displayed in the cells:

select

[Stores].[Region].Members on columns,

[Products].{Product Family].Members on rows

from SalesCube

where ([Measures].[Profit])

 

Measure

A numeric value stored in a fact table and in an OLAP cube.

Sales Count, Sales Price, Cost, Discount, and Profit could all be measures in an OLAP cube.

 

Member

One of the data points for a level of a hierarchy of a dimension.

Some of the members of the Month level of the Time dimension are January, February, March, and April.

 

Member Property

An attribute of a level that is available for OLAP querying.

In Microsoft Analysis Services you can create member properties for any level. These member properties can be referenced directly in MDX queries and they can also be used for creating virtual dimensions.

 

Metadata

Also Known As: Meta Data or Meta-data

Data that describes the data in the warehouse.

Metadata includes the following:

A description of tables and fields in the warehouse, including data types and the range of acceptable values.
A similar description of tables and fields in the source databases, with a mapping of fields from the source to the warehouse.

A description of how the data has been transformed, including formulae, formatting, currency conversion, and time aggregation.

Any other information that is needed to support and manage the operation of the data warehouse.

There are a number of companies and organizations attempting to standardize the use of metadata. A standard metadata model would greatly aid the process of integrating data warehousing tools from different companies. Some data warehousing experts believe that the standardization of metadata is impossible.

 

Metric - ANOTHER TERM USED FOR Dimension

Multidimensional Analysis

Also Known As: OLAP (On-Line Analytical Processing)

A process of analysis that involves organizing and summarizing data in a multiple number of dimensions.

People can comprehend a far greater amount of information if that information is organized into dimensions and into hierarchies. The wide use of spreadsheets and graphs illustrates the need for people to have their information organized.

A spreadsheet is a two-dimensional analysis tool. If a person could comprehend 10 individual facts, they could possibly comprehend 100 facts if they were arranged in a spreadsheet.

If 3 or 4 or 5 dimensions could be displayed, the amount of information that could be comprehended would be increased exponentially - to 1000 facts, 10,000 facts, and 100,000 facts.

Multidimensional data is also organized hierarchically, allowing users to “drill down” for more detailed information, “drill up” to see a broader, more summarized view, and “slice and dice” to dynamically change the combinations of dimensions that are being viewed.

 

Multidimensional Cube - ANOTHER TERM USED FOR Cube

Multidimensional Database (MDD) - SEE DEFINITION FOR MDBMS

Multidimensional Database Management System (MDBMS)

A database management system that organizes data multidimensionally.

A multidimensional database management system organizes data specifically so it can be viewed with a multidimensional analysis (OLAP) tool. Because it is optimized for this purpose, it has the potential to deliver the information quickly and efficiently.

Multidimensional Online Analytical Programming (MOLAP)

OLAP that stores data and aggregations in a multidimensional database structures.

Non-Volatile

Data that does not change.

Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business. Non-volatility is one of the original defining characteristics of a data warehouse.

 

Normalization

The process of organizing data in accordance with the rules of a relational database.

In a completely de-normalized database the customer name and address information would be stored every time a customer made a purchase.

In a normalized database each customer’s name and address would be stored only once, in a separate table. Every purchase record would have a reference to the customer table to indicate which customer was involved.

Many individual decisions have to be made in the process of normalizing a de-normalized database. How do we know which customer information refers to the same person? When there is contradictory address information, how do we choose between the various alternatives?

A fully normalized database is usually the most efficient design for an On-Line Transaction Processing System. A data warehouse, with its emphasis on efficient retrieval of data, often benefits from some intentional de-normalization. See the discussion of the Star Schema.

OLAP (On-Line Analytical Processing)

The use of computers to analyze an organization’s data.

“OLAP” is the most widely used term for multidimensional analysis software. The term “On-Line Analytical Processing” was developed to distinguish data warehousing activities from “On-Line Transaction Processing” - the use of computers to run the on-going operation of a business.

In its broadest usage the term “OLAP” is used as a synonym of “data warehousing”. In a more narrow usage, the term OLAP is used to refer to the tools used for Multidimensional Analysis.

“Think of an OLAP data structure as a Rubik’s Cube of data that users can twist and twirl in different ways to work through what-if and what-happened scenarios.” - Lee The, Editor, Datamation (May 1995)

 

OLAP Browser

A tool used for multidimensional (OLAP) browsing.

OLAP Services

Business Intelligence tools included with Microsoft SQL Server 7.0.

OLAP Services was extended and renamed as Analysis Services in SQL Server 2000.

OLAP System

Term that is used as a synonym for datawarehousing system.

OLTP (OnLine Transaction Processing)

The use of computers to run the on-going operation of a business.

PivotTable Services

The tools for client access to Microsoft’s Analysis Services (OLAP Services).

Private Dimension

In Microsoft Analysis Services, a dimension that is restricted in use to one particular cube.

Shared (conformed) dimensions are very useful in creating a unified data warehousing structure. You can create a dimension once and use it in several different cubes.

Private dimensions are useful in those situations where you want independent cubes. If you change a private dimension, that change only affects a single cube, whereas the change of a shared dimensions can have implications for many cubes.

 

Processing

In Microsoft Analysis Services, the building of cube and dimension structures and cube aggregations.

Cubes need to be processed before they can be browsed. For cubes, an incremental process only adds new data into the cube, while a full process reprocesses all the cube’s data.

Relational Database Management System (RDBMS)

A Database Management System based on relational theory.

Most modern Database Management Systems (Oracle, Sybase, Microsoft SQL Server) are relational databases. These databases support a standard language - SQL (Structured Query Language).

Relational On-Line Analytical Processing (ROLAP)

OLAP that stores data and aggregations in a relational database.

 

Replication

The physical copying of data from one database to another.

In data warehousing replication takes place as data is moved from the on-line transaction processing system into the data warehouse. Replication also takes place if one or more data marts is being populated with data from the data warehouse.

There are several software tools that have been developed to handle replication into a data warehouse. These tools give the ability to transfer data out of and into a variety of database management systems. Many of these tools also provide data transformation and data cleansing capabilities.

Heterogeneous replication occurs when the source and the target database are not the same database management system.

Data migration is the movement of data from one environment to another - as happens when data is brought from a legacy system into a data warehouse.

Bi-directional replication is the ability to copy data in both directions between two databases.

In changed data capture only the data that has been changed since the last replication is copied.

In synchronization all the data stored in the database is replicated.

ROLAP - SEE Relational On-Line Analytical Processing

 

Scale, Scalable, and Scalability

Having to do with the ability of a computer system or a database to operate efficiently with larger quantities of data.

Scalability is often discussed in situations when multiple processors are joined together. The system scales well (or is scalable) if doubling the number of processors also doubles the speed at which the system performs its tasks. The extra work involved in coordinating larger systems usually prevents them from being fully scalable - so that going from one to two processors would increase the total speed by less than a factor of two.

 

Schema

The logical organization of data in a database.

 

 

Shared Dimension

In Microsoft Analysis Services, a dimension used by more than one cube.

In general, a dimension that is used by more than one cube is called a conformed dimension.

 

 

Slice, Slicer, Slicing

The limiting of a cellset to data for a single member from a particular dimension.

Slicing in MDX is similar to filtering in a relational database. In an MDX query, the WHERE clause is called the slicer.

 

 

Slice and Dice

The ability to move between different combinations of dimensions when viewing data with an OLAP browser.

Multidimensional analysis tools organize the data in two primary ways: in multiple dimensions and in hierarchies.

Slicing and dicing refers to the ability to combine and re-combine the dimensions to see different slices of the information. Picture slicing a three-dimensional cube of information, in order to see what values are contained in the middle layer. Slicing and dicing a cube allows an end-user to do the same thing with multiple dimensions.

 

 

Slowly Changing Dimensions (SCD)

A dimension that has levels or attributes that are changing on an occasional basis. SEE Changing Dimensions.

 

 

SQL (Structured Query Language)

The standard language for accessing relational databases.

 

 

Snowflaking

Normalization applied to the dimension tables of a star schema.

The star schema is a very simple database design, which clearly presents the multidimensional character of the data and allows for rapid querying of the data in a data warehouse. In snowflaking, some of the fields of the dimension tables are split off into separate tables. This achieves a higher level of normalization, but makes the database design more complex and can reduce the performance and ease of use for Business Intelligence Tools.

 

 

Sparsity and Density, Sparse and Dense

The degree to which the cells of a cube are filled with data.

One of the primary challenges of storing multidimensional data is the degree of sparsity that is often encountered. When many dimensions are considered with a fine grain of detail, most of the cells will be empty.

It is not uncommon for large cubes to have data in fewer than one in a million cells. Expressed numerically, that cube would have a density of less than .0001%.

 

Star Schema (Business Definition)

A method of organizing information in a data warehouse that allows the business information to be viewed from many perspectives.

The star is a picture of the way the data is being stored. The basic factual information is in the middle of the star. The points of the star represent various perspectives from which the factual information can be viewed. In Figure 1, the Shipments can be grouped and viewed from the perspective of any of the five points of the star.

Image

Figure 1. Shipment Information Stored in a Star Schema

 

Star Schema (Technical Definition)

A database design that consists of a fact table and one or more dimension tables.

Each of the dimension tables has a single field primary key which has a one-to-many relationship with a foreign key in the fact table. The star schema is an intentional simplification of the database design that would be achieved by following the standard rules of normalization. The dimension tables are often flattened, to allow for more efficient querying (see ’snowflaking’).

 

Structured Query Language (SQL)

The standard language for accessing relational databases.

 

 

Summary Tables

Tables used to store summarized or aggregated data.

 

Synchronization

The copying of all data in a database replication.

Time-variant data

Data that is identified with a particular time period.

Time-variant is one of the original defining characteristics of a data warehouse.

Virtual Cube

The term used in Microsoft’s Analysis Services (OLAP Services) for a cube that is created from portions of one or more base cubes.

A virtual cube is similar to a view in a relational database. It can be used for security purposes, giving users access to only some of the dimensions and measures. It can also be used to show data from separate cubes at the same time.

Virtual cubes are much more useful when you have shared dimensions and measures that are common to all the base cubes that are used.

Virtual Dimension

The term used in Microsoft’s Analysis Services (OLAP Services) for a dimension that is created from one or more member properties in another dimension.

XML (eXtensible Markup Language)

A method of sharing data between disparate data systems, without needing a direct connection between them.

 

XML for Analysis Services

An XML schema that can be used to communicate with a Microsoft Analysis Server.

Y

Z

more details find here


Add your opinion - this will move the pointer
get dashboard plugin?

Free Dashboards

Here are some more interesting Articles
  • When is a Data Warehouse Appliance not a Data Warehouse Appliance?
  • Is Data Warehousing Essential to Business Intelligence?
  • Top 5 Ways To Increase BI Value
  • How Valuable is Business Intelligence to the Enterprise?
  • Benefits of data warehousing
  • Four Tips for Better Business Intelligence in 2008
  • Leave a Reply

    You must be logged in to post a comment.