Life cycle of a DW ( Data Warehouse ) project

Life cycle of a DW ( Data Warehouse ) project

After many requests ,I am compiling everything in one page for easy page navigation and easy learning.hope you like it.

1 )Requirement Gathering

Requirements Gathering is started by directly discussing or with JAD sessions (Joint Application Development) with the End users .The help of Business leader or Business managers is required for this meeting and department co-operation is needed since individual departments will need to share their part of information.
The goal of this discussion is to identify objects necessary for the Reporting and Analysis requirements.
The various data sources (in the form of flat file, data file, excel file or tape drive etc) are identified. This process usually takes easily around 2-8 weeks of time. At the end of this phase, List of Reports to be delivered to the end users. The project plan also has to be updated with estimated time of completion for the Data warehouse project. Other tasks after this phase are also tracked on the Project-plan.

2)Physical Environment Setup
After the Requirements are gathered, its necessary to set physical Envioronment by installing database and maintaining the Physical servers.The usual set of servers include 3 sets of instances.
Development servers: where the developers work on the database and roll it for testing.
Testing Servers: the objects developed by the developers are tested on this server which has the database and data replica to that of Production server.
Production server: this is the real time data containing the Production data of any transaction sources. The testing on this is not advised by the internal testers. Testing on this is done only by end clients doing transactions.
Along with these separate instances, it is also necessary to have separate process of ETL, OLAP and Reporting tools on separate servers and all of them to have their own Application as well as Database servers running separately. The servers are setup with database with the help of Network admin and database Administrators (read DBA). They in turn will develop detailed document of the servers handed to the projects managers.

3)Data Modeling

Data modeling process is nothing but to identify the data sources and many a times this step is delayed
The data identification step is important and is better to be done by an experienced person and in this phase, else it might become tough to solve it in the later stages. Until the ETL process.It is advisable to estimate good amount of time in identifying thedata and it should be easily estimated in 2-10 weeks depending on the complexity.After this process,the Data modeler should produce following deliverables:
· Logical data model.
· Physical data model.
· Identification of data sources.

4)ETL

This phase of ETL ( Extraction ,Transformation and Loading ) takes the most of any Datawarehousing project.
This phase has to get the source data and following:
· Understand the Business Rules
· Understand the necessary Columns of tables.
· Understand the Logical data model
· Understand the Physical data model
The Developer has to produce following deliverables:
· ETL Script / ETL Package in the ETL tool
· Data Mapping Document

5)OLAP Cube Design

OLAP databases provide aggregated summary information quickly using a schema that is easily understood by end users.
In the OLAP design it is not required to join tables together the way it is done in relational database. It contains all the relationships required to enable a specific sets of queries for a database. At the center there’s an object called Cube, which may contain many Cubes. Usually a small DW(Data warehouse) project consist of only one such Cube.
You can use a client browsing tools such as Microsoft Excel or a 3rd party application to attach to the data cube. After you attach the client browser, you can drag and drop the cube elements onto a design surface or pivot table to formulate questions and retrieve answers quickly. The cube pre-aggregates information, and is optimized to answer questions like “What is the total number of bugs on each day that meet a set of criteria?”
The cube consists of two primary concepts: measures and dimensions. The measures are the (usually) numeric values that provide summaries at various different levels of aggregation. The dimensions are the way in which the numeric values are summarized.
For example, Work Item Count is a measure that shows the total number of work items. When disaggregated by dimensions, such as Priority, Assigned To, Date, or State, you can use this measure to answer a wide range of questions.
Within the cube, measures are organized within measure groups. A measure group is associated with a single fact or event that is tracked by the OLAP database. Also, the measures can be summarized by various dimensions, some of which are common across the various measure groups. For example, Date, Build, Project, Person, Area, and Iteration are common across all of the measure groups.

data warehousing is an iterative process - Its difficult to get all the requirements at once. This Process usually takes around 3 weeks. To ensure Optimal performance, it is worth to try and test the OLAP Cube generation path. The OLAP cube is to be refreshed after the loading of the Data warehouse.
Following documents to be delivered:
· Actual OLAP cube / report.
· Documentation specifying the OLAP cube dimensions and measures.

6)Front End Development (Reports )
In this phase of DW project if the reports are not bringing any value to the end user or executive, then the efforts to build the OLAP cube is wasted. so its very important to think about the end reports and timely delivery of reports to the end user. It is the trend to have reports seen through a standard web browser like Internet explorer or Mozzila firefox. It is not a good idea to install a report viewing software on each and every machine of the end user.
For Rapid prototyping there is an efficient tool for creating a report called Infocaptor (www.infocaptor.com) .It is very simple and easy to create reports with its powerful ability to create Dashboards, reports with Drilling ability. The reports can also be printed after saving to PDF format. The report’s output can also be saved in PDF or HTML and hence finally easily transferred to the end users.
The other front-end options ranges from an internal front-end development using scripting languages such as JSP,ASP, PHP, or Perl, to off-the-shelf products such as Discoverer (from Oracle),Seagate Crystal Reports, to the more higher-level products such as Actuate.The best in this category is once again my favorite Infocaptor ( www.infocaptor.com).The usual time to create Reports is arround 4 weeks.
The documents delivered to the client should state the Deployment of reports on their server.”Front end Deployment documentation”

7)Performance Tuning
The Performance tuning of 3 areas to look for as below. The task can easily take around a week.
1) ETL - Since Loading data is very time consuming ,its best to put that activity in a night load job.The ETL process needs to be tuned more and more as possible,since some problems are out of control of DW team.
2) Query Processing -Query performance is a big issue in cases where the reports are run directly against Relationship database.(especially ROLAP).for this situation the DW team has to spend considerate amount of time in tuning the query.
3) Report Delivery - End users can experience delays in receiving their reports due to factors other than the query performance. For example, The type of server settings, network traffic, end user’s own computer and the Reporting tool used. It is important for the data warehouse team to look into these areas for performance tuning.
The DW team has to create document on
· Performance tuning document - Goal and Result

8)Quality Assurance
This phase is shown in the form of flowchart of the process of QA to Production. Once the DW team completes the development work, the QA team which is from the Client side starts doing testing and the reports. If there is any changes to be done, the Development team will work on the CR (Change request) and after this CR is completed, then its ready for Production.

 

 

 

Deliverables after this phase completed.
· QA Test Plan
· QA verification (that the data warehousing system is ready to go to production )
Rollout To Production

After the QA team completes the testing ,they will give thumbs up(signoff document) to go to Production servers. It will take less then a week to go to live once everything is tested and approved.
Deliverables
· Delivery of the data warehousing system to the end users.

9) Production Maintenance

After the DW project goes live activities like taking backup on regular time period and crisis mangement becomes very important and needs to be planned well in advance.
The other task is to monitor the Users and to know how much they utilize the DW; so future enhancements can be planned.
Incremental Enhancements

After the system is up and running in the Production environment,there are situations where small and minor changes arises.(Incremental Enhancements).The task can be as simple as to do the changes in the production environment,but is highly risky to do on live (production)systems.Do the chnagses on the Developmetn and roll out the changes in the production systems.
Documents of interests
· Actual change to the data warehousing system
· Change management documentation


Add your opinion - this will move the pointer
get dashboard plugin?

Free Dashboards

Here are some more interesting Articles
  • Life cycle of a DW ( Data Warehouse ) project
  • Agile Software Development
  • Requirement Gathering in Data warehouse Project
  • The Business Intelligence Life Cycle
  • Benefits of data warehousing
  • When is a Data Warehouse Appliance not a Data Warehouse Appliance?
  • One Response to “Life cycle of a DW ( Data Warehouse ) project”

    1. […] Davebrin wrote an interesting post today onHere’s a quick excerpt Life cycle of a DW ( Data Warehouse ) project After many requests ,I am compiling everything in one page for easy page navigation and easy learning.hope you like it. 1 )Requirement Gathering Requirements Gathering is started by directly discussing or with JAD sessions (Joint Application Development) with the End users .The help of Business leader or Business managers is required for this meeting and department co-operation is needed since individual departments will need to share their part […]

    Leave a Reply

    You must be logged in to post a comment.