Monday, 6 January 2014

Performance Tuning of BI system



Performance Tuning of BI system :-


1. What do you do to improve Query Performance ?
To query performance:
1. by compressing the requests in cube
2. Having aggregates on a cube
3. Having index on cube
4. Partition of cube

2. When do you create Aggregates (thumb rule) ? Can we create Aggregates on DSO/Multi Provider ?
if Aggregation ratio > 10 I.e. 10 times more records are read than are displayed,
 (Aggregation ratio = number of records read from DB / number of records transferred)
And
Percentage of DB time > 30% i.e. the time spent on database is a substantial part of the whole query runtime
Aggregates can be created only on top of basic InfoCubes. Aggregates cannot be built on MultiProvider, SAP Remote Cubes, Remote Cubes or DSO.

3. How do you design the CUBE ?
Keep the small dimensions together
Keep line item dimensions wherever needed
Group related characteristics into one dimension only
Remove high cardinality dimensions
Characteristics in the 'right' order (technical order >>> most selective first)

4. What are Advantages and Disadvantages of aggregates ? Is there any limitation in creation of aggregates ?
Aggregates can improve query performance considerably, but keep in mind that they also impact the load performance.
Aggregates take up disk space and processing resources, and do not allow you to see the most recently loaded data. There is no limitation.

5. Explain me the following options of aggregate:
 Switch off the aggregate. The aggregate is still filled with data, but is not used by the OLAP processor. You still need to roll up new data.
 Deactivate the aggregate The aggregate data is deleted, but the aggregate definition remains.
 Delete the aggregate. The aggregate data and the definition of the aggregate are deleted.

6. Where we can see the query performance time in aggregates ?
Run the query in RSRT using the Execution & Debug option.
Select the option - Display Statistics Data - and run the query this provide info on how much DB time the query took and how many rows were selected from the DB and how many were transferred to the OLAP processor.
Then go back and run the query again, and also check the - Do Not Use Aggregate option and compare the statistics of the two runs.
As mentioned, you can check the Display Aggregates Found option if you want to verify if/what aggregate is being selected

7. How would you optimize the dimensions ?
 We should define as many dimensions as possible and we have to take care that no single dimension crosses more than 20% of the fact table size.

8. What is Logical and Physical Partitioning ?
Physical partitioning done on database level and Logical partitioning done with MultiProvider.
Ex: Cube partitioning with 0calmonth/0fiscper is physical partitioning.

9. What are the advantages of CUBE Partition ?
Parallel accesses to partitions.
Read smaller sets of data.
Fast Deletion of partitions (DROP PARTITION instead DELETE FROM WHERE).

10. On which fact table we can do partition ?
We do partitioning on E –fact table.
Automatically Partitioned Database Tables are:
    InfoCube F-Fact table: partitioned by request
    PSA table: partitioned by request
    ODS Change Log: similar to PSA table

11. What is Line –Item Dimension ? And tell me thumb rule to create Line-item Dimension ?
Line item: This means the dimension contains precisely one characteristic. This means that the system does not create a dimension table. Instead, the SID table of the characteristic takes on the role of dimension table (normally key in fact table is dimension ID, then this dimension ID link to dimension table to get the SID). Removing the dimension table has the following advantages:

When loading transaction data, no IDs are generated for the entries in the dimension table. This number range operation can compromise performance precisely in the case where a degenerated dimension is involved.

A table- having a very large cardinality- is removed from the star schema. As a result, the SQL-based queries are simpler. In many cases, the database optimizer can choose better execution plans.

Advantages:
  Saves one table join at query runtime
  Saves determination of dimension ID’s at data load time.
Dis Advantages:
  F4 –Help not possible on dimension values (only on master data values)
  A dimension marked as a line item cannot subsequently include additional characteristics. This is only possible with normal dimensions.

12. What is flat Aggregate ?
If any aggregate is having less than 15 chars, then system will allocate each char to different dimension (as line item). These types of aggregates are called flat aggregates.

13. What is High cardinality dimension ?
If you have Fact Vs Dimension table ratio more than 20 % then it will reduces the Performance of the Report, so at that case we make it as a high cardinality dimension. We have FM to check the Ratio  .. Go to SE37…. RSDEW_INFOCUBE_DESIGNS.

14. How do you know size of the CUBE ?
There are two basic approaches to find the size of the data target.
1. via Transaction Code (DB02…Detailed Analysis).
2. Via Functional Module (Go to SE37… Give functional module name ‘DB_GET_TABLE_SIZE’).

15. Can we create indices to the DSO ? If yes… How do you create ?
Yes. We can create secondary indexes by using the context menu in order to improve the load and query performance of the ODS object. By drag and drop required key fields into the index folder.

16. Maximum, how many indices you can create ? And have you ever created Primary Indices ?
We can create a maximum of 16 secondary indexes. Primary indexes are created automatically by the system.
NOTE: Indexes are use to improve data reading performance but decreases data writing performance. That’s why we drop them during the data loading to data target and create again after loading finished

17. What is BITMAP, B-TREE indices ?
Bitmap indexes link to data by providing a map of zeros and ones indicating if a combination of values exist. B-tree indexes link to data via a hierarchical structure (therefore the name).B-tree indexes are most effective for high-cardinality data.
Bitmap Indices - Well-suited for few characteristic values
B-Tree Indices - Good for range queries and for high cardinality

18. What is Compression ?
Compression transfers the data from the F fact table to the E fact table while eliminating the request information in the InfoCube. Compression usually reduces the number of records by combining records with the same key that has been loaded in separate requests.
Advantage Query performance is usually improved significantly
Disadvantage Note that you lose the request information after the compression.

19. For Example…..if I’m running the report …..Does it gets the data from the E-Fact table/F-fact table ?
It picks from the both E and F- Fact tables.

20. What is the use of BI/BW Statistics ?
BI Statistics traces and records query and load information in support of BI performance tuning.

21. How to know whether a particular query is using the aggregates ?
Transaction RSRT, chooses your query, and clicks 'execute and debug' and mark checkbox 'display aggregate found' (under 'aggregates').

22. How Number Ranges will improve loading performance ?
During the Master Data Loading, each record will go to database table and pick the new SID number. Similarly, during the Info Cube data loading also each record will go to database table and gets the new DIM-ID. Since we have huge amount of data, the performance of the loading will be decrease. Because all the records will go to database table and gets new either the SID (or) DIM ID numbers.

23. What do you do to improve Loading Performance ?
Load master data before Transaction data
Delete indices (dropping indices) before loading to the targets.
Buffering number ranges

24. Tell me Sequence of Query Execution
1. First checks data availability in Local Cache (Which is session dependent, stores in Memory & available in All SAP BW Releases).
2. If Local Cache is NOT available then will check Global OLAP Cache (OLAP Cache buffers Query result set data in order to improve performance of subsequent query executions. It is session/user independent, stores in memory/files/tables & is available from SAP BW 3.X).
3. If Global OLAP Cache is NOT available then will look for relevant Aggregates (Aggregates are subsets of fact table data where summary data is stored as part of new, transparent InfoCubes. Aggregates have the same structure as an InfoCube, with transparent database tables surrounded by the dimensions tables).
4. If No relevant Aggregates are found then it will look into database through InfoProvider.

25. Why do we use Change run ?
When master data changes, the changes of the navigational attributes/hierarchies must be applied to the depending aggregates; this process is called change run. Newly loaded master data is not active until the change run has been applied the changes to all aggregates.

26. General Tips :
a. Better use inclusion and avoid exclusion of char values in reporting.
b. Exclusion cannot access database indices.
c. Indices speed up data retrieval for a given selection tremendously. If no indices are available, full table scans have to be performed.
d. Performance can be affected when compounded characteristics are used extensively, particularly when a large number of characteristics are included in a compounding.
e. The extensive use of navigation attributes leads to a large number of tables and joins, which can reduce performance.
f. It is possible to bypass the PSA. This is not recommended, as the PSA is important for backup purposes and its use has, in most cases, performance advantages.

27. How to Check if an Index is Used or not ?
a. Open a second session and choose System -> Utilities -> Performance trace. The Trace Requests screen appears.
b. Select Trace on. The SQL trace is activated for your user that is all the database operations under your user are recorded.
c. In the first window, perform the action in which the index should be used. If your database system uses a cost-based optimizer, you should perform this action with as representative data as possible. A cost-based optimizer tries to determine the best index based on the statistics.
d. In the second session, choose Trace off and then trace list.




No comments:

Post a Comment