Resource Analyzer Report

Resource Analyzer report provides “profiling” and a “analyzer report” of your pipeline so that you can evaluate the efficiency of allocated resources.

This helps in identification of “optimum resources required” and “scalability limits” for the pipeline performance.

The resource analyzer report has four pages that shows different parameters of the pipeline performance.


Efficiency Statistic

Graphs under Efficiency statistics shows current performance of pipelines.

It has three types of graphs:

Driver vs Executer graph

Driver vs Executer graph shows the wall clock time of driver and executer. It has the following properties:

FieldDescription
Driver wall clock timeTime spent in driver alone.
executor wall clock timeTime spent in executors.
Total wall clock timeThe total spark application wall clock time can be divided into time spent in driver and time spent in executors.

Critical and Ideal graph

Critical and Ideal graph shows the time taken by a pipeline in different cases.

ResourceAnalyzer-EfficiencyStatistics-2

FieldDescription
Actual runtimeActual time take by pipeline when execute with single executer & one core.
critical path timeCritical path time is the minimum time that this application will take even if we give it infinite executors.
Ideal Application TimeIdeal application time is computed by assuming ideal partitioning (tasks = cores and no skew) of data in all stages of the application.

OCCH Graph (Core Compute Hours)

A core-hour is a measurement of computational time. In OnScale, if you run one core CPU for one hour, that’s one core-hour. If you run 1000 CPUs for 1 hour, then that’s 1000 core-hours.

ResourceAnalyzer-EfficiencyStatistics-3

FieldDescription
OCCH Driver wastageOCCH wastage by driver in percentage.
OCCH executer wastageOCCH wastage by executer in percentage.
Total wastageDriver wastage + executer wastage.

Aggregate Metrics

It frameworks the report of data for each task in the application.

Shown below are parameters such as min, max, sum and mean of all the stages and jobs in the application.

FieldDescription
Disk Bytes SpilledSize of spilled bytes on disk (can be different if compressed)
Executor RuntimeTotal time spent by executor core running this task
Input Bytes ReadNumber of bytes read by a task (using read API’s)
JvmGCTimeAmount of time spent in GC while this task was in progress
Output Bytes WrittenNumber of bytes written by a task (using write API’s)
Peak Execution MemoryMaximum execution memory used by a task
Result SizeThe number of bytes sent by the task back to driver
Shuffle Read Bytes ReadTotal bytes read by a task for shuffle data
Shuffle Read Fetch WaitTimeTime spent by the task waiting for shuffle data
Shuffle Read Local BlocksTotal records read by a task for shuffle data
Shuffle Read Local BlocksShuffle blocks fetched from local machine (disk access)
Shuffle Read Remote BlocksShuffle blocks fetched from remote machine (network access)
Shuffle Read Remote BlocksNumber of bytes that were spilled to disk during the task
Shuffle Write Bytes WrittenTotal shuffle bytes written by a task
Shuffle Write Records WrittenTotal shuffle records written by a task
Shuffle Write TimeAmount of time spent in a task writing shuffle data
Task DurationTotal time spent by the task starting from its creation

Simulation

Using the fine grained task level data and the relationship between stages, in this we can simulate how the application will behave when the number of executors is changed. Specifically It will predict wall clock time and cluster utilization.

Note that cluster utilization is not cluster CPU utilization. It only means some task was scheduled on a core. The CPU utilization will further depend upon if the task is CPU bound or IO bound.


Stage Metrics

It shows the complete information about a stage.

FieldDescription
Stage idIt is the id of a stage.
wallclock%It is the percentage of time of a stage with respect to overall time by pipeline.
task counthow many task in stage.
wallclock timetotal time by stage in ms.
maximum Task memorymaximum memory taken by any task of stage.
PRatioNumber of tasks in stage divided by number of cores. Represents degree of parallelism in the stage.
TaskSkewDuration of largest task in stage divided by duration of median task.Represents degree of skew in the stage.
IO%Percentage between total memory used by stage to total memory used by pipeline in execution.
Task Runtime%:-Total time spent by the task during execution on the executor.
InputInput bytes read by stage.
OutputOutput bytes written by stage.
Shuffle-InputShuffle data during read by stage.
Shuffle-OutputShuffle data during written by stage.
WallClockTime IdealTime required by stage for ideal application.
OneCoreComputeHours AvailableTotal compute hours for this stage.
OneCoreComputeHours Used%Compute hours used in this stage.
OneCoreComputeHours Wasted%Compute hours wasted in this stage.
Task StageSkewDuration of largest task in stage divided by total duration of the stage. Represents the impact of the largest task on stage time.
OIRatioOutput to input ratio. Total output of the stage (results + shuffle write) divided by total input(input data + shuffle read).
ShuffleWrite%Amount of time spent in shuffle writes across all tasks in the given stage as a percentage.
ReadFetch%Amount of time spent in shuffle read across all tasks in the given stage as a percentage.
GC%Amount of time spent in GC across all tasks in the given stage as a percentage.
Top