Interpretation of results¶

This page will help you interpret metrics correctly and identify real problems in teams.

Basic principles¶

Important notice

Metrics are indicators, not absolute truth. Always consider context and verify suspicions before jumping to conclusions.

What the metrics say¶

The metrics say	Metrics don't say
Quantity of activities	Code Quality
Distribution of work	Individual talent
Adherence to processes	Real Benefit
Patterns of behavior	Circumstances of the student

Compliance score¶

Interpretation of the total score¶

flowchart LR
    A["Compliance 90-100%"] -->|"Excellent team"| B["Monitor"]
    C["Compliance 70-90%"] -->|"Good team"| D["Occasional check"]
    E["Compliance 50-70%"] -->|"Problems"| F["Active intervention"]
    G["Compliance < 50%"] -->|"Critical"| H["Immediate action"]

Context is key¶

Score	Possible interpretations
High score (>80%)	+ A well-functioning team OR ! Gamification of metrics
Average score (60-80%)	+ Starting team OR ! Problems with cooperation
Low score (<60%)	! Serious problems OR + Different work style

Best practice

Compare teams within the course, not absolute values. Each course has a different baseline.

Interpretation of individual metrics¶

R01: Issue Assigned¶

What we measure: Student has at least 1 assigned issue in the project

Healthy patterns:

Each student works on their own issue
Issue is created before starting work

Warning signs:

Student has no assigned issue -> score 0 %
Issue assigned only formally, without actual work

R02: Branch + MR Created¶

What we measure: MR source branch follows naming convention (default issue-123-description)

Healthy patterns:

Branch named according to convention: issue-5-add-search
MR created from feature branch

Warning signs:

Branch named generically: my-branch, fix, test
Direct push to main without MR

R03: Tests Written¶

What we measure: At least 1 MR contains test file changes

Configuration: Test file patterns are set via test_file_patterns

R04: MR Linked to Issue¶

What we measure: MR description references an issue (e.g. Closes #5, Fixes #12)

R05: MR Description¶

What we measure: MR description includes required sections (default ## Description and ## Testing)

R06: Code Review Received (partial scoring)¶

What we measure: Student's MR received reviews from ≥ N distinct reviewers

Partial scoring: If student needs 2 reviewers and has 1, they get 50 % of the weight.

Healthy patterns:

Each member reviews and is reviewed
Constructive comments (≥ 30 words)
Different reviewers (not always the same person)

Gaming detection:

LGTM reviews - Blank approval without comments
Review rings - Alice reviews Bob, Bob reviews Alice, no one else

flowchart TB
    subgraph "Healthy Review "
        A1["Alice"] -->|review| B1["Bob's MR"]
        B2["Bob"] -->|review| C1["Carol's MR"]
        C2["Carol"] -->|review| A2["Alice's MR"]
    end

    subgraph "Review Ring "
        X1["Alice"] <-->|"always"| Y1["Bob"]
        Z1["Carol"] -.->|"no review"| X1
    end

R07: Code Review Given (partial scoring)¶

What we measure: Student meaningfully reviewed ≥ N distinct peer MRs

Minimum review length: 30 words (configurable via min_review_word_count)

R08: Review Response (partial scoring)¶

What we measure: Author responded to review threads and referenced commits

R09: MR Approved (partial scoring)¶

What we measure: Student's MR received ≥ N approvals

R10: Merged by Author¶

What we measure: Author merged their own MR (not another team member)

R11: MR + Issue Closed¶

What we measure: Both MR and linked issue are closed

R12: Pipeline Green¶

What we measure: At least 1 pipeline succeeded with a test job

Non-Contributing Members¶

When is a member excluded from the team score?¶

The compliance engine uses three-tier member classification:

Teachers (is_teacher=true) - get an individual snapshot but are never included in the team aggregate
Non-contributing - Guest/Reporter with no activity OR inherited members with no activity -> get an individual snapshot but are excluded from the team average
Students - Developer+ or any member with activity -> included in the team score

What counts as "activity"?¶

Activity means the member has at least one of the following:

Authored merge request
Assigned issue
Given code review
Triggered pipeline

Weekly Check Filtering¶

If a course uses weekly_deadlines with expected_checks, the engine evaluates only checks relevant to the current week. This prevents penalizing students for checks not yet introduced in the curriculum.

Force-recheck

The endpoint POST /api/v1/teams/{team_id}/recheck triggers a recomputation with all checks (R01-R13 + custom), bypassing weekly filtering.

Gaming Detection¶

Types of gaming¶

1. Commit Spam¶

Definition: Large number of insignificant commits

Signals: - >30 commits per hour - Commit messages: "fix", "update", ".", "asdf" - Changes only whitespace or comments

Example:

Text Only
1 2 3 4 5 6	`10:00 - "fix" 10:01 - "update" 10:02 - "changes" 10:03 - "more changes" ... 10:30 - "final fix"`

Action

Interview with the student about the purpose of commits and DevOps practices.

2. LGTM Reviews¶

Definition: Code review without actual inspection

Signals: - Comments only: "LGTM", "", "ok" - Review takes <1 minute - Approval without any comments

3. Review Rings¶

Definition: Mutual review without diversity

Signals: - Gini coefficient of review pairs > 0.8 - Always the same pairs - No cross-review with other members

How to react to gaming¶

flowchart TD
    A["Gaming detected"] --> B{"Severity"}
    B -->|Low| C["Dashboard warning"]
    B -->|Medium| D["Email students"]
    B -->|High| E["Personal interview"]

    E --> F{"Repeated?"}
    F -->|Yes| G["Penalty"]
    F -->|No| H["Monitoring"]

Contextual factors¶

When metrics can lie¶

Situation	Impact on metrics	How to verify
Pair programming	Low distribution	Ask about workflow
Refactoring	Big changes, few features	See diff
Documentation	Few commits	See contents
Final phase	Burst activity	Compare with plan

Verification questions¶

Before drawing conclusions, ask yourself:

Is this pattern consistent? (not a one time anomaly)
Are external factors influencing it? (exams, holidays)
Does it match the workflow of the team? (pair programming, mob programming)
What does the code say? (not just metrics)

Practical examples¶

Example 1: Low activity of one member¶

Situation: David has 5 commits per semester, the others 50+

Possible causes: - David doesn't - David does code reviews (not commits) - David does documentation offline - David has personal problems

Action: View review activity, compare with issues, or interview

Example 2: High compliance, but poor result¶

Situation: The team has 90% compliance, but the project is not working

Possible causes: - Gaming metric - Good process, bad implementation - Technical debts

Action: Code review, project demo, technical interview

Example 3: Burst activity before the deadline¶

Situation: 80% commits in the last 2 days

Possible causes: - Bad time management - Underestimating the task - External factors (other projects) - Common for some types of tasks

Action: Planning discussion, mentoring

Checklist for assessment¶

I compared the team to the course average
I looked at the trend, not just the current state
I checked the gaming flags
I considered the context (project type, semester phase)
I looked at the code, not just the metrics
I have enough data for a conclusion

Interpretation of results¶

Basic principles¶

What the metrics say¶

Compliance score¶

Interpretation of the total score¶

Context is key¶

Interpretation of individual metrics¶

R01: Issue Assigned¶

R02: Branch + MR Created¶

R03: Tests Written¶

R04: MR Linked to Issue¶

R05: MR Description¶

R06: Code Review Received (partial scoring)¶

R07: Code Review Given (partial scoring)¶

R08: Review Response (partial scoring)¶

R09: MR Approved (partial scoring)¶

R10: Merged by Author¶

R11: MR + Issue Closed¶

R12: Pipeline Green¶

Non-Contributing Members¶

When is a member excluded from the team score?¶

What counts as "activity"?¶

Weekly Check Filtering¶

Gaming Detection¶

Types of gaming¶

1. Commit Spam¶

2. LGTM Reviews¶

3. Review Rings¶

How to react to gaming¶

Contextual factors¶

When metrics can lie¶

Verification questions¶

Practical examples¶

Example 1: Low activity of one member¶

Example 2: High compliance, but poor result¶

Example 3: Burst activity before the deadline¶

Checklist for assessment¶

Further reading¶