3. Exploring possible solutions¶
The analysis in the previous chapter showed that none of the existing solutions covers all the requirements placed on a tool for supporting DevOps teaching. Before designing my own solution, I consider it sensible to check two practical questions directly against data. The first is whether the GitLab event interface gives a rich enough picture of what is happening in the repository to allow the teacher's checklist from Chapter 2 to be evaluated automatically. The second is how much of the same checklist would actually be covered by existing tools picked from the categories in sections 2.3.1 to 2.3.3, were they deployed directly on a SEF project. I am not trying to pick one specific tool and recommend it; the goal is to obtain an empirically grounded picture of what already exists and what will need to be designed.
3.1 Goal and method of the experiment¶
I split the experiment into two parts. The first verifies coverage of the checklist by events that GitLab can emit on repository state changes. The second verifies how the same checklist is handled by selected existing tools. The independent variable is in both cases the checklist item; the dependent variable is the degree of coverage.
I assess coverage on a three-step scale: an item is fully covered (✓) if it can be evaluated automatically without supplementary data; partially covered (~) if only part of the data is available or further processing is required; and not covered (-) if the data is missing or not machine-capturable. This grading mirrors the convention used in the analysis table in Section 2.4 and preserves comparability between the two parts of the experiment.
As input data for both parts I used my own GitLab project with a small number of synthetic merge requests and comments simulating typical situations observed in SEF: a trivial merge request without a description, a high-quality merge request with an issue link and tests, as well as deliberately suspicious behaviour such as immediate approval without comment. This approach follows the recommendation of Bass, Weber and Zhu 1 that the observability of DevOps practices be verified against concrete, narrowly bounded scenarios.
3.2 Coverage of the checklist by webhooks¶
GitLab webhooks emit JSON messages on selected events (Push, Merge Request, Note, Pipeline, Issue, Tag Push). For each event I recorded the contents of the delivered payload and then verified which items of the checklist from Chapter 2 can be derived from it.
The measurement showed that the richest source of information is the Merge Request event, which contains the assignee and reviewer identifiers, the title and description of the request, links to associated issues, the list of changed files and the current pipeline status. The second key source is the Note event, which alongside the comment text carries the discussion thread and the position in the diff, allowing general comments to be distinguished from line-bound review comments. The Push event provides a complete list of commits including messages, author and timestamp, which also enables analysis of activity distribution over time. For checking branch-naming conventions, the source-branch reference is already available in the Merge Request event, so verification does not require an additional API call.
Some items, however, were not covered directly. Specifically, the result of static analysis and concrete test coverage are not directly available; although these data are visible in the CI/CD pipeline, the webhook does not contain them, so they must be obtained via supplementary REST API3 calls. Likewise, whether a reviewer phrased a comment as constructive rather than merely confirming cannot be determined from structure alone, but only by analysing the text.
The coverage table below summarises the results of this part of the experiment for the ten most important items of the checklist.
| Checklist item | Webhook | REST API |
|---|---|---|
| Issue assignment to a team member | ✓ | ✓ |
| Creation of a working branch following conventions | ✓ | ✓ |
| Merge request contains description and link to issue | ✓ | ✓ |
| Merge request has two reviewers in assignee and reviewer roles | ✓ | ✓ |
| Review comments are bound to a specific line of code | ✓ | ✓ |
| Meaningfulness of comment text | - | ~ |
| Unit tests added in the change | ~ | ✓ |
| CI/CD pipeline status for the merge request | ✓ | ✓ |
| Test coverage for changed files | - | ~ |
| Distribution of commits over the semester | ✓ | ✓ |
✓ - fully covered, ~ - partially covered, - - not covered.
It is clear from the table that the combination of webhooks and supplementary REST API calls covers all quantitatively verifiable items. The qualitative judgement of comment meaningfulness and review constructiveness remains an area where metadata alone are insufficient and an additional mechanism - for example heuristic text analysis combined with simple machine-learning models - needs to be designed. This finding directly shapes the direction of the proposed solution.
3.3 Trial with existing tools¶
In the second part of the experiment I verified to what extent existing tools can cover the checklist when deployed directly against the same GitLab project. I included GitLab Analytics, the Danger4 tool with the rules description-required, linked-issue and tests-changed, and the built-in coverage report that GitLab makes available in the pipeline. I deliberately excluded Artemis and CodeGrade from this part because they require their own infrastructure and submitted code, which is incompatible with the SEF working model.
GitLab Analytics confirmed the finding from the analysis chapter: it provides aggregate charts of commits and merge requests, but does not let the teacher answer whether a particular student fulfilled a rule. The statistics are also bound to a single project, so with more than a dozen project repositories across both observed courses the teacher would have to walk through dozens of separate screens.
Danger surprisingly covered well the part of the rules that can be expressed in JavaScript at merge-request opening time. The script successfully identified merge requests without descriptions, without linked issues, and changes that did not add any test files. The tool, however, works only at the level of a single merge request - it cannot answer whether a student as a whole fulfilled at least one author and two review activities during the semester. When attempting to extend the rule to detect fast approvals (a request approved within tens of seconds of being opened), a custom script calling the API outside Danger was required, which negated the benefit of using it.
The built-in coverage report worked technically correctly, but it requires every team to configure it separately in the pipeline, which would mean centrally maintaining a template across all project repositories. This element will therefore make sense to handle centrally on the side of the proposed tool rather than leave the configuration to teams - which matches the recommendation of Sadowski et al. 2 that a centrally scaled tooling platform is more stable than delegating configuration to teams.
3.4 Conclusions from the experiment¶
The empirical exploration confirmed two key starting points for the design of my own solution. The first is technical feasibility: the required dataset is available through a combination of webhooks and the REST API, and acquiring it does not require any change on the side of the GitLab installation on which the projects already run. The second is the inadequacy of partial tools: none of the examined tools provides an aggregated view across teams, and none covers the qualitative aspect of review communication.
These observations lead to three design decisions that I elaborate in detail in the next chapter. The solution will be an event-driven system receiving GitLab webhooks and enriching them with REST calls. Assessment will work at the level of an individual student over the whole semester, not at the level of a single merge request. Finally, the qualitative aspect of review communication will be assessed by an additional heuristic, whose result will be presented to the teacher as a suggestion rather than a final decision. This way pedagogical responsibility remains with the teacher and the solution stays in line with the principle that the tool is meant to relieve the teacher, not replace them.