Today we released a new word count report feature. Counting words sounds really simple, however, if you dive deeper into the subject you quickly find out it’s not.
Our previous implementation was simple and did it’s job. It gave the project administrators an overview of what happened in the project and who made how many text changes. The problem was that it didn’t respect our proposed workflow and didn’t include the status information. Meaning, the system counted the words no matter what status of the text segment was. Our new implementation is much smarter and uses the segment’s status to decide whether to count or not. And we managed to recalculate the reports for previous work as well.
Workflow of translation and count behaviour
On Lingohub we established the translation workflow of
These statuses are an indicator of the progress of a text segment. All collaborators can set the status at least up to translated, administrators and reviewers can set the status up to approved. Hence, there are two different word counts to track: translated words and reviewed words. Please note, that the word counts are always calculated from the source language.
As a general rule the word count works like this:
If a target segment is set the first time to the status translated, the source segment is counted and the user associated with. Further changes of the target segment will only be counted when the source segment was changed before. The same applies to the review status.
A picture says more than words:
That is pretty straightforward, but just to make things more clear let's consider not so common cases and examples:
- 1. Translator A translated Segment X to German (target) and set the status to “Draft”. Translator B sets Segments X to “Approved”. Both the translated and the reviewed count are associated with Translator B. The system only counts if you do the translation and also set the status to at least "Translated".
1.a Translator A sets the status of the Segment X back to draft. One day later Translator A changes the content of Segment X and sets the status to “Reviewed”. No counts are recorded for Translator A, since the word count was already done and the source Segment X hasn't changed.
1.b The source text of Segment X was changed which automatically changes the target text’s status to “Draft”. Translator A sees the source change, translates and reviews the German (target) segment again. Now the word count for translated words is the difference between the old source text and the new source text while the review count is set to the number of words in the source text (since the whole text had to be reviewed again).
- 2. When a file was uploaded, new and updated target texts are associated with the uploader (or in case of the Github integration with the connected user). However, the report shows how the text changes were made, e.g. editor or file upload.
- 3. Segment X has English as source language and German as target language. The German segment is still empty, yet Translator A sets the status already to reviewed. Word counts are calculated for both, translated and reviewed. Translator A is associated with the counts. This may seem strange because no work was done. Nevertheless, it is for the system impossible to know if "real" work was done or not. We believe that trust is a key component in working with translators.
Our new reports system is just the start. We established a base that let us provide more statistics and data in the future. Together with the new design we are working on visualizing the data and export options.