Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add OEP-0058 - Translations Management #367

Merged
Merged
Changes from 49 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
6a7f323
docs: add OEP - Translations Management
Aug 8, 2022
d4f6b7a
docs: rename oep & reformat table
Aug 8, 2022
2f97f83
docs: add Ned as Arbiter
Aug 9, 2022
bd779ed
docs: demote title to subtile and add new title
Aug 10, 2022
6f575d0
docs: change status to "Under Review"
Aug 10, 2022
8826ec0
docs: add Type section to table
Aug 10, 2022
89f2284
docs: add Review Period row to table
Aug 10, 2022
58c39ac
docs: lowercase the 't' in tCRIL
Aug 10, 2022
0ae30b1
docs: change Description section to Current State
Aug 10, 2022
a72f901
docs: add widths to table
Aug 10, 2022
228e3cd
docs: remove subtitle
Aug 10, 2022
77dbb0e
docs: rearrange and rename sections
Aug 10, 2022
01cbfba
docs: move OEP-58 to Arch and rename with "arch"
Aug 10, 2022
87ab993
docs: expand Rejected Alternative question
Aug 11, 2022
3638508
docs: update Rejected Alternatives question 1
Aug 11, 2022
588e3a6
docs: change type to Architecture Decision
Aug 11, 2022
973cf3e
fix: add correct file name reference for OEP
Aug 11, 2022
116ded7
docs: change Pros & Cons section to Impacts
Sep 6, 2022
2836cde
docs: change header level of Locations section
Sep 6, 2022
0119570
docs: expand Proposed Implementation section
Sep 6, 2022
20b9e6c
docs: replace repo with full word
Sep 6, 2022
8f63bbe
docs: update oep-0058
Sep 8, 2022
224a5df
docs: correct format to follow RST guidelines
Sep 8, 2022
de68683
docs: condense sub-bullet point into parent
Sep 8, 2022
cfac57d
docs: split Rationale section into two sections
Sep 8, 2022
8c1a57a
docs: standardize to "translation files"
Sep 8, 2022
0061121
docs: clarify GH app permissions behavior
Sep 8, 2022
b9e5abc
docs: update final implementation step
Sep 8, 2022
59bf579
docs: clarify Translation WG instruction
Sep 8, 2022
b501020
docs: add example to Move Translation Files sect
Sep 8, 2022
c0d2755
docs: split bullet point and add link to PR
Sep 16, 2022
b0a0a06
docs: add translation labor cost
Sep 16, 2022
4587332
docs: add new repo translations instructions
Sep 16, 2022
59a257d
Merge branch 'openedx:master' into Carlos-Muniz/translations-management
Sep 23, 2022
8fcb11e
docs: add Final State section
Sep 23, 2022
e00b253
docs: update language to be clearer
Sep 26, 2022
a550fc8
docs: add Transifex github app & config links
Sep 26, 2022
f726086
fix: spelling errors
Sep 28, 2022
6a9df8f
Merge pull request #1 from openedx/nedbat/oep-58-spelling
Sep 28, 2022
9474f40
docs: add changes based on @sarina's review
Sep 29, 2022
7c67da0
Merge branch 'openedx:master' into Carlos-Muniz/translations-management
Nov 1, 2022
e9a2c70
style: correct formatting
Nov 1, 2022
17190ad
docs: update OEP based on prototype and newinfo
Nov 1, 2022
daf503c
Merge branch 'openedx:master' into Carlos-Muniz/translations-management
Nov 1, 2022
c839424
fix: remove most mentions of transifex-client deprecation
Nov 3, 2022
fc07406
fix: update OEP-58 based on suggestion
Nov 8, 2022
3b40a06
feat: add review period
Nov 7, 2022
1d27a59
fix: change Transifex Memory to Translation Memory
Nov 8, 2022
620e6b3
fix: add clarification on impact on developers
Nov 8, 2022
7a06a16
fix: update based on suggestion
Nov 9, 2022
6e2de68
Merge branch 'openedx:master' into Carlos-Muniz/translations-management
Nov 9, 2022
e86e67d
feat: add Follow-up Work link
Nov 14, 2022
4fbff9e
fix: apply suggestions from review
Nov 16, 2022
eeaf460
feat: add notes from threads & fix structure
Nov 22, 2022
df76645
Merge branch 'openedx:master' into Carlos-Muniz/translations-management
Nov 22, 2022
3e14956
Merge branch 'openedx:master' into Carlos-Muniz/translations-management
Dec 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
263 changes: 263 additions & 0 deletions oeps/architectural-decisions/oep-0058-arch-translations-management.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
OEP-58: Translations Management
###############################

.. list-table::
:widths: 25 75

* - OEP
- :doc:`OEP-58 <oep-0058-arch-translations-management>`
* - Title
- Translations Management
* - Last Modified
- 2022-11-07
* - Authors
-
* Carlos Muniz <[email protected]>
* Feanil Patel <[email protected]>
* Sarina Canelake <[email protected]>
* - Arbiter
- Ned Batchelder <[email protected]>
* - Status
- Under Review
Copy link
Contributor

@robrap robrap Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the Provisional definition and the docs for status changes, I'd recommend:

  1. Updating the status to Under Review (=> Provisional) to clarify the intended status upon merge, assuming that status is agreed upon. And,

  2. Adding a Follow-up Work page to this header as noted in the docs above. This link might also provide some thoughts on how this work will come to be.

  3. Additionally, this isn't a blocker to this OEP, but is related to "Follow-up Work". I'm wondering what thought has been given to how companies will need to handle private repos and translations? Maybe a separate Discuss thread could be started for this, but has thought already been given to this? Is the idea that others could easily replicate this pattern and re-use tooling against a private repo of translations? How much work will that involve? Etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments @robrap.

  1. If I understand correctly, you want to clarify that the next status after "Under Review" should be "Provisional" and not "Accepted", because it hasn't been vetted and adopted in the platform. This makes sense to me, but I don't think we need to specify this in the document.

  2. I will add a link named “Follow-up Work” to the References section of the OEP header.

  3. I think this is a good first question for the Follow-up Work link. The openedx-atlas CLI tool can pull translation files from any repository that works with git. And the GitHub Action that generates translations and moves them to openedx-translations can also be reappropriated for private repos. The work involved will be: copying the github action that generates the translations to another repository, changing the organization and names of the repos, and attaching it to a separate Transifex project via the app.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Carlos-Muniz on point 1, @robrap is suggesting that you change the status from:

  • Under Review

to one of these:

  1. Under Review (=> Provisional)
  2. Under Review (=> Accepted)

in order to clarify whether you want this to merge in as a Provisional or Accepted, which is information the reviewers should know. I agree with him.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside, @robrap : I think this is evidence that the OEP-1 status guidance that we collaborated on could be improved :) OEP-1 presents Under Review as a valid status even though we'd really prefer that folks always include the (=> ...) part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification. I've changed it.

* - Type
- Architecture Decision
* - Created
- 2022-08-08
* - Review Period
- 2022-11-07 - 2022-11-21
.. * - Resolution
.. -

.. contents::
:local:
:depth: 1

Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most OEPs are structured in some variation of:

  • Abstract (very short)
  • Motivation
  • Specification (aka Decision)
  • Impact (aka Consequences)
  • Rejected Alternatives
  • References

I find that this template makes it really easy to digest OEPs, both as a first time reader and as a returning reference-seeker. I find the same thing to be true with PEPs. I think the current draft of this OEP could be mapped into the normal OEP template without any major edits:

  • Decision => Abstract
  • Context, Current State, Rationale => Motivation
  • Decision, Proposed Implementation => Specification
  • Impacts, Locations => Impact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this suggestion. I will reorganize the OEP this way at the end of the review period so comments on specific sections are referenced correctly by GitHub.

*******

The current method of managing and organizing translations files is overly complicated
and unavailable to the majority of the Open edX community. For example: the
edx-transifex-bot performs the automated upload of english translation source files to
Transifex and download of translation files to GitHub. It currently runs on legacy
infrastructure originally provided by a community member and is difficult to track why
some PRs are merged by the bot, and some are not, and where the bot is creating and
merging PRs. Most recently, it was discovered that the translations were not uploading
properly but it has been impossible for most members of the Open edX community to debug
exactly why. In the week before the Nutmeg release, this was a significant pain point.

Decision
********

To alleviate these issues, we will switch from using the edx-transifex-bot to the
`Transifex GitHub App`_, a stable app provided by Transifex. Benefits of this change
include being easier to maintain and solving a lot of the pain points detailed below. As
part of this proposal, we suggest moving translations into their own repository, to make
using the `Transifex GitHub App`_ more streamlined and straightforward, and in order to
make organizing and using the up to date translations simpler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: When writing OEPs/PEPs/RFCs, you don't need to politely suggest things nor write in a conditional voice. The entire OEP is understood as a polite suggestion. So, your language can be more concise & direct. For example, instead of:

As part of this proposal, we suggest moving translations into their own repository, to make
using the Transifex GitHub App_ more streamlined and straightforward, and in order to
make organizing and using the up to date translations simpler.

you can say:

Translations will be moved into their own repository. This will make it easier to use the Transifex GitHub App_, organize translations, and use translations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I'll update the language to be more concise & direct.


Current State
*************

* Edx-transifex-bot is a potential security issue: The edx-transifex-bot requires admin
rights on Transifex in order to function. Admin rights give access to private/sensitive
information as well as the ability to permanently delete translation and configuration
files. At some point, the login to the edx-transifex-bot user was lost, and without
access to infrastructure that the bot uses to function, this edx-transifex-bot is a
security issue most of the Open edX community cannot control or debug.
* edx-transifex-bot is a black box for most of the community: the code for the
edx-transifex-bot is in the `ecommerce-scripts`_ repository but it is impossible for
most of the community to observe the work it is doing, or whether it is doing it
correctly. In addition, there is no documentation for these important scripts.
Comment on lines +68 to +71
Copy link
Contributor

@pshiu pshiu Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the errors of the Transifex GitHub App shown? I understand that many of the errors will be caught on extraction by the GitHub workflow (and thus will be visible via PR), but I wonder if there are errors that will only be caught at the Transifex GitHub App level.

Assuming such errors exist (they may not), two thought experiments:

  1. What happens if two projects happen to use the same translation key for a string? (Of course, we have a naming convention to prevent that, but say a copy and paste error?)
  2. Will an error caused by one project block translations for all other projects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a valid concern. Transifex has a very robust notification system, and errors occurring there will notify the Transifex Admins.

For your thought experiments, Transifex projects are treated exclusively. What happens in one, does not affect the others, similar to how GitHub treats repositories. So an error in any Transifex project other than openedx-translations will not affect the translations in Transifex project.

As a side note, translation files were kept in their original file structures to combat the non-unique file name errors we have experienced before, as the entire path is used as the unique translation slug. So as long as files are kept in separate directory structures within the openedx-translations repository, this will not be a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great, thanks for the clarification! My sense is that we should be good as long as there is a way for maintainers to know that changes they made caused errors at the Transifex level. It sounds like the Translation WG or Transifex Admins are taking up this role and will be responsible for notifying the relevant maintainer?

I should note that I was being a little confusing in my parent comment because I used "project" to refer to an Open edX Project (a repository) instead of a Transifex Project. I was just trying to make sure an error caused by one repository will not block translations for all other repositories. (I suspect that may have been one of the reasons the original Jenkins jobs that pushed/pulled translations were done on a per-respository basis.)

* The translations for the Open edX Maple Release were never uploaded to Transifex,
because the automation handled by the edx-transifex-bot never uploaded it.
* The underlying transifex-client library and Transifex API (V2) are being deprecated on
November 30th, 2022. Prior to removing the transifex-client as a dependency, this led
to inconsistent behavior by our tooling when we try to automatically manage
translations. See `this pull request`_ for more details.
* We have a complex process for managing translations for the named releases. As a
result, the black box nature of the edx-transifex-bot and the unclear state of the
underlying tooling, this has become more laborious to keep running. Especially because
there are few people with Admin rights to Transifex and knowledge of the Transifex API;
this could become a recurring problem with each Open edX release.

.. _ecommerce-scripts: https://github.com/openedx/ecommerce-scripts/tree/master/transifex
.. _this pull request: https://github.com/openedx/edx-platform/pull/30567

Rationale for migrating to the `Transifex GitHub App`_
******************************************************

* This is an upgrade of a system we use regularly, but do not want to have to maintain
regularly.
* Upgrading from a bot (machine user) to an app/workflow is recommended by GitHub and
makes the translation process more open source.
* The `Transifex GitHub App`_ is developed and supported by Transifex
* The `Transifex GitHub App`_ is very simple to configure and has many options. We can
set Transifex Projects to automatically upload/download translation files from a
repository once the translations are reviewed and accepted.
* By using an app that is maintained by Transifex the organization, we reduce the
maintenance burden and are more future proof of changes they might make since they
maintain both the API and the `Transifex GitHub App`_.

Rationale for consolidating translations files centrally
Copy link
Contributor

@pshiu pshiu Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anything changing about what powers or responsibilities repository maintainers have over their strings? For example:

  • Who would have the power/responsibility to review/approve translations for each repo?
  • Will maintainers also have power/responsibility over their repo's folder within openedx-translations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious:

(1) What are you asking for?
(2) What "powers" do you want and why?
(3) What "powers" do you currently exercise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Translation WG is in charge of translating and reviewing translations, and to my knowledge these translations are just automatically merged in, e.g. openedx/paragon#1744. With the changes proposed in OEP-58, Translations WG would still be in charge of translating and reviewing translations. Approval limits (once it hits X% translated it is automatically committed to openedx-translations) can be set per translation file. I can see a situation where maintainers request the Translation WG to set the approval limit below the standard set by the Translations WG. That being said, our Translators are top knotch and keep translations at 90% or higher.

I think answers to @sarina's questions can help shed light to what type of input maintainers are looking for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you asking for?

I am trying to understand what the interaction should be between Open edX developers and/or maintainers with the Translation WG. For example, I currently do not know where to find information on the roles and responsibilities of maintainers, developers, and the Translation WG. It may be helpful to point to that information in this OEP.

What "powers" do you want and why? What "powers" do you currently exercise?

The Translation WG is in charge of translating and reviewing translations

This is new to me! I greatly appreciate the Translation WG's work, don't get me wrong. However, I thought the status quo was that maintainers ultimately owned their repo's translations. For example, I think many maintainers were routinely given Project Maintainer access to Transifex, and would frequently help add/change/remove strings or review translations. If the ownership of translations is changing or has already changed, it may be helpful to point to that change in this OEP.

Also, previously, maintainers were able to modify their translation files. Now, via this OEP, it appears those translation files are being moved to a centralized repository, and I am unsure who will own that repository and if any of the access maintainers previously had to change those files will change. In short, "will there ever be non-bot PRs for the translation files in openedx-translations?" and "who will approve and/or merge those PRs?".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshiu brought up some important questions.

  • Who will maintain the openedx-translations repository?
  • How should errors/bugs in translations in the openedx-translations repository be fixed?
  • Who is in charge of approving and/or merging PRs in the openedx-translations repository?

It may be unreasonable to expect any one group to maintain openedx-translations as there are lots of stakeholders. Translators are expected to translate, but may not get to see if their translations will cause errors. Developers may see that there are errors, but may not understand the language of the translation. There will have to be cooperation.

We should set up an issue request form on openedx-translations (much like the Github Requests in tcril-engineering) that notifies the Translations WG on slack when there is a translation error that needs to be fixed. This way, Translators can be notified of an error in a way they will be able to digest it, and Developers can write the problem and proposed solution in a way that is common to them.

We will have to look further into how the GitHub Transifex app manages changes to translation files outside of Transifex (openedx/openedx-translations#29) to confirm that making PRs is a possible fix that developers can make. In the case that it is allowed, we can treat it as any other repository and nominate Open edX Community members to be Core Contributors to openedx-translations based on their technical experience and participation in the Translations WG.

@brian-smith-tcril is currently looking at this issue and will update with the Transifex App's capability.

Copy link
Contributor

@brian-smith-tcril brian-smith-tcril Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i commented on the issue on the openedx-translations repo, but i'll reiterate here:

based on transifex docs (and @Carlos-Muniz and my testing of the Transifex github app integration)

For existing resources that have already been synced with GitHub, if translations are found in GitHub, they will be sent to Transifex only if no translations are found for that language in Transifex

knowing this, it seems we should stick to the idea of transifex as a source of truth for all translated strings, and any updates should happen in transifex

********************************************************

* Transifex only allows a one-to-one relationship between repositories and Transifex
Projects. Organizing all of the translation files into one repository and one Transifex
Project has a lower labor cost: projects are managed separately so we end up spending
less time tracking translation progress, and debugging translation issues when all
translation files are put in the same place. By decreasing the number of projects we
need to maintain, we can add more content like the MFE translations.
* A repository that only contains text/binary files, and uses branches to separate
translations related to Open edX releases can make all interactions with translations
very quick and simple due to the ability to clone and sparse-checkout the branch of a
specific release and the directory (repository name) with translation files.

Proposed Implementation
***********************

Move Translation Files to a New Repo
====================================

Translation files (of types .mo and .po) currently exist amongst the code/documentation
they translate. We will move these translation files from being amongst the
code/documentation to their own repo. For example, a translation file for the openedx
repository `edx-platform`_ located at
``edx-platform/conf/locale/en/LC_MESSAGES/django.po`` would be moved to the new
repository with the name openedx-translations_ located at
``openedx-translation/edx-platform/conf/locale/en/LC_MESSAGES/django.po``. For easier
Copy link

@regisb regisb Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest to create an intermediate "repos" folder, such that we have openedx-translations/repos/edx-platform? Otherwise we will have a lot of repos located at the root, as we already do: https://github.com/openedx/openedx-translations This will quickly become unwieldy. And if we need extra translations-specific folders at the root (e.g: "docs"), they will be mixed with repo names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea that we can implement once the review period is complete.

reintegration, translation files will be kept in the same directory structure as the
code/documentation they translate.

Repositories that generate translation files will have their translation files generated
and committed via a pull request to the openedx-translation repository via a GitHub
workflow. Once the translation files from edx-platform and other repositories are moved
Comment on lines +131 to +133
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested to know a little more about how the integration from openedx repos to openedx-translations will work.

Will it be something like (for example):

  1. PR merged to master on edx-platform
  2. GitHub workflow triggered to extract & compile updated source translations.
  3. If source translations differ, a new PR is created and merged into openedx-translations master.

Or is there some other cadence?

What about source translation changes made to the release branches already cut on edx-platform? Will they too have GitHub workflow triggers that keep their source translations in sync with the appropriate release branch on openedx-translations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its almost exactly like that. Checkout a run of the github action: https://github.com/openedx/openedx-translations/actions/runs/3415232508

That's a good point about translation changes made to release branches.
Though the above is just the current iteration of the action, in the future we can make the action run on the latest release branch as well to keep everything updated while it is supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the release branch logistics work exactly? I was worried because the Transifex docs state that "the GitHub integration works on top of a single branch."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good topic for Follow-up Work.

As a first stab: when a new release is cut, a new branch of the translations corresponding to the release branch will also be made. And then a new project can track that branch on Transifex. We can update the GitHub Action to run for multiple branches to keep the translation source files for the new release up to date. And with every new release, the old release's transifex project will be archived, but the translations will still be available on GitHub in the old release's branch.

The first outstanding issue to this approach would be making sure that Translators have access to this new project as well. It will be up to the Translations WG to manage the latest translation files for the entire platform as well as minor fixes to the latest release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wildly guessing here as I'm not involved in BTR, but I had assumed current state is that translations are frozen at the time the release is cut, and any needed changes are made directly to the translation files in the release's branch for that repo.

(Not saying this is what we should do in the future, just wanted to make sure I am on the right page.)

It might also be nice to include a way for openedx-atlas to sideload translation files for the correct openedx-translation named release branch being deployed/developed on. Not sure how that would work. A --branch parameter? Some sort of auto-detection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be nice to include a way for openedx-atlas to sideload translation files for the correct openedx-translation named release branch being deployed/developed on. Not sure how that would work. A --branch parameter? Some sort of auto-detection?

the readme on https://github.com/openedx/openedx-atlas documents 2 ways to specify which branch to pull:

  • using --config and specifying a branch in a config file
  • directly specifying the branch using -b/--branch

Comment on lines +131 to +133
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this GitHub workflow be called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Would it be worth adding that link into the OEP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshiu from OEP-1:

OEPs are not used to dictate small decisions made in every day feature work. See OEP-19 for more detail.

I think the workflow name is a small decision. I wouldn't want to lock Carlos et al into that specific workflow name if it's not critical to this OEP's deisgn. Do you see any harm in leaving it out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all! Thought it might be a nice add if it was already decided for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since OEP-58's next target status is "Provisional", once it is back "Under Review" with the target status of "Accepted", we can update the details of OEP-58 to reflect the specifics of implementation.

to the openedx-translations repository, the `Transifex GitHub App`_ will link a Transifex
project of a name such as "openedx-translations" to the openedx-translations repository.
A `Transifex GitHub Integration configuration file`_ naming the files that are to be
translated and the trigger that pulls translation files back into will be created in the
openedx/translations repository. This link will allow for the `Transifex GitHub App`_ to
automatically manage the push/pull of the translation files without the need for human
intervention.

.. _edx-platform: https://github.com/openedx/edx-platform
.. _openedx-translations: https://github.com/openedx/openedx-translation

Add `Transifex GitHub App`_ to openedx Organization
===================================================

The `Transifex GitHub App`_ will need to be added to the openedx GitHub organization in
order to grant the app permissions to push/pull the translation files. Currently, we
manage the push/pull permissions for the edx-transifex-bot through a number of GitHub
user groups. The `Transifex GitHub App`_ once installed in an organization, is granted
permissions to push/pull on a repository basis, and by moving all the translation files
to a single repository we eliminate separate translations user groups.

Connect the New Translation Repository to Transifex
===================================================

The Transifex web-app accepts a `Transifex GitHub Integration configuration file`_ for
each Transifex project. By connecting the single repository containing all translation
files, we only need to make a single `Transifex GitHub Integration configuration file`_
that allows the `Transifex GitHub App`_ to manage the translation files. Based on the
Translation Working Group's instruction on acceptable translation/review percentages, we
can set parameters that automatically push and pull translation files.

Copy Transifex's Translation Memory and Combine Translators
===========================================================

As a last step we will reorganize the openedx Transifex organization by combining
translators and reviewers across Transifex projects into the new project associated with
the new repository. In addition, we can save all the progress the Open edX translators
have accomplished by copying the Transifex's Translation Memory, the auto-translation
feature that allows for Projects with similar strings to be automatically translated,
from the old projects to this new one. Once older projects are made redundant by the new
project, they will be deprecated. By moving all the translation files to the same
repository we can increase the reach of the Transifex's Translation Memory feature to
help translate similar strings across the entire code/documentation base.

Get Translations Back for Deployment/Development
================================================

A new python library, called openedx-atlas, will be created. This will enable the
placement of the translation files kept in openedx-translations into locally cloned
repositories for development and containers containing the code translation files are
formed from. This tool will manage the placement of translation files through an editable
atlas configuration file (atlas.yml) kept in the repositories that have
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be an implementation detail, but I'm wary of configuration files. If there is a hardcoded atlas.yml file in a repo, then it becomes difficult to use settings that are different from the defaults. For instance, if a user wants to download translation strings for a language that is not present in edx-platform (a very common request) they will have to modify atlas.yml, and thus fork edx-platform. Nobody wants that.

Instead, I recommend to implement good defaults for openedx-atlas, and to make it possible to override defaults easily via the CLI. For instance, by default openedx-atlas could download strings for all languages with 50%+ verified rate (as i18-tool currently does) and we could add extra languages or modify the minimum required rate from the CLI.

Furthermore, my experience with openedx-i18n makes me doubt whether a CLI tool is even necessary. In Tutor translation strings are simply downloaded and moved to a dedicated directory with a single curl command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openedx-atlas already provides the ability to override default parameter values and an atlas.yml configuration file by using the CLI's flags.

atlas pull -r "openedx/openedx-translations" -b "main" -d "credentials"

This would pull translation files from the credentials directory in the main branch
from the repository openedx/openedx-translations.

There are many other features that could be beneficial, such as the ones you described in your comment. We can hash out these implementation details further in Follow-up Work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semi-related thread: #367 (comment)

translation files kept in openedx-translations. The atlas.yml file will support
options that allow for the concatenation, reorganization, and reformatting of translation
files as they are copied to their locations amongst the code. The atlas.yml file
will also support selecting which languages to be included in an Open edX deployment. The
tool will have to be used/ran as part of the setup of a repository, whether for
development or deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapting our development & deployment tooling seems like one of the biggest impact areas of this OEP. Has any research been done in regards to how openedx-atlas will be integrated into Open edX's supported development & deployment environments (devstack, Tutor, tubular, edx_ansible), and what work it will take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since openedx-atlas works as a configurable wrapper to git's sparse-checkout, openedx-atlas can be used to pull translations from openedx-translations right after git clone is used to pull the repository from GitHub. This should cover most supported development & deployment environments, though I'm curious if there are any others that may not work like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cover most supported development & deployment environments, though I'm curious if there are any others that may not work like this.

Having atlas built in python adds complexity to the deployment of MFEs. It seems there are a few paths that we could go down

The first option would be to keep atlas in python, and to use the python implementation everywhere

Pros:
* Flexibile/Extensibile
* Easy to maintain (only need to maintain one implementation)
Cons:
* Requires python in build/deployment envs

Another option would be to write atlas in bash, that way python would not be required in build/deployment environments for MFEs.

Pros:
* Runs (pretty much) everywhere
Cons:
* Bash can get messy when projects get complex

A third option would be to also write atlas functionality in js and package for NPM so MFEs (or frontend-build) could import it.

Pros:
* Easy integration for MFEs
* Flexible/extensible
Cons:
* Requires maintaining duplicated functionality (need to implement everything in both js and python)

If the scope of atlas stays limited, i'd lean towards the bash option. If there's an expectation of increased complexity within atlas, however, there are definite benefits to using full fledged programming languages.

Copy link
Member

@kdmccormick kdmccormick Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmtcril mentioned a fourth option: get rid of openedx-atlas, and publish releases of .tar.gz/.zip files from the openedx-translations repository

Pros:
* No sparse checkout logic needed, no separate CLI tool needed.
* All the complexity would be in the openedx-translations repo; other repos would just download & unpack the .tar.gzs.
Cons:
* Would need to add automation to the openedx-translations repo to regularly publish these
* The .tar.gzs would contain every language (or every supported language?)
  * It's unclear whether the .tar.gzs would be too big or not.
     edx-platform's locale directory is currently 60 Mb, gzips down to 14 Mb.
* Is there other stuff we think openedx-atlas would do that we'd be leaving out with this option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solely from the MFE repo perspective, I'm for gzips. Github workflows for publishing them on every commit would be pretty simple. And I very much doubt download sizes will be problematic. For all I know tx pull could be even less efficient. 🤷🏼

What would stop us from storing translations per-repo in opendx-translations, though? That would make each individual pull pretty tiny.

If that doesn't work, for whatever reason, I'd vote for bash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arbrandes

What would stop us from storing translations per-repo in opendx-translations, though? That would make each individual pull pretty tiny.

That is indeed the suggestion on the table -- one gzip published per repo 🙂 It's just that each gzip would have every language unless we wanted to publish R*L gzips, where R is the number of repos and L is the number of languages.


Impacts
*******

Impact on Translators
=====================

As we approach the end of the translation upgrade process, we will need to tactically
move from multiple Transifex projects to a single project. This will require coordination
with our translators to ensure that moving forward they are providing translations in the
right place.

Impact on Site Operators
========================

Currently the translation files for any given service or library is stored at the same
place as the code, which has generally simplified the deployment story in the past. With
this change, the translations files will move to their own repository. As we deprecate
the old translations files, the relevant deployment tooling will need to be updated to
pull down the translations from the new repository as a part of the deployment process.
This will impact both the old Ansible based tooling as well as any new docker based
tooling.

Impact on Developers
====================

While it won’t directly impact the day-to-day workflow of developers (unless you are
developing or testing with translation files), due to the same reasons that we impact
Comment on lines +217 to +218
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest there might be a non-negligible direct impact to a developer's workflow, which is to learn to use the new openedx-atlas tooling and to run (or re-run) it as necessary on pulling each repo a developer works in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learning how and when to use openedx-atlas may be a non-negligible learning task, but then again so is learning any new package. In the future, it can be added to Makefiles to minimize the learning load just as i18n-tools commands were.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the impression that for most dev workflows the atlas usage will be pretty straightforward. Any friction can also be mitigated by ensuring repos include a section in the readme describing any simple atlas commands a dev may need to run. Furthermore, atlas use can be integrated into any scripts that currently depend on translation files existing within the repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-smith-tcril, I really like the README addition idea and integrating openedx-atlas into any scripts we currently use within our repos.

I definitely agree the impact will be minimal; it's clear openedx-atlas was designed with that in mind. But minimal impact does exist, and I think developer education is a great way to address that impact.

site operators (new translations location), we will have to update development tools as
well. In addition, we will create new instructions for developers on how to enable
translations for a new service/repo when it comes online.

Locations
*********

Dumps of the translation/localization files from Transifex for the Open edX Release
project already exist in a repository with the name of openedx/openedx-i18n. A new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deprecate this openedx/openedx-i18n repo as part of this OEP too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get @regisb's input on whether the proposed solution would allow us to also deprecate openedx/openedx-i18n before we commit to removing it in this OEP.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new openedx-translations repo contains strings for all languages and repos and it follows the regular minor/major release schedule, then yes we will be able to get rid of the openedx-i18n repo. I'm looking forward to it :)

repository named openedx/openedx-translations will be similarly structured, but it will
contain the translation files for all repositories within openedx. The
`Transifex GitHub App`_ will be installed in the openedx organization. Similar to how the
Build-Test-Release Working Group creates a new branch for each new named release of
edx-platform, translation releases will also be kept in branches corresponding to
edx-platform releases.

Rejected Alternatives
*********************

Rewriting the Current Tooling for the New API
=============================================

The source code for the edx-transifex-bot can be found in `ecommerce-scripts`_. We could
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deprecate the edx-transifex-bot as part of this OEP too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OEP-58 will limit the actions the edx-transifex-bot, but we cannot fully deprecate the bot because it would be instead used as a virtual user that will act as the point of contact in the github action that generates translation source files and the GitHub Transifex App.

Copy link
Contributor

@brian-smith-tcril brian-smith-tcril Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to add context here, if you look through https://github.com/openedx/openedx-translations/commits/main you'll see commits by edx-transifex-bot with messages that start with chore: add extracted translation source files

rewrite the current tooling to try to solve the problems encountered in the last two Open
edX releases and upgrade to the new API, but this approach is a patch-up job that will
not address several other issues mentioned and would have to be undertaken by the
community member with exclusive access to the legacy infrastructure currently running the
edx-transifex-bot.

Making a Transifex Project for Each Repository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the administrative overhead behind making a Transifex Project for each repository be relieved by creating tooling around user and project management features of the Transifex API?

On the other hand, I'd guess an argument to add is that Transifex translation memory groups are not available at the open source tier of Transifex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but then we have to consider that a mostly non-technical Translation WG has to learn how to use the Transifex API. And then Translators may have to be part of 100+ teams to translate files for each repository, and learn to navigate the hundreds of projects to see their progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshiu for this and some other comments, it would be useful to spell out your use cases (both ones you currently follow and new ones you're proposing - and for the latter, justification to the need)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarina Sorry for the confusion! My comment was meant to address the rejected alternative of making a Transifex project for each repository. The suggestion was that perhaps tooling could be created to remedy the concerns that caused this alternative be rejected. (For example, perhaps a bot that auto-adds and removes translators from all relevant projects.)

@Carlos-Muniz I thought that even in a consolidated repository scenario, translators still have to navigate one Transifex Resource per original source repository (resulting in the same problem)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshiu Translators are capable of navigating between Transifex Resources (Translation files) within one Transifex Project, as they do that already with the current Translation Project "edx-platform". File structures are flattened within a transifex project, so they will not have to search through directories to find the resources they are to translate. But having to look between 100+ Transifex projects for 1-2 resources each project, and then expecting Transifex Admin to be able to track the progress of each of the languages for each of the projects becomes a far more labor intensive alternative.

Tooling + a dashboard to track progress would have to be made at the very least. And with all those extra moving parts, we may find ourselves with a pile of spaghetti that would be very difficult to untangle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Carlos-Muniz After diving into Transifex's UI, here's what I now understand:

That last bullet means (like you said) that to translate or review a string across projects, a translator will either have to:

  1. Click on the editor links in the string search, which loads an editor in a new tab, or
  2. Know which project a string belongs to and switch to that project in the editor's project dropdown menu.

Both are tedious for the translator and a bad user experience. The fact that the Transifex editor does not support editing strings across multiple projects is (at least for me) the winning argument justifying the consolidation of all translation strings into a single Transifex project.

I didn't understand this when I first read the OEP, so I really appreciate your time in enlightening me. Thanks for your help.

==============================================

As translation support is provided for more repos, the effort to maintain the
translations infrastructure increases. A Transifex Project houses the content to be
translated and needs to be created before any content can be added for translation.
Transifex Projects can only support one GitHub repository each and need to be maintained
separately. Maintaining a Transifex Project involves adjusting configuration files,
adding new languages, assigning translators to projects, or any other miscellaneous
irregular tasks that would be time-consuming at a larger scale. If we add a Transifex
Project, each Transifex Project will need to be maintained separately, making debugging
issues or tracking the progress of each Transifex Project time-consuming.

.. _Transifex GitHub App: https://github.com/apps/transifex-integration
.. _Transifex GitHub Integration configuration file: https://docs.transifex.com/transifex-github-integrations/github-tx-ui#linking-a-specific-project-with-a-github-repository