Apertus: Swiss Open Source LLM

EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS) have released Apertus, Switzerland’s first large-scale open, multilingual language model:

Apertus is designed with transparency at its core, thereby ensuring full reproducibility of the training process. Alongside the models, the research team has published a range of resources: comprehensive documentation and source code of the training process and datasets used, model weights including intermediate checkpoints – all released under a permissive open-source license, which also allows for commercial use. The terms and conditions are available via Hugging Face.

Apertus was developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins.

Apertus is available on Hugging Face:

2 Likes

The performance isn’t really there yet, but this is a great step forwards. It would be interesting to see if this system meets the OSAID !

Edit: it looks like they have an acceptable use policy :frowning:

The AUP contains obligations that are not present in the Apache-2.0 license. However, even if someone agrees to the AUP on Hugging Face to download the Apertus model, if they subsequently redistribute the unmodified model under Apache-2.0, downstream recipients need only comply with Apache-2.0 and do not need to agree to the AUP (the redistributor’s own AUP obligations may still remain).

I do find this part of the use policy interesting:

The training data and the Apertus LLM may contain or generate information that directly or indirectly refers to an identifiable individual (Personal Data). You process Personal Data as independent controller in accordance with applicable data protection law. SNAI will regularly provide a file with hash values for download which you can apply as an output filter to your use of our Apertus LLM. The file reflects data protection deletion requests which have been addressed to SNAI as the developer of the Apertus LLM. It allows you to remove Personal Data contained in the model output. We strongly advise downloading and applying this output filter from SNAI every six months following the release of the model.

In particular, they ask you to:

process Personal Data as independent controller in accordance with applicable data protection law.

I wonder if use policies that require you to “comply with all applicable law” violates the OSD. It seems to me that this is redundant and would apply to you anyways.

As another example, I like how gpt-oss has a simple straightforward use policy:

We aim for our tools to be used safely, responsibly, and democratically,
while maximizing your control over how you use them.
By using OpenAI gpt-oss-120b, you agree to comply with all applicable law.

I think that it goes without saying that people have to apply with applicable law, because it’s the law. I don’t think we actually need to proactively mention it tbh!

1 Like

In the Apertus AUP, I believe the following clause could constitute an additional restriction:

By using the Apertus LLM you agree to indemnify, defend, and hold harmless ETH Zurich and EPFL against any third-party claims arising from your use of Apertus LLM.