NASK has published a new variant of the Polish PLLuM artificial intelligence model

The new PLLuM variant published on Wednesday was trained on a larger dataset, fine-tuned for new tasks, including official ones, and better secured against attacks, the NASK Institute announced in a press release. The updated Polish model was released in three versions.
As indicated by NASK-PIB, the new variant of the Polish AI model, PLLuM-12B-nc-250715, was trained on a "significantly better" prepared and larger dataset; it was also tuned for new tasks, including official tasks.
The model was trained on texts from the gov.pl domain, the Public Information Bulletin, and the Science Library, among others. "The data is collected in full compliance with Polish and European law," emphasized Dr. Agnieszka Karlińska from NASK's Linguistic Engineering and Text Analysis Department in a press release.
As reported, the updated PLLuM is available in three variants: basic, instructional, adapted to a range of tasks, and "trained," meaning protected against abuse. This latest version is "the most advanced," NASK emphasized. It is based on sets of prompts (queries - PAP) and responses rated by humans as better or worse, which helps it respond more precisely and securely during conversations.
Models from this series are available on the Hugging Face platform, from where any user can download them after completing the form, it added.
"From the outset, we have taken the position that mass copying of ready-made AI models, so-called strong LLMs, is associated with a number of risks. Therefore, we are developing a methodology for the controlled generation of synthetic data, i.e., data that is created using other models but is verified and validated by humans," said Dr. Piotr Pęzik, professor at the University of Lodz, operational manager of the HIVE AI project, responsible for the Polish model. This allows PLLuM to better understand the Polish cultural context, respond more precisely, and generate fewer unnecessary and random words, the authors explained.
The model was also further secured against attacks, NASK assured. Vulnerability tests showed that the effectiveness of prompt attacks was reduced to 2-3 cases per 100 attempts. This is significantly less than with other open models, NASK noted. Prompt injection attacks are a technique in which an attacker sends "malicious" instructions, bypassing model security measures, which can result in, for example, generating malicious responses or disclosing confidential information.
Dr. Karlińska announced that the HIVE consortium will "soon" present the second product from the PLLuM family - a prototype of a citizen assistant (chatbot) that will be used by researchers to collect prompts for the implementation of PLLuM models in the mObywatel application.
NASK announced that further releases from the HIVE AI consortium will be announced in the coming weeks.
PLLuM is a language model created for government, businesses, and researchers, as well as for citizens – in the form of a chatbot. It premiered at the end of February this year. At that time, the Ministry of Digital Affairs announced the establishment of HIVE AI, a consortium of Polish research centers and institutions focused on digital services, led by NASK-PIB. The consortium is developing new Polish-language PLLuM language models and implementing them in public administration units. (PAP)
mbl/ mick/
The PAP Foundation permits free reprinting of articles from the Nauka w Polsce website, provided that you notify us by email once a month of your use of the website and cite the source of the article. On portals and websites, please include the linked address: Source: naukawpolsce.pl, and in journals, please include the annotation: Source: Nauka w Polsce website - naukawpolsce.pl. This permission does not apply to information in the "World" category or any photographs or video materials.
naukawpolsce.pl