---
license: cc-by-nc-4.0
---
# WSC-ASR-eval

To address the unique linguistic characteristics of Sichuang in speech recognition, we propose **WSC-ASR-eval**, a benchmark specifically designed for evaluating Sichuang ASR systems. It is tailored to assess model performance across diverse lengths, domains, and linguistic phenomena of Sichuang speech.

The test set annotations are provided by Beijing AISHELL Technology Co., Ltd.

## Annotation Rules
- Annotate the text, gender, age and emotional tone;
- For multi-speaker audio (where speakers are distinguishable): annotate the text in chronological order, enclose the text of non-primary speakers in parentheses (); paralinguistic information shall be annotated based on the primary speaker;
- For English content: spell out uppercase letters with spaces between them and use lowercase for words; no spaces between English and Chinese text;
- () indicate speech from non-primary speakers that is inaudible;
- (Text content) indicate audible speech from non-primary speakers;
- *indicates that some characters/words are inaudible or their exact content is uncertain.

## Subset Division
- short/long：0~10s is classified as short，10~30 is classified as slong，The two are merged into the full version of WenetSpeech-Chuan-Eval-ASR.
- Easy/Hard：Classified by the difficulty level of recognition scenarios, the Easy scenario mainly comprises quiet settings (audiobooks, recitations), while the Hard scenario features noisy environments (live streams, short videos, etc.). These two parts are merged into the full version of WenetSpeech-Chuan-Eval-ASR, which is consistent with the subset division presented in the paper.
- NOTE：The Long subset provides the original long audio clips, while the Easy and Hard subsets utilize the resulting short audio clips obtained by segmenting the former.

## Standardized Processing
- No punctuation;
- Delete *
- Keep content in (), discard ().

