打印本文 打印本文  關閉窗口 關閉窗口  
      美國國家語料庫(ANC)介紹
      作者:admin  文章來源:本站原創  點擊數  更新時間:2011-11-16  文章錄入:admin  責任編輯:admin



      美國國家語料庫(ANC)介紹

       

      (歡迎收藏本頁)

       

      ANC = The American National Corpus美國國家語料庫

      http://www.anc.org/ 

       

      美國國家語料庫(American National CorpusANC)是目前規模最大的關于美國英語使用現狀的語料庫,它包括從1990年起的各種文字材料、口頭材料的文字記錄。ANC已出版過兩個版本,第一個版本包含1,000萬口語和書面語美式英語詞匯,第二個版本則包含了2,200萬口語和書面語美式英語詞匯。

      The First Release of the ANC

      The First Release of the ANC is a beta version. It contains over 10,000,000 words of written and spoken American English, annotated for lemma and part of speech. It is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium.

      The texts included in the first 10 million words of the ANC are those that were first received. Therefore the corpus is not balanced. There has been no hand-validation of the XML tagging or the part of speech annotation tags. Headers are minimal, although they contain fairly complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

      One of the aims of releasing this first 10 million words is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu .

      The Second Release of the ANC

      The Second Release of the American National Corpus contains over 22,000,000 words of written and spoken American English, annotated for lemma, part of speech, noun chunks, and verb chunks. Part of speech tags using the Penn tagset are included for all data in the Second Release, and many documents are also PoS-tagged using the Biber tagset.

      The ANC Second Release is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium. Please consult the LDC Catalog entry for the ANC Second Release.

      The First and Second Releases of the ANC include materials which have been acquired to date, and therefore the current release of the ANC is not balanced. There has been no hand-validation of the XML tagging or the annotation. Headers are typically minimal, although most contain complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

      One of the aims of the Second Release is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu.

      ANC address:

      http://www.anc.org/

      more corpus addresses:

      /Article/201111/2702.html 

       

      打印本文 打印本文  關閉窗口 關閉窗口  
      主站蜘蛛池模板: 狠狠精品干练久久久无码中文字幕| 三上悠亚中文字幕在线| 一本大道加勒比久久综合| 男女下面无遮挡一进一出| 思思久久99热只有频精品66| 人妻久久久一区二区三区| www.中文字幕在线| 精品国产不卡一区二区三区| 在线观看国产小视频| 亚洲综合视频网| h片在线免费观看| 欧美va天堂视频在线| 国产精品久久现线拍久青草| 久久天天躁狠狠躁夜夜躁2014| 精品日产卡一卡乱码| 在人间免费观看未删减| 亚洲av丰满熟妇在线播放| 老司机午夜免费福利视频| 扒开女人双腿猛进猛出免费视频 | 尹人久久久香蕉精品| 免费人成再在线观看网站| mm1313亚洲精品无码又大又粗| 男人和女人做爽爽视频| 天堂网www在线资源中文| 亚洲va无码va在线va天堂| 美女视频黄频a免费观看| 国外成人免费高清激情视频| 久久国产欧美日韩精品| 老子影院我不卡| 好男人好资源影视在线| 亚洲人成77777在线播放网站| 羞羞视频在线播放| 国产麻豆91在线| 亚洲一区精品视频在线| 老司机午夜福利视频| 国产老师的丝袜在线看| 亚洲中文字幕av在天堂| 羞羞网站在线观看| 国产精品美女久久久久久久| 亚洲av永久无码精品水牛影视| 羽田真理n1170在线播放|