打印本文 打印本文  關閉窗口 關閉窗口  
      美國國家語料庫(ANC)介紹
      作者:admin  文章來源:本站原創  點擊數  更新時間:2011-11-16  文章錄入:admin  責任編輯:admin



      美國國家語料庫(ANC)介紹

       

      (歡迎收藏本頁)

       

      ANC = The American National Corpus美國國家語料庫

      http://www.anc.org/ 

       

      美國國家語料庫(American National CorpusANC)是目前規模最大的關于美國英語使用現狀的語料庫,它包括從1990年起的各種文字材料、口頭材料的文字記錄。ANC已出版過兩個版本,第一個版本包含1,000萬口語和書面語美式英語詞匯,第二個版本則包含了2,200萬口語和書面語美式英語詞匯。

      The First Release of the ANC

      The First Release of the ANC is a beta version. It contains over 10,000,000 words of written and spoken American English, annotated for lemma and part of speech. It is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium.

      The texts included in the first 10 million words of the ANC are those that were first received. Therefore the corpus is not balanced. There has been no hand-validation of the XML tagging or the part of speech annotation tags. Headers are minimal, although they contain fairly complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

      One of the aims of releasing this first 10 million words is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu .

      The Second Release of the ANC

      The Second Release of the American National Corpus contains over 22,000,000 words of written and spoken American English, annotated for lemma, part of speech, noun chunks, and verb chunks. Part of speech tags using the Penn tagset are included for all data in the Second Release, and many documents are also PoS-tagged using the Biber tagset.

      The ANC Second Release is available for research and education for a nominal licensing fee from the Linguistic Data Consortium. Commercial users can obtain the corpus and gain rights to use it in commercial products by joining the ANC Consortium. Please consult the LDC Catalog entry for the ANC Second Release.

      The First and Second Releases of the ANC include materials which have been acquired to date, and therefore the current release of the ANC is not balanced. There has been no hand-validation of the XML tagging or the annotation. Headers are typically minimal, although most contain complete information concerning domain, subdomain, subject, audience, and medium. Check the list of known bugs and caveats for a description of the limitations we are currently aware of.

      One of the aims of the Second Release is to get feedback from the community about its structure and annotation, so that modifications can be made, if necessary, for the final release of the full 100 million words. We therefore invite comments and bug reports from the community of ANC users. Please contact anc@cs.vassar.edu.

      ANC address:

      http://www.anc.org/

      more corpus addresses:

      /Article/201111/2702.html 

       

      打印本文 打印本文  關閉窗口 關閉窗口  
      主站蜘蛛池模板: 国产在线观看麻豆91精品免费| 中文国产成人精品久久下载| 玉蒲团之风雨山庄| 国产后入又长又硬| 78成人精品电影在线播放 | 欧美另类videosbestsex高清| 军人武警gay男同gvus69| 黑人xxxx日本| 国产超碰人人爽人人做人人添| 中文字幕丰满乱子伦无码专区| 欧产日产国产精品| 亚洲精品美女久久7777777| 老司机深夜福利视频| 国产成人精品无码一区二区| 91精品国产高清91久久久久久| 忍者刺客在线观看完整中文免费版| 久久永久免费人妻精品下载| 欧美日本在线一区二区三区| 免费a级毛片网站| 老熟女高潮一区二区三区| 国产成人精品一区二区三在线观看 | 免费国产黄网站在线观看视频 | 精品国产精品久久一区免费式| 国产在线视频色综合| 在线观看91精品国产入口| 天堂草原电视剧在线观看免费| 中文字幕日韩一区二区三区不卡| 最新国产中文字幕| 人人爽人人爽人人片a免费| 美国式禁忌免费| 国产亚AV手机在线观看| 国模欢欢炮交150视频| 国产精品美女久久久久久2018| heisiav1| 成人免费毛片观看| 久久久久久亚洲精品不卡| 最近2018免费中文字幕视频| 免费A级毛视频| 精品女同一区二区三区在线| 国产主播一区二区三区| 麻豆精品传媒一二三区在线视频 |