• text corpus
  • given names
    • Census.gov
    • Given Name Frequency Project
    • data.govt.nz
    • wiktionary (downloadable here)
  • CommonCrawl.org — web-crawling data
  • more at Wikipedia's Category:Open_data