il y a 3 ans · 318907ad3d
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,5 @@
 .DS_Store
 config.env
 config.env.prod
 data
 web/app/__pycache__/
--- a/LICENSE
+++ b/LICENSE
@@ -0,0 +1,21 @@
 MIT License

 Copyright (c) 2021 Simon Bowie <ad7588@coventry.ac.uk>

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is furnished
 to do so, subject to the following conditions:

 The above copyright notice and this permission notice (including the next
 paragraph) shall be included in all copies or substantial portions of the
 Software.

 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
 FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS
 OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
 OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,53 @@
 # Archival Conversations patents data search engine

 This repository contains the Docker Compose, Nginx, Python, and Solr config files for deploying the development environment for the Archival Conversations patents data search engine site.

 ## to deploy environment

 ### config.env

 To deploy this environment, first copy config.env.template to a new file, config.env. Fill in the appropriate environment variables.

 Note that on Mac the Python container has to communicate with the Solr container using the hostname 'host.docker.internal' rather than 'localhost' or '127.0.0.1': https://stackoverflow.com/questions/24319662/from-inside-of-a-docker-container-how-do-i-connect-to-the-localhost-of-the-mach

 On Linux, you can use the container name e.g. 'solr' as the Solr hostname in config.env.

 ### Docker Compose

 In the command line, navigate to the directory where this repository is stored on your local machine and run:

 `docker-compose up -d --build`

 Docker should build the application environment comprising a Python container (including ImageMagick), an Apache Solr container (deployed Solr for .rtf indexing using instructions at: https://github.com/docker-solr/docker-solr), and an Nginx web server to serve the website.

 The website should then be available in the browser at 'localhost:5000'.

 To take down the environment, run:

 `docker-compose down`

 ## populating Apache Solr

 In order to fill the site with documents, you will have to populate the Apache Solr search engine. There is a solr_import.sh script to help with this. Place whatever files you want indexed in a directory called 'data' within the main directory.

 In solr_import.sh, change the directory to point to the main directory and, if necessary, change the location parameters for the various cores.

 We use different Solr cores for the different themes on the site: 'all' is a core containing all documents while 'active', 'expanding', etc. contain only documents for that theme.

 ### legacy Solr commands

 This section should be fully superseded by solr_import.sh and including the Solr config in the repository. These are left here for reference.

 Created core using:

 `docker exec -it solr solr create_core -c epo_data`

 Note this fix to ensure that .rtf files can be indexed using Apache Tika: https://gitmemory.com/issue/docker-solr/docker-solr/341/682877640. Once you've created the core, run these commands:

 `docker exec -ti --user=solr solr bash -c 'cp -r /opt/solr/example/files/conf/* /var/solr/data/{CORE_NAME}/conf/'`

 `docker restart solr`

 Add files to Solr using:

 `docker run --rm -v "/Users/ad7588/Downloads/2018 (10381):/2018" --network=host solr:latest post -c epo_data /2018`
--- a/config.env.template
+++ b/config.env.template
@@ -0,0 +1,23 @@
 # This config file contains the environment variables for the application

 # Flask variables
 FLASK_APP=app/__init__.py
 FLASK_RUN_HOST=0.0.0.0
 FLASK_DEBUG=1

 # Solr variables
 # Hostname for Solr
 SOLR_HOSTNAME=
 # Solr port, usually 8983
 SOLR_PORT=
 # Solr core, usually all
 SOLR_CORE=

 # OPS API variables
 # Hostname for OPS API, usually https://ops.epo.org
 OPS_URL=
 # Hostname for OPS API for images for some reason different to above, usually http://ops.epo.org
 OPS_URL_IMAGES=
 # API credentials from OPS https://developers.epo.org/
 CONSUMER_KEY=
 CONSUMER_SECRET=
--- a/docker-compose.prod.yml
+++ b/docker-compose.prod.yml
@@ -0,0 +1,37 @@
 version: '3.9'

 services:

  python:
    build: ./web
    container_name: python
    expose:
      - 5000
    env_file:
      - ./config.env.prod
    volumes:
      - ./web:/code
    command: gunicorn --bind 0.0.0.0:5000 "app:create_app()"

  nginx:
    image: nginx:latest
    container_name: nginx
    restart: unless-stopped
    ports:
      - "1337:80"
    volumes:
      - ./nginx-conf:/etc/nginx/conf.d
    depends_on:
      - python

  solr:
    container_name: solr
    image: solr:latest
    ports:
      - '8983:8983'
    volumes:
      - solrdata:/var/solr
      - ./solr_config:/opt/solr/server/solr/configsets/custom

 volumes:
  solrdata:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -0,0 +1,25 @@
 version: '3.9'

 services:

  python:
    build: ./web
    container_name: python
    ports:
      - "5000:5000"
    volumes:
      - ./web:/code
    env_file:
      - ./config.env

  solr:
    container_name: solr
    image: solr:latest
    ports:
      - '8983:8983'
    volumes:
      - solrdata:/var/solr
      - ./solr_config:/opt/solr/server/solr/configsets/custom

 volumes:
  solrdata:
--- a/nginx-conf/patents.conf
+++ b/nginx-conf/patents.conf
@@ -0,0 +1,16 @@
 upstream patents {
    server python:5000;
 }

 server {

    listen 80;

    location / {
        proxy_pass http://patents;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
    }

 }
--- a/solr_config/currency.xml
+++ b/solr_config/currency.xml
@@ -0,0 +1,67 @@
 <?xml version="1.0" ?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->

 <!-- Example exchange rates file for CurrencyField type named "currency" in example schema -->

 <currencyConfig version="1.0">
  <rates>
    <!-- Updated from http://www.exchangerate.com/ at 2011-09-27 -->
    <rate from="USD" to="ARS" rate="4.333871" comment="ARGENTINA Peso" />
    <rate from="USD" to="AUD" rate="1.025768" comment="AUSTRALIA Dollar" />
    <rate from="USD" to="EUR" rate="0.743676" comment="European Euro" />
    <rate from="USD" to="BRL" rate="1.881093" comment="BRAZIL Real" />
    <rate from="USD" to="CAD" rate="1.030815" comment="CANADA Dollar" />
    <rate from="USD" to="CLP" rate="519.0996" comment="CHILE Peso" />
    <rate from="USD" to="CNY" rate="6.387310" comment="CHINA Yuan" />
    <rate from="USD" to="CZK" rate="18.47134" comment="CZECH REP. Koruna" />
    <rate from="USD" to="DKK" rate="5.515436" comment="DENMARK Krone" />
    <rate from="USD" to="HKD" rate="7.801922" comment="HONG KONG Dollar" />
    <rate from="USD" to="HUF" rate="215.6169" comment="HUNGARY Forint" />
    <rate from="USD" to="ISK" rate="118.1280" comment="ICELAND Krona" />
    <rate from="USD" to="INR" rate="49.49088" comment="INDIA Rupee" />
    <rate from="USD" to="XDR" rate="0.641358" comment="INTNL MON. FUND SDR" />
    <rate from="USD" to="ILS" rate="3.709739" comment="ISRAEL Sheqel" />
    <rate from="USD" to="JPY" rate="76.32419" comment="JAPAN Yen" />
    <rate from="USD" to="KRW" rate="1169.173" comment="KOREA (SOUTH) Won" />
    <rate from="USD" to="KWD" rate="0.275142" comment="KUWAIT Dinar" />
    <rate from="USD" to="MXN" rate="13.85895" comment="MEXICO Peso" />
    <rate from="USD" to="NZD" rate="1.285159" comment="NEW ZEALAND Dollar" />
    <rate from="USD" to="NOK" rate="5.859035" comment="NORWAY Krone" />
    <rate from="USD" to="PKR" rate="87.57007" comment="PAKISTAN Rupee" />
    <rate from="USD" to="PEN" rate="2.730683" comment="PERU Sol" />
    <rate from="USD" to="PHP" rate="43.62039" comment="PHILIPPINES Peso" />
    <rate from="USD" to="PLN" rate="3.310139" comment="POLAND Zloty" />
    <rate from="USD" to="RON" rate="3.100932" comment="ROMANIA Leu" />
    <rate from="USD" to="RUB" rate="32.14663" comment="RUSSIA Ruble" />
    <rate from="USD" to="SAR" rate="3.750465" comment="SAUDI ARABIA Riyal" />
    <rate from="USD" to="SGD" rate="1.299352" comment="SINGAPORE Dollar" />
    <rate from="USD" to="ZAR" rate="8.329761" comment="SOUTH AFRICA Rand" />
    <rate from="USD" to="SEK" rate="6.883442" comment="SWEDEN Krona" />
    <rate from="USD" to="CHF" rate="0.906035" comment="SWITZERLAND Franc" />
    <rate from="USD" to="TWD" rate="30.40283" comment="TAIWAN Dollar" />
    <rate from="USD" to="THB" rate="30.89487" comment="THAILAND Baht" />
    <rate from="USD" to="AED" rate="3.672955" comment="U.A.E. Dirham" />
    <rate from="USD" to="UAH" rate="7.988582" comment="UKRAINE Hryvnia" />
    <rate from="USD" to="GBP" rate="0.647910" comment="UNITED KINGDOM Pound" />
    
    <!-- Cross-rates for some common currencies -->
    <rate from="EUR" to="GBP" rate="0.869914" />  
    <rate from="EUR" to="NOK" rate="7.800095" />  
    <rate from="GBP" to="NOK" rate="8.966508" />  
  </rates>
 </currencyConfig>
--- a/solr_config/elevate.xml
+++ b/solr_config/elevate.xml
@@ -0,0 +1,42 @@
 <?xml version="1.0" encoding="UTF-8" ?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->

 <!-- If this file is found in the config directory, it will only be
     loaded once at startup.  If it is found in Solr's data
     directory, it will be re-loaded every commit.

   See http://wiki.apache.org/solr/QueryElevationComponent for more info

 -->
 <elevate>
 <!-- Query elevation examples
  <query text="foo bar">
    <doc id="1" />
    <doc id="2" />
    <doc id="3" />
  </query>

 for use with techproducts example
 
  <query text="ipod">
    <doc id="MA147LL/A" />  put the actual ipod at the top 
    <doc id="IW-02" exclude="true" /> exclude this cable
  </query>
 -->

 </elevate>
--- a/solr_config/email_url_types.txt
+++ b/solr_config/email_url_types.txt
@@ -0,0 +1,2 @@
 <URL>
 <EMAIL>
--- a/solr_config/lang/contractions_ca.txt
+++ b/solr_config/lang/contractions_ca.txt
@@ -0,0 +1,8 @@
 # Set of Catalan contractions for ElisionFilter
 # TODO: load this as a resource from the analyzer and sync it in build.xml
 d
 l
 m
 n
 s
 t
--- a/solr_config/lang/contractions_fr.txt
+++ b/solr_config/lang/contractions_fr.txt
@@ -0,0 +1,15 @@
 # Set of French contractions for ElisionFilter
 # TODO: load this as a resource from the analyzer and sync it in build.xml
 l
 m
 t
 qu
 n
 s
 j
 d
 c
 jusqu
 quoiqu
 lorsqu
 puisqu
--- a/solr_config/lang/contractions_ga.txt
+++ b/solr_config/lang/contractions_ga.txt
@@ -0,0 +1,5 @@
 # Set of Irish contractions for ElisionFilter
 # TODO: load this as a resource from the analyzer and sync it in build.xml
 d
 m
 b
--- a/solr_config/lang/contractions_it.txt
+++ b/solr_config/lang/contractions_it.txt
@@ -0,0 +1,23 @@
 # Set of Italian contractions for ElisionFilter
 # TODO: load this as a resource from the analyzer and sync it in build.xml
 c
 l 
 all 
 dall 
 dell 
 nell 
 sull 
 coll 
 pell 
 gl 
 agl 
 dagl 
 degl 
 negl 
 sugl 
 un 
 m 
 t 
 s 
 v 
 d
--- a/solr_config/lang/hyphenations_ga.txt
+++ b/solr_config/lang/hyphenations_ga.txt
@@ -0,0 +1,5 @@
 # Set of Irish hyphenations for StopFilter
 # TODO: load this as a resource from the analyzer and sync it in build.xml
 h
 n
 t
--- a/solr_config/lang/stemdict_nl.txt
+++ b/solr_config/lang/stemdict_nl.txt
@@ -0,0 +1,6 @@
 # Set of overrides for the dutch stemmer
 # TODO: load this as a resource from the analyzer and sync it in build.xml
 fiets	fiets
 bromfiets	bromfiets
 ei	eier
 kind	kinder
--- a/solr_config/lang/stoptags_ja.txt
+++ b/solr_config/lang/stoptags_ja.txt
@@ -0,0 +1,420 @@
 #
 # This file defines a Japanese stoptag set for JapanesePartOfSpeechStopFilter.
 #
 # Any token with a part-of-speech tag that exactly matches those defined in this
 # file are removed from the token stream.
 #
 # Set your own stoptags by uncommenting the lines below.  Note that comments are
 # not allowed on the same line as a stoptag.  See LUCENE-3745 for frequency lists,
 # etc. that can be useful for building you own stoptag set.
 #
 # The entire possible tagset is provided below for convenience.
 #
 #####
 #  noun: unclassified nouns
 #名詞
 #
 #  noun-common: Common nouns or nouns where the sub-classification is undefined
 #名詞-一般
 #
 #  noun-proper: Proper nouns where the sub-classification is undefined 
 #名詞-固有名詞
 #
 #  noun-proper-misc: miscellaneous proper nouns
 #名詞-固有名詞-一般
 #
 #  noun-proper-person: Personal names where the sub-classification is undefined
 #名詞-固有名詞-人名
 #
 #  noun-proper-person-misc: names that cannot be divided into surname and 
 #  given name; foreign names; names where the surname or given name is unknown.
 #  e.g. お市の方
 #名詞-固有名詞-人名-一般
 #
 #  noun-proper-person-surname: Mainly Japanese surnames.
 #  e.g. 山田
 #名詞-固有名詞-人名-姓
 #
 #  noun-proper-person-given_name: Mainly Japanese given names.
 #  e.g. 太郎
 #名詞-固有名詞-人名-名
 #
 #  noun-proper-organization: Names representing organizations.
 #  e.g. 通産省, NHK
 #名詞-固有名詞-組織
 #
 #  noun-proper-place: Place names where the sub-classification is undefined
 #名詞-固有名詞-地域
 #
 #  noun-proper-place-misc: Place names excluding countries.
 #  e.g. アジア, バルセロナ, 京都
 #名詞-固有名詞-地域-一般
 #
 #  noun-proper-place-country: Country names. 
 #  e.g. 日本, オーストラリア
 #名詞-固有名詞-地域-国
 #
 #  noun-pronoun: Pronouns where the sub-classification is undefined
 #名詞-代名詞
 #
 #  noun-pronoun-misc: miscellaneous pronouns: 
 #  e.g. それ, ここ, あいつ, あなた, あちこち, いくつ, どこか, なに, みなさん, みんな, わたくし, われわれ
 #名詞-代名詞-一般
 #
 #  noun-pronoun-contraction: Spoken language contraction made by combining a 
 #  pronoun and the particle 'wa'.
 #  e.g. ありゃ, こりゃ, こりゃあ, そりゃ, そりゃあ 
 #名詞-代名詞-縮約
 #
 #  noun-adverbial: Temporal nouns such as names of days or months that behave 
 #  like adverbs. Nouns that represent amount or ratios and can be used adverbially,
 #  e.g. 金曜, 一月, 午後, 少量
 #名詞-副詞可能
 #
 #  noun-verbal: Nouns that take arguments with case and can appear followed by 
 #  'suru' and related verbs (する, できる, なさる, くださる)
 #  e.g. インプット, 愛着, 悪化, 悪戦苦闘, 一安心, 下取り
 #名詞-サ変接続
 #
 #  noun-adjective-base: The base form of adjectives, words that appear before な ("na")
 #  e.g. 健康, 安易, 駄目, だめ
 #名詞-形容動詞語幹
 #
 #  noun-numeric: Arabic numbers, Chinese numerals, and counters like 何 (回), 数.
 #  e.g. 0, 1, 2, 何, 数, 幾
 #名詞-数
 #
 #  noun-affix: noun affixes where the sub-classification is undefined
 #名詞-非自立
 #
 #  noun-affix-misc: Of adnominalizers, the case-marker の ("no"), and words that 
 #  attach to the base form of inflectional words, words that cannot be classified 
 #  into any of the other categories below. This category includes indefinite nouns.
 #  e.g. あかつき, 暁, かい, 甲斐, 気, きらい, 嫌い, くせ, 癖, こと, 事, ごと, 毎, しだい, 次第, 
 #       順, せい, 所為, ついで, 序で, つもり, 積もり, 点, どころ, の, はず, 筈, はずみ, 弾み, 
 #       拍子, ふう, ふり, 振り, ほう, 方, 旨, もの, 物, 者, ゆえ, 故, ゆえん, 所以, わけ, 訳,
 #       わり, 割り, 割, ん-口語/, もん-口語/
 #名詞-非自立-一般
 #
 #  noun-affix-adverbial: noun affixes that that can behave as adverbs.
 #  e.g. あいだ, 間, あげく, 挙げ句, あと, 後, 余り, 以外, 以降, 以後, 以上, 以前, 一方, うえ, 
 #       上, うち, 内, おり, 折り, かぎり, 限り, きり, っきり, 結果, ころ, 頃, さい, 際, 最中, さなか, 
 #       最中, じたい, 自体, たび, 度, ため, 為, つど, 都度, とおり, 通り, とき, 時, ところ, 所, 
 #       とたん, 途端, なか, 中, のち, 後, ばあい, 場合, 日, ぶん, 分, ほか, 他, まえ, 前, まま, 
 #       儘, 侭, みぎり, 矢先
 #名詞-非自立-副詞可能
 #
 #  noun-affix-aux: noun affixes treated as 助動詞 ("auxiliary verb") in school grammars 
 #  with the stem よう(だ) ("you(da)").
 #  e.g.  よう, やう, 様 (よう)
 #名詞-非自立-助動詞語幹
 #  
 #  noun-affix-adjective-base: noun affixes that can connect to the indeclinable
 #  connection form な (aux "da").
 #  e.g. みたい, ふう
 #名詞-非自立-形容動詞語幹
 #
 #  noun-special: special nouns where the sub-classification is undefined.
 #名詞-特殊
 #
 #  noun-special-aux: The そうだ ("souda") stem form that is used for reporting news, is 
 #  treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the base 
 #  form of inflectional words.
 #  e.g. そう
 #名詞-特殊-助動詞語幹
 #
 #  noun-suffix: noun suffixes where the sub-classification is undefined.
 #名詞-接尾
 #
 #  noun-suffix-misc: Of the nouns or stem forms of other parts of speech that connect 
 #  to ガル or タイ and can combine into compound nouns, words that cannot be classified into
 #  any of the other categories below. In general, this category is more inclusive than 
 #  接尾語 ("suffix") and is usually the last element in a compound noun.
 #  e.g. おき, かた, 方, 甲斐 (がい), がかり, ぎみ, 気味, ぐるみ, (～した) さ, 次第, 済 (ず) み,
 #       よう, (でき)っこ, 感, 観, 性, 学, 類, 面, 用
 #名詞-接尾-一般
 #
 #  noun-suffix-person: Suffixes that form nouns and attach to person names more often
 #  than other nouns.
 #  e.g. 君, 様, 著
 #名詞-接尾-人名
 #
 #  noun-suffix-place: Suffixes that form nouns and attach to place names more often 
 #  than other nouns.
 #  e.g. 町, 市, 県
 #名詞-接尾-地域
 #
 #  noun-suffix-verbal: Of the suffixes that attach to nouns and form nouns, those that 
 #  can appear before スル ("suru").
 #  e.g. 化, 視, 分け, 入り, 落ち, 買い
 #名詞-接尾-サ変接続
 #
 #  noun-suffix-aux: The stem form of そうだ (様態) that is used to indicate conditions, 
 #  is treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the 
 #  conjunctive form of inflectional words.
 #  e.g. そう
 #名詞-接尾-助動詞語幹
 #
 #  noun-suffix-adjective-base: Suffixes that attach to other nouns or the conjunctive 
 #  form of inflectional words and appear before the copula だ ("da").
 #  e.g. 的, げ, がち
 #名詞-接尾-形容動詞語幹
 #
 #  noun-suffix-adverbial: Suffixes that attach to other nouns and can behave as adverbs.
 #  e.g. 後 (ご), 以後, 以降, 以前, 前後, 中, 末, 上, 時 (じ)
 #名詞-接尾-副詞可能
 #
 #  noun-suffix-classifier: Suffixes that attach to numbers and form nouns. This category 
 #  is more inclusive than 助数詞 ("classifier") and includes common nouns that attach 
 #  to numbers.
 #  e.g. 個, つ, 本, 冊, パーセント, cm, kg, カ月, か国, 区画, 時間, 時半
 #名詞-接尾-助数詞
 #
 #  noun-suffix-special: Special suffixes that mainly attach to inflecting words.
 #  e.g. (楽し) さ, (考え) 方
 #名詞-接尾-特殊
 #
 #  noun-suffix-conjunctive: Nouns that behave like conjunctions and join two words 
 #  together.
 #  e.g. (日本) 対 (アメリカ), 対 (アメリカ), (3) 対 (5), (女優) 兼 (主婦)
 #名詞-接続詞的
 #
 #  noun-verbal_aux: Nouns that attach to the conjunctive particle て ("te") and are 
 #  semantically verb-like.
 #  e.g. ごらん, ご覧, 御覧, 頂戴
 #名詞-動詞非自立的
 #
 #  noun-quotation: text that cannot be segmented into words, proverbs, Chinese poetry, 
 #  dialects, English, etc. Currently, the only entry for 名詞 引用文字列 ("noun quotation") 
 #  is いわく ("iwaku").
 #名詞-引用文字列
 #
 #  noun-nai_adjective: Words that appear before the auxiliary verb ない ("nai") and
 #  behave like an adjective.
 #  e.g. 申し訳, 仕方, とんでも, 違い
 #名詞-ナイ形容詞語幹
 #
 #####
 #  prefix: unclassified prefixes
 #接頭詞
 #
 #  prefix-nominal: Prefixes that attach to nouns (including adjective stem forms) 
 #  excluding numerical expressions.
 #  e.g. お (水), 某 (氏), 同 (社), 故 (～氏), 高 (品質), お (見事), ご (立派)
 #接頭詞-名詞接続
 #
 #  prefix-verbal: Prefixes that attach to the imperative form of a verb or a verb
 #  in conjunctive form followed by なる/なさる/くださる.
 #  e.g. お (読みなさい), お (座り)
 #接頭詞-動詞接続
 #
 #  prefix-adjectival: Prefixes that attach to adjectives.
 #  e.g. お (寒いですねえ), バカ (でかい)
 #接頭詞-形容詞接続
 #
 #  prefix-numerical: Prefixes that attach to numerical expressions.
 #  e.g. 約, およそ, 毎時
 #接頭詞-数接続
 #
 #####
 #  verb: unclassified verbs
 #動詞
 #
 #  verb-main:
 #動詞-自立
 #
 #  verb-auxiliary:
 #動詞-非自立
 #
 #  verb-suffix:
 #動詞-接尾
 #
 #####
 #  adjective: unclassified adjectives
 #形容詞
 #
 #  adjective-main:
 #形容詞-自立
 #
 #  adjective-auxiliary:
 #形容詞-非自立
 #
 #  adjective-suffix:
 #形容詞-接尾
 #
 #####
 #  adverb: unclassified adverbs
 #副詞
 #
 #  adverb-misc: Words that can be segmented into one unit and where adnominal 
 #  modification is not possible.
 #  e.g. あいかわらず, 多分
 #副詞-一般
 #
 #  adverb-particle_conjunction: Adverbs that can be followed by の, は, に, 
 #  な, する, だ, etc.
 #  e.g. こんなに, そんなに, あんなに, なにか, なんでも
 #副詞-助詞類接続
 #
 #####
 #  adnominal: Words that only have noun-modifying forms.
 #  e.g. この, その, あの, どの, いわゆる, なんらかの, 何らかの, いろんな, こういう, そういう, ああいう, 
 #       どういう, こんな, そんな, あんな, どんな, 大きな, 小さな, おかしな, ほんの, たいした, 
 #       「(, も) さる (ことながら)」, 微々たる, 堂々たる, 単なる, いかなる, 我が」「同じ, 亡き
 #連体詞
 #
 #####
 #  conjunction: Conjunctions that can occur independently.
 #  e.g. が, けれども, そして, じゃあ, それどころか
 接続詞
 #
 #####
 #  particle: unclassified particles.
 助詞
 #
 #  particle-case: case particles where the subclassification is undefined.
 助詞-格助詞
 #
 #  particle-case-misc: Case particles.
 #  e.g. から, が, で, と, に, へ, より, を, の, にて
 助詞-格助詞-一般
 #
 #  particle-case-quote: the "to" that appears after nouns, a person’s speech, 
 #  quotation marks, expressions of decisions from a meeting, reasons, judgements,
 #  conjectures, etc.
 #  e.g. ( だ) と (述べた.), ( である) と (して執行猶予...)
 助詞-格助詞-引用
 #
 #  particle-case-compound: Compounds of particles and verbs that mainly behave 
 #  like case particles.
 #  e.g. という, といった, とかいう, として, とともに, と共に, でもって, にあたって, に当たって, に当って,
 #       にあたり, に当たり, に当り, に当たる, にあたる, において, に於いて,に於て, における, に於ける, 
 #       にかけ, にかけて, にかんし, に関し, にかんして, に関して, にかんする, に関する, に際し, 
 #       に際して, にしたがい, に従い, に従う, にしたがって, に従って, にたいし, に対し, にたいして, 
 #       に対して, にたいする, に対する, について, につき, につけ, につけて, につれ, につれて, にとって,
 #       にとり, にまつわる, によって, に依って, に因って, により, に依り, に因り, による, に依る, に因る, 
 #       にわたって, にわたる, をもって, を以って, を通じ, を通じて, を通して, をめぐって, をめぐり, をめぐる,
 #       って-口語/, ちゅう-関西弁「という」/, (何) ていう (人)-口語/, っていう-口語/, といふ, とかいふ
 助詞-格助詞-連語
 #
 #  particle-conjunctive:
 #  e.g. から, からには, が, けれど, けれども, けど, し, つつ, て, で, と, ところが, どころか, とも, ども, 
 #       ながら, なり, ので, のに, ば, ものの, や ( した), やいなや, (ころん) じゃ(いけない)-口語/, 
 #       (行っ) ちゃ(いけない)-口語/, (言っ) たって (しかたがない)-口語/, (それがなく)ったって (平気)-口語/
 助詞-接続助詞
 #
 #  particle-dependency:
 #  e.g. こそ, さえ, しか, すら, は, も, ぞ
 助詞-係助詞
 #
 #  particle-adverbial:
 #  e.g. がてら, かも, くらい, 位, ぐらい, しも, (学校) じゃ(これが流行っている)-口語/, 
 #       (それ)じゃあ (よくない)-口語/, ずつ, (私) なぞ, など, (私) なり (に), (先生) なんか (大嫌い)-口語/,
 #       (私) なんぞ, (先生) なんて (大嫌い)-口語/, のみ, だけ, (私) だって-口語/, だに, 
 #       (彼)ったら-口語/, (お茶) でも (いかが), 等 (とう), (今後) とも, ばかり, ばっか-口語/, ばっかり-口語/,
 #       ほど, 程, まで, 迄, (誰) も (が)([助詞-格助詞] および [助詞-係助詞] の前に位置する「も」)
 助詞-副助詞
 #
 #  particle-interjective: particles with interjective grammatical roles.
 #  e.g. (松島) や
 助詞-間投助詞
 #
 #  particle-coordinate:
 #  e.g. と, たり, だの, だり, とか, なり, や, やら
 助詞-並立助詞
 #
 #  particle-final:
 #  e.g. かい, かしら, さ, ぜ, (だ)っけ-口語/, (とまってる) で-方言/, な, ナ, なあ-口語/, ぞ, ね, ネ, 
 #       ねぇ-口語/, ねえ-口語/, ねん-方言/, の, のう-口語/, や, よ, ヨ, よぉ-口語/, わ, わい-口語/
 助詞-終助詞
 #
 #  particle-adverbial/conjunctive/final: The particle "ka" when unknown whether it is 
 #  adverbial, conjunctive, or sentence final. For example:
 #       (a) 「A か B か」. Ex:「(国内で運用する) か,(海外で運用する) か (.)」
 #       (b) Inside an adverb phrase. Ex:「(幸いという) か (, 死者はいなかった.)」
 #           「(祈りが届いたせい) か (, 試験に合格した.)」
 #       (c) 「かのように」. Ex:「(何もなかった) か (のように振る舞った.)」
 #  e.g. か
 助詞-副助詞／並立助詞／終助詞
 #
 #  particle-adnominalizer: The "no" that attaches to nouns and modifies 
 #  non-inflectional words.
 助詞-連体化
 #
 #  particle-adnominalizer: The "ni" and "to" that appear following nouns and adverbs 
 #  that are giongo, giseigo, or gitaigo.
 #  e.g. に, と
 助詞-副詞化
 #
 #  particle-special: A particle that does not fit into one of the above classifications. 
 #  This includes particles that are used in Tanka, Haiku, and other poetry.
 #  e.g. かな, けむ, ( しただろう) に, (あんた) にゃ(わからん), (俺) ん (家)
 助詞-特殊
 #
 #####
 #  auxiliary-verb:
 助動詞
 #
 #####
 #  interjection: Greetings and other exclamations.
 #  e.g. おはよう, おはようございます, こんにちは, こんばんは, ありがとう, どうもありがとう, ありがとうございます, 
 #       いただきます, ごちそうさま, さよなら, さようなら, はい, いいえ, ごめん, ごめんなさい
 #感動詞
 #
 #####
 #  symbol: unclassified Symbols.
 記号
 #
 #  symbol-misc: A general symbol not in one of the categories below.
 #  e.g. [○◎@$〒→+]
 記号-一般
 #
 #  symbol-comma: Commas
 #  e.g. [,、]
 記号-読点
 #
 #  symbol-period: Periods and full stops.
 #  e.g. [.．。]
 記号-句点
 #
 #  symbol-space: Full-width whitespace.
 記号-空白
 #
 #  symbol-open_bracket:
 #  e.g. [({‘“『【]
 記号-括弧開
 #
 #  symbol-close_bracket:
 #  e.g. [)}’”』」】]
 記号-括弧閉
 #
 #  symbol-alphabetic:
 #記号-アルファベット
 #
 #####
 #  other: unclassified other
 #その他
 #
 #  other-interjection: Words that are hard to classify as noun-suffixes or 
 #  sentence-final particles.
 #  e.g. (だ)ァ
 その他-間投
 #
 #####
 #  filler: Aizuchi that occurs during a conversation or sounds inserted as filler.
 #  e.g. あの, うんと, えと
 フィラー
 #
 #####
 #  non-verbal: non-verbal sound.
 非言語音
 #
 #####
 #  fragment:
 #語断片
 #
 #####
 #  unknown: unknown part of speech.
 #未知語
 #
 ##### End of file
--- a/solr_config/lang/stopwords_ar.txt
+++ b/solr_config/lang/stopwords_ar.txt
@@ -0,0 +1,125 @@
 # This file was created by Jacques Savoy and is distributed under the BSD license.
 # See http://members.unine.ch/jacques.savoy/clef/index.html.
 # Also see http://www.opensource.org/licenses/bsd-license.html
 # Cleaned on October 11, 2009 (not normalized, so use before normalization)
 # This means that when modifying this list, you might need to add some 
 # redundant entries, for example containing forms with both أ and ا
 من
 ومن
 منها
 منه
 في
 وفي
 فيها
 فيه
 و
 ف
 ثم
 او
 أو
 ب
 بها
 به
 ا
 أ
 اى
 اي
 أي
 أى
 لا
 ولا
 الا
 ألا
 إلا
 لكن
 ما
 وما
 كما
 فما
 عن
 مع
 اذا
 إذا
 ان
 أن
 إن
 انها
 أنها
 إنها
 انه
 أنه
 إنه
 بان
 بأن
 فان
 فأن
 وان
 وأن
 وإن
 التى
 التي
 الذى
 الذي
 الذين
 الى
 الي
 إلى
 إلي
 على
 عليها
 عليه
 اما
 أما
 إما
 ايضا
 أيضا
 كل
 وكل
 لم
 ولم
 لن
 ولن
 هى
 هي
 هو
 وهى
 وهي
 وهو
 فهى
 فهي
 فهو
 انت
 أنت
 لك
 لها
 له
 هذه
 هذا
 تلك
 ذلك
 هناك
 كانت
 كان
 يكون
 تكون
 وكانت
 وكان
 غير
 بعض
 قد
 نحو
 بين
 بينما
 منذ
 ضمن
 حيث
 الان
 الآن
 خلال
 بعد
 قبل
 حتى
 عند
 عندما
 لدى
 جميع
--- a/solr_config/lang/stopwords_bg.txt
+++ b/solr_config/lang/stopwords_bg.txt
@@ -0,0 +1,193 @@
 # This file was created by Jacques Savoy and is distributed under the BSD license.
 # See http://members.unine.ch/jacques.savoy/clef/index.html.
 # Also see http://www.opensource.org/licenses/bsd-license.html
 а
 аз
 ако
 ала
 бе
 без
 беше
 би
 бил
 била
 били
 било
 близо
 бъдат
 бъде
 бяха
 в
 вас
 ваш
 ваша
 вероятно
 вече
 взема
 ви
 вие
 винаги
 все
 всеки
 всички
 всичко
 всяка
 във
 въпреки
 върху
 г
 ги
 главно
 го
 д
 да
 дали
 до
 докато
 докога
 дори
 досега
 доста
 е
 едва
 един
 ето
 за
 зад
 заедно
 заради
 засега
 затова
 защо
 защото
 и
 из
 или
 им
 има
 имат
 иска
 й
 каза
 как
 каква
 какво
 както
 какъв
 като
 кога
 когато
 което
 които
 кой
 който
 колко
 която
 къде
 където
 към
 ли
 м
 ме
 между
 мен
 ми
 мнозина
 мога
 могат
 може
 моля
 момента
 му
 н
 на
 над
 назад
 най
 направи
 напред
 например
 нас
 не
 него
 нея
 ни
 ние
 никой
 нито
 но
 някои
 някой
 няма
 обаче
 около
 освен
 особено
 от
 отгоре
 отново
 още
 пак
 по
 повече
 повечето
 под
 поне
 поради
 после
 почти
 прави
 пред
 преди
 през
 при
 пък
 първо
 с
 са
 само
 се
 сега
 си
 скоро
 след
 сме
 според
 сред
 срещу
 сте
 съм
 със
 също
 т
 тази
 така
 такива
 такъв
 там
 твой
 те
 тези
 ти
 тн
 то
 това
 тогава
 този
 той
 толкова
 точно
 трябва
 тук
 тъй
 тя
 тях
 у
 харесва
 ч
 че
 често
 чрез
 ще
 щом
 я
--- a/solr_config/lang/stopwords_ca.txt
+++ b/solr_config/lang/stopwords_ca.txt
@@ -0,0 +1,220 @@
 # Catalan stopwords from http://github.com/vcl/cue.language (Apache 2 Licensed)
 a
 abans
 ací
 ah
 així
 això
 al
 als
 aleshores
 algun
 alguna
 algunes
 alguns
 alhora
 allà
 allí
 allò
 altra
 altre
 altres
 amb
 ambdós
 ambdues
 apa
 aquell
 aquella
 aquelles
 aquells
 aquest
 aquesta
 aquestes
 aquests
 aquí
 baix
 cada
 cadascú
 cadascuna
 cadascunes
 cadascuns
 com
 contra
 d'un
 d'una
 d'unes
 d'uns
 dalt
 de
 del
 dels
 des
 després
 dins
 dintre
 donat
 doncs
 durant
 e
 eh
 el
 els
 em
 en
 encara
 ens
 entre
 érem
 eren
 éreu
 es
 és
 esta
 està
 estàvem
 estaven
 estàveu
 esteu
 et
 etc
 ets
 fins
 fora
 gairebé
 ha
 han
 has
 havia
 he
 hem
 heu
 hi 
 ho
 i
 igual
 iguals
 ja
 l'hi
 la
 les
 li
 li'n
 llavors
 m'he
 ma
 mal
 malgrat
 mateix
 mateixa
 mateixes
 mateixos
 me
 mentre
 més
 meu
 meus
 meva
 meves
 molt
 molta
 moltes
 molts
 mon
 mons
 n'he
 n'hi
 ne
 ni
 no
 nogensmenys
 només
 nosaltres
 nostra
 nostre
 nostres
 o
 oh
 oi
 on
 pas
 pel
 pels
 per
 però
 perquè
 poc 
 poca
 pocs
 poques
 potser
 propi
 qual
 quals
 quan
 quant 
 que
 què
 quelcom
 qui
 quin
 quina
 quines
 quins
 s'ha
 s'han
 sa
 semblant
 semblants
 ses
 seu 
 seus
 seva
 seva
 seves
 si
 sobre
 sobretot
 sóc
 solament
 sols
 son 
 són
 sons 
 sota
 sou
 t'ha
 t'han
 t'he
 ta
 tal
 també
 tampoc
 tan
 tant
 tanta
 tantes
 teu
 teus
 teva
 teves
 ton
 tons
 tot
 tota
 totes
 tots
 un
 una
 unes
 uns
 us
 va
 vaig
 vam
 van
 vas
 veu
 vosaltres
 vostra
 vostre
 vostres
--- a/solr_config/lang/stopwords_cz.txt
+++ b/solr_config/lang/stopwords_cz.txt
@@ -0,0 +1,172 @@
 a
 s
 k
 o
 i
 u
 v
 z
 dnes
 cz
 tímto
 budeš
 budem
 byli
 jseš
 můj
 svým
 ta
 tomto
 tohle
 tuto
 tyto
 jej
 zda
 proč
 máte
 tato
 kam
 tohoto
 kdo
 kteří
 mi
 nám
 tom
 tomuto
 mít
 nic
 proto
 kterou
 byla
 toho
 protože
 asi
 ho
 naši
 napište
 re
 což
 tím
 takže
 svých
 její
 svými
 jste
 aj
 tu
 tedy
 teto
 bylo
 kde
 ke
 pravé
 ji
 nad
 nejsou
 či
 pod
 téma
 mezi
 přes
 ty
 pak
 vám
 ani
 když
 však
 neg
 jsem
 tento
 článku
 články
 aby
 jsme
 před
 pta
 jejich
 byl
 ještě
 až
 bez
 také
 pouze
 první
 vaše
 která
 nás
 nový
 tipy
 pokud
 může
 strana
 jeho
 své
 jiné
 zprávy
 nové
 není
 vás
 jen
 podle
 zde
 už
 být
 více
 bude
 již
 než
 který
 by
 které
 co
 nebo
 ten
 tak
 má
 při
 od
 po
 jsou
 jak
 další
 ale
 si
 se
 ve
 to
 jako
 za
 zpět
 ze
 do
 pro
 je
 na
 atd
 atp
 jakmile
 přičemž
 já
 on
 ona
 ono
 oni
 ony
 my
 vy
 jí
 ji
 mě
 mne
 jemu
 tomu
 těm
 těmu
 němu
 němuž
 jehož
 jíž
 jelikož
 jež
 jakož
 načež
--- a/solr_config/lang/stopwords_da.txt
+++ b/solr_config/lang/stopwords_da.txt
@@ -0,0 +1,110 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/danish/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A Danish stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.

 | This is a ranked list (commonest to rarest) of stopwords derived from
 | a large text sample.


 og           | and
 i            | in
 jeg          | I
 det          | that (dem. pronoun)/it (pers. pronoun)
 at           | that (in front of a sentence)/to (with infinitive)
 en           | a/an
 den          | it (pers. pronoun)/that (dem. pronoun)
 til          | to/at/for/until/against/by/of/into, more
 er           | present tense of "to be"
 som          | who, as
 på           | on/upon/in/on/at/to/after/of/with/for, on
 de           | they
 med          | with/by/in, along
 han          | he
 af           | of/by/from/off/for/in/with/on, off
 for          | at/for/to/from/by/of/ago, in front/before, because
 ikke         | not
 der          | who/which, there/those
 var          | past tense of "to be"
 mig          | me/myself
 sig          | oneself/himself/herself/itself/themselves
 men          | but
 et           | a/an/one, one (number), someone/somebody/one
 har          | present tense of "to have"
 om           | round/about/for/in/a, about/around/down, if
 vi           | we
 min          | my
 havde        | past tense of "to have"
 ham          | him
 hun          | she
 nu           | now
 over         | over/above/across/by/beyond/past/on/about, over/past
 da           | then, when/as/since
 fra          | from/off/since, off, since
 du           | you
 ud           | out
 sin          | his/her/its/one's
 dem          | them
 os           | us/ourselves
 op           | up
 man          | you/one
 hans         | his
 hvor         | where
 eller        | or
 hvad         | what
 skal         | must/shall etc.
 selv         | myself/youself/herself/ourselves etc., even
 her          | here
 alle         | all/everyone/everybody etc.
 vil          | will (verb)
 blev         | past tense of "to stay/to remain/to get/to become"
 kunne        | could
 ind          | in
 når          | when
 være         | present tense of "to be"
 dog          | however/yet/after all
 noget        | something
 ville        | would
 jo           | you know/you see (adv), yes
 deres        | their/theirs
 efter        | after/behind/according to/for/by/from, later/afterwards
 ned          | down
 skulle       | should
 denne        | this
 end          | than
 dette        | this
 mit          | my/mine
 også         | also
 under        | under/beneath/below/during, below/underneath
 have         | have
 dig          | you
 anden        | other
 hende        | her
 mine         | my
 alt          | everything
 meget        | much/very, plenty of
 sit          | his, her, its, one's
 sine         | his, her, its, one's
 vor          | our
 mod          | against
 disse        | these
 hvis         | if
 din          | your/yours
 nogle        | some
 hos          | by/at
 blive        | be/become
 mange        | many
 ad           | by/through
 bliver       | present tense of "to be/to become"
 hendes       | her/hers
 været        | be
 thi          | for (conj)
 jer          | you
 sådan        | such, like this/like that
--- a/solr_config/lang/stopwords_de.txt
+++ b/solr_config/lang/stopwords_de.txt
@@ -0,0 +1,294 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A German stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.

 | The number of forms in this list is reduced significantly by passing it
 | through the German stemmer.


 aber           |  but

 alle           |  all
 allem
 allen
 aller
 alles

 als            |  than, as
 also           |  so
 am             |  an + dem
 an             |  at

 ander          |  other
 andere
 anderem
 anderen
 anderer
 anderes
 anderm
 andern
 anderr
 anders

 auch           |  also
 auf            |  on
 aus            |  out of
 bei            |  by
 bin            |  am
 bis            |  until
 bist           |  art
 da             |  there
 damit          |  with it
 dann           |  then

 der            |  the
 den
 des
 dem
 die
 das

 daß            |  that

 derselbe       |  the same
 derselben
 denselben
 desselben
 demselben
 dieselbe
 dieselben
 dasselbe

 dazu           |  to that

 dein           |  thy
 deine
 deinem
 deinen
 deiner
 deines

 denn           |  because

 derer          |  of those
 dessen         |  of him

 dich           |  thee
 dir            |  to thee
 du             |  thou

 dies           |  this
 diese
 diesem
 diesen
 dieser
 dieses


 doch           |  (several meanings)
 dort           |  (over) there


 durch          |  through

 ein            |  a
 eine
 einem
 einen
 einer
 eines

 einig          |  some
 einige
 einigem
 einigen
 einiger
 einiges

 einmal         |  once

 er             |  he
 ihn            |  him
 ihm            |  to him

 es             |  it
 etwas          |  something

 euer           |  your
 eure
 eurem
 euren
 eurer
 eures

 für            |  for
 gegen          |  towards
 gewesen        |  p.p. of sein
 hab            |  have
 habe           |  have
 haben          |  have
 hat            |  has
 hatte          |  had
 hatten         |  had
 hier           |  here
 hin            |  there
 hinter         |  behind

 ich            |  I
 mich           |  me
 mir            |  to me


 ihr            |  you, to her
 ihre
 ihrem
 ihren
 ihrer
 ihres
 euch           |  to you

 im             |  in + dem
 in             |  in
 indem          |  while
 ins            |  in + das
 ist            |  is

 jede           |  each, every
 jedem
 jeden
 jeder
 jedes

 jene           |  that
 jenem
 jenen
 jener
 jenes

 jetzt          |  now
 kann           |  can

 kein           |  no
 keine
 keinem
 keinen
 keiner
 keines

 können         |  can
 könnte         |  could
 machen         |  do
 man            |  one

 manche         |  some, many a
 manchem
 manchen
 mancher
 manches

 mein           |  my
 meine
 meinem
 meinen
 meiner
 meines

 mit            |  with
 muss           |  must
 musste         |  had to
 nach           |  to(wards)
 nicht          |  not
 nichts         |  nothing
 noch           |  still, yet
 nun            |  now
 nur            |  only
 ob             |  whether
 oder           |  or
 ohne           |  without
 sehr           |  very

 sein           |  his
 seine
 seinem
 seinen
 seiner
 seines

 selbst         |  self
 sich           |  herself

 sie            |  they, she
 ihnen          |  to them

 sind           |  are
 so             |  so

 solche         |  such
 solchem
 solchen
 solcher
 solches

 soll           |  shall
 sollte         |  should
 sondern        |  but
 sonst          |  else
 über           |  over
 um             |  about, around
 und            |  and

 uns            |  us
 unse
 unsem
 unsen
 unser
 unses

 unter          |  under
 viel           |  much
 vom            |  von + dem
 von            |  from
 vor            |  before
 während        |  while
 war            |  was
 waren          |  were
 warst          |  wast
 was            |  what
 weg            |  away, off
 weil           |  because
 weiter         |  further

 welche         |  which
 welchem
 welchen
 welcher
 welches

 wenn           |  when
 werde          |  will
 werden         |  will
 wie            |  how
 wieder         |  again
 will           |  want
 wir            |  we
 wird           |  will
 wirst          |  willst
 wo             |  where
 wollen         |  want
 wollte         |  wanted
 würde          |  would
 würden         |  would
 zu             |  to
 zum            |  zu + dem
 zur            |  zu + der
 zwar           |  indeed
 zwischen       |  between

--- a/solr_config/lang/stopwords_el.txt
+++ b/solr_config/lang/stopwords_el.txt
@@ -0,0 +1,78 @@
 # Lucene Greek Stopwords list
 # Note: by default this file is used after GreekLowerCaseFilter,
 # so when modifying this file use 'σ' instead of 'ς' 
 ο
 η
 το
 οι
 τα
 του
 τησ
 των
 τον
 την
 και 
 κι
 κ
 ειμαι
 εισαι
 ειναι
 ειμαστε
 ειστε
 στο
 στον
 στη
 στην
 μα
 αλλα
 απο
 για
 προσ
 με
 σε
 ωσ
 παρα
 αντι
 κατα
 μετα
 θα
 να
 δε
 δεν
 μη
 μην
 επι
 ενω
 εαν
 αν
 τοτε
 που
 πωσ
 ποιοσ
 ποια
 ποιο
 ποιοι
 ποιεσ
 ποιων
 ποιουσ
 αυτοσ
 αυτη
 αυτο
 αυτοι
 αυτων
 αυτουσ
 αυτεσ
 αυτα
 εκεινοσ
 εκεινη
 εκεινο
 εκεινοι
 εκεινεσ
 εκεινα
 εκεινων
 εκεινουσ
 οπωσ
 ομωσ
 ισωσ
 οσο
 οτι
--- a/solr_config/lang/stopwords_en.txt
+++ b/solr_config/lang/stopwords_en.txt
@@ -0,0 +1,54 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.

 # a couple of test stopwords to test that the words are really being
 # configured from this file:
 stopworda
 stopwordb

 # Standard english stop words taken from Lucene's StopAnalyzer
 a
 an
 and
 are
 as
 at
 be
 but
 by
 for
 if
 in
 into
 is
 it
 no
 not
 of
 on
 or
 such
 that
 the
 their
 then
 there
 these
 they
 this
 to
 was
 will
 with
--- a/solr_config/lang/stopwords_es.txt
+++ b/solr_config/lang/stopwords_es.txt
@@ -0,0 +1,356 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A Spanish stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.


 | The following is a ranked list (commonest to rarest) of stopwords
 | deriving from a large sample of text.

 | Extra words have been added at the end.

 de             |  from, of
 la             |  the, her
 que            |  who, that
 el             |  the
 en             |  in
 y              |  and
 a              |  to
 los            |  the, them
 del            |  de + el
 se             |  himself, from him etc
 las            |  the, them
 por            |  for, by, etc
 un             |  a
 para           |  for
 con            |  with
 no             |  no
 una            |  a
 su             |  his, her
 al             |  a + el
  | es         from SER
 lo             |  him
 como           |  how
 más            |  more
 pero           |  pero
 sus            |  su plural
 le             |  to him, her
 ya             |  already
 o              |  or
  | fue        from SER
 este           |  this
  | ha         from HABER
 sí             |  himself etc
 porque         |  because
 esta           |  this
  | son        from SER
 entre          |  between
  | está     from ESTAR
 cuando         |  when
 muy            |  very
 sin            |  without
 sobre          |  on
  | ser        from SER
  | tiene      from TENER
 también        |  also
 me             |  me
 hasta          |  until
 hay            |  there is/are
 donde          |  where
  | han        from HABER
 quien          |  whom, that
  | están      from ESTAR
  | estado     from ESTAR
 desde          |  from
 todo           |  all
 nos            |  us
 durante        |  during
  | estados    from ESTAR
 todos          |  all
 uno            |  a
 les            |  to them
 ni             |  nor
 contra         |  against
 otros          |  other
  | fueron     from SER
 ese            |  that
 eso            |  that
  | había      from HABER
 ante           |  before
 ellos          |  they
 e              |  and (variant of y)
 esto           |  this
 mí             |  me
 antes          |  before
 algunos        |  some
 qué            |  what?
 unos           |  a
 yo             |  I
 otro           |  other
 otras          |  other
 otra           |  other
 él             |  he
 tanto          |  so much, many
 esa            |  that
 estos          |  these
 mucho          |  much, many
 quienes        |  who
 nada           |  nothing
 muchos         |  many
 cual           |  who
  | sea        from SER
 poco           |  few
 ella           |  she
 estar          |  to be
  | haber      from HABER
 estas          |  these
  | estaba     from ESTAR
  | estamos    from ESTAR
 algunas        |  some
 algo           |  something
 nosotros       |  we

      | other forms

 mi             |  me
 mis            |  mi plural
 tú             |  thou
 te             |  thee
 ti             |  thee
 tu             |  thy
 tus            |  tu plural
 ellas          |  they
 nosotras       |  we
 vosotros       |  you
 vosotras       |  you
 os             |  you
 mío            |  mine
 mía            |
 míos           |
 mías           |
 tuyo           |  thine
 tuya           |
 tuyos          |
 tuyas          |
 suyo           |  his, hers, theirs
 suya           |
 suyos          |
 suyas          |
 nuestro        |  ours
 nuestra        |
 nuestros       |
 nuestras       |
 vuestro        |  yours
 vuestra        |
 vuestros       |
 vuestras       |
 esos           |  those
 esas           |  those

               | forms of estar, to be (not including the infinitive):
 estoy
 estás
 está
 estamos
 estáis
 están
 esté
 estés
 estemos
 estéis
 estén
 estaré
 estarás
 estará
 estaremos
 estaréis
 estarán
 estaría
 estarías
 estaríamos
 estaríais
 estarían
 estaba
 estabas
 estábamos
 estabais
 estaban
 estuve
 estuviste
 estuvo
 estuvimos
 estuvisteis
 estuvieron
 estuviera
 estuvieras
 estuviéramos
 estuvierais
 estuvieran
 estuviese
 estuvieses
 estuviésemos
 estuvieseis
 estuviesen
 estando
 estado
 estada
 estados
 estadas
 estad

               | forms of haber, to have (not including the infinitive):
 he
 has
 ha
 hemos
 habéis
 han
 haya
 hayas
 hayamos
 hayáis
 hayan
 habré
 habrás
 habrá
 habremos
 habréis
 habrán
 habría
 habrías
 habríamos
 habríais
 habrían
 había
 habías
 habíamos
 habíais
 habían
 hube
 hubiste
 hubo
 hubimos
 hubisteis
 hubieron
 hubiera
 hubieras
 hubiéramos
 hubierais
 hubieran
 hubiese
 hubieses
 hubiésemos
 hubieseis
 hubiesen
 habiendo
 habido
 habida
 habidos
 habidas

               | forms of ser, to be (not including the infinitive):
 soy
 eres
 es
 somos
 sois
 son
 sea
 seas
 seamos
 seáis
 sean
 seré
 serás
 será
 seremos
 seréis
 serán
 sería
 serías
 seríamos
 seríais
 serían
 era
 eras
 éramos
 erais
 eran
 fui
 fuiste
 fue
 fuimos
 fuisteis
 fueron
 fuera
 fueras
 fuéramos
 fuerais
 fueran
 fuese
 fueses
 fuésemos
 fueseis
 fuesen
 siendo
 sido
  |  sed also means 'thirst'

               | forms of tener, to have (not including the infinitive):
 tengo
 tienes
 tiene
 tenemos
 tenéis
 tienen
 tenga
 tengas
 tengamos
 tengáis
 tengan
 tendré
 tendrás
 tendrá
 tendremos
 tendréis
 tendrán
 tendría
 tendrías
 tendríamos
 tendríais
 tendrían
 tenía
 tenías
 teníamos
 teníais
 tenían
 tuve
 tuviste
 tuvo
 tuvimos
 tuvisteis
 tuvieron
 tuviera
 tuvieras
 tuviéramos
 tuvierais
 tuvieran
 tuviese
 tuvieses
 tuviésemos
 tuvieseis
 tuviesen
 teniendo
 tenido
 tenida
 tenidos
 tenidas
 tened

--- a/solr_config/lang/stopwords_eu.txt
+++ b/solr_config/lang/stopwords_eu.txt
@@ -0,0 +1,99 @@
 # example set of basque stopwords
 al
 anitz
 arabera
 asko
 baina
 bat
 batean
 batek
 bati
 batzuei
 batzuek
 batzuetan
 batzuk
 bera
 beraiek
 berau
 berauek
 bere
 berori
 beroriek
 beste
 bezala
 da
 dago
 dira
 ditu
 du
 dute
 edo
 egin
 ere
 eta
 eurak
 ez
 gainera
 gu
 gutxi
 guzti
 haiei
 haiek
 haietan
 hainbeste
 hala
 han
 handik
 hango
 hara
 hari
 hark
 hartan
 hau
 hauei
 hauek
 hauetan
 hemen
 hemendik
 hemengo
 hi
 hona
 honek
 honela
 honetan
 honi
 hor
 hori
 horiei
 horiek
 horietan
 horko
 horra
 horrek
 horrela
 horretan
 horri
 hortik
 hura
 izan
 ni
 noiz
 nola
 non
 nondik
 nongo
 nor
 nora
 ze
 zein
 zen
 zenbait
 zenbat
 zer
 zergatik
 ziren
 zituen
 zu
 zuek
 zuen
 zuten
--- a/solr_config/lang/stopwords_fa.txt
+++ b/solr_config/lang/stopwords_fa.txt
@@ -0,0 +1,313 @@
 # This file was created by Jacques Savoy and is distributed under the BSD license.
 # See http://members.unine.ch/jacques.savoy/clef/index.html.
 # Also see http://www.opensource.org/licenses/bsd-license.html
 # Note: by default this file is used after normalization, so when adding entries
 # to this file, use the arabic 'ي' instead of 'ی'
 انان
 نداشته
 سراسر
 خياه
 ايشان
 وي
 تاكنون
 بيشتري
 دوم
 پس
 ناشي
 وگو
 يا
 داشتند
 سپس
 هنگام
 هرگز
 پنج
 نشان
 امسال
 ديگر
 گروهي
 شدند
 چطور
 ده
 و
 دو
 نخستين
 ولي
 چرا
 چه
 وسط
 ه
 كدام
 قابل
 يك
 رفت
 هفت
 همچنين
 در
 هزار
 بله
 بلي
 شايد
 اما
 شناسي
 گرفته
 دهد
 داشته
 دانست
 داشتن
 خواهيم
 ميليارد
 وقتيكه
 امد
 خواهد
 جز
 اورده
 شده
 بلكه
 خدمات
 شدن
 برخي
 نبود
 بسياري
 جلوگيري
 حق
 كردند
 نوعي
 بعري
 نكرده
 نظير
 نبايد
 بوده
 بودن
 داد
 اورد
 هست
 جايي
 شود
 دنبال
 داده
 بايد
 سابق
 هيچ
 همان
 انجا
 كمتر
 كجاست
 گردد
 كسي
 تر
 مردم
 تان
 دادن
 بودند
 سري
 جدا
 ندارند
 مگر
 يكديگر
 دارد
 دهند
 بنابراين
 هنگامي
 سمت
 جا
 انچه
 خود
 دادند
 زياد
 دارند
 اثر
 بدون
 بهترين
 بيشتر
 البته
 به
 براساس
 بيرون
 كرد
 بعضي
 گرفت
 توي
 اي
 ميليون
 او
 جريان
 تول
 بر
 مانند
 برابر
 باشيم
 مدتي
 گويند
 اكنون
 تا
 تنها
 جديد
 چند
 بي
 نشده
 كردن
 كردم
 گويد
 كرده
 كنيم
 نمي
 نزد
 روي
 قصد
 فقط
 بالاي
 ديگران
 اين
 ديروز
 توسط
 سوم
 ايم
 دانند
 سوي
 استفاده
 شما
 كنار
 داريم
 ساخته
 طور
 امده
 رفته
 نخست
 بيست
 نزديك
 طي
 كنيد
 از
 انها
 تمامي
 داشت
 يكي
 طريق
 اش
 چيست
 روب
 نمايد
 گفت
 چندين
 چيزي
 تواند
 ام
 ايا
 با
 ان
 ايد
 ترين
 اينكه
 ديگري
 راه
 هايي
 بروز
 همچنان
 پاعين
 كس
 حدود
 مختلف
 مقابل
 چيز
 گيرد
 ندارد
 ضد
 همچون
 سازي
 شان
 مورد
 باره
 مرسي
 خويش
 برخوردار
 چون
 خارج
 شش
 هنوز
 تحت
 ضمن
 هستيم
 گفته
 فكر
 بسيار
 پيش
 براي
 روزهاي
 انكه
 نخواهد
 بالا
 كل
 وقتي
 كي
 چنين
 كه
 گيري
 نيست
 است
 كجا
 كند
 نيز
 يابد
 بندي
 حتي
 توانند
 عقب
 خواست
 كنند
 بين
 تمام
 همه
 ما
 باشند
 مثل
 شد
 اري
 باشد
 اره
 طبق
 بعد
 اگر
 صورت
 غير
 جاي
 بيش
 ريزي
 اند
 زيرا
 چگونه
 بار
 لطفا
 مي
 درباره
 من
 ديده
 همين
 گذاري
 برداري
 علت
 گذاشته
 هم
 فوق
 نه
 ها
 شوند
 اباد
 همواره
 هر
 اول
 خواهند
 چهار
 نام
 امروز
 مان
 هاي
 قبل
 كنم
 سعي
 تازه
 را
 هستند
 زير
 جلوي
 عنوان
 بود
--- a/solr_config/lang/stopwords_fi.txt
+++ b/solr_config/lang/stopwords_fi.txt
@@ -0,0 +1,97 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/finnish/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
 
 | forms of BE

 olla
 olen
 olet
 on
 olemme
 olette
 ovat
 ole        | negative form

 oli
 olisi
 olisit
 olisin
 olisimme
 olisitte
 olisivat
 olit
 olin
 olimme
 olitte
 olivat
 ollut
 olleet

 en         | negation
 et
 ei
 emme
 ette
 eivät

 |Nom   Gen    Acc    Part   Iness   Elat    Illat  Adess   Ablat   Allat   Ess    Trans
 minä   minun  minut  minua  minussa minusta minuun minulla minulta minulle               | I
 sinä   sinun  sinut  sinua  sinussa sinusta sinuun sinulla sinulta sinulle               | you
 hän    hänen  hänet  häntä  hänessä hänestä häneen hänellä häneltä hänelle               | he she
 me     meidän meidät meitä  meissä  meistä  meihin meillä  meiltä  meille                | we
 te     teidän teidät teitä  teissä  teistä  teihin teillä  teiltä  teille                | you
 he     heidän heidät heitä  heissä  heistä  heihin heillä  heiltä  heille                | they

 tämä   tämän         tätä   tässä   tästä   tähän  tallä   tältä   tälle   tänä   täksi  | this
 tuo    tuon          tuotä  tuossa  tuosta  tuohon tuolla  tuolta  tuolle  tuona  tuoksi | that
 se     sen           sitä   siinä   siitä   siihen sillä   siltä   sille   sinä   siksi  | it
 nämä   näiden        näitä  näissä  näistä  näihin näillä  näiltä  näille  näinä  näiksi | these
 nuo    noiden        noita  noissa  noista  noihin noilla  noilta  noille  noina  noiksi | those
 ne     niiden        niitä  niissä  niistä  niihin niillä  niiltä  niille  niinä  niiksi | they

 kuka   kenen kenet   ketä   kenessä kenestä keneen kenellä keneltä kenelle kenenä keneksi| who
 ketkä  keiden ketkä  keitä  keissä  keistä  keihin keillä  keiltä  keille  keinä  keiksi | (pl)
 mikä   minkä minkä   mitä   missä   mistä   mihin  millä   miltä   mille   minä   miksi  | which what
 mitkä                                                                                    | (pl)

 joka   jonka         jota   jossa   josta   johon  jolla   jolta   jolle   jona   joksi  | who which
 jotka  joiden        joita  joissa  joista  joihin joilla  joilta  joille  joina  joiksi | (pl)

 | conjunctions

 että   | that
 ja     | and
 jos    | if
 koska  | because
 kuin   | than
 mutta  | but
 niin   | so
 sekä   | and
 sillä  | for
 tai    | or
 vaan   | but
 vai    | or
 vaikka | although


 | prepositions

 kanssa  | with
 mukaan  | according to
 noin    | about
 poikki  | across
 yli     | over, across

 | other

 kun    | when
 niin   | so
 nyt    | now
 itse   | self

--- a/solr_config/lang/stopwords_fr.txt
+++ b/solr_config/lang/stopwords_fr.txt
@@ -0,0 +1,186 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A French stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.

 au             |  a + le
 aux            |  a + les
 avec           |  with
 ce             |  this
 ces            |  these
 dans           |  with
 de             |  of
 des            |  de + les
 du             |  de + le
 elle           |  she
 en             |  `of them' etc
 et             |  and
 eux            |  them
 il             |  he
 je             |  I
 la             |  the
 le             |  the
 leur           |  their
 lui            |  him
 ma             |  my (fem)
 mais           |  but
 me             |  me
 même           |  same; as in moi-même (myself) etc
 mes            |  me (pl)
 moi            |  me
 mon            |  my (masc)
 ne             |  not
 nos            |  our (pl)
 notre          |  our
 nous           |  we
 on             |  one
 ou             |  where
 par            |  by
 pas            |  not
 pour           |  for
 qu             |  que before vowel
 que            |  that
 qui            |  who
 sa             |  his, her (fem)
 se             |  oneself
 ses            |  his (pl)
 son            |  his, her (masc)
 sur            |  on
 ta             |  thy (fem)
 te             |  thee
 tes            |  thy (pl)
 toi            |  thee
 ton            |  thy (masc)
 tu             |  thou
 un             |  a
 une            |  a
 vos            |  your (pl)
 votre          |  your
 vous           |  you

               |  single letter forms

 c              |  c'
 d              |  d'
 j              |  j'
 l              |  l'
 à              |  to, at
 m              |  m'
 n              |  n'
 s              |  s'
 t              |  t'
 y              |  there

               | forms of être (not including the infinitive):
 été
 étée
 étées
 étés
 étant
 suis
 es
 est
 sommes
 êtes
 sont
 serai
 seras
 sera
 serons
 serez
 seront
 serais
 serait
 serions
 seriez
 seraient
 étais
 était
 étions
 étiez
 étaient
 fus
 fut
 fûmes
 fûtes
 furent
 sois
 soit
 soyons
 soyez
 soient
 fusse
 fusses
 fût
 fussions
 fussiez
 fussent

               | forms of avoir (not including the infinitive):
 ayant
 eu
 eue
 eues
 eus
 ai
 as
 avons
 avez
 ont
 aurai
 auras
 aura
 aurons
 aurez
 auront
 aurais
 aurait
 aurions
 auriez
 auraient
 avais
 avait
 avions
 aviez
 avaient
 eut
 eûmes
 eûtes
 eurent
 aie
 aies
 ait
 ayons
 ayez
 aient
 eusse
 eusses
 eût
 eussions
 eussiez
 eussent

               | Later additions (from Jean-Christophe Deschamps)
 ceci           |  this
 cela           |  that
 celà           |  that
 cet            |  this
 cette          |  this
 ici            |  here
 ils            |  they
 les            |  the (pl)
 leurs          |  their (pl)
 quel           |  which
 quels          |  which
 quelle         |  which
 quelles        |  which
 sans           |  without
 soi            |  oneself

--- a/solr_config/lang/stopwords_ga.txt
+++ b/solr_config/lang/stopwords_ga.txt
@@ -0,0 +1,110 @@

 a
 ach
 ag
 agus
 an
 aon
 ar
 arna
 as
 b'
 ba
 beirt
 bhúr
 caoga
 ceathair
 ceathrar
 chomh
 chtó
 chuig
 chun
 cois
 céad
 cúig
 cúigear
 d'
 daichead
 dar
 de
 deich
 deichniúr
 den
 dhá
 do
 don
 dtí
 dá
 dár
 dó
 faoi
 faoin
 faoina
 faoinár
 fara
 fiche
 gach
 gan
 go
 gur
 haon
 hocht
 i
 iad
 idir
 in
 ina
 ins
 inár
 is
 le
 leis
 lena
 lenár
 m'
 mar
 mo
 mé
 na
 nach
 naoi
 naonúr
 ná
 ní
 níor
 nó
 nócha
 ocht
 ochtar
 os
 roimh
 sa
 seacht
 seachtar
 seachtó
 seasca
 seisear
 siad
 sibh
 sinn
 sna
 sé
 sí
 tar
 thar
 thú
 triúr
 trí
 trína
 trínár
 tríocha
 tú
 um
 ár
 é
 éis
 í
 ó
 ón
 óna
 ónár
--- a/solr_config/lang/stopwords_gl.txt
+++ b/solr_config/lang/stopwords_gl.txt
@@ -0,0 +1,161 @@
 # galican stopwords
 a
 aínda
 alí
 aquel
 aquela
 aquelas
 aqueles
 aquilo
 aquí
 ao
 aos
 as
 así
 á
 ben
 cando
 che
 co
 coa
 comigo
 con
 connosco
 contigo
 convosco
 coas
 cos
 cun
 cuns
 cunha
 cunhas
 da
 dalgunha
 dalgunhas
 dalgún
 dalgúns
 das
 de
 del
 dela
 delas
 deles
 desde
 deste
 do
 dos
 dun
 duns
 dunha
 dunhas
 e
 el
 ela
 elas
 eles
 en
 era
 eran
 esa
 esas
 ese
 eses
 esta
 estar
 estaba
 está
 están
 este
 estes
 estiven
 estou
 eu
 é
 facer
 foi
 foron
 fun
 había
 hai
 iso
 isto
 la
 las
 lle
 lles
 lo
 los
 mais
 me
 meu
 meus
 min
 miña
 miñas
 moi
 na
 nas
 neste
 nin
 no
 non
 nos
 nosa
 nosas
 noso
 nosos
 nós
 nun
 nunha
 nuns
 nunhas
 o
 os
 ou
 ó
 ós
 para
 pero
 pode
 pois
 pola
 polas
 polo
 polos
 por
 que
 se
 senón
 ser
 seu
 seus
 sexa
 sido
 sobre
 súa
 súas
 tamén
 tan
 te
 ten
 teñen
 teño
 ter
 teu
 teus
 ti
 tido
 tiña
 tiven
 túa
 túas
 un
 unha
 unhas
 uns
 vos
 vosa
 vosas
 voso
 vosos
 vós
--- a/solr_config/lang/stopwords_hi.txt
+++ b/solr_config/lang/stopwords_hi.txt
@@ -0,0 +1,235 @@
 # Also see http://www.opensource.org/licenses/bsd-license.html
 # See http://members.unine.ch/jacques.savoy/clef/index.html.
 # This file was created by Jacques Savoy and is distributed under the BSD license.
 # Note: by default this file also contains forms normalized by HindiNormalizer 
 # for spelling variation (see section below), such that it can be used whether or 
 # not you enable that feature. When adding additional entries to this list,
 # please add the normalized form as well. 
 अंदर
 अत
 अपना
 अपनी
 अपने
 अभी
 आदि
 आप
 इत्यादि
 इन 
 इनका
 इन्हीं
 इन्हें
 इन्हों
 इस
 इसका
 इसकी
 इसके
 इसमें
 इसी
 इसे
 उन
 उनका
 उनकी
 उनके
 उनको
 उन्हीं
 उन्हें
 उन्हों
 उस
 उसके
 उसी
 उसे
 एक
 एवं
 एस
 ऐसे
 और
 कई
 कर
 करता
 करते
 करना
 करने
 करें
 कहते
 कहा
 का
 काफ़ी
 कि
 कितना
 किन्हें
 किन्हों
 किया
 किर
 किस
 किसी
 किसे
 की
 कुछ
 कुल
 के
 को
 कोई
 कौन
 कौनसा
 गया
 घर
 जब
 जहाँ
 जा
 जितना
 जिन
 जिन्हें
 जिन्हों
 जिस
 जिसे
 जीधर
 जैसा
 जैसे
 जो
 तक
 तब
 तरह
 तिन
 तिन्हें
 तिन्हों
 तिस
 तिसे
 तो
 था
 थी
 थे
 दबारा
 दिया
 दुसरा
 दूसरे
 दो
 द्वारा
 न
 नहीं
 ना
 निहायत
 नीचे
 ने
 पर
 पर  
 पहले
 पूरा
 पे
 फिर
 बनी
 बही
 बहुत
 बाद
 बाला
 बिलकुल
 भी
 भीतर
 मगर
 मानो
 मे
 में
 यदि
 यह
 यहाँ
 यही
 या
 यिह 
 ये
 रखें
 रहा
 रहे
 ऱ्वासा
 लिए
 लिये
 लेकिन
 व
 वर्ग
 वह
 वह 
 वहाँ
 वहीं
 वाले
 वुह 
 वे
 वग़ैरह
 संग
 सकता
 सकते
 सबसे
 सभी
 साथ
 साबुत
 साभ
 सारा
 से
 सो
 ही
 हुआ
 हुई
 हुए
 है
 हैं
 हो
 होता
 होती
 होते
 होना
 होने
 # additional normalized forms of the above
 अपनि
 जेसे
 होति
 सभि
 तिंहों
 इंहों
 दवारा
 इसि
 किंहें
 थि
 उंहों
 ओर
 जिंहें
 वहिं
 अभि
 बनि
 हि
 उंहिं
 उंहें
 हें
 वगेरह
 एसे
 रवासा
 कोन
 निचे
 काफि
 उसि
 पुरा
 भितर
 हे
 बहि
 वहां
 कोइ
 यहां
 जिंहों
 तिंहें
 किसि
 कइ
 यहि
 इंहिं
 जिधर
 इंहें
 अदि
 इतयादि
 हुइ
 कोनसा
 इसकि
 दुसरे
 जहां
 अप
 किंहों
 उनकि
 भि
 वरग
 हुअ
 जेसा
 नहिं
--- a/solr_config/lang/stopwords_hu.txt
+++ b/solr_config/lang/stopwords_hu.txt
@@ -0,0 +1,211 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/hungarian/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
 
 | Hungarian stop word list
 | prepared by Anna Tordai

 a
 ahogy
 ahol
 aki
 akik
 akkor
 alatt
 által
 általában
 amely
 amelyek
 amelyekben
 amelyeket
 amelyet
 amelynek
 ami
 amit
 amolyan
 amíg
 amikor
 át
 abban
 ahhoz
 annak
 arra
 arról
 az
 azok
 azon
 azt
 azzal
 azért
 aztán
 azután
 azonban
 bár
 be
 belül
 benne
 cikk
 cikkek
 cikkeket
 csak
 de
 e
 eddig
 egész
 egy
 egyes
 egyetlen
 egyéb
 egyik
 egyre
 ekkor
 el
 elég
 ellen
 elő
 először
 előtt
 első
 én
 éppen
 ebben
 ehhez
 emilyen
 ennek
 erre
 ez
 ezt
 ezek
 ezen
 ezzel
 ezért
 és
 fel
 felé
 hanem
 hiszen
 hogy
 hogyan
 igen
 így
 illetve
 ill.
 ill
 ilyen
 ilyenkor
 ison
 ismét
 itt
 jó
 jól
 jobban
 kell
 kellett
 keresztül
 keressünk
 ki
 kívül
 között
 közül
 legalább
 lehet
 lehetett
 legyen
 lenne
 lenni
 lesz
 lett
 maga
 magát
 majd
 majd
 már
 más
 másik
 meg
 még
 mellett
 mert
 mely
 melyek
 mi
 mit
 míg
 miért
 milyen
 mikor
 minden
 mindent
 mindenki
 mindig
 mint
 mintha
 mivel
 most
 nagy
 nagyobb
 nagyon
 ne
 néha
 nekem
 neki
 nem
 néhány
 nélkül
 nincs
 olyan
 ott
 össze
 ő
 ők
 őket
 pedig
 persze
 rá
 s
 saját
 sem
 semmi
 sok
 sokat
 sokkal
 számára
 szemben
 szerint
 szinte
 talán
 tehát
 teljes
 tovább
 továbbá
 több
 úgy
 ugyanis
 új
 újabb
 újra
 után
 utána
 utolsó
 vagy
 vagyis
 valaki
 valami
 valamint
 való
 vagyok
 van
 vannak
 volt
 voltam
 voltak
 voltunk
 vissza
 vele
 viszont
 volna
--- a/solr_config/lang/stopwords_hy.txt
+++ b/solr_config/lang/stopwords_hy.txt
@@ -0,0 +1,46 @@
 # example set of Armenian stopwords.
 այդ
 այլ
 այն
 այս
 դու
 դուք
 եմ
 են
 ենք
 ես
 եք
 է
 էի
 էին
 էինք
 էիր
 էիք
 էր
 ըստ
 թ
 ի
 ին
 իսկ
 իր
 կամ
 համար
 հետ
 հետո
 մենք
 մեջ
 մի
 ն
 նա
 նաև
 նրա
 նրանք
 որ
 որը
 որոնք
 որպես
 ու
 ում
 պիտի
 վրա
 և
--- a/solr_config/lang/stopwords_id.txt
+++ b/solr_config/lang/stopwords_id.txt
@@ -0,0 +1,359 @@
 # from appendix D of: A Study of Stemming Effects on Information
 # Retrieval in Bahasa Indonesia
 ada
 adanya
 adalah
 adapun
 agak
 agaknya
 agar
 akan
 akankah
 akhirnya
 aku
 akulah
 amat
 amatlah
 anda
 andalah
 antar
 diantaranya
 antara
 antaranya
 diantara
 apa
 apaan
 mengapa
 apabila
 apakah
 apalagi
 apatah
 atau
 ataukah
 ataupun
 bagai
 bagaikan
 sebagai
 sebagainya
 bagaimana
 bagaimanapun
 sebagaimana
 bagaimanakah
 bagi
 bahkan
 bahwa
 bahwasanya
 sebaliknya
 banyak
 sebanyak
 beberapa
 seberapa
 begini
 beginian
 beginikah
 beginilah
 sebegini
 begitu
 begitukah
 begitulah
 begitupun
 sebegitu
 belum
 belumlah
 sebelum
 sebelumnya
 sebenarnya
 berapa
 berapakah
 berapalah
 berapapun
 betulkah
 sebetulnya
 biasa
 biasanya
 bila
 bilakah
 bisa
 bisakah
 sebisanya
 boleh
 bolehkah
 bolehlah
 buat
 bukan
 bukankah
 bukanlah
 bukannya
 cuma
 percuma
 dahulu
 dalam
 dan
 dapat
 dari
 daripada
 dekat
 demi
 demikian
 demikianlah
 sedemikian
 dengan
 depan
 di
 dia
 dialah
 dini
 diri
 dirinya
 terdiri
 dong
 dulu
 enggak
 enggaknya
 entah
 entahlah
 terhadap
 terhadapnya
 hal
 hampir
 hanya
 hanyalah
 harus
 haruslah
 harusnya
 seharusnya
 hendak
 hendaklah
 hendaknya
 hingga
 sehingga
 ia
 ialah
 ibarat
 ingin
 inginkah
 inginkan
 ini
 inikah
 inilah
 itu
 itukah
 itulah
 jangan
 jangankan
 janganlah
 jika
 jikalau
 juga
 justru
 kala
 kalau
 kalaulah
 kalaupun
 kalian
 kami
 kamilah
 kamu
 kamulah
 kan
 kapan
 kapankah
 kapanpun
 dikarenakan
 karena
 karenanya
 ke
 kecil
 kemudian
 kenapa
 kepada
 kepadanya
 ketika
 seketika
 khususnya
 kini
 kinilah
 kiranya
 sekiranya
 kita
 kitalah
 kok
 lagi
 lagian
 selagi
 lah
 lain
 lainnya
 melainkan
 selaku
 lalu
 melalui
 terlalu
 lama
 lamanya
 selama
 selama
 selamanya
 lebih
 terlebih
 bermacam
 macam
 semacam
 maka
 makanya
 makin
 malah
 malahan
 mampu
 mampukah
 mana
 manakala
 manalagi
 masih
 masihkah
 semasih
 masing
 mau
 maupun
 semaunya
 memang
 mereka
 merekalah
 meski
 meskipun
 semula
 mungkin
 mungkinkah
 nah
 namun
 nanti
 nantinya
 nyaris
 oleh
 olehnya
 seorang
 seseorang
 pada
 padanya
 padahal
 paling
 sepanjang
 pantas
 sepantasnya
 sepantasnyalah
 para
 pasti
 pastilah
 per
 pernah
 pula
 pun
 merupakan
 rupanya
 serupa
 saat
 saatnya
 sesaat
 saja
 sajalah
 saling
 bersama
 sama
 sesama
 sambil
 sampai
 sana
 sangat
 sangatlah
 saya
 sayalah
 se
 sebab
 sebabnya
 sebuah
 tersebut
 tersebutlah
 sedang
 sedangkan
 sedikit
 sedikitnya
 segala
 segalanya
 segera
 sesegera
 sejak
 sejenak
 sekali
 sekalian
 sekalipun
 sesekali
 sekaligus
 sekarang
 sekarang
 sekitar
 sekitarnya
 sela
 selain
 selalu
 seluruh
 seluruhnya
 semakin
 sementara
 sempat
 semua
 semuanya
 sendiri
 sendirinya
 seolah
 seperti
 sepertinya
 sering
 seringnya
 serta
 siapa
 siapakah
 siapapun
 disini
 disinilah
 sini
 sinilah
 sesuatu
 sesuatunya
 suatu
 sesudah
 sesudahnya
 sudah
 sudahkah
 sudahlah
 supaya
 tadi
 tadinya
 tak
 tanpa
 setelah
 telah
 tentang
 tentu
 tentulah
 tentunya
 tertentu
 seterusnya
 tapi
 tetapi
 setiap
 tiap
 setidaknya
 tidak
 tidakkah
 tidaklah
 toh
 waduh
 wah
 wahai
 sewaktu
 walau
 walaupun
 wong
 yaitu
 yakni
 yang
--- a/solr_config/lang/stopwords_it.txt
+++ b/solr_config/lang/stopwords_it.txt
@@ -0,0 +1,303 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/italian/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | An Italian stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.

 ad             |  a (to) before vowel
 al             |  a + il
 allo           |  a + lo
 ai             |  a + i
 agli           |  a + gli
 all            |  a + l'
 agl            |  a + gl'
 alla           |  a + la
 alle           |  a + le
 con            |  with
 col            |  con + il
 coi            |  con + i (forms collo, cogli etc are now very rare)
 da             |  from
 dal            |  da + il
 dallo          |  da + lo
 dai            |  da + i
 dagli          |  da + gli
 dall           |  da + l'
 dagl           |  da + gll'
 dalla          |  da + la
 dalle          |  da + le
 di             |  of
 del            |  di + il
 dello          |  di + lo
 dei            |  di + i
 degli          |  di + gli
 dell           |  di + l'
 degl           |  di + gl'
 della          |  di + la
 delle          |  di + le
 in             |  in
 nel            |  in + el
 nello          |  in + lo
 nei            |  in + i
 negli          |  in + gli
 nell           |  in + l'
 negl           |  in + gl'
 nella          |  in + la
 nelle          |  in + le
 su             |  on
 sul            |  su + il
 sullo          |  su + lo
 sui            |  su + i
 sugli          |  su + gli
 sull           |  su + l'
 sugl           |  su + gl'
 sulla          |  su + la
 sulle          |  su + le
 per            |  through, by
 tra            |  among
 contro         |  against
 io             |  I
 tu             |  thou
 lui            |  he
 lei            |  she
 noi            |  we
 voi            |  you
 loro           |  they
 mio            |  my
 mia            |
 miei           |
 mie            |
 tuo            |
 tua            |
 tuoi           |  thy
 tue            |
 suo            |
 sua            |
 suoi           |  his, her
 sue            |
 nostro         |  our
 nostra         |
 nostri         |
 nostre         |
 vostro         |  your
 vostra         |
 vostri         |
 vostre         |
 mi             |  me
 ti             |  thee
 ci             |  us, there
 vi             |  you, there
 lo             |  him, the
 la             |  her, the
 li             |  them
 le             |  them, the
 gli            |  to him, the
 ne             |  from there etc
 il             |  the
 un             |  a
 uno            |  a
 una            |  a
 ma             |  but
 ed             |  and
 se             |  if
 perché         |  why, because
 anche          |  also
 come           |  how
 dov            |  where (as dov')
 dove           |  where
 che            |  who, that
 chi            |  who
 cui            |  whom
 non            |  not
 più            |  more
 quale          |  who, that
 quanto         |  how much
 quanti         |
 quanta         |
 quante         |
 quello         |  that
 quelli         |
 quella         |
 quelle         |
 questo         |  this
 questi         |
 questa         |
 queste         |
 si             |  yes
 tutto          |  all
 tutti          |  all

               |  single letter forms:

 a              |  at
 c              |  as c' for ce or ci
 e              |  and
 i              |  the
 l              |  as l'
 o              |  or

               | forms of avere, to have (not including the infinitive):

 ho
 hai
 ha
 abbiamo
 avete
 hanno
 abbia
 abbiate
 abbiano
 avrò
 avrai
 avrà
 avremo
 avrete
 avranno
 avrei
 avresti
 avrebbe
 avremmo
 avreste
 avrebbero
 avevo
 avevi
 aveva
 avevamo
 avevate
 avevano
 ebbi
 avesti
 ebbe
 avemmo
 aveste
 ebbero
 avessi
 avesse
 avessimo
 avessero
 avendo
 avuto
 avuta
 avuti
 avute

               | forms of essere, to be (not including the infinitive):
 sono
 sei
 è
 siamo
 siete
 sia
 siate
 siano
 sarò
 sarai
 sarà
 saremo
 sarete
 saranno
 sarei
 saresti
 sarebbe
 saremmo
 sareste
 sarebbero
 ero
 eri
 era
 eravamo
 eravate
 erano
 fui
 fosti
 fu
 fummo
 foste
 furono
 fossi
 fosse
 fossimo
 fossero
 essendo

               | forms of fare, to do (not including the infinitive, fa, fat-):
 faccio
 fai
 facciamo
 fanno
 faccia
 facciate
 facciano
 farò
 farai
 farà
 faremo
 farete
 faranno
 farei
 faresti
 farebbe
 faremmo
 fareste
 farebbero
 facevo
 facevi
 faceva
 facevamo
 facevate
 facevano
 feci
 facesti
 fece
 facemmo
 faceste
 fecero
 facessi
 facesse
 facessimo
 facessero
 facendo

               | forms of stare, to be (not including the infinitive):
 sto
 stai
 sta
 stiamo
 stanno
 stia
 stiate
 stiano
 starò
 starai
 starà
 staremo
 starete
 staranno
 starei
 staresti
 starebbe
 staremmo
 stareste
 starebbero
 stavo
 stavi
 stava
 stavamo
 stavate
 stavano
 stetti
 stesti
 stette
 stemmo
 steste
 stettero
 stessi
 stesse
 stessimo
 stessero
 stando
--- a/solr_config/lang/stopwords_ja.txt
+++ b/solr_config/lang/stopwords_ja.txt
@@ -0,0 +1,127 @@
 #
 # This file defines a stopword set for Japanese.
 #
 # This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
 # Punctuation characters and frequent kanji have mostly been left out.  See LUCENE-3745
 # for frequency lists, etc. that can be useful for making your own set (if desired)
 #
 # Note that there is an overlap between these stopwords and the terms stopped when used
 # in combination with the JapanesePartOfSpeechStopFilter.  When editing this file, note
 # that comments are not allowed on the same line as stopwords.
 #
 # Also note that stopping is done in a case-insensitive manner.  Change your StopFilter
 # configuration if you need case-sensitive stopping.  Lastly, note that stopping is done
 # using the same character width as the entries in this file.  Since this StopFilter is
 # normally done after a CJKWidthFilter in your chain, you would usually want your romaji
 # entries to be in half-width and your kana entries to be in full-width.
 #
 の
 に
 は
 を
 た
 が
 で
 て
 と
 し
 れ
 さ
 ある
 いる
 も
 する
 から
 な
 こと
 として
 い
 や
 れる
 など
 なっ
 ない
 この
 ため
 その
 あっ
 よう
 また
 もの
 という
 あり
 まで
 られ
 なる
 へ
 か
 だ
 これ
 によって
 により
 おり
 より
 による
 ず
 なり
 られる
 において
 ば
 なかっ
 なく
 しかし
 について
 せ
 だっ
 その後
 できる
 それ
 う
 ので
 なお
 のみ
 でき
 き
 つ
 における
 および
 いう
 さらに
 でも
 ら
 たり
 その他
 に関する
 たち
 ます
 ん
 なら
 に対して
 特に
 せる
 及び
 これら
 とき
 では
 にて
 ほか
 ながら
 うち
 そして
 とともに
 ただし
 かつて
 それぞれ
 または
 お
 ほど
 ものの
 に対する
 ほとんど
 と共に
 といった
 です
 とも
 ところ
 ここ
 ##### End of file
--- a/solr_config/lang/stopwords_lv.txt
+++ b/solr_config/lang/stopwords_lv.txt
@@ -0,0 +1,172 @@
 # Set of Latvian stopwords from A Stemming Algorithm for Latvian, Karlis Kreslins
 # the original list of over 800 forms was refined: 
 #   pronouns, adverbs, interjections were removed
 # 
 # prepositions
 aiz
 ap
 ar
 apakš
 ārpus
 augšpus
 bez
 caur
 dēļ
 gar
 iekš
 iz
 kopš
 labad
 lejpus
 līdz
 no
 otrpus
 pa
 par
 pār
 pēc
 pie
 pirms
 pret
 priekš
 starp
 šaipus
 uz
 viņpus
 virs
 virspus
 zem
 apakšpus
 # Conjunctions
 un
 bet
 jo
 ja
 ka
 lai
 tomēr
 tikko
 turpretī
 arī
 kaut
 gan
 tādēļ
 tā
 ne
 tikvien
 vien
 kā
 ir
 te
 vai
 kamēr
 # Particles
 ar
 diezin
 droši
 diemžēl
 nebūt
 ik
 it
 taču
 nu
 pat
 tiklab
 iekšpus
 nedz
 tik
 nevis
 turpretim
 jeb
 iekam
 iekām
 iekāms
 kolīdz
 līdzko
 tiklīdz
 jebšu
 tālab
 tāpēc
 nekā
 itin
 jā
 jau
 jel
 nē
 nezin
 tad
 tikai
 vis
 tak
 iekams
 vien
 # modal verbs
 būt  
 biju 
 biji
 bija
 bijām
 bijāt
 esmu
 esi
 esam
 esat 
 būšu     
 būsi
 būs
 būsim
 būsiet
 tikt
 tiku
 tiki
 tika
 tikām
 tikāt
 tieku
 tiec
 tiek
 tiekam
 tiekat
 tikšu
 tiks
 tiksim
 tiksiet
 tapt
 tapi
 tapāt
 topat
 tapšu
 tapsi
 taps
 tapsim
 tapsiet
 kļūt
 kļuvu
 kļuvi
 kļuva
 kļuvām
 kļuvāt
 kļūstu
 kļūsti
 kļūst
 kļūstam
 kļūstat
 kļūšu
 kļūsi
 kļūs
 kļūsim
 kļūsiet
 # verbs
 varēt
 varēju
 varējām
 varēšu
 varēsim
 var
 varēji
 varējāt
 varēsi
 varēsiet
 varat
 varēja
 varēs
--- a/solr_config/lang/stopwords_nl.txt
+++ b/solr_config/lang/stopwords_nl.txt
@@ -0,0 +1,119 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/dutch/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A Dutch stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.

 | This is a ranked list (commonest to rarest) of stopwords derived from
 | a large sample of Dutch text.

 | Dutch stop words frequently exhibit homonym clashes. These are indicated
 | clearly below.

 de             |  the
 en             |  and
 van            |  of, from
 ik             |  I, the ego
 te             |  (1) chez, at etc, (2) to, (3) too
 dat            |  that, which
 die            |  that, those, who, which
 in             |  in, inside
 een            |  a, an, one
 hij            |  he
 het            |  the, it
 niet           |  not, nothing, naught
 zijn           |  (1) to be, being, (2) his, one's, its
 is             |  is
 was            |  (1) was, past tense of all persons sing. of 'zijn' (to be) (2) wax, (3) the washing, (4) rise of river
 op             |  on, upon, at, in, up, used up
 aan            |  on, upon, to (as dative)
 met            |  with, by
 als            |  like, such as, when
 voor           |  (1) before, in front of, (2) furrow
 had            |  had, past tense all persons sing. of 'hebben' (have)
 er             |  there
 maar           |  but, only
 om             |  round, about, for etc
 hem            |  him
 dan            |  then
 zou            |  should/would, past tense all persons sing. of 'zullen'
 of             |  or, whether, if
 wat            |  what, something, anything
 mijn           |  possessive and noun 'mine'
 men            |  people, 'one'
 dit            |  this
 zo             |  so, thus, in this way
 door           |  through by
 over           |  over, across
 ze             |  she, her, they, them
 zich           |  oneself
 bij            |  (1) a bee, (2) by, near, at
 ook            |  also, too
 tot            |  till, until
 je             |  you
 mij            |  me
 uit            |  out of, from
 der            |  Old Dutch form of 'van der' still found in surnames
 daar           |  (1) there, (2) because
 haar           |  (1) her, their, them, (2) hair
 naar           |  (1) unpleasant, unwell etc, (2) towards, (3) as
 heb            |  present first person sing. of 'to have'
 hoe            |  how, why
 heeft          |  present third person sing. of 'to have'
 hebben         |  'to have' and various parts thereof
 deze           |  this
 u              |  you
 want           |  (1) for, (2) mitten, (3) rigging
 nog            |  yet, still
 zal            |  'shall', first and third person sing. of verb 'zullen' (will)
 me             |  me
 zij            |  she, they
 nu             |  now
 ge             |  'thou', still used in Belgium and south Netherlands
 geen           |  none
 omdat          |  because
 iets           |  something, somewhat
 worden         |  to become, grow, get
 toch           |  yet, still
 al             |  all, every, each
 waren          |  (1) 'were' (2) to wander, (3) wares, (3)
 veel           |  much, many
 meer           |  (1) more, (2) lake
 doen           |  to do, to make
 toen           |  then, when
 moet           |  noun 'spot/mote' and present form of 'to must'
 ben            |  (1) am, (2) 'are' in interrogative second person singular of 'to be'
 zonder         |  without
 kan            |  noun 'can' and present form of 'to be able'
 hun            |  their, them
 dus            |  so, consequently
 alles          |  all, everything, anything
 onder          |  under, beneath
 ja             |  yes, of course
 eens           |  once, one day
 hier           |  here
 wie            |  who
 werd           |  imperfect third person sing. of 'become'
 altijd         |  always
 doch           |  yet, but etc
 wordt          |  present third person sing. of 'become'
 wezen          |  (1) to be, (2) 'been' as in 'been fishing', (3) orphans
 kunnen         |  to be able
 ons            |  us/our
 zelf           |  self
 tegen          |  against, towards, at
 na             |  after, near
 reeds          |  already
 wil            |  (1) present tense of 'want', (2) 'will', noun, (3) fender
 kon            |  could; past tense of 'to be able'
 niets          |  nothing
 uw             |  your
 iemand         |  somebody
 geweest        |  been; past participle of 'be'
 andere         |  other
--- a/solr_config/lang/stopwords_no.txt
+++ b/solr_config/lang/stopwords_no.txt
@@ -0,0 +1,194 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/norwegian/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A Norwegian stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.

 | This stop word list is for the dominant bokmål dialect. Words unique
 | to nynorsk are marked *.

 | Revised by Jan Bruusgaard <Jan.Bruusgaard@ssb.no>, Jan 2005

 og             | and
 i              | in
 jeg            | I
 det            | it/this/that
 at             | to (w. inf.)
 en             | a/an
 et             | a/an
 den            | it/this/that
 til            | to
 er             | is/am/are
 som            | who/that
 på             | on
 de             | they / you(formal)
 med            | with
 han            | he
 av             | of
 ikke           | not
 ikkje          | not *
 der            | there
 så             | so
 var            | was/were
 meg            | me
 seg            | you
 men            | but
 ett            | one
 har            | have
 om             | about
 vi             | we
 min            | my
 mitt           | my
 ha             | have
 hadde          | had
 hun            | she
 nå             | now
 over           | over
 da             | when/as
 ved            | by/know
 fra            | from
 du             | you
 ut             | out
 sin            | your
 dem            | them
 oss            | us
 opp            | up
 man            | you/one
 kan            | can
 hans           | his
 hvor           | where
 eller          | or
 hva            | what
 skal           | shall/must
 selv           | self (reflective)
 sjøl           | self (reflective)
 her            | here
 alle           | all
 vil            | will
 bli            | become
 ble            | became
 blei           | became *
 blitt          | have become
 kunne          | could
 inn            | in
 når            | when
 være           | be
 kom            | come
 noen           | some
 noe            | some
 ville          | would
 dere           | you
 som            | who/which/that
 deres          | their/theirs
 kun            | only/just
 ja             | yes
 etter          | after
 ned            | down
 skulle         | should
 denne          | this
 for            | for/because
 deg            | you
 si             | hers/his
 sine           | hers/his
 sitt           | hers/his
 mot            | against
 å              | to
 meget          | much
 hvorfor        | why
 dette          | this
 disse          | these/those
 uten           | without
 hvordan        | how
 ingen          | none
 din            | your
 ditt           | your
 blir           | become
 samme          | same
 hvilken        | which
 hvilke         | which (plural)
 sånn           | such a
 inni           | inside/within
 mellom         | between
 vår            | our
 hver           | each
 hvem           | who
 vors           | us/ours
 hvis           | whose
 både           | both
 bare           | only/just
 enn            | than
 fordi          | as/because
 før            | before
 mange          | many
 også           | also
 slik           | just
 vært           | been
 være           | to be
 båe            | both *
 begge          | both
 siden          | since
 dykk           | your *
 dykkar         | yours *
 dei            | they *
 deira          | them *
 deires         | theirs *
 deim           | them *
 di             | your (fem.) *
 då             | as/when *
 eg             | I *
 ein            | a/an *
 eit            | a/an *
 eitt           | a/an *
 elles          | or *
 honom          | he *
 hjå            | at *
 ho             | she *
 hoe            | she *
 henne          | her
 hennar         | her/hers
 hennes         | hers
 hoss           | how *
 hossen         | how *
 ikkje          | not *
 ingi           | noone *
 inkje          | noone *
 korleis        | how *
 korso          | how *
 kva            | what/which *
 kvar           | where *
 kvarhelst      | where *
 kven           | who/whom *
 kvi            | why *
 kvifor         | why *
 me             | we *
 medan          | while *
 mi             | my *
 mine           | my *
 mykje          | much *
 no             | now *
 nokon          | some (masc./neut.) *
 noka           | some (fem.) *
 nokor          | some *
 noko           | some *
 nokre          | some *
 si             | his/hers *
 sia            | since *
 sidan          | since *
 so             | so *
 somt           | some *
 somme          | some *
 um             | about*
 upp            | up *
 vere           | be *
 vore           | was *
 verte          | become *
 vort           | become *
 varte          | became *
 vart           | became *

--- a/solr_config/lang/stopwords_pt.txt
+++ b/solr_config/lang/stopwords_pt.txt
@@ -0,0 +1,253 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/portuguese/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A Portuguese stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.


 | The following is a ranked list (commonest to rarest) of stopwords
 | deriving from a large sample of text.

 | Extra words have been added at the end.

 de             |  of, from
 a              |  the; to, at; her
 o              |  the; him
 que            |  who, that
 e              |  and
 do             |  de + o
 da             |  de + a
 em             |  in
 um             |  a
 para           |  for
  | é          from SER
 com            |  with
 não            |  not, no
 uma            |  a
 os             |  the; them
 no             |  em + o
 se             |  himself etc
 na             |  em + a
 por            |  for
 mais           |  more
 as             |  the; them
 dos            |  de + os
 como           |  as, like
 mas            |  but
  | foi        from SER
 ao             |  a + o
 ele            |  he
 das            |  de + as
  | tem        from TER
 à              |  a + a
 seu            |  his
 sua            |  her
 ou             |  or
  | ser        from SER
 quando         |  when
 muito          |  much
  | há         from HAV
 nos            |  em + os; us
 já             |  already, now
  | está       from EST
 eu             |  I
 também         |  also
 só             |  only, just
 pelo           |  per + o
 pela           |  per + a
 até            |  up to
 isso           |  that
 ela            |  he
 entre          |  between
  | era        from SER
 depois         |  after
 sem            |  without
 mesmo          |  same
 aos            |  a + os
  | ter        from TER
 seus           |  his
 quem           |  whom
 nas            |  em + as
 me             |  me
 esse           |  that
 eles           |  they
  | estão      from EST
 você           |  you
  | tinha      from TER
  | foram      from SER
 essa           |  that
 num            |  em + um
 nem            |  nor
 suas           |  her
 meu            |  my
 às             |  a + as
 minha          |  my
  | têm        from TER
 numa           |  em + uma
 pelos          |  per + os
 elas           |  they
  | havia      from HAV
  | seja       from SER
 qual           |  which
  | será       from SER
 nós            |  we
  | tenho      from TER
 lhe            |  to him, her
 deles          |  of them
 essas          |  those
 esses          |  those
 pelas          |  per + as
 este           |  this
  | fosse      from SER
 dele           |  of him

 | other words. There are many contractions such as naquele = em+aquele,
 | mo = me+o, but they are rare.
 | Indefinite article plural forms are also rare.

 tu             |  thou
 te             |  thee
 vocês          |  you (plural)
 vos            |  you
 lhes           |  to them
 meus           |  my
 minhas
 teu            |  thy
 tua
 teus
 tuas
 nosso          | our
 nossa
 nossos
 nossas

 dela           |  of her
 delas          |  of them

 esta           |  this
 estes          |  these
 estas          |  these
 aquele         |  that
 aquela         |  that
 aqueles        |  those
 aquelas        |  those
 isto           |  this
 aquilo         |  that

               | forms of estar, to be (not including the infinitive):
 estou
 está
 estamos
 estão
 estive
 esteve
 estivemos
 estiveram
 estava
 estávamos
 estavam
 estivera
 estivéramos
 esteja
 estejamos
 estejam
 estivesse
 estivéssemos
 estivessem
 estiver
 estivermos
 estiverem

               | forms of haver, to have (not including the infinitive):
 hei
 há
 havemos
 hão
 houve
 houvemos
 houveram
 houvera
 houvéramos
 haja
 hajamos
 hajam
 houvesse
 houvéssemos
 houvessem
 houver
 houvermos
 houverem
 houverei
 houverá
 houveremos
 houverão
 houveria
 houveríamos
 houveriam

               | forms of ser, to be (not including the infinitive):
 sou
 somos
 são
 era
 éramos
 eram
 fui
 foi
 fomos
 foram
 fora
 fôramos
 seja
 sejamos
 sejam
 fosse
 fôssemos
 fossem
 for
 formos
 forem
 serei
 será
 seremos
 serão
 seria
 seríamos
 seriam

               | forms of ter, to have (not including the infinitive):
 tenho
 tem
 temos
 tém
 tinha
 tínhamos
 tinham
 tive
 teve
 tivemos
 tiveram
 tivera
 tivéramos
 tenha
 tenhamos
 tenham
 tivesse
 tivéssemos
 tivessem
 tiver
 tivermos
 tiverem
 terei
 terá
 teremos
 terão
 teria
 teríamos
 teriam
--- a/solr_config/lang/stopwords_ro.txt
+++ b/solr_config/lang/stopwords_ro.txt
@@ -0,0 +1,233 @@
 # This file was created by Jacques Savoy and is distributed under the BSD license.
 # See http://members.unine.ch/jacques.savoy/clef/index.html.
 # Also see http://www.opensource.org/licenses/bsd-license.html
 acea
 aceasta
 această
 aceea
 acei
 aceia
 acel
 acela
 acele
 acelea
 acest
 acesta
 aceste
 acestea
 aceşti
 aceştia
 acolo
 acum
 ai
 aia
 aibă
 aici
 al
 ăla
 ale
 alea
 ălea
 altceva
 altcineva
 am
 ar
 are
 aş
 aşadar
 asemenea
 asta
 ăsta
 astăzi
 astea
 ăstea
 ăştia
 asupra
 aţi
 au
 avea
 avem
 aveţi
 azi
 bine
 bucur
 bună
 ca
 că
 căci
 când
 care
 cărei
 căror
 cărui
 cât
 câte
 câţi
 către
 câtva
 ce
 cel
 ceva
 chiar
 cînd
 cine
 cineva
 cît
 cîte
 cîţi
 cîtva
 contra
 cu
 cum
 cumva
 curând
 curînd
 da
 dă
 dacă
 dar
 datorită
 de
 deci
 deja
 deoarece
 departe
 deşi
 din
 dinaintea
 dintr
 dintre
 drept
 după
 ea
 ei
 el
 ele
 eram
 este
 eşti
 eu
 face
 fără
 fi
 fie
 fiecare
 fii
 fim
 fiţi
 iar
 ieri
 îi
 îl
 îmi
 împotriva
 în 
 înainte
 înaintea
 încât
 încît
 încotro
 între
 întrucât
 întrucît
 îţi
 la
 lângă
 le
 li
 lîngă
 lor
 lui
 mă
 mâine
 mea
 mei
 mele
 mereu
 meu
 mi
 mine
 mult
 multă
 mulţi
 ne
 nicăieri
 nici
 nimeni
 nişte
 noastră
 noastre
 noi
 noştri
 nostru
 nu
 ori
 oricând
 oricare
 oricât
 orice
 oricînd
 oricine
 oricît
 oricum
 oriunde
 până
 pe
 pentru
 peste
 pînă
 poate
 pot
 prea
 prima
 primul
 prin
 printr
 sa
 să
 săi
 sale
 sau
 său
 se
 şi
 sînt
 sîntem
 sînteţi
 spre
 sub
 sunt
 suntem
 sunteţi
 ta
 tăi
 tale
 tău
 te
 ţi
 ţie
 tine
 toată
 toate
 tot
 toţi
 totuşi
 tu
 un
 una
 unde
 undeva
 unei
 unele
 uneori
 unor
 vă
 vi
 voastră
 voastre
 voi
 voştri
 vostru
 vouă
 vreo
 vreun
--- a/solr_config/lang/stopwords_ru.txt
+++ b/solr_config/lang/stopwords_ru.txt
@@ -0,0 +1,243 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/russian/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | a russian stop word list. comments begin with vertical bar. each stop
 | word is at the start of a line.

 | this is a ranked list (commonest to rarest) of stopwords derived from
 | a large text sample.

 | letter `ё' is translated to `е'.

 и              | and
 в              | in/into
 во             | alternative form
 не             | not
 что            | what/that
 он             | he
 на             | on/onto
 я              | i
 с              | from
 со             | alternative form
 как            | how
 а              | milder form of `no' (but)
 то             | conjunction and form of `that'
 все            | all
 она            | she
 так            | so, thus
 его            | him
 но             | but
 да             | yes/and
 ты             | thou
 к              | towards, by
 у              | around, chez
 же             | intensifier particle
 вы             | you
 за             | beyond, behind
 бы             | conditional/subj. particle
 по             | up to, along
 только         | only
 ее             | her
 мне            | to me
 было           | it was
 вот            | here is/are, particle
 от             | away from
 меня           | me
 еще            | still, yet, more
 нет            | no, there isnt/arent
 о              | about
 из             | out of
 ему            | to him
 теперь         | now
 когда          | when
 даже           | even
 ну             | so, well
 вдруг          | suddenly
 ли             | interrogative particle
 если           | if
 уже            | already, but homonym of `narrower'
 или            | or
 ни             | neither
 быть           | to be
 был            | he was
 него           | prepositional form of его
 до             | up to
 вас            | you accusative
 нибудь         | indef. suffix preceded by hyphen
 опять          | again
 уж             | already, but homonym of `adder'
 вам            | to you
 сказал         | he said
 ведь           | particle `after all'
 там            | there
 потом          | then
 себя           | oneself
 ничего         | nothing
 ей             | to her
 может          | usually with `быть' as `maybe'
 они            | they
 тут            | here
 где            | where
 есть           | there is/are
 надо           | got to, must
 ней            | prepositional form of  ей
 для            | for
 мы             | we
 тебя           | thee
 их             | them, their
 чем            | than
 была           | she was
 сам            | self
 чтоб           | in order to
 без            | without
 будто          | as if
 человек        | man, person, one
 чего           | genitive form of `what'
 раз            | once
 тоже           | also
 себе           | to oneself
 под            | beneath
 жизнь          | life
 будет          | will be
 ж              | short form of intensifer particle `же'
 тогда          | then
 кто            | who
 этот           | this
 говорил        | was saying
 того           | genitive form of `that'
 потому         | for that reason
 этого          | genitive form of `this'
 какой          | which
 совсем         | altogether
 ним            | prepositional form of `его', `они'
 здесь          | here
 этом           | prepositional form of `этот'
 один           | one
 почти          | almost
 мой            | my
 тем            | instrumental/dative plural of `тот', `то'
 чтобы          | full form of `in order that'
 нее            | her (acc.)
 кажется        | it seems
 сейчас         | now
 были           | they were
 куда           | where to
 зачем          | why
 сказать        | to say
 всех           | all (acc., gen. preposn. plural)
 никогда        | never
 сегодня        | today
 можно          | possible, one can
 при            | by
 наконец        | finally
 два            | two
 об             | alternative form of `о', about
 другой         | another
 хоть           | even
 после          | after
 над            | above
 больше         | more
 тот            | that one (masc.)
 через          | across, in
 эти            | these
 нас            | us
 про            | about
 всего          | in all, only, of all
 них            | prepositional form of `они' (they)
 какая          | which, feminine
 много          | lots
 разве          | interrogative particle
 сказала        | she said
 три            | three
 эту            | this, acc. fem. sing.
 моя            | my, feminine
 впрочем        | moreover, besides
 хорошо         | good
 свою           | ones own, acc. fem. sing.
 этой           | oblique form of `эта', fem. `this'
 перед          | in front of
 иногда         | sometimes
 лучше          | better
 чуть           | a little
 том            | preposn. form of `that one'
 нельзя         | one must not
 такой          | such a one
 им             | to them
 более          | more
 всегда         | always
 конечно        | of course
 всю            | acc. fem. sing of `all'
 между          | between


  | b: some paradigms
  |
  | personal pronouns
  |
  | я  меня  мне  мной  [мною]
  | ты  тебя  тебе  тобой  [тобою]
  | он  его  ему  им  [него, нему, ним]
  | она  ее  эи  ею  [нее, нэи, нею]
  | оно  его  ему  им  [него, нему, ним]
  |
  | мы  нас  нам  нами
  | вы  вас  вам  вами
  | они  их  им  ими  [них, ним, ними]
  |
  |   себя  себе  собой   [собою]
  |
  | demonstrative pronouns: этот (this), тот (that)
  |
  | этот  эта  это  эти
  | этого  эты  это  эти
  | этого  этой  этого  этих
  | этому  этой  этому  этим
  | этим  этой  этим  [этою]  этими
  | этом  этой  этом  этих
  |
  | тот  та  то  те
  | того  ту  то  те
  | того  той  того  тех
  | тому  той  тому  тем
  | тем  той  тем  [тою]  теми
  | том  той  том  тех
  |
  | determinative pronouns
  |
  | (a) весь (all)
  |
  | весь  вся  все  все
  | всего  всю  все  все
  | всего  всей  всего  всех
  | всему  всей  всему  всем
  | всем  всей  всем  [всею]  всеми
  | всем  всей  всем  всех
  |
  | (b) сам (himself etc)
  |
  | сам  сама  само  сами
  | самого саму  само  самих
  | самого самой самого  самих
  | самому самой самому  самим
  | самим  самой  самим  [самою]  самими
  | самом самой самом  самих
  |
  | stems of verbs `to be', `to have', `to do' and modal
  |
  | быть  бы  буд  быв  есть  суть
  | име
  | дел
  | мог   мож  мочь
  | уме
  | хоч  хот
  | долж
  | можн
  | нужн
  | нельзя

--- a/solr_config/lang/stopwords_sv.txt
+++ b/solr_config/lang/stopwords_sv.txt
@@ -0,0 +1,133 @@
 | From svn.tartarus.org/snowball/trunk/website/algorithms/swedish/stop.txt
 | This file is distributed under the BSD License.
 | See http://snowball.tartarus.org/license.php
 | Also see http://www.opensource.org/licenses/bsd-license.html
 |  - Encoding was converted to UTF-8.
 |  - This notice was added.
 |
 | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"

 | A Swedish stop word list. Comments begin with vertical bar. Each stop
 | word is at the start of a line.

 | This is a ranked list (commonest to rarest) of stopwords derived from
 | a large text sample.

 | Swedish stop words occasionally exhibit homonym clashes. For example
 |  så = so, but also seed. These are indicated clearly below.

 och            | and
 det            | it, this/that
 att            | to (with infinitive)
 i              | in, at
 en             | a
 jag            | I
 hon            | she
 som            | who, that
 han            | he
 på             | on
 den            | it, this/that
 med            | with
 var            | where, each
 sig            | him(self) etc
 för            | for
 så             | so (also: seed)
 till           | to
 är             | is
 men            | but
 ett            | a
 om             | if; around, about
 hade           | had
 de             | they, these/those
 av             | of
 icke           | not, no
 mig            | me
 du             | you
 henne          | her
 då             | then, when
 sin            | his
 nu             | now
 har            | have
 inte           | inte någon = no one
 hans           | his
 honom          | him
 skulle         | 'sake'
 hennes         | her
 där            | there
 min            | my
 man            | one (pronoun)
 ej             | nor
 vid            | at, by, on (also: vast)
 kunde          | could
 något          | some etc
 från           | from, off
 ut             | out
 när            | when
 efter          | after, behind
 upp            | up
 vi             | we
 dem            | them
 vara           | be
 vad            | what
 över           | over
 än             | than
 dig            | you
 kan            | can
 sina           | his
 här            | here
 ha             | have
 mot            | towards
 alla           | all
 under          | under (also: wonder)
 någon          | some etc
 eller          | or (else)
 allt           | all
 mycket         | much
 sedan          | since
 ju             | why
 denna          | this/that
 själv          | myself, yourself etc
 detta          | this/that
 åt             | to
 utan           | without
 varit          | was
 hur            | how
 ingen          | no
 mitt           | my
 ni             | you
 bli            | to be, become
 blev           | from bli
 oss            | us
 din            | thy
 dessa          | these/those
 några          | some etc
 deras          | their
 blir           | from bli
 mina           | my
 samma          | (the) same
 vilken         | who, that
 er             | you, your
 sådan          | such a
 vår            | our
 blivit         | from bli
 dess           | its
 inom           | within
 mellan         | between
 sådant         | such a
 varför         | why
 varje          | each
 vilka          | who, that
 ditt           | thy
 vem            | who
 vilket         | who, that
 sitta          | his
 sådana         | such a
 vart           | each
 dina           | thy
 vars           | whose
 vårt           | our
 våra           | our
 ert            | your
 era            | your
 vilkas         | whose

--- a/solr_config/lang/stopwords_th.txt
+++ b/solr_config/lang/stopwords_th.txt
@@ -0,0 +1,119 @@
 # Thai stopwords from:
 # "Opinion Detection in Thai Political News Columns
 # Based on Subjectivity Analysis"
 # Khampol Sukhum, Supot Nitsuwat, and Choochart Haruechaiyasak
 ไว้
 ไม่
 ไป
 ได้
 ให้
 ใน
 โดย
 แห่ง
 แล้ว
 และ
 แรก
 แบบ
 แต่
 เอง
 เห็น
 เลย
 เริ่ม
 เรา
 เมื่อ
 เพื่อ
 เพราะ
 เป็นการ
 เป็น
 เปิดเผย
 เปิด
 เนื่องจาก
 เดียวกัน
 เดียว
 เช่น
 เฉพาะ
 เคย
 เข้า
 เขา
 อีก
 อาจ
 อะไร
 ออก
 อย่าง
 อยู่
 อยาก
 หาก
 หลาย
 หลังจาก
 หลัง
 หรือ
 หนึ่ง
 ส่วน
 ส่ง
 สุด
 สําหรับ
 ว่า
 วัน
 ลง
 ร่วม
 ราย
 รับ
 ระหว่าง
 รวม
 ยัง
 มี
 มาก
 มา
 พร้อม
 พบ
 ผ่าน
 ผล
 บาง
 น่า
 นี้
 นํา
 นั้น
 นัก
 นอกจาก
 ทุก
 ที่สุด
 ที่
 ทําให้
 ทํา
 ทาง
 ทั้งนี้
 ทั้ง
 ถ้า
 ถูก
 ถึง
 ต้อง
 ต่างๆ
 ต่าง
 ต่อ
 ตาม
 ตั้งแต่
 ตั้ง
 ด้าน
 ด้วย
 ดัง
 ซึ่ง
 ช่วง
 จึง
 จาก
 จัด
 จะ
 คือ
 ความ
 ครั้ง
 คง
 ขึ้น
 ของ
 ขอ
 ขณะ
 ก่อน
 ก็
 การ
 กับ
 กัน
 กว่า
 กล่าว
--- a/solr_config/lang/stopwords_tr.txt
+++ b/solr_config/lang/stopwords_tr.txt
@@ -0,0 +1,212 @@
 # Turkish stopwords from LUCENE-559
 # merged with the list from "Information Retrieval on Turkish Texts"
 #   (http://www.users.muohio.edu/canf/papers/JASIST2008offPrint.pdf)
 acaba
 altmış
 altı
 ama
 ancak
 arada
 aslında
 ayrıca
 bana
 bazı
 belki
 ben
 benden
 beni
 benim
 beri
 beş
 bile
 bin
 bir
 birçok
 biri
 birkaç
 birkez
 birşey
 birşeyi
 biz
 bize
 bizden
 bizi
 bizim
 böyle
 böylece
 bu
 buna
 bunda
 bundan
 bunlar
 bunları
 bunların
 bunu
 bunun
 burada
 çok
 çünkü
 da
 daha
 dahi
 de
 defa
 değil
 diğer
 diye
 doksan
 dokuz
 dolayı
 dolayısıyla
 dört
 edecek
 eden
 ederek
 edilecek
 ediliyor
 edilmesi
 ediyor
 eğer
 elli
 en
 etmesi
 etti
 ettiği
 ettiğini
 gibi
 göre
 halen
 hangi
 hatta
 hem
 henüz
 hep
 hepsi
 her
 herhangi
 herkesin
 hiç
 hiçbir
 için
 iki
 ile
 ilgili
 ise
 işte
 itibaren
 itibariyle
 kadar
 karşın
 katrilyon
 kendi
 kendilerine
 kendini
 kendisi
 kendisine
 kendisini
 kez
 ki
 kim
 kimden
 kime
 kimi
 kimse
 kırk
 milyar
 milyon
 mu
 mü
 mı
 nasıl
 ne
 neden
 nedenle
 nerde
 nerede
 nereye
 niye
 niçin
 o
 olan
 olarak
 oldu
 olduğu
 olduğunu
 olduklarını
 olmadı
 olmadığı
 olmak
 olması
 olmayan
 olmaz
 olsa
 olsun
 olup
 olur
 olursa
 oluyor
 on
 ona
 ondan
 onlar
 onlardan
 onları
 onların
 onu
 onun
 otuz
 oysa
 öyle
 pek
 rağmen
 sadece
 sanki
 sekiz
 seksen
 sen
 senden
 seni
 senin
 siz
 sizden
 sizi
 sizin
 şey
 şeyden
 şeyi
 şeyler
 şöyle
 şu
 şuna
 şunda
 şundan
 şunları
 şunu
 tarafından
 trilyon
 tüm
 üç
 üzere
 var
 vardı
 ve
 veya
 ya
 yani
 yapacak
 yapılan
 yapılması
 yapıyor
 yapmak
 yaptı
 yaptığı
 yaptığını
 yaptıkları
 yedi
 yerine
 yetmiş
 yine
 yirmi
 yoksa
 yüz
 zaten
--- a/solr_config/lang/userdict_ja.txt
+++ b/solr_config/lang/userdict_ja.txt
@@ -0,0 +1,29 @@
 #
 # This is a sample user dictionary for Kuromoji (JapaneseTokenizer)
 #
 # Add entries to this file in order to override the statistical model in terms
 # of segmentation, readings and part-of-speech tags.  Notice that entries do
 # not have weights since they are always used when found.  This is by-design
 # in order to maximize ease-of-use.
 #
 # Entries are defined using the following CSV format:
 #  <text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
 #
 # Notice that a single half-width space separates tokens and readings, and
 # that the number tokens and readings must match exactly.
 #
 # Also notice that multiple entries with the same <text> is undefined.
 #
 # Whitespace only lines are ignored.  Comments are not allowed on entry lines.
 #

 # Custom segmentation for kanji compounds
 日本経済新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
 関西国際空港,関西 国際 空港,カンサイ コクサイ クウコウ,カスタム名詞

 # Custom segmentation for compound katakana
 トートバッグ,トート バッグ,トート バッグ,かずカナ名詞
 ショルダーバッグ,ショルダー バッグ,ショルダー バッグ,かずカナ名詞

 # Custom reading for former sumo wrestler
 朝青龍,朝青龍,アサショウリュウ,カスタム人名
--- a/solr_config/params.json
+++ b/solr_config/params.json
@@ -0,0 +1,34 @@
 {"params":{
  "query":{
    "defType":"edismax",
    "q.alt":"*:*",
    "rows":"10",
    "fl":"*,score",
    "":{"v":0}},
  "facets":{
    "facet":"on",
    "facet.mincount":"1",
    "f.doc_type.facet.mincount":"0",
    "facet.field":["text_shingles","{!ex=type}doc_type", "language"],
    "f.text_shingles.facet.limit":10,
    "facet.query":"{!ex=type key=all_types}*:*",
    "f.doc_type.facet.missing":true,
    "":{"v":0}},
  "browse":{
    "type_fq":"{!field f=doc_type v=$type}",
    "hl":"on",
    "hl.fl":"content",
    "v.locale":"${locale}",
    "debug":"true",
    "hl.simple.pre":"HL_START",
    "hl.simple.post":"HL_END",
    "echoParams": "explicit",
    "_appends_": {
      "fq": "{!switch v=$type tag=type case='*:*' case.all='*:*' case.unknown='-doc_type:[* TO *]' default=$type_fq}"
    },
    "":{"v":0}},
  "velocity":{
    "wt":"velocity",
    "v.template":"browse",
    "v.layout":"layout",
    "":{"v":0}}}}
--- a/solr_config/protwords.txt
+++ b/solr_config/protwords.txt
@@ -0,0 +1,21 @@
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.

 #-----------------------------------------------------------------------
 # Use a protected word file to protect against the stemmer reducing two
 # unrelated words to the same base word.

 # Some non-words that normally won't be encountered,
 # just to test that they won't be stemmed.
 dontstems
 zwhacky

--- a/solr_config/schema.xml
+++ b/solr_config/schema.xml
@@ -0,0 +1,530 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <!-- Solr managed schema - automatically generated - DO NOT EDIT -->
 <schema name="example-data-driven-schema" version="1.6">
  <uniqueKey>id</uniqueKey>
  <fieldType name="ancestor_path" class="solr.TextField">
    <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"/>
    </analyzer>
  </fieldType>
  <fieldType name="binary" class="solr.BinaryField"/>
  <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
  <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>
  <fieldType name="currency" class="solr.CurrencyFieldType" amountLongSuffix="_l_ns" codeStrSuffix="_s_ns" defaultCurrency="USD" currencyConfig="currency.xml" />
  <fieldType name="descendent_path" class="solr.TextField">
    <analyzer type="index">
      <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="ignored" class="solr.StrField" indexed="false" stored="false" multiValued="true"/>
  <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
  <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" maxDistErr="0.001" distErrPct="0.025" distanceUnits="kilometers"/>
  <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="phonetic_en" class="solr.TextField" indexed="true" stored="false">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
    </analyzer>
  </fieldType>
  <fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
  <fieldType name="pdates" class="solr.DatePointField" docValues="true" multiValued="true"/>
  <fieldType name="pdouble" class="solr.DoublePointField" docValues="true"/>
  <fieldType name="pdoubles" class="solr.DoublePointField" docValues="true" multiValued="true"/>
  <fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
  <fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
  <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
  <fieldType name="pints" class="solr.IntPointField" docValues="true" multiValued="true"/>
  <fieldType name="plong" class="solr.LongPointField" docValues="true"/>
  <fieldType name="plongs" class="solr.LongPointField" docValues="true" multiValued="true"/>
  <fieldType name="point" class="solr.PointType" subFieldSuffix="_d" dimension="2"/>
  <fieldType name="random" class="solr.RandomSortField" indexed="true"/>
  <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
  <fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true"/>
  <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_ar.txt" ignoreCase="true"/>
      <filter class="solr.ArabicNormalizationFilterFactory"/>
      <filter class="solr.ArabicStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_bg" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_bg.txt" ignoreCase="true"/>
      <filter class="solr.BulgarianStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ElisionFilterFactory" articles="lang/contractions_ca.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_ca.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Catalan"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.CJKWidthFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.CJKBigramFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_cz" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_cz.txt" ignoreCase="true"/>
      <filter class="solr.CzechStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_da" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_da.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Danish"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_de.txt" ignoreCase="true"/>
      <filter class="solr.GermanNormalizationFilterFactory"/>
      <filter class="solr.GermanLightStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_el" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.GreekLowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_el.txt" ignoreCase="false"/>
      <filter class="solr.GreekStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_en_splitting" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" catenateAll="0" catenateWords="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.FlattenGraphFilterFactory" />
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="0" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" catenateAll="0" catenateWords="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_en_splitting_tight" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="false" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="0" generateWordParts="0" catenateAll="0" catenateWords="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      <filter class="solr.FlattenGraphFilterFactory" />
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="false" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="0" generateWordParts="0" catenateAll="0" catenateWords="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_es.txt" ignoreCase="true"/>
      <filter class="solr.SpanishLightStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_eu" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_eu.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Basque"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <charFilter class="solr.PersianCharFilterFactory"/>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ArabicNormalizationFilterFactory"/>
      <filter class="solr.PersianNormalizationFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_fa.txt" ignoreCase="true"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_fi" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_fi.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ElisionFilterFactory" articles="lang/contractions_fr.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_fr.txt" ignoreCase="true"/>
      <filter class="solr.FrenchLightStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ga" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ElisionFilterFactory" articles="lang/contractions_ga.txt" ignoreCase="true"/>
      <filter class="solr.StopFilterFactory" words="lang/hyphenations_ga.txt" ignoreCase="true"/>
      <filter class="solr.IrishLowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_ga.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Irish"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ReversedWildcardFilterFactory" maxPosQuestion="2" maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_gl" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_gl.txt" ignoreCase="true"/>
      <filter class="solr.GalicianStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_hi" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.IndicNormalizationFilterFactory"/>
      <filter class="solr.HindiNormalizationFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_hi.txt" ignoreCase="true"/>
      <filter class="solr.HindiStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_hu" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_hu.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Hungarian"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_hy.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_id" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_id.txt" ignoreCase="true"/>
      <filter class="solr.IndonesianStemFilterFactory" stemDerivational="true"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_it" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ElisionFilterFactory" articles="lang/contractions_it.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_it.txt" ignoreCase="true"/>
      <filter class="solr.ItalianLightStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ja" class="solr.TextField" autoGeneratePhraseQueries="false" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
      <filter class="solr.JapaneseBaseFormFilterFactory"/>
      <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt"/>
      <filter class="solr.CJKWidthFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_ja.txt" ignoreCase="true"/>
      <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ko" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.KoreanTokenizerFactory" decompoundMode="discard" outputUnknownUnigrams="false"/>
      <filter class="solr.KoreanPartOfSpeechStopFilterFactory" />
      <filter class="solr.KoreanReadingFormFilterFactory" />
      <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
  </fieldType>
  <fieldType name="text_lv" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_lv.txt" ignoreCase="true"/>
      <filter class="solr.LatvianStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_nl.txt" ignoreCase="true"/>
      <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_no" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_no.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Norwegian"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_pt.txt" ignoreCase="true"/>
      <filter class="solr.PortugueseLightStemFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_ro.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_ru.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_sv" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_sv.txt" ignoreCase="true"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Swedish"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_th" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.ThaiTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_th.txt" ignoreCase="true"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.TurkishLowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_tr.txt" ignoreCase="false"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
  </fieldType>

  <fieldType name="text_email_url" class="solr.TextField">
    <analyzer>
      <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
      <filter class="solr.TypeTokenFilterFactory" types="email_url_types.txt" useWhitelist="true"/>
    </analyzer>
  </fieldType>

  <fieldType name="text_shingles" class="solr.TextField" positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <!-- <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="false" /> -->
      <filter class="solr.LengthFilterFactory" min="2" max="18"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="(^[^a-z]+$)" replacement="" replace="all"/>
      <filter class="solr.ShingleFilterFactory" minShingleSize="3"  maxShingleSize="3"
             outputUnigrams="false" outputUnigramsIfNoShingles="false" tokenSeparator=" " fillerToken="*"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="(.*[\*].*)"  replacement=""/>
      <filter class="solr.TrimFilterFactory"/>

      <!-- PRFF could have removed everything down to an empty string, remove if so -->
      <filter class="solr.LengthFilterFactory" min="1" max="100"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

  <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
  <field name="_version_" type="plong" indexed="true" stored="true"/>
  <field name="content_type" type="string" indexed="true" stored="true"/>
  <field name="doc_type" type="string" indexed="true" stored="true"/>
  <field name="title" type="string" indexed="true" stored="true"/>
  <field name="language" type="string" indexed="true" stored="true"/>
  <field name="content" type="text_general" multiValued="false" indexed="true" stored="true"/>
  <field name="text_shingles" type="text_shingles" indexed="true" stored="false"/>
  <field name="_text_" type="text_general" multiValued="true" indexed="true" stored="false"/>

  <dynamicField name="*_txt_en_split_tight" type="text_en_splitting_tight" indexed="true" stored="true"/>
  <dynamicField name="*_descendent_path" type="descendent_path" indexed="true" stored="true"/>
  <dynamicField name="*_ancestor_path" type="ancestor_path" indexed="true" stored="true"/>
  <dynamicField name="*_txt_en_split" type="text_en_splitting" indexed="true" stored="true"/>
  <dynamicField name="*_coordinate" type="pdouble" indexed="true" stored="false"/>
  <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
  <dynamicField name="*_txt_rev" type="text_general_rev" indexed="true" stored="true"/>
  <dynamicField name="*_phon_en" type="phonetic_en" indexed="true" stored="true"/>
  <dynamicField name="*_s_lower" type="lowercase" indexed="true" stored="true"/>
  <dynamicField name="*_txt_cjk" type="text_cjk" indexed="true" stored="true"/>
  <dynamicField name="random_*" type="random"/>
  <dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true"/>
  <dynamicField name="*_txt_ar" type="text_ar" indexed="true" stored="true"/>
  <dynamicField name="*_txt_bg" type="text_bg" indexed="true" stored="true"/>
  <dynamicField name="*_txt_ca" type="text_ca" indexed="true" stored="true"/>
  <dynamicField name="*_txt_cz" type="text_cz" indexed="true" stored="true"/>
  <dynamicField name="*_txt_da" type="text_da" indexed="true" stored="true"/>
  <dynamicField name="*_txt_de" type="text_de" indexed="true" stored="true"/>
  <dynamicField name="*_txt_el" type="text_el" indexed="true" stored="true"/>
  <dynamicField name="*_txt_es" type="text_es" indexed="true" stored="true"/>
  <dynamicField name="*_txt_eu" type="text_eu" indexed="true" stored="true"/>
  <dynamicField name="*_txt_fa" type="text_fa" indexed="true" stored="true"/>
  <dynamicField name="*_txt_fi" type="text_fi" indexed="true" stored="true"/>
  <dynamicField name="*_txt_fr" type="text_fr" indexed="true" stored="true"/>
  <dynamicField name="*_txt_ga" type="text_ga" indexed="true" stored="true"/>
  <dynamicField name="*_txt_gl" type="text_gl" indexed="true" stored="true"/>
  <dynamicField name="*_txt_hi" type="text_hi" indexed="true" stored="true"/>
  <dynamicField name="*_txt_hu" type="text_hu" indexed="true" stored="true"/>
  <dynamicField name="*_txt_hy" type="text_hy" indexed="true" stored="true"/>
  <dynamicField name="*_txt_id" type="text_id" indexed="true" stored="true"/>
  <dynamicField name="*_txt_it" type="text_it" indexed="true" stored="true"/>
  <dynamicField name="*_txt_ja" type="text_ja" indexed="true" stored="true"/>
  <dynamicField name="*_txt_ko" type="text_ko" indexed="true" stored="true"/>
  <dynamicField name="*_txt_lv" type="text_lv" indexed="true" stored="true"/>
  <dynamicField name="*_txt_nl" type="text_nl" indexed="true" stored="true"/>
  <dynamicField name="*_txt_no" type="text_no" indexed="true" stored="true"/>
  <dynamicField name="*_txt_pt" type="text_pt" indexed="true" stored="true"/>
  <dynamicField name="*_txt_ro" type="text_ro" indexed="true" stored="true"/>
  <dynamicField name="*_txt_ru" type="text_ru" indexed="true" stored="true"/>
  <dynamicField name="*_txt_sv" type="text_sv" indexed="true" stored="true"/>
  <dynamicField name="*_txt_th" type="text_th" indexed="true" stored="true"/>
  <dynamicField name="*_txt_tr" type="text_tr" indexed="true" stored="true"/>
  <dynamicField name="*_point" type="point" indexed="true" stored="true"/>
  <dynamicField name="*_srpt" type="location_rpt" indexed="true" stored="true"/>
  <dynamicField name="attr_*" type="text_general" multiValued="true" indexed="true" stored="true"/>
  <dynamicField name="*_l_ns" type="plong" indexed="true" stored="false"/>
  <dynamicField name="*_s_ns" type="string" indexed="true" stored="false"/>
  <dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/>
  <dynamicField name="*_dts" type="pdate" multiValued="true" indexed="true" stored="true"/>
  <dynamicField name="*_is" type="pints" indexed="true" stored="true"/>
  <dynamicField name="*_ss" type="strings" indexed="true" stored="true"/>
  <dynamicField name="*_ls" type="plongs" indexed="true" stored="true"/>
  <dynamicField name="*_bs" type="booleans" indexed="true" stored="true"/>
  <dynamicField name="*_fs" type="pfloats" indexed="true" stored="true"/>
  <dynamicField name="*_ds" type="pdoubles" indexed="true" stored="true"/>
  <dynamicField name="*_dt" type="pdate" indexed="true" stored="true"/>
  <dynamicField name="*_ws" type="text_ws" indexed="true" stored="true"/>
  <dynamicField name="*_i" type="pint" indexed="true" stored="true"/>
  <dynamicField name="*_s" type="string" indexed="true" stored="true"/>
  <dynamicField name="*_l" type="plong" indexed="true" stored="true"/>
  <dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
  <dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
  <dynamicField name="*_f" type="pfloat" indexed="true" stored="true"/>
  <dynamicField name="*_d" type="pdouble" indexed="true" stored="true"/>
  <dynamicField name="*_p" type="location" indexed="true" stored="true"/>
  <dynamicField name="*_c" type="currency" indexed="true" stored="true"/>

  <copyField source="content" dest="text_shingles"/>
  <copyField source="*" dest="_text_"/>

  <!-- ADDED BY SIMON BOWIE 2022-04-04 -->
  <copyField source="content" dest="year"/>
  <field name="year" type="year" indexed="true" stored="true"/>

  <fieldType name="year" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.PatternTokenizerFactory" pattern="=D[^\s]*\s[^\s]*\s[^\s]*\s[^\s]*\s(\d{4})" group="1" />
    </analyzer>
  </fieldType>
  <!-- END -->

 </schema>
--- a/solr_config/solrconfig.xml
+++ b/solr_config/solrconfig.xml
--- a/solr_config/stopwords.txt
+++ b/solr_config/stopwords.txt
@@ -0,0 +1,14 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
--- a/solr_config/synonyms.txt
+++ b/solr_config/synonyms.txt
@@ -0,0 +1,29 @@
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.

 #-----------------------------------------------------------------------
 #some test synonym mappings unlikely to appear in real input text
 aaafoo => aaabar
 bbbfoo => bbbfoo bbbbar
 cccfoo => cccbar cccbaz
 fooaaa,baraaa,bazaaa

 # Some synonym groups specific to this example
 GB,gib,gigabyte,gigabytes
 MB,mib,megabyte,megabytes
 Television, Televisions, TV, TVs
 #notice we use "gib" instead of "GiB" so any WordDelimiterGraphFilter coming
 #after us won't split it into two words.

 # Synonym mappings can be used for spelling correction too
 pixima => pixma

--- a/solr_config/update-script.js
+++ b/solr_config/update-script.js
@@ -0,0 +1,115 @@
 function get_class(name) {
  var clazz;
  try {
    // Java8 Nashorn
    clazz = eval("Java.type(name).class");
  } catch(e) {
    // Java7 Rhino
    clazz = eval("Packages."+name);
  }

  return clazz;
 }

 function processAdd(cmd) {

  doc = cmd.solrDoc;  // org.apache.solr.common.SolrInputDocument
  var id = doc.getFieldValue("id");
  logger.info("update-script#processAdd: id=" + id);

  // The idea here is to use the file's content_type value to
  // simplify into user-friendly values, such that types of, say, image/jpeg and image/tiff
  // are in an "Images" facet

  var ct = doc.getFieldValue("content_type");
  if (ct) {
    // strip off semicolon onward
    var semicolon_index = ct.indexOf(';');
    if (semicolon_index != -1) {
      ct = ct.substring(0,semicolon_index);
    }
    // and split type/subtype
    var ct_type = ct.substring(0,ct.indexOf('/'));
    var ct_subtype = ct.substring(ct.indexOf('/')+1);

    var doc_type;
    switch(true) {
      case /^application\/rtf/.test(ct) || /wordprocessing/.test(ct):
        doc_type = "doc";
        break;

      case /html/.test(ct):
        doc_type = "html";
        break;

      case /^image\/.*/.test(ct):
        doc_type = "image";
        break;

      case /presentation|powerpoint/.test(ct):
        doc_type = "presentation";
        break;

      case /spreadsheet|excel/.test(ct):
        doc_type = "spreadsheet";
        break;

      case /^application\/pdf/.test(ct):
        doc_type = "pdf";
        break;

      case /^text\/plain/.test(ct):
        doc_type = "text"
        break;

      default:
        break;
    }

    // TODO: error handling needed?   What if there is no slash?
    if(doc_type) { doc.setField("doc_type", doc_type); }
    doc.setField("content_type_type_s", ct_type);
    doc.setField("content_type_subtype_s", ct_subtype);
  }

  var content = doc.getFieldValue("content");
  if (!content) {
    return; //No content found, so we are done here
  }

    var analyzer =
         req.getCore().getLatestSchema()
         .getFieldTypeByName("text_email_url")
         .getIndexAnalyzer();

  var token_stream =
       analyzer.tokenStream("content", content);
  var term_att = token_stream.getAttribute(get_class("org.apache.lucene.analysis.tokenattributes.CharTermAttribute"));
  var type_att = token_stream.getAttribute(get_class("org.apache.lucene.analysis.tokenattributes.TypeAttribute"));
  token_stream.reset();
  while (token_stream.incrementToken()) {
    doc.addField(type_att.type().replace(/\<|\>/g,'').toLowerCase()+"_ss", term_att.toString());
  }
  token_stream.end();
  token_stream.close();
 }

 function processDelete(cmd) {
  // no-op
 }

 function processMergeIndexes(cmd) {
  // no-op
 }

 function processCommit(cmd) {
  // no-op
 }

 function processRollback(cmd) {
  // no-op
 }

 function finish() {
  // no-op
 }
--- a/solr_config/velocity/browse.vm
+++ b/solr_config/velocity/browse.vm
@@ -0,0 +1,32 @@
 <div id="query-box">
  <form id="query-form" action="#{url_for_home}" method="GET">
    $resource.find:
    <input type="text" id="q" name="q" style="width: 50%" value="$!esc.html($request.params.get('q'))"/>
    <input type="submit" value="$resource.submit"/>
    <div id="debug_query" class="debug">
      <span id="parsed_query">$esc.html($response.response.debug.parsedquery)</span>
    </div>

    <input type="hidden" name="type" value="#current_type"/>
    #if("#current_locale"!="")<input type="hidden" value="locale" value="#current_locale"/>#end
    #foreach($fq in $response.responseHeader.params.getAll("fq"))
      <input type="hidden" name="fq" id="allFQs" value="$esc.html($fq)"/>
    #end
  </form>

  <div id="constraints">
    #foreach($fq in $response.responseHeader.params.getAll("fq"))
      #set($previous_fq_count=$velocityCount - 1)
      #if($fq != '')
      &gt; $fq<a href="#url_for_filters($response.responseHeader.params.fq.subList(0,$previous_fq_count))">x</a>
      #end
    #end
  </div>

 </div>


 <div id="browse_results">
  #parse("results.vm")
 </div>

--- a/solr_config/velocity/dropit.js
+++ b/solr_config/velocity/dropit.js
--- a/solr_config/velocity/facet_doc_type.vm
+++ b/solr_config/velocity/facet_doc_type.vm
@@ -0,0 +1,2 @@
 ## intentionally empty

--- a/solr_config/velocity/facet_text_shingles.vm
+++ b/solr_config/velocity/facet_text_shingles.vm
@@ -0,0 +1,12 @@
 <div id="facet_$field.name">
  <span class="facet-field">$resource.facet.top_phrases</span><br/>

  <ul id="tagcloud">
    #foreach($facet in $sort.sort($field.values,"name"))
    <li data-weight="$math.mul($facet.count,1)">
      <a href="#url_for_facet_filter($field.name, $facet.name)">$facet.name</a>
    </li>

    #end
  </ul>
 </div>
--- a/solr_config/velocity/facets.vm
+++ b/solr_config/velocity/facets.vm
@@ -0,0 +1,24 @@
 #if($response.facetFields.size() > 0)
  #foreach($field in $response.facetFields)
    #if($field.values.size() > 0)
        #if($engine.resourceExists("facet_${field.name}.vm"))
          #parse("facet_${field.name}.vm")
        #else
          <div id="facet_$field.name" class="facet_field">
            <span class="facet-field">#label("facet.${field.name}",$field.name)</span><br/>

            <ul>
              #foreach($facet in $field.values)
                <li><a href="#url_for_facet_filter($field.name, $facet.name)">#if($facet.name!=$null)#label("${field.name}.${facet.name}","${field.name}.${facet.name}")#else<em>missing</em>#end</a> ($facet.count)</li>
              #end
            </ul>
          </div>
        #end
    #end
  #end ## end if field.values > 0
 #end  ## end if facetFields > 0





--- a/solr_config/velocity/footer.vm
+++ b/solr_config/velocity/footer.vm
@@ -0,0 +1,29 @@
 <hr/>

 <div>

  <div id="admin"><a href="#url_root/index.html#/#{core_name}">Solr Admin</a></div>

  <a href="#" onclick='jQuery(".debug").toggle(); return false;'>toggle debug mode</a>
  <a href="#url_for_lens&wt=xml#if($debug)&debug=true#end">XML results</a> ## TODO: Add links for other formats, maybe dynamically?

 </div>

 <div>
  <a href="http://lucene.apache.org/solr">Solr Home Page</a>
 </div>


 <div class="debug">
  <hr/>
  Request:
  <pre>
    $esc.html($request)
  </pre>

  <hr/>
  Debug:
  <pre>
    $esc.html($response.response.debug)
  </pre>
 </div>
--- a/solr_config/velocity/head.vm
+++ b/solr_config/velocity/head.vm
@@ -0,0 +1,290 @@
 <title>Solr browse: #core_name</title>

 <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

 <link rel="icon" type="image/x-icon" href="#{url_root}/img/favicon.ico"/>
 <link rel="shortcut icon" type="image/x-icon" href="#{url_root}/img/favicon.ico"/>

 <script type="text/javascript" src="#{url_root}/libs/jquery-3.4.1.min.js"></script>
 <script type="text/javascript" src="#{url_for_solr}/admin/file?file=/velocity/js/jquery.tx3-tag-cloud.js&contentType=text/javascript"></script>
 <script type="text/javascript" src="#{url_for_solr}/admin/file?file=/velocity/js/dropit.js&contentType=text/javascript"></script>
 <script type="text/javascript" src="#{url_for_solr}/admin/file?file=/velocity/js/jquery.autocomplete.js&contentType=text/javascript"></script>

 <script type="text/javascript">
  $(document).ready(function() {

    $("#tagcloud").tx3TagCloud({
      multiplier: 1
    });

    $('.menu').dropit();

    $( document ).ajaxComplete(function() {
      $("#tagcloud").tx3TagCloud({
        multiplier: 5
      });
    });

    $('\#q').keyup(function() {
      $('#browse_results').load('#{url_for_home}?#lensNoQ&v.layout.enabled=false&v.template=results&q='+encodeURI($('\#q').val()));

      $("\#q").autocomplete('#{url_for_solr}/suggest', {
        extraParams: {
          'suggest.q': function() { return $("\#q").val();},
          'suggest.build': 'true',
          'wt': 'json',
        }
      }).keydown(function(e) {
        if (e.keyCode === 13){
          $("#query-form").trigger('submit');
        }
      });
    });

  });
 </script>

 <style>

  html {
    background-color: #F0F8FF;
  }

  body {
    font-family: Helvetica, Arial, sans-serif;
    font-size: 10pt;
  }

  #header {
    width: 100%;
    font-size: 20pt;
  }

  #header2 {
    margin-left:1200px;
  }

  #logo {
    width: 115px;
    margin: 0px 0px 0px 0px;
    border-style: none;
  }

  a {
    color: #305CB3;
  }

  a.hidden {
    display:none;
  }

  em {
    color: #FF833D;
  }

  .error {
    color: white;
    background-color: red;
    left: 210px;
    width:80%;
    position: relative;
  }

  .debug { display: none; font-size: 10pt}
  #debug_query {
    font-family: Helvetica, Arial, sans-serif;
    font-size: 10pt;
    font-weight: bold;
  }
  #parsed_query {
    font-family: Courier, Courier New, monospaced;
    font-size: 10pt;
    font-weight: normal;
  }

  #admin {
    text-align: right;
    vertical-align: top;
  }

  #query-form {
    width: 90%;
  }

  #query-box {
    padding: 5px;
    margin: 5px;
    font-weight: normal;
    font-size: 24px;
    letter-spacing: 0.08em;
  }
  #constraints {
    margin: 10px;
  }

  #tabs {  }
  #tabs li { display: inline; font-size: 10px;}
  #tabs li a { border-radius: 20px; border: 2px solid #C1CDCD; padding: 10px;color: #42454a; background-color: #dedbde;}
  #tabs li a:hover { background-color: #f1f0ee; }
  #tabs li a.selected { color: #000; background-color: #f1f0ee; font-weight: bold; padding: 5px }
  #tabs li a.no_results { color: #000; background-color: #838B8B; font-style: italic; padding: 5px; pointer-events: none;
  cursor: default; text-decoration: none;}

  .pagination {
    width: 305px;
    border-radius: 25px;
    border: 2px solid #C1CDCD;
    padding: 20px;
    padding-left: 10%;
    background: #eee;
    margin-left: 190px;
    margin-top : 42px;
    padding-top: 5px;
    padding-bottom: 5px;
    text-align:left;
  }

  #results_list { width: 70%; }
  .result-document {
    border-radius: 25px;
    border: 2px solid #C1CDCD;
    padding: 10px;
 //    width: 800px;
 //    height: 120px;
    margin: 5px;
 //    margin-left: 60px;
 //    margin-right: 210px;
 //    margin-bottom: 15px;
    transition: 1s ease;
  }
  .result-document:hover
  {
    webkit-transform: scale(1.1);
    -ms-transform: scale(1.1);
    transform: scale(1.1);
    transition: 1s ease;
  }
  .result-document div {
    padding: 5px;
  }
  .result-title {
    width:60%;
  }
  .result-body {
    background: #ddd;
  }
  .result-document:nth-child(2n+1) {
    background-color: #FFFFFD;
  }

  #facets {
    margin: 5px;
    margin-top: 0px;
    padding: 5px;
    top: -20px;
    position: relative;
    float: right;
    width: 25%;
  }
  .facet-field {
    font-weight: bold;
  }
  #facets ul {
    list-style: none;
    margin: 0;
    margin-bottom: 5px;
    margin-top: 5px;
    padding-left: 10px;
  }
  #facets ul li {
    color: #999;
    padding: 2px;
  }

  div.facet_field {
    clear: left;
  }

  ul.tx3-tag-cloud { }
  ul.tx3-tag-cloud li {
    display: block;
    float: left;
    list-style: none;
    margin-right: 4px;
  }
  ul.tx3-tag-cloud li a {
    display: block;
    text-decoration: none;
    color: #c9c9c9;
    padding: 3px 10px;
  }
  ul.tx3-tag-cloud li a:hover {
    color: #000000;
    -webkit-transition: color 250ms linear;
    -moz-transition: color 250ms linear;
    -o-transition: color 250ms linear;
    -ms-transition: color 250ms linear;
    transition: color 250ms linear;
  }

  .dropit {
    list-style: none;
    padding: 0;
    margin: 0;
  }
  .dropit .dropit-trigger { position: relative; }
  .dropit .dropit-submenu {
    position: absolute;
    top: 100%;
    left: 0; /* dropdown left or right */
    z-index: 1000;
    display: none;
    min-width: 150px;
    list-style: none;
    padding: 0;
    margin: 0;
  }
  .dropit .dropit-open .dropit-submenu { display: block; }


  <!--autocomplete css-->
  .ac_results {
    padding: 0px;
    border: 1px solid black;
    background-color: white;
    overflow: hidden;
    z-index: 99999;
  }

  .ac_results ul {
    width: 100%;
    list-style-position: outside;
    list-style: none;
    padding: 0;
    margin: 0;
  }

  .ac_results li {
    margin: 0px;
    padding: 2px 5px;
    cursor: default;
    display: block;
    font: menu;
    font-size: 12px;
    line-height: 16px;
    overflow: hidden;
  }

  .ac_loading {
 //    background: white url('˜indicator.gif') right center no-repeat;
  }

  .ac_odd {
    background-color: #eee;
  }

  .ac_over {
    background-color: #0A246A;
    color: white;
  }
 </style>
--- a/solr_config/velocity/hit.vm
+++ b/solr_config/velocity/hit.vm
@@ -0,0 +1,77 @@

 #set($docId = $doc.getFirstValue($request.schema.uniqueKeyField.name))

 ## Load Mime-Type List and Mapping
 #parse('mime_type_lists.vm')

 ## Title
 #if($doc.getFieldValue('title'))
  #set($title = $esc.html($doc.getFirstValue('title')))
 #else
  #set($title = "$doc.getFirstValue('id').substring($math.add(1,$doc.getFirstValue('id').lastIndexOf('/')))")
 #end

 ## Date
 #if($doc.getFieldValue('attr_meta_creation_date'))
  #set($date = $esc.html($doc.getFirstValue('attr_meta_creation_date')))
 #else
  #set($date = "No date found")
 #end



 ## URL
 #if($doc.getFieldValue('url'))
  #set($url = $doc.getFieldValue('url'))
 #elseif($doc.getFieldValue('resourcename'))
  #set($url = "file:///$doc.getFirstValue('resourcename')")
 #else
  #set($url = "$doc.getFieldValue('id')")
 #end

 ## Sort out Mime-Type
 #set($ct = $doc.getFirstValue('content_type').split(";").get(0))
 #set($filename = $doc.getFirstValue('resourcename'))
 #set($filetype = false)
 #set($filetype = $mimeExtensionsMap.get($ct))
 #if(!$filetype)
  #set($filetype = $filename.substring($filename.lastIndexOf(".")).substring(1))
 #end
 #if(!$filetype)
  #set($filetype = "file")
 #end
 #if(!$supportedMimeTypes.contains($filetype))
  #set($filetype = "file")
 #end

 <div class="result-document">
  <span class="result-title">
    <img src="#{url_root}/img/filetypes/${filetype}.png" align="center">
    <b>$title</b>
  </span>

  <div>
    id: $docId </br>
  </div>

  #set($pad = "")
  #foreach($v in $response.response.highlighting.get($docId).get("content"))
    $pad$esc.html($v).replace("HL_START","<em>").replace("HL_END","</em>")
    #set($pad = " ... ")
  #end

 </div>

 <a href="#" class="debug" onclick='jQuery(this).next().toggle(); return false;'>toggle explain</a>
 <pre style="display: none;">
    $esc.html($response.getExplainMap().get($doc.getFirstValue('id')))
 </pre>

 <a href="#" class="debug" onclick='jQuery(this).next().toggle(); return false;'>show all fields</a>
 <pre style="display:none;">
  #foreach($fieldname in $doc.fieldNames)
    <span>$fieldname :</span>
    <span>#foreach($value in $doc.getFieldValues($fieldname))$esc.html($value)#end</span>
  #end
 </pre>

--- a/solr_config/velocity/img/english_640.png
+++ b/solr_config/velocity/img/english_640.png
--- a/solr_config/velocity/img/france_640.png
+++ b/solr_config/velocity/img/france_640.png
--- a/solr_config/velocity/img/germany_640.png
+++ b/solr_config/velocity/img/germany_640.png
--- a/solr_config/velocity/img/globe_256.png
+++ b/solr_config/velocity/img/globe_256.png
--- a/solr_config/velocity/jquery.tx3-tag-cloud.js
+++ b/solr_config/velocity/jquery.tx3-tag-cloud.js
--- a/solr_config/velocity/js/dropit.js
+++ b/solr_config/velocity/js/dropit.js
@@ -0,0 +1,97 @@
 /*
 * Dropit v1.1.0
 * http://dev7studios.com/dropit
 *
 * Copyright 2012, Dev7studios
 * Free to use and abuse under the MIT license.
 * http://www.opensource.org/licenses/mit-license.php
 */

 ;(function($) {

    $.fn.dropit = function(method) {

        var methods = {

            init : function(options) {
                this.dropit.settings = $.extend({}, this.dropit.defaults, options);
                return this.each(function() {
                    var $el = $(this),
                         el = this,
                         settings = $.fn.dropit.settings;

                    // Hide initial submenus
                    $el.addClass('dropit')
                    .find('>'+ settings.triggerParentEl +':has('+ settings.submenuEl +')').addClass('dropit-trigger')
                    .find(settings.submenuEl).addClass('dropit-submenu').hide();

                    // Open on click
                    $el.off(settings.action).on(settings.action, settings.triggerParentEl +':has('+ settings.submenuEl +') > '+ settings.triggerEl +'', function(){
                        // Close click menu's if clicked again
                        if(settings.action == 'click' && $(this).parents(settings.triggerParentEl).hasClass('dropit-open')){
                            settings.beforeHide.call(this);
                            $(this).parents(settings.triggerParentEl).removeClass('dropit-open').find(settings.submenuEl).hide();
                            settings.afterHide.call(this);
                            return false;
                        }

                        // Hide open menus
                        settings.beforeHide.call(this);
                        $('.dropit-open').removeClass('dropit-open').find('.dropit-submenu').hide();
                        settings.afterHide.call(this);

                        // Open this menu
                        settings.beforeShow.call(this);
                        $(this).parents(settings.triggerParentEl).addClass('dropit-open').find(settings.submenuEl).show();
                        settings.afterShow.call(this);

                        return false;
                    });

                    // Close if outside click
                    $(document).on('click', function(){
                        settings.beforeHide.call(this);
                        $('.dropit-open').removeClass('dropit-open').find('.dropit-submenu').hide();
                        settings.afterHide.call(this);
                    });

                    // If hover
                    if(settings.action == 'mouseenter'){
                        $el.on('mouseleave', '.dropit-open', function(){
                            settings.beforeHide.call(this);
                            $(this).removeClass('dropit-open').find(settings.submenuEl).hide();
                            settings.afterHide.call(this);
                        });
                    }

                    settings.afterLoad.call(this);
                });
            }

        };

        if (methods[method]) {
            return methods[method].apply(this, Array.prototype.slice.call(arguments, 1));
        } else if (typeof method === 'object' || !method) {
            return methods.init.apply(this, arguments);
        } else {
            $.error( 'Method "' +  method + '" does not exist in dropit plugin!');
        }

    };

    $.fn.dropit.defaults = {
        action: 'mouseenter', // The open action for the trigger
        submenuEl: 'ul', // The submenu element
        triggerEl: 'a', // The trigger element
        triggerParentEl: 'li', // The trigger parent element
        afterLoad: function(){}, // Triggers when plugin has loaded
        beforeShow: function(){}, // Triggers before submenu is shown
        afterShow: function(){}, // Triggers after submenu is shown
        beforeHide: function(){}, // Triggers before submenu is hidden
        afterHide: function(){} // Triggers before submenu is hidden
    };

    $.fn.dropit.settings = {};

 })(jQuery);
--- a/solr_config/velocity/js/jquery.autocomplete.js
+++ b/solr_config/velocity/js/jquery.autocomplete.js
@@ -0,0 +1,763 @@
 /*
 * Autocomplete - jQuery plugin 1.1pre
 *
 * Copyright (c) 2007 Dylan Verheul, Dan G. Switzer, Anjesh Tuladhar, Jörn Zaefferer
 *
 * Dual licensed under the MIT and GPL licenses:
 *   http://www.opensource.org/licenses/mit-license.php
 *   http://www.gnu.org/licenses/gpl.html
 *
 * Revision: Id: jquery.autocomplete.js 5785 2008-07-12 10:37:33Z joern.zaefferer $
 *
 */

 ;(function($) {
  
 $.fn.extend({
  autocomplete: function(urlOrData, options) {
    var isUrl = typeof urlOrData == "string";
    options = $.extend({}, $.Autocompleter.defaults, {
      url: isUrl ? urlOrData : null,
      data: isUrl ? null : urlOrData,
      delay: isUrl ? $.Autocompleter.defaults.delay : 10,
      max: options && !options.scroll ? 10 : 150
    }, options);
    
    // if highlight is set to false, replace it with a do-nothing function
    options.highlight = options.highlight || function(value) { return value; };
    
    // if the formatMatch option is not specified, then use formatItem for backwards compatibility
    options.formatMatch = options.formatMatch || options.formatItem;
    
    return this.each(function() {
      new $.Autocompleter(this, options);
    });
  },
  result: function(handler) {
    return this.bind("result", handler);
  },
  search: function(handler) {
    return this.trigger("search", [handler]);
  },
  flushCache: function() {
    return this.trigger("flushCache");
  },
  setOptions: function(options){
    return this.trigger("setOptions", [options]);
  },
  unautocomplete: function() {
    return this.trigger("unautocomplete");
  }
 });

 $.Autocompleter = function(input, options) {

  var KEY = {
    UP: 38,
    DOWN: 40,
    DEL: 46,
    TAB: 9,
    RETURN: 13,
    ESC: 27,
    COMMA: 188,
    PAGEUP: 33,
    PAGEDOWN: 34,
    BACKSPACE: 8
  };

  // Create $ object for input element
  var $input = $(input).attr("autocomplete", "off").addClass(options.inputClass);

  var timeout;
  var previousValue = "";
  var cache = $.Autocompleter.Cache(options);
  var hasFocus = 0;
  var lastKeyPressCode;
  var config = {
    mouseDownOnSelect: false
  };
  var select = $.Autocompleter.Select(options, input, selectCurrent, config);
  
  var blockSubmit;
  
  // prevent form submit in opera when selecting with return key
  $.browser.opera && $(input.form).bind("submit.autocomplete", function() {
    if (blockSubmit) {
      blockSubmit = false;
      return false;
    }
  });
  
  // only opera doesn't trigger keydown multiple times while pressed, others don't work with keypress at all
  $input.bind(($.browser.opera ? "keypress" : "keydown") + ".autocomplete", function(event) {
    // track last key pressed
    lastKeyPressCode = event.keyCode;
    switch(event.keyCode) {
    
      case KEY.UP:
        event.preventDefault();
        if ( select.visible() ) {
          select.prev();
        } else {
          onChange(0, true);
        }
        break;
        
      case KEY.DOWN:
        event.preventDefault();
        if ( select.visible() ) {
          select.next();
        } else {
          onChange(0, true);
        }
        break;
        
      case KEY.PAGEUP:
        event.preventDefault();
        if ( select.visible() ) {
          select.pageUp();
        } else {
          onChange(0, true);
        }
        break;
        
      case KEY.PAGEDOWN:
        event.preventDefault();
        if ( select.visible() ) {
          select.pageDown();
        } else {
          onChange(0, true);
        }
        break;
      
      // matches also semicolon
      case options.multiple && $.trim(options.multipleSeparator) == "," && KEY.COMMA:
      case KEY.TAB:
      case KEY.RETURN:
        if( selectCurrent() ) {
          // stop default to prevent a form submit, Opera needs special handling
          event.preventDefault();
          blockSubmit = true;
          return false;
        }
        break;
        
      case KEY.ESC:
        select.hide();
        break;
        
      default:
        clearTimeout(timeout);
        timeout = setTimeout(onChange, options.delay);
        break;
    }
  }).focus(function(){
    // track whether the field has focus, we shouldn't process any
    // results if the field no longer has focus
    hasFocus++;
  }).blur(function() {
    hasFocus = 0;
    if (!config.mouseDownOnSelect) {
      hideResults();
    }
  }).click(function() {
    // show select when clicking in a focused field
    if ( hasFocus++ > 1 && !select.visible() ) {
      onChange(0, true);
    }
  }).bind("search", function() {
    // TODO why not just specifying both arguments?
    var fn = (arguments.length > 1) ? arguments[1] : null;
    function findValueCallback(q, data) {
      var result;
      if( data && data.length ) {
        for (var i=0; i < data.length; i++) {
          if( data[i].result.toLowerCase() == q.toLowerCase() ) {
            result = data[i];
            break;
          }
        }
      }
      if( typeof fn == "function" ) fn(result);
      else $input.trigger("result", result && [result.data, result.value]);
    }
    $.each(trimWords($input.val()), function(i, value) {
      request(value, findValueCallback, findValueCallback);
    });
  }).bind("flushCache", function() {
    cache.flush();
  }).bind("setOptions", function() {
    $.extend(options, arguments[1]);
    // if we've updated the data, repopulate
    if ( "data" in arguments[1] )
      cache.populate();
  }).bind("unautocomplete", function() {
    select.unbind();
    $input.unbind();
    $(input.form).unbind(".autocomplete");
  });
  
  
  function selectCurrent() {
    var selected = select.selected();
    if( !selected )
      return false;
    
    var v = selected.result;
    previousValue = v;
    
    if ( options.multiple ) {
      var words = trimWords($input.val());
      if ( words.length > 1 ) {
        v = words.slice(0, words.length - 1).join( options.multipleSeparator ) + options.multipleSeparator + v;
      }
      v += options.multipleSeparator;
    }
    
    $input.val(v);
    hideResultsNow();
    $input.trigger("result", [selected.data, selected.value]);
    return true;
  }
  
  function onChange(crap, skipPrevCheck) {
    if( lastKeyPressCode == KEY.DEL ) {
      select.hide();
      return;
    }
    
    var currentValue = $input.val();
    
    if ( !skipPrevCheck && currentValue == previousValue )
      return;
    
    previousValue = currentValue;
    
    currentValue = lastWord(currentValue);
    if ( currentValue.length >= options.minChars) {
      $input.addClass(options.loadingClass);
      if (!options.matchCase)
        currentValue = currentValue.toLowerCase();
      request(currentValue, receiveData, hideResultsNow);
    } else {
      stopLoading();
      select.hide();
    }
  };
  
  function trimWords(value) {
    if ( !value ) {
      return [""];
    }
    var words = value.split( options.multipleSeparator );
    var result = [];
    $.each(words, function(i, value) {
      if ( $.trim(value) )
        result[i] = $.trim(value);
    });
    return result;
  }
  
  function lastWord(value) {
    if ( !options.multiple )
      return value;
    var words = trimWords(value);
    return words[words.length - 1];
  }
  
  // fills in the input box w/the first match (assumed to be the best match)
  // q: the term entered
  // sValue: the first matching result
  function autoFill(q, sValue){
    // autofill in the complete box w/the first match as long as the user hasn't entered in more data
    // if the last user key pressed was backspace, don't autofill
    if( options.autoFill && (lastWord($input.val()).toLowerCase() == q.toLowerCase()) && lastKeyPressCode != KEY.BACKSPACE ) {
      // fill in the value (keep the case the user has typed)
      $input.val($input.val() + sValue.substring(lastWord(previousValue).length));
      // select the portion of the value not typed by the user (so the next character will erase)
      $.Autocompleter.Selection(input, previousValue.length, previousValue.length + sValue.length);
    }
  };

  function hideResults() {
    clearTimeout(timeout);
    timeout = setTimeout(hideResultsNow, 200);
  };

  function hideResultsNow() {
    var wasVisible = select.visible();
    select.hide();
    clearTimeout(timeout);
    stopLoading();
    if (options.mustMatch) {
      // call search and run callback
      $input.search(
        function (result){
          // if no value found, clear the input box
          if( !result ) {
            if (options.multiple) {
              var words = trimWords($input.val()).slice(0, -1);
              $input.val( words.join(options.multipleSeparator) + (words.length ? options.multipleSeparator : "") );
            }
            else
              $input.val( "" );
          }
        }
      );
    }
    if (wasVisible)
      // position cursor at end of input field
      $.Autocompleter.Selection(input, input.value.length, input.value.length);
  };

  function receiveData(q, data) {
    if ( data && data.length && hasFocus ) {
      stopLoading();
      select.display(data, q);
      autoFill(q, data[0].value);
      select.show();
    } else {
      hideResultsNow();
    }
  };

  function request(term, success, failure) {
    if (!options.matchCase)
      term = term.toLowerCase();
    var data = cache.load(term);
    data = null; // Avoid buggy cache and go to Solr every time 
    // recieve the cached data
    if (data && data.length) {
      success(term, data);
    // if an AJAX url has been supplied, try loading the data now
    } else if( (typeof options.url == "string") && (options.url.length > 0) ){
      
      var extraParams = {
        timestamp: +new Date()
      };
      $.each(options.extraParams, function(key, param) {
        extraParams[key] = typeof param == "function" ? param() : param;
      });
      
      $.ajax({
        // try to leverage ajaxQueue plugin to abort previous requests
        mode: "abort",
        // limit abortion to this input
        port: "autocomplete" + input.name,
        dataType: options.dataType,
        url: options.url,
        data: $.extend({
          q: lastWord(term),
          limit: options.max
        }, extraParams),
        success: function(data) {
          var parsed = options.parse && options.parse(data) || parse(data);
          cache.add(term, parsed);
          success(term, parsed);
        }
      });
    } else {
      // if we have a failure, we need to empty the list -- this prevents the the [TAB] key from selecting the last successful match
      select.emptyList();
      failure(term);
    }
  };
  
  function parse(data) {
    var parsed = [];
    var rows = data.split("\n");
    for (var i=0; i < rows.length; i++) {
      var row = $.trim(rows[i]);
      if (row) {
        row = row.split("|");
        parsed[parsed.length] = {
          data: row,
          value: row[0],
          result: options.formatResult && options.formatResult(row, row[0]) || row[0]
        };
      }
    }
    return parsed;
  };

  function stopLoading() {
    $input.removeClass(options.loadingClass);
  };

 };

 $.Autocompleter.defaults = {
  inputClass: "ac_input",
  resultsClass: "ac_results",
  loadingClass: "ac_loading",
  minChars: 1,
  delay: 400,
  matchCase: false,
  matchSubset: true,
  matchContains: false,
  cacheLength: 10,
  max: 100,
  mustMatch: false,
  extraParams: {},
  selectFirst: false,
  formatItem: function(row) { return row[0]; },
  formatMatch: null,
  autoFill: false,
  width: 0,
  multiple: false,
  multipleSeparator: ", ",
  highlight: function(value, term) {
    return value.replace(new RegExp("(?![^&;]+;)(?!<[^<>]*)(" + term.replace(/([\^\$\(\)\[\]\{\}\*\.\+\?\|\\])/gi, "\\$1") + ")(?![^<>]*>)(?![^&;]+;)", "gi"), "<strong>$1</strong>");
  },
    scroll: true,
    scrollHeight: 180
 };

 $.Autocompleter.Cache = function(options) {

  var data = {};
  var length = 0;
  
  function matchSubset(s, sub) {
    if (!options.matchCase) 
      s = s.toLowerCase();
    var i = s.indexOf(sub);
    if (options.matchContains == "word"){
      i = s.toLowerCase().search("\\b" + sub.toLowerCase());
    }
    if (i == -1) return false;
    return i == 0 || options.matchContains;
  };
  
  function add(q, value) {
    if (length > options.cacheLength){
      flush();
    }
    if (!data[q]){ 
      length++;
    }
    data[q] = value;
  }
  
  function populate(){
    if( !options.data ) return false;
    // track the matches
    var stMatchSets = {},
      nullData = 0;

    // no url was specified, we need to adjust the cache length to make sure it fits the local data store
    if( !options.url ) options.cacheLength = 1;
    
    // track all options for minChars = 0
    stMatchSets[""] = [];
    
    // loop through the array and create a lookup structure
    for ( var i = 0, ol = options.data.length; i < ol; i++ ) {
      var rawValue = options.data[i];
      // if rawValue is a string, make an array otherwise just reference the array
      rawValue = (typeof rawValue == "string") ? [rawValue] : rawValue;
      
      var value = options.formatMatch(rawValue, i+1, options.data.length);
      if ( value === false )
        continue;
        
      var firstChar = value.charAt(0).toLowerCase();
      // if no lookup array for this character exists, look it up now
      if( !stMatchSets[firstChar] ) 
        stMatchSets[firstChar] = [];

      // if the match is a string
      var row = {
        value: value,
        data: rawValue,
        result: options.formatResult && options.formatResult(rawValue) || value
      };
      
      // push the current match into the set list
      stMatchSets[firstChar].push(row);

      // keep track of minChars zero items
      if ( nullData++ < options.max ) {
        stMatchSets[""].push(row);
      }
    };

    // add the data items to the cache
    $.each(stMatchSets, function(i, value) {
      // increase the cache size
      options.cacheLength++;
      // add to the cache
      add(i, value);
    });
  }
  
  // populate any existing data
  setTimeout(populate, 25);
  
  function flush(){
    data = {};
    length = 0;
  }
  
  return {
    flush: flush,
    add: add,
    populate: populate,
    load: function(q) {
      if (!options.cacheLength || !length)
        return null;
      /* 
       * if dealing w/local data and matchContains than we must make sure
       * to loop through all the data collections looking for matches
       */
      if( !options.url && options.matchContains ){
        // track all matches
        var csub = [];
        // loop through all the data grids for matches
        for( var k in data ){
          // don't search through the stMatchSets[""] (minChars: 0) cache
          // this prevents duplicates
          if( k.length > 0 ){
            var c = data[k];
            $.each(c, function(i, x) {
              // if we've got a match, add it to the array
              if (matchSubset(x.value, q)) {
                csub.push(x);
              }
            });
          }
        }        
        return csub;
      } else 
      // if the exact item exists, use it
      if (data[q]){
        return data[q];
      } else
      if (options.matchSubset) {
        for (var i = q.length - 1; i >= options.minChars; i--) {
          var c = data[q.substr(0, i)];
          if (c) {
            var csub = [];
            $.each(c, function(i, x) {
              if (matchSubset(x.value, q)) {
                csub[csub.length] = x;
              }
            });
            return csub;
          }
        }
      }
      return null;
    }
  };
 };

 $.Autocompleter.Select = function (options, input, select, config) {
  var CLASSES = {
    ACTIVE: "ac_over"
  };
  
  var listItems,
    active = -1,
    data,
    term = "",
    needsInit = true,
    element,
    list;
  
  // Create results
  function init() {
    if (!needsInit)
      return;
    element = $("<div/>")
    .hide()
    .addClass(options.resultsClass)
    .css("position", "absolute")
    .appendTo(document.body);
  
    list = $("<ul/>").appendTo(element).mouseover( function(event) {
      if(target(event).nodeName && target(event).nodeName.toUpperCase() == 'LI') {
              active = $("li", list).removeClass(CLASSES.ACTIVE).index(target(event));
          $(target(event)).addClass(CLASSES.ACTIVE);            
          }
    }).click(function(event) {
      $(target(event)).addClass(CLASSES.ACTIVE);
      select();
      // TODO provide option to avoid setting focus again after selection? useful for cleanup-on-focus
      input.focus();
      return false;
    }).mousedown(function() {
      config.mouseDownOnSelect = true;
    }).mouseup(function() {
      config.mouseDownOnSelect = false;
    });
    
    if( options.width > 0 )
      element.css("width", options.width);
      
    needsInit = false;
  } 
  
  function target(event) {
    var element = event.target;
    while(element && element.tagName != "LI")
      element = element.parentNode;
    // more fun with IE, sometimes event.target is empty, just ignore it then
    if(!element)
      return [];
    return element;
  }

  function moveSelect(step) {
    listItems.slice(active, active + 1).removeClass(CLASSES.ACTIVE);
    movePosition(step);
        var activeItem = listItems.slice(active, active + 1).addClass(CLASSES.ACTIVE);
        if(options.scroll) {
            var offset = 0;
            listItems.slice(0, active).each(function() {
        offset += this.offsetHeight;
      });
            if((offset + activeItem[0].offsetHeight - list.scrollTop()) > list[0].clientHeight) {
                list.scrollTop(offset + activeItem[0].offsetHeight - list.innerHeight());
            } else if(offset < list.scrollTop()) {
                list.scrollTop(offset);
            }
        }
  };
  
  function movePosition(step) {
    active += step;
    if (active < 0) {
      active = listItems.size() - 1;
    } else if (active >= listItems.size()) {
      active = 0;
    }
  }
  
  function limitNumberOfItems(available) {
    return options.max && options.max < available
      ? options.max
      : available;
  }
  
  function fillList() {
    list.empty();
    var max = limitNumberOfItems(data.length);
    for (var i=0; i < max; i++) {
      if (!data[i])
        continue;
      var formatted = options.formatItem(data[i].data, i+1, max, data[i].value, term);
      if ( formatted === false )
        continue;
      var li = $("<li/>").html( options.highlight(formatted, term) ).addClass(i%2 == 0 ? "ac_even" : "ac_odd").appendTo(list)[0];
      $.data(li, "ac_data", data[i]);
    }
    listItems = list.find("li");
    if ( options.selectFirst ) {
      listItems.slice(0, 1).addClass(CLASSES.ACTIVE);
      active = 0;
    }
    // apply bgiframe if available
    if ( $.fn.bgiframe )
      list.bgiframe();
  }
  
  return {
    display: function(d, q) {
      init();
      data = d;
      term = q;
      fillList();
    },
    next: function() {
      moveSelect(1);
    },
    prev: function() {
      moveSelect(-1);
    },
    pageUp: function() {
      if (active != 0 && active - 8 < 0) {
        moveSelect( -active );
      } else {
        moveSelect(-8);
      }
    },
    pageDown: function() {
      if (active != listItems.size() - 1 && active + 8 > listItems.size()) {
        moveSelect( listItems.size() - 1 - active );
      } else {
        moveSelect(8);
      }
    },
    hide: function() {
      element && element.hide();
      listItems && listItems.removeClass(CLASSES.ACTIVE);
      active = -1;
    },
    visible : function() {
      return element && element.is(":visible");
    },
    current: function() {
      return this.visible() && (listItems.filter("." + CLASSES.ACTIVE)[0] || options.selectFirst && listItems[0]);
    },
    show: function() {
      var offset = $(input).offset();
      element.css({
        width: typeof options.width == "string" || options.width > 0 ? options.width : $(input).width(),
        top: offset.top + input.offsetHeight,
        left: offset.left
      }).show();
            if(options.scroll) {
                list.scrollTop(0);
                list.css({
          maxHeight: options.scrollHeight,
          overflow: 'auto'
        });
        
                if($.browser.msie && typeof document.body.style.maxHeight === "undefined") {
          var listHeight = 0;
          listItems.each(function() {
            listHeight += this.offsetHeight;
          });
          var scrollbarsVisible = listHeight > options.scrollHeight;
                    list.css('height', scrollbarsVisible ? options.scrollHeight : listHeight );
          if (!scrollbarsVisible) {
            // IE doesn't recalculate width when scrollbar disappears
            listItems.width( list.width() - parseInt(listItems.css("padding-left")) - parseInt(listItems.css("padding-right")) );
          }
                }
                
            }
    },
    selected: function() {
      var selected = listItems && listItems.filter("." + CLASSES.ACTIVE).removeClass(CLASSES.ACTIVE);
      return selected && selected.length && $.data(selected[0], "ac_data");
    },
    emptyList: function (){
      list && list.empty();
    },
    unbind: function() {
      element && element.remove();
    }
  };
 };

 $.Autocompleter.Selection = function(field, start, end) {
  if( field.createTextRange ){
    var selRange = field.createTextRange();
    selRange.collapse(true);
    selRange.moveStart("character", start);
    selRange.moveEnd("character", end);
    selRange.select();
  } else if( field.setSelectionRange ){
    field.setSelectionRange(start, end);
  } else {
    if( field.selectionStart ){
      field.selectionStart = start;
      field.selectionEnd = end;
    }
  }
  field.focus();
 };

 })(jQuery);
--- a/solr_config/velocity/js/jquery.tx3-tag-cloud.js
+++ b/solr_config/velocity/js/jquery.tx3-tag-cloud.js
@@ -0,0 +1,70 @@
 /*
 * ----------------------------------------------------------------------------
 * "THE BEER-WARE LICENSE" (Revision 42):
 * Tuxes3 wrote this file. As long as you retain this notice you
 * can do whatever you want with this stuff. If we meet some day, and you think
 * this stuff is worth it, you can buy me a beer in return Tuxes3
 * ----------------------------------------------------------------------------
 */
 (function($)
 {
  var settings;
    $.fn.tx3TagCloud = function(options)
    {

      //
      // DEFAULT SETTINGS
      //
      settings = $.extend({
        multiplier    : 1
      }, options);
      main(this);

    }

    function main(element)
    {
      // adding style attr
      element.addClass("tx3-tag-cloud");
      addListElementFontSize(element);
    }

    /**
     * calculates the font size on each li element 
     * according to their data-weight attribut
     */
    function addListElementFontSize(element)
    {
      var hDataWeight = -9007199254740992;
      var lDataWeight = 9007199254740992;
      $.each(element.find("li"), function(){
        cDataWeight = getDataWeight(this);
        if (cDataWeight == undefined)
        {
          logWarning("No \"data-weight\" attribut defined on <li> element");
        }
        else
        {
          hDataWeight = cDataWeight > hDataWeight ? cDataWeight : hDataWeight;
          lDataWeight = cDataWeight < lDataWeight ? cDataWeight : lDataWeight;
        }
      });
      $.each(element.find("li"), function(){
        var dataWeight = getDataWeight(this);
        var percent = Math.abs((dataWeight - lDataWeight)/(lDataWeight - hDataWeight));
        $(this).css('font-size', (1 + (percent * settings['multiplier'])) + "em");
      });

    }

    function getDataWeight(element)
    {
      return parseInt($(element).attr("data-weight"));
    }

    function logWarning(message)
    {
      console.log("[WARNING] " + Date.now() + " : " + message);
    }

 }(jQuery));
--- a/solr_config/velocity/layout.vm
+++ b/solr_config/velocity/layout.vm
@@ -0,0 +1,42 @@
 <html>
 <head>
  #parse("head.vm")
 </head>
  <body>
    <div id="header">
      <a href="#url_for_home"><img src="#{url_root}/img/solr.svg" id="logo" title="Solr"/></a> $resource.powered_file_search
    </div>

    <div id="header2" onclick="javascript:locale_select()">
      <ul class="menu">

        <li>
          <a href="#"><img src="#{url_for_solr}/admin/file?file=/velocity/img/globe_256.png&contentType=image/png" id="locale_pic" title="locale_select" width="30px" height="27px"/></a>
          <ul>
            <li><a href="#url_for_locale('fr_FR')" #if("#current_locale"=="fr_FR")class="hidden"#end>
              <img src="#{url_for_solr}/admin/file?file=/velocity/img/france_640.png&contentType=image/png" id="french_flag"  width="40px" height="40px"/>Fran&ccedil;ais</a></li>
            <li><a href="#url_for_locale('de_DE')" #if("#current_locale"=="de_DE")class="hidden"#end>
              <img src="#{url_for_solr}/admin/file?file=/velocity/img/germany_640.png&contentType=image/png" id="german_flag"  width="40px" height="40px"/>Deutsch</a></li>
            <li><a href="#url_for_locale('')" #if("#current_locale"=="")class="hidden"#end>
              <img src="#{url_for_solr}/admin/file?file=/velocity/img/english_640.png&contentType=image/png" id="english_flag"  width="40px" height="40px"/>English</a></li>
          </ul>
        </li>
      </ul>
    </div>

    #if($response.response.error.code)
      <div class="error">
        <h1>ERROR $response.response.error.code</h1>
        $response.response.error.msg
      </div>
    #else
      <div id="content">
        $content
      </div>
    #end

    <div id="footer">
      #parse("footer.vm")
    </div>
  </body>
 </html>
--- a/solr_config/velocity/macros.vm
+++ b/solr_config/velocity/macros.vm
@@ -0,0 +1,16 @@
 #macro(lensFilterSortOnly)?#if($response.responseHeader.params.getAll("fq").size() > 0)&#fqs($response.responseHeader.params.getAll("fq"))#end#sort($request.params.getParams('sort'))#end
 #macro(lensNoQ)#lensFilterSortOnly&type=#current_type#if("#current_locale"!="")&locale=#current_locale#end#end
 #macro(lensNoType)#lensFilterSortOnly#q#if("#current_locale"!="")&locale=#current_locale#end#end
 #macro(lensNoLocale)#lensFilterSortOnly#q&type=#current_type#end

 ## lens modified for example/files - to use fq from responseHeader rather than request, and #debug removed too as it is built into browse params now, also added type to lens
 #macro(lens)#lensNoQ#q#end

 ## Macros defined custom for the "files" example
 #macro(url_for_type $type)#url_for_home#lensNoType&type=$type#end
 #macro(current_type)#if($response.responseHeader.params.type)${response.responseHeader.params.type}#{else}all#end#end
 #macro(url_for_locale $locale)#url_for_home#lensNoLocale#if($locale!="")&locale=$locale#end&start=$page.start#end
 #macro(current_locale)$!{response.responseHeader.params.locale}#end

 ## Usage: #label(resource_key[, default_value]) - resource_key is used as label if no default value specified and no resource exists
 #macro(label $key $default)#if($resource.get($key).exists)${resource.get($key)}#else#if($default)$default#else${key}#end#end#end
--- a/solr_config/velocity/mime_type_lists.vm
+++ b/solr_config/velocity/mime_type_lists.vm
@@ -0,0 +1,68 @@
 #**
 *  Define some Mime-Types, short and long form
 *#

 ## MimeType to extension map for detecting file type
 ## and showing proper icon
 ## List of types match the icons in /solr/img/filetypes

 ## Short MimeType Names
 ## Was called $supportedtypes
 #set($supportedMimeTypes = "7z;ai;aiff;asc;audio;bin;bz2;c;cfc;cfm;chm;class;conf;cpp;cs;css;csv;deb;divx;doc;dot;eml;enc;file;gif;gz;hlp;htm;html;image;iso;jar;java;jpeg;jpg;js;lua;m;mm;mov;mp3;mpg;odc;odf;odg;odi;odp;ods;odt;ogg;pdf;pgp;php;pl;png;ppt;ps;py;ram;rar;rb;rm;rpm;rtf;sig;sql;swf;sxc;sxd;sxi;sxw;tar;tex;tgz;txt;vcf;video;vsd;wav;wma;wmv;xls;xml;xpi;xvid;zip")

 ## Long Form: map MimeType headers to our Short names
 ## Was called $extMap
 #set( $mimeExtensionsMap = {
   "application/x-7z-compressed": "7z",
   "application/postscript": "ai",
   "application/pgp-signature": "asc",
   "application/octet-stream": "bin",
   "application/x-bzip2": "bz2",
   "text/x-c": "c",
   "application/vnd.ms-htmlhelp": "chm",
   "application/java-vm": "class",
   "text/css": "css",
   "text/csv": "csv",
   "application/x-debian-package": "deb",
   "application/msword": "doc",
   "message/rfc822": "eml",
   "image/gif": "gif",
   "application/winhlp": "hlp",
   "text/html": "html",
   "application/java-archive": "jar",
   "text/x-java-source": "java",
   "image/jpeg": "jpeg",
   "application/javascript": "js",
   "application/vnd.oasis.opendocument.chart": "odc",
   "application/vnd.oasis.opendocument.formula": "odf",
   "application/vnd.oasis.opendocument.graphics": "odg",
   "application/vnd.oasis.opendocument.image": "odi",
   "application/vnd.oasis.opendocument.presentation": "odp",
   "application/vnd.oasis.opendocument.spreadsheet": "ods",
   "application/vnd.oasis.opendocument.text": "odt",
   "application/pdf": "pdf",
   "application/pgp-encrypted": "pgp",
   "image/png": "png",
   "application/vnd.ms-powerpoint": "ppt",
   "audio/x-pn-realaudio": "ram",
   "application/x-rar-compressed": "rar",
   "application/vnd.rn-realmedia": "rm",
   "application/rtf": "rtf",
   "application/x-shockwave-flash": "swf",
   "application/vnd.sun.xml.calc": "sxc",
   "application/vnd.sun.xml.draw": "sxd",
   "application/vnd.sun.xml.impress": "sxi",
   "application/vnd.sun.xml.writer": "sxw",
   "application/x-tar": "tar",
   "application/x-tex": "tex",
   "text/plain": "txt",
   "text/x-vcard": "vcf",
   "application/vnd.visio": "vsd",
   "audio/x-wav": "wav",
   "audio/x-ms-wma": "wma",
   "video/x-ms-wmv": "wmv",
   "application/vnd.ms-excel": "xls",
   "application/xml": "xml",
   "application/x-xpinstall": "xpi",
   "application/zip": "zip"
 })
--- a/solr_config/velocity/results.vm
+++ b/solr_config/velocity/results.vm
@@ -0,0 +1,20 @@
 <div id="facets">
  #parse("facets.vm")
 </div>


 <div id="results_list">
  <div class="pagination">
    <span class="results-found">$page.results_found</span> $resource.results_found_in.insert(${response.responseHeader.QTime})
    $resource.page_of.insert($page.current_page_number,$page.page_count)
  </div>

  #parse("results_list.vm")

  <div class="pagination">
    #link_to_previous_page
    <span class="results-found">$page.results_found</span> $resource.results_found.
    $resource.page_of.insert($page.current_page_number,$page.page_count)
    #link_to_next_page
  </div>
 </div>
--- a/solr_config/velocity/results_list.vm
+++ b/solr_config/velocity/results_list.vm
@@ -0,0 +1,21 @@
 <ul id="tabs">
  <li><a href="#url_for_type('all')" #if("#current_type"=="all")class="selected"#end>$resource.type.all ($response.response.facet_counts.facet_queries.all_types)</a></li>
  #foreach($type in $response.response.facet_counts.facet_fields.doc_type)
    #if($type.key)
      <li><a href="#url_for_type($type.key)" #if($type.value=="0")class="no_results"#end #if("#current_type"==$type.key)class="selected"#end> #label("type.${type.key}.label", $type.key) ($type.value)</a></li>
    #else
      #if($type.value > 0)
        <li><a href="#url_for_type('unknown')" #if("#current_type"=="unknown")class="selected"#end>$resource.type.unknown ($type.value)</a></li>
      #end
    #end
  #end
 </ul>


 <div id="results">
  #foreach($doc in $response.results)
    #parse("hit.vm")
  #end
 </div>


--- a/solr_import.sh
+++ b/solr_import.sh
@@ -0,0 +1,144 @@
 #!/bin/bash
 # @name: solr_import.sh
 # @version: 0.1
 # @creation_date: 2022-03-11
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: Runs imports of files into Solr indexes
 # @acknowledgements:
 # https://www.redhat.com/sysadmin/arguments-options-bash-scripts

 ############################################################
 # Subprograms                                              #
 ############################################################
 License()
 {
  echo 'Copyright 2022 Simon Bowie <ad7588@coventry.ac.uk>'
  echo
  echo 'Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:'
  echo
  echo 'The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.'
  echo
  echo 'THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.'
 }

 Help()
 {
   # Display Help
   echo "This script performs Solr import functions for different cores."
   echo
   echo "Syntax: solr_import.sh [-l|h|z|a|e|i|m|p|x|d|s|w]"
   echo "options:"
   echo "l     Print the MIT License notification."
   echo "h     Print this Help."
   echo "z     Index all."
   echo "a     Index ACTIVE folder."
   echo "e     Index EXPANDING folder."
   echo "i     Index INVISIBLE folder."
   echo "m     Index MULTI-SPECIES folder."
   echo "p     Index PISSING & LEAKING folder."
   echo "x     Index SECRET folder."
   echo "d     Index SELF-DEFENDING folder."
   echo "s     Index SURVIVING folder."
   echo "w     Index WORKING folder."
   echo
 }

 Import()
 {
  docker exec -it solr bin/solr delete -c $core

  docker exec -it solr solr create_core -c $core -d custom

  #docker exec -ti --user=solr solr bash -c "cp -r /opt/solr/example/files/conf/* /var/solr/data/$core/conf/"

  #docker restart solr

  sleep 30

  docker run --rm -v "$directory/$location:/$core" --network=host solr:latest post -c $core /$core
 }

 Import_recursive()
 {
  docker run --rm -v "$directory/$subdirectory:/$core" --network=host solr:latest post -c $core /$core
 }
 ############################################################
 ############################################################
 # Main program                                             #
 ############################################################
 ############################################################

 # Set variables
 directory="/Users/ad7588/projects/patent_site"

 # Get the options
 while getopts ":hlimzaespxdw" option; do
   case $option in
      l) # display License
        License
        exit;;
      h) # display Help
        Help
        exit;;
      z) # index all
        core="all"
        docker exec -it solr bin/solr delete -c $core
        docker exec -it solr solr create_core -c $core -d custom
        location="data/POP_Dataset_2022"
          for subdirectory in $location/*/
          do
            subdirectory=${subdirectory%*/}      # remove the trailing "/"
            Import_recursive
          done
        exit;;
      a) # index ACTIVE folder
        core="active"
        location="data/pop_rtfs/ACTIVE (160)"
        Import
        exit;;
      e) # index EXPANDING folder
        core="expanding"
        location="data/pop_rtfs/EXPANDING (169)"
        Import
        exit;;
      i) # index INVISIBLE folder
        core="invisible"
        location="data/pop_rtfs/IN.VISIBLE (204)"
        Import
        exit;;
      m) # index MULTI-SPECIES folder
        core="multispecies"
        location="data/pop_rtfs/MULTI-SPECIES (180)"
        Import
        exit;;
      p) # index PISSING & LEAKING folder
        core="pissing"
        location="data/pop_rtfs/PISSING & LEAKING (168)"
        Import
        exit;;
      x) # index SECRET folder
        core="secret"
        location="data/pop_rtfs/SECRET (92)"
        Import
        exit;;
      d) # index SELF-DEFENDING folder
        core="defending"
        location="data/pop_rtfs/SELF-DEFENDING (115)"
        Import
        exit;;
      s) # index SURVIVING folder
        core="surviving"
        location="data/pop_rtfs/SURVIVING (166)"
        Import
        exit;;
      w) # index WORKING folder
        core="working"
        location="data/pop_rtfs/WORKING (101)"
        Import
        exit;;
      \?) # Invalid option
        echo "Error: Invalid option"
        exit;;
   esac
 done
--- a/web/Dockerfile
+++ b/web/Dockerfile
@@ -0,0 +1,10 @@
 # syntax=docker/dockerfile:1
 FROM python:3.10.7-slim-buster

 RUN apt-get update -y && apt-get install -y imagemagick

 WORKDIR /code
 COPY requirements.txt requirements.txt
 RUN pip install -r requirements.txt
 COPY . .
 CMD ["python3", "-m", "flask", "run"]
--- a/web/app/__init__.py
+++ b/web/app/__init__.py
@@ -0,0 +1,34 @@
 # @name: __init__.py
 # @creation_date: 2022-09-07
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: Initialises the app, SQLAlchemy, and configuration variables
 # @acknowledgements:
 # https://www.digitalocean.com/community/tutorials/how-to-add-authentication-to-your-app-with-flask-login
 # Config stuff adapted from https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-iii-web-forms

 from flask import Flask
 from flask_moment import Moment
 import os

 # initiate Moment for datetime functions
 moment = Moment()

 def create_app():
    app = Flask(__name__)

    moment.init_app(app)

    # blueprint for main parts of app
    from .main import main as main_blueprint
    app.register_blueprint(main_blueprint)

    # blueprint for search parts of app
    from .search import search as search_blueprint
    app.register_blueprint(search_blueprint)

    # blueprint for random parts of app
    from .random import random as random_blueprint
    app.register_blueprint(random_blueprint)

    return app
--- a/web/app/main.py
+++ b/web/app/main.py
@@ -0,0 +1,16 @@
 # @name: main.py
 # @creation_date: 2022-09-07
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: Main route for index and other pages
 # @acknowledgements:
 # https://www.digitalocean.com/community/tutorials/how-to-add-authentication-to-your-app-with-flask-login

 from flask import Blueprint, render_template

 main = Blueprint('main', __name__)

 # route for index page
@main.route('/')
 def index():
    return render_template('index.html')
--- a/web/app/ops.py
+++ b/web/app/ops.py
@@ -0,0 +1,153 @@
 # @name: ops.py
 # @version: 0.1
 # @creation_date: 2022-09-08
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <simon.bowie.19@gmail.com>
 # @purpose: Performs functions against the European Patent Office's Open Patent Services (OPS) API
 # @acknowledgements:
 # OPS documented at https://www.epo.org/searching-for-patents/data/web-services/ops.html
 # OPS RESTful API specification at http://documents.epo.org/projects/babylon/eponet.nsf/0/F3ECDCC915C9BCD8C1258060003AA712/$File/ops_v3.2_documentation_-_version_1.3.18_en.pdf
 # OPS API functions list at https://developers.epo.org/ops-v3-2/apis

 import os
 import requests
 import base64
 from wand.image import Image

 # get config variables from OS environment variables: set in env file passed through Docker Compose
 ops_url = os.environ.get('OPS_URL')
 ops_url_images = os.environ.get('OPS_URL_IMAGES')
 consumer_key = os.environ.get('CONSUMER_KEY')
 consumer_secret = os.environ.get('CONSUMER_SECRET')

 def get_access_token():

    # OPS API credentials (details at http://documents.epo.org/projects/babylon/eponet.nsf/0/F3ECDCC915C9BCD8C1258060003AA712/$File/ops_v3.2_documentation_-_version_1.3.18_en.pdf)
    endpoint_url = ops_url + '3.2/auth/accesstoken'
    auth = consumer_key + ":" + consumer_secret
    auth_bytes = auth.encode("ascii")
    base64_bytes = base64.b64encode(auth_bytes)
    base64_string = base64_bytes.decode("ascii")

    # set up API call
    headers = {"Authorization": "Basic " + base64_string, "Content-Type": "application/x-www-form-urlencoded"}
    data = "grant_type=client_credentials"

    # give back result
    response = requests.post(endpoint_url, headers=headers, data=data)

    if response.status_code == 200:
        # turn the API response into useful Json
        json = response.json()
        access_token = json['access_token']

    return access_token

 def get_publication_details(doc_ref):

    access_token = get_access_token()

    # OPS API credentials (details at http://documents.epo.org/projects/babylon/eponet.nsf/0/F3ECDCC915C9BCD8C1258060003AA712/$File/ops_v3.2_documentation_-_version_1.3.16_en.pdf)
    endpoint_url = ops_url + 'rest-services/published-data/publication/docdb/' + doc_ref + '/biblio'

    # set up API call
    headers = {"Authorization": "Bearer " + access_token, "Accept": "application/json"}

    # get result
    response = requests.get(endpoint_url, headers=headers)

    output = {}

    if response.status_code == 200:
        # turn the API response into useful Json
        json = response.json()

        # for each invention title, check if it's in the original language
        try:
            json['ops:world-patent-data']['exchange-documents']['exchange-document']['bibliographic-data']['invention-title']
            invention_titles = json['ops:world-patent-data']['exchange-documents']['exchange-document']['bibliographic-data']['invention-title']
            try:
                invention_titles[1]
                for invention_title in invention_titles:
                    if invention_title['@lang'] is not None and invention_title['@lang'] != 'en':
                        output['original_title'] = invention_title['$']
            except KeyError:
                if invention_titles['@lang'] is not None and invention_titles['@lang'] != 'en':
                    output['original_title'] = invention_titles['$']
        except KeyError:
            pass

        # for each abstract, check if it's in the original language
        try:
            json['ops:world-patent-data']['exchange-documents']['exchange-document']['abstract']
            abstracts = json['ops:world-patent-data']['exchange-documents']['exchange-document']['abstract']
            try:
                abstracts[1]
                for abstract in abstracts:
                    if abstract['@lang'] is not None and abstract['@lang'] != 'en':
                        output['original_abstract'] = abstract['p']['$']
            except KeyError:
                if abstracts['@lang'] is not None and abstracts['@lang'] != 'en':
                    output['original_abstract'] = abstracts['p']['$']
        except KeyError:
            pass

    return output

 def get_images(doc_ref):

    access_token = get_access_token()

    # OPS API credentials (details at http://documents.epo.org/projects/babylon/eponet.nsf/0/F3ECDCC915C9BCD8C1258060003AA712/$File/ops_v3.2_documentation_-_version_1.3.16_en.pdf)
    endpoint_url = ops_url + 'rest-services/published-data/publication/docdb/' + doc_ref + '/images'

    # set up API call
    headers = {"Authorization": "Bearer " + access_token, "Accept": "application/json"}

    # give back result
    response = requests.get(endpoint_url, headers=headers)

    if response.status_code == 200:

        output = {}
        drawings_url = {}

        # turn the API response into useful Json
        json = response.json()

        try:
            json['ops:world-patent-data']['ops:document-inquiry']['ops:inquiry-result']['ops:document-instance']
            document_instances = json['ops:world-patent-data']['ops:document-inquiry']['ops:inquiry-result']['ops:document-instance']
            try:
                document_instances[1]
                for document_instance in document_instances:
                    if document_instance['@desc'] == 'Drawing':
                        drawings_url = ops_url_images + '3.2/rest-services/' +  document_instance['@link'] + '?Range=1'
                if drawings_url is None:
                    for document_instance in document_instances:
                        if document_instance['@desc'] == 'FullDocument':
                            drawings_url = ops_url_images + '3.2/rest-services/' +  document_instance['@link'] + '?Range=1'
            except KeyError:
                pass

            if drawings_url is not None:

                # set up API call
                headers = {"Authorization": "Bearer " + access_token, "Accept": "application/tiff"}

                # give back result
                response = requests.get(drawings_url, headers=headers)

                if response.status_code == 200:
                    with Image(blob = response.content) as image:
                        png_blob = image.make_blob('png')
                        base64_bytes = base64.b64encode(png_blob)
                        output['image'] = base64_bytes.decode("ascii")

        except KeyError:
            pass

    else:
        output = False

    return output
--- a/web/app/random.py
+++ b/web/app/random.py
@@ -0,0 +1,62 @@
 # @name: random.py
 # @creation_date: 2022-09-09
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: random route for random
 # @acknowledgements:

 from flask import Blueprint, render_template, request
 from . import solr
 from . import ops

 random = Blueprint('random', __name__)

 # route for random page
@random.route('/random/')
 def random_record():
    core = 'all'
    results = solr.get_random_record(core)
    for result in results:
        publication_details = ops.get_publication_details(result['doc_ref'])
        result.update(publication_details)
        if ops.get_images(result['doc_ref']):
            image = ops.get_images(result['doc_ref'])
            result.update(image)
    return render_template('record.html', results=results)

 # route for comparing two random records
@random.route('/random/two/')
 def two_random_records():
    core = 'all'
    results_list = []
    i = 0
    while i <= 1:
        results = solr.get_random_record(core)
        for result in results:
            publication_details = ops.get_publication_details(result['doc_ref'])
            result.update(publication_details)
            if ops.get_images(result['doc_ref']):
                image = ops.get_images(result['doc_ref'])
                result.update(image)
        results_list.append(result)
        i += 1
    return render_template('compare.html', results=results_list)

 # route for getting ten random titles
@random.route('/random/titles/')
 def ten_random_titles():
    titles = solr.get_ten_random_elements('title')
    additional_titles = solr.get_ten_random_elements('title')
    return render_template('titles.html', titles=titles, additional_titles=additional_titles)

 # route for getting ten random abstracts
@random.route('/random/abstracts/')
 def ten_random_abstracts():
    abstracts = solr.get_ten_random_elements('abstract')
    return render_template('abstracts.html', abstracts=abstracts)

 # route for getting ten random images
@random.route('/random/images/')
 def ten_random_images():
    results = solr.get_ten_random_images()
    return render_template('images.html', results=results)
--- a/web/app/search.py
+++ b/web/app/search.py
@@ -0,0 +1,51 @@
 # @name: search.py
 # @creation_date: 2022-09-07
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: search route for search
 # @acknowledgements:
 # https://www.digitalocean.com/community/tutorials/how-to-add-authentication-to-your-app-with-flask-login

 from flask import Blueprint, render_template, request
 from . import solr
 from . import ops

 search = Blueprint('search', __name__)

 # route for search page
@search.route('/search/', methods=['POST'])
 def basic_search():
    search = request.form.get('search')
    if request.form.get('core') is not None:
        core = request.form.get('core')
    else:
        core = 'all'
    if request.form.get('sort') is not None:
        sort = request.form.get('sort')
    else:
        sort = 'relevance'
    results = solr.solr_search(core, sort, search)
    return render_template('search.html', results=results, search=search, core=core, sort=sort)

 # route for id_search page
@search.route('/search/id/')
 def id_search():
    if request.args.get('core') is not None:
        core = request.args.get('core')
    else:
        core = 'all'
    if request.args.get('sort') is not None:
        sort = request.args.get('sort')
    else:
        sort = 'relevance'
    id = request.args.get('id')
    results = solr.solr_search(core, sort, search, id)

    for result in results:
        publication_details = ops.get_publication_details(result['doc_ref'])
        result.update(publication_details)
        if ops.get_images(result['doc_ref']):
            image = ops.get_images(result['doc_ref'])
            result.update(image)

    return render_template('record.html', results=results)
--- a/web/app/solr.py
+++ b/web/app/solr.py
@@ -0,0 +1,145 @@
 # @name: solr.py
 # @version: 0.1
 # @creation_date: 2022-09-07
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <simon.bowie.19@gmail.com>
 # @purpose: Performs Solr functions
 # @acknowledgements:

 import os
 import requests
 import re
 import urllib
 import random
 from . import ops

 # get config variables from OS environment variables: set in env file passed through Docker Compose
 solr_hostname = os.environ.get('SOLR_HOSTNAME')
 solr_port = os.environ.get('SOLR_PORT')

 def solr_search(core, sort, search=None, id=None):

    # Assemble a query string to send to Solr. This uses the Solr hostname from config.env. Solr's query syntax can be found at many sites including https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html
    if id is not None:
        solrurl = 'http://' + solr_hostname + ':' + solr_port + '/solr/' + core + '/select?q.op=OR&q=id%3A"' + id + '"&wt=json'
    else:
        if (sort == 'relevance'):
            solrurl = 'http://' + solr_hostname + ':' + solr_port + '/solr/' + core + '/select?q.op=OR&q=content%3A' + urllib.parse.quote_plus(search) + '&wt=json'
        else:
            solrurl = 'http://' + solr_hostname + ':' + solr_port + '/solr/' + core + '/select?q.op=OR&q=content%3A' + urllib.parse.quote_plus(search) + '&wt=json&sort=' + sort

    # get result
    request = requests.get(solrurl)
    # turn the API response into useful Json
    json = request.json()

    if (json['response']['numFound'] == 0):
        output = 'no results found'
    else:
        output = []
        for result in json['response']['docs']:
            # set ID variable
            id = result['id']
            # set content variable
            content = result['content']
            # parse result
            result_output = parse_result(id, content)
            output.append(result_output)
    return output

 def parse_result(id, input):

    output = {}

    output['id'] = id

    # set document reference number (used for OPS API)
    doc_ref = re.search('=D\s(([^\s]*)\s([^\s]*)\s([^\s]*))', input)
    if doc_ref is None:
        doc_ref = re.search('=D&locale=en_EP\s(([^\s]*)\s([^\s]*)\s([^\s]*))', input)
        output['doc_ref'] = doc_ref.group(1).replace(" ","")
    else:
        output['doc_ref'] = doc_ref.group(1).replace(" ","")

    # search for the application ID in the content element and display it
    application_id = re.search('Application.*\n(.*)\n', input)
    output['application_id'] = application_id.group(1)

    # search for the EPO publication URL in the content element and display it
    epo_publication = re.search('Publication.*\n(.*)\n', input)
    output['epo_publication_url'] = epo_publication.group(1)

    # search for the IPC publication URL in the content element and display it
    ipc_publication = re.search('IPC.*\n(.*)\n', input)
    output['ipc_publication_url'] = ipc_publication.group(1)

    # search for the title in the content element and display it
    title = re.search('Title.*\n(.*)\n', input)
    if title is not None:
        output['title'] = title.group(1)

    # search for the abstract in the content element and display it
    abstract = re.search('Abstract.*\n(.*)\n', input)
    if abstract is None:
        abstract = re.search('\(.\) \\n\\n(.*)\\n', input)
    if abstract is not None:
        output['abstract'] = abstract.group(1);

    # search for the year in the content element and display it
    year = re.search('=D[^\s]*\s[^\s]*\s[^\s]*\s[^\s]*\s(\d{4})', input)
    if year is not None:
        output['year'] = year.group(1)
    return output

 def get_random_record(core):

    rand = str(random.randint(0, 9999999))

    # Assemble a query string to send to Solr. This uses the Solr hostname from config.env. Solr's query syntax can be found at many sites including https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html
    solrurl = 'http://' + solr_hostname + ':' + solr_port + '/solr/' + core + '/select?q.op=OR&q=*%3A*&wt=json&sort=random_' + rand + '%20asc&rows=1'

    # get result
    request = requests.get(solrurl)
    # turn the API response into useful Json
    json = request.json()

    if (json['response']['numFound'] == 0):
        output = 'no results found'
    else:
        output = []
        for result in json['response']['docs']:
            # set ID variables
            id = result['id']
            # set content variable
            content = result['content']
            # parse result
            result_output = parse_result(id, content)
            output.append(result_output)
    return output

 def get_ten_random_elements(field):
    core = 'all'
    output = []
    i = 0
    while i <= 9:
        results = get_random_record(core)
        for result in results:
            if field in result:
                dict = {'id': result['id'], field: result[field]}
                output.append(dict)
                i += 1
    return output

 def get_ten_random_images():
    core = 'all'
    output = []
    i = 0
    while i <= 9:
        results = get_random_record(core)
        for result in results:
            if ops.get_images(result['doc_ref']):
                image = ops.get_images(result['doc_ref'])
                result.update(image)
                output.append(result)
                i += 1
    return output
--- a/web/app/static/js/main.js
+++ b/web/app/static/js/main.js
@@ -0,0 +1,9 @@
 /*
 # @name: main.js
 # @version: 0.1
 # @creation_date: 2022-09-07
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: JavaScript functions for various functions
 # @acknowledgements:
 */
--- a/web/app/static/styles/custom.css
+++ b/web/app/static/styles/custom.css
@@ -0,0 +1,10 @@
 /*
 # @name: custom.css
 # @version: 0.1
 # @creation_date: 2022-09-07
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: Custom CSS to override Bootstrap 5 defaults
 # @acknowledgements:
 # Bootstrap 5.1.3: https://getbootstrap.com/
 */
--- a/web/app/templates/abstracts.html
+++ b/web/app/templates/abstracts.html
@@ -0,0 +1,15 @@
 {% extends "base.html" %}

 {% block content %}

  {% for abstract in abstracts %}

    {{ abstract['abstract'] }}

    <br><br>

    <hr>

  {% endfor %}

 {% endblock %}
--- a/web/app/templates/base.html
+++ b/web/app/templates/base.html
@@ -0,0 +1,51 @@
 <!--
 # @name: base.html
 # @version: 0.1
 # @creation_date: 2022-09-07
 # @license: The MIT License <https://opensource.org/licenses/MIT>
 # @author: Simon Bowie <ad7588@coventry.ac.uk>
 # @purpose: Basic layout for all pages
 # @acknowledgements:
 # https://www.digitalocean.com/community/tutorials/how-to-make-a-web-application-using-flask-in-python-3
 # Bootstrap 5.1.3: https://getbootstrap.com/
 # Flask-Moment: https://flask-moment.readthedocs.io/en/latest/
 # Boostrap select: https://stackoverflow.com/questions/67942546/bootstrap-5-select-dropdown-with-the-multiple-attribute-collapsed

 -->

 <!DOCTYPE html>
 <html>
 <head>
  <title>Performing Patents Otherwise: Archival conversations with 320.000 clothing inventions</title>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <!-- Bootstrap CSS -->
  <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">
  <link href="{{ url_for('static',filename='styles/custom.css') }}" rel="stylesheet">
  <!-- JavaScript -->
  <script src="https://code.jquery.com/jquery-3.6.0.js" integrity="sha256-H+K7U5CnXl1h5ywQfKtSj8PCmoN9aaq30gDh27Xc0jk=" crossorigin="anonymous"></script>
  <script src="{{ url_for('static',filename='js/main.js') }}"></script>
  <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-ka7Sk0Gln4gmtz2MlQnikT1wXgYsOg+OMhuP+IlRH9sENBO0LRn5q+8nbTov4+1p" crossorigin="anonymous"></script>
 </head>

 <body class="d-flex flex-column min-vh-100">

  <main class="flex-shrink-0">
    <div class="container-fluid p-5 my-5 border">

          {% block content %}
          {% endblock %}

      </div>
    </main>

    <footer class="footer py-3 mt-auto bg-light">
        <div class="container">
            <span class="text-muted">Data from the <a href="https://www.epo.org/">European Patent Office's</a> <a href="https://worldwide.espacenet.com/">Espacenet patent search engine</a> and reconfigured by Goldsmiths, University of London's Archival Conversations project.</span>
        </div>
    </footer>

 </body>

 </html>
--- a/web/app/templates/compare.html
+++ b/web/app/templates/compare.html
@@ -0,0 +1,71 @@
 {% extends "base.html" %}

 {% block content %}

  <div class="row">
  {% for result in results %}
    <div class="col-6 text-center">

    Application ID:

    <a href="/search/id?id={{ result['id'] }}&core=all">
      {{ result['application_id'] }}
    </a>

    <br><br>

    Year:

    {{ result['year'] }}

    <br><br>

    EPO publication:

    <a href="{{ result['epo_publication_url'] }}">
      {{ result['epo_publication_url'] }}
    </a>

    <br><br>

    IPC publication:

    <a href="{{ result['ipc_publication_url'] }}">
      {{ result['ipc_publication_url'] }}
    </a>

    <br><br>

    {% if result['title'] is defined %}
      Title:
      {{ result['title'] }}
      <br><br>
    {% endif %}

    {% if result['original_title'] is defined %}
      Original language title:
      {{ result['original_title'] }}
      <br><br>
    {% endif %}

    {% if result['abstract'] is defined %}
      Abstract:
      {{ result['abstract'] }}
      <br><br>
    {% endif %}

    {% if result['original_abstract'] is defined %}
      Original language abstract:
      {{ result['original_abstract'] }}
      <br><br>
    {% endif %}

    {% if result['image'] is defined %}
      <img class="img-fluid" src="data:image/jpg;base64,{{ result['image'] }}" alt="Drawing of patent" />'
    {% endif %}
    
    </div>
  {% endfor %}
  </div>

 {% endblock %}
--- a/web/app/templates/images.html
+++ b/web/app/templates/images.html
@@ -0,0 +1,11 @@
 {% extends "base.html" %}

 {% block content %}

  {% for result in results %}

    <img class="img-fluid" src="data:image/jpg;base64,{{ result['image'] }}" alt="Drawing accompanying patent for{{ result['title'] }}" />'

  {% endfor %}

 {% endblock %}
--- a/web/app/templates/index.html
+++ b/web/app/templates/index.html
@@ -0,0 +1,46 @@
 {% extends "base.html" %}

 {% block content %}

 <div class="row">
  <div class="col text-center">
    <p class="h1">Performing Patents Otherwise</p>
    <p class="h2">Archival conversations with 320,000 clothing inventions</p>
  </div>
 </div>

 <div class="row justify-content-center p-3">
  <div class="col-sm-6 text-center">
    <form action="/search" method="POST">
      <input type="text" name="search" placeholder="search for a patent record">
      <input type="submit" id="submit" value="search">
    </form>
  </div>
 </div>


 <div class="row p-3">
  <div class="col text-center">
    <a href="/random">show a random record</a>
  </div>

  <div class="col text-center">
    <a href="/random/two">compare two random records</a>
  </div>
 </div>

 <div class="row p-3">
  <div class="col text-center">
    <a href="/random/titles">ten random titles</a>
  </div>

  <div class="col text-center">
    <a href="/random/abstracts">ten random abstracts</a>
  </div>

  <div class="col text-center">
    <a href="/random/images">ten random images (takes a long time to load)</a>
  </div>
 </div>

 {% endblock %}
--- a/web/app/templates/record.html
+++ b/web/app/templates/record.html
@@ -0,0 +1,69 @@
 {% extends "base.html" %}

 {% block content %}

  {% for result in results %}
  <div id="result">

    Application ID:

    <a href="/search/id?id={{ result['id'] }}&core=all">
      {{ result['application_id'] }}
    </a>

    <br><br>

    Year:

    {{ result['year'] }}

    <br><br>

    EPO publication:

    <a href="{{ result['epo_publication_url'] }}">
      {{ result['epo_publication_url'] }}
    </a>

    <br><br>

    IPC publication:

    <a href="{{ result['ipc_publication_url'] }}">
      {{ result['ipc_publication_url'] }}
    </a>

    <br><br>

    {% if result['title'] is defined %}
      Title:
      {{ result['title'] }}
      <br><br>
    {% endif %}

    {% if result['original_title'] is defined %}
      Original language title:
      {{ result['original_title'] }}
      <br><br>
    {% endif %}

    {% if result['abstract'] is defined %}
      Abstract:
      {{ result['abstract'] }}
      <br><br>
    {% endif %}

    {% if result['original_abstract'] is defined %}
      Original language abstract:
      {{ result['original_abstract'] }}
      <br><br>
    {% endif %}

    {% if result['image'] is defined %}
      <img class="img-fluid" src="data:image/jpg;base64,{{ result['image'] }}" alt="Drawing of patent" />'
    {% endif %}

  </div>
  {% endfor %}

 {% endblock %}
--- a/web/app/templates/search.html
+++ b/web/app/templates/search.html
@@ -0,0 +1,95 @@
 {% extends "base.html" %}

 {% block content %}

 <div class="row p-3">
  <form action="/search" method="POST">
    <input type="hidden" name="search" value="{{ search }}">
    <input type="hidden" name="searchopt" value="{{ core }}">
    sort by:
    <select name="sort" id="sort" onchange="this.form.submit()">
      <option value="relevance" {% if sort == 'relevance' %} selected {% endif %}>relevance</option>
      <option value="year desc" {% if sort == 'year desc' %} selected {% endif %}>year descending</option>
      <option value="year asc" {% if sort == 'year asc' %} selected {% endif %}>year ascending</option>
    </select>
    <noscript>
      <input type="submit" class="btn btn-default" value="Set" />
    </noscript>
  </form>
 </div>

 {% if results == 'no results found' %}

  {{ results }}

 {% else %}

  {% for result in results %}

    Application ID:

    <a href="/search/id?id={{ result['id'] }}&core=all">
      <span class="result-entry">
        {{ result['application_id'] }}
      </span>
    </a>

    <br><br>

    Year:

    {{ result['year'] }}

    <br><br>

    EPO publication:

    <a href="{{ result['epo_publication_url'] }}">
        {{ result['epo_publication_url'] }}
    </a>

    <br><br>

    IPC publication:

    <a href="{{ result['ipc_publication_url'] }}">
        {{ result['ipc_publication_url'] }}
    </a>

    <br><br>

    {% if result['title'] is defined %}
      Title:
      <span class="result-entry">
        {{ result['title'] }}
      </span>
      <br><br>
    {% endif %}

    {% if result['abstract'] is defined %}
      Abstract:
      <span class="result-entry">
        {{ result['abstract'] }}
      </span>
      <br><br>
    {% endif %}

      <hr>

  {% endfor %}

 {% endif %}

 <script>
  let search_string = "{{ search }}";
  const search_array = search_string.split(" ");
  for (const term of search_array){
    $("span[class=result-entry]:contains('" + term + "')").html(function(_, html) {
      var replace = "(" + term + ")";
      var re = new RegExp(replace, "g");
      return html.replace(re, '<span style="color:orange">$1</span>');
    });
  }
 </script>

 {% endblock %}
--- a/web/app/templates/titles.html
+++ b/web/app/templates/titles.html
@@ -0,0 +1,42 @@
 {% extends "base.html" %}

 {% block content %}

  <button class="float-end btn btn-danger" onclick="removeRandomTitle()">-</button>
  <button class="float-end btn btn-danger" onclick="addRandomTitle()">+</button>

  {% for title in titles %}

    <span class="title">
      <a href="/search/id?id={{ title['id'] }}&core=all">
        {{ title['title'] }}
      </a>
    </span>

    <br><br>

    <hr>

  {% endfor %}

 <script>
  var titles = {{ additional_titles|tojson }};

  x = 0;

  function addRandomTitle(){
    var record_array = titles[x];
    document.querySelector('.container-fluid').innerHTML += "<a href='/search/id?id=" + record_array['id'] + "&core=all'><span class='title'>" + record_array['title'] + "</span></a><br><br><hr>";
    x++;
  }

  function removeRandomTitle() {
    var elts = document.getElementsByClassName("title");
    var RandomSpan = elts[Math.floor(Math.random() * elts.length)];
    var TextReplacement = RandomSpan.textContent.replace(/\w/g,"-");
    RandomSpan.removeAttribute("href");
    RandomSpan.innerHTML = TextReplacement;
  }
 </script>

 {% endblock %}
--- a/web/content/about.md
+++ b/web/content/about.md
@@ -0,0 +1,23 @@
 Version 0.1 of the Experimental Publishing Compendium was edited by members of COPIM’s experimental publishing group formally known as Work Package Six. Since then a great many contributors⁠, from tool and technology makers to authors, designers and publishers have contributed, slowly transforming the Compendium from an edited volume to a collective resource.

 The Compendium is © 2022–2022 [COPIM](https://copim.ac.uk) and licensed under a [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) to make it open for reuse and disappropriation.

 ... List all contributors....

 The Compendium is designed to be periodically updated, growing with the practices it aims to catalogue and support. Keeping the Compendium updated takes labour, care and attention and like any processual book it will die at some point. Currently, the Compendium is hosted by the Centre for Postdigital Cultures, Coventry University and is version 0.1.

 ## Preface

 We — the editors of this compendium — do not wish to impose one version of experimental publishing, yet we recognise that a collection such as this is necessarily biased and thus political. In this preface to the first version, we are sharing how this particular version of the compendium came about, in the hope that this will open the compendium for amendments by those who maintain and use it.

 The COPIM experimental publishing group, formerly known as work package six, worked for three-and-a-half-years on experimental publishing, in the context of the largely Anglo-American Community-led Open Publication Infrastructures for Monographs Project, COPIM. At a time when commercial consolidation threatened to monopolise the emerging scholarly Open Access publishing landscape, COPIM gathered publishers, libraries and infrastructure providers to develop community-owned infrastructure that can support small and large players. Open infrastructure, we proposed as an alternative to proprietary platforms that extract value and control access. Under the banner of scaling small, COPIM worked towards a diverse publishing landscape characterised by community ownership, collective production and governance, scholar-led publishing, and the sharing of resources and open infrastructures amongst diverse institutions. COPIM’s work packages were largely dedicated to serious infrastructure building, with the exception of the experimental publishing group, which grants the question how experimental publishing contributes to the ambition to establish infrastructures that allow diverse small initiatives to proliferate at scale?

 The closely related metaphors of publishing landscape, ecology, ecosystems or bibliodiversity shaped COPIM’s work. Staying with these images of lively and abundant interdependence allows us to locate experimental publishing’s place in scholarly knowledge production. Speaking of publishing ecologies implies that scholarly publishing cannot be separated from the wider academic landscape. How scholarly work is published cannot be separated from how it is funded, conceptualised, written, valued, reviewed, rewarded, read and taught. In this metaphor, scholarly works, like all specimen, coevolve with the environment they inhabit.

 Many things can be said about this environment: the contemporary academy. There isn’t one academy for starters. Opinions and politics differ, so do the stakes and subject position of the beholders.

 We, invested in feminist techno-politics, yearn for more collective, inclusive, embodied, situated and caring modes of knowledge production. But the notion that changes in publishing affect the entire scholarly landscape applies just as neatly to those, for example, who pursue scholarly excellence through competition and streamlining. Our point here, is that scholarly publishing ecologies reflect and materialise the wider scholarly landscape. Scholarly books, in this ecological view, are not containers of knowledge but relational nodes that materialise what does and doesn’t count as valuable practices, sites, labour, and subjects of knowledge.

 The flow of water is commonly used to model the flow of knowledge, taking us further into the question which forces shape the metaphorical scholarly landscape. Bureaucratic fantasies, enshrined in grant applications, project timetables and scholarly self-understanding and career paths imagine scholarly publishing at the end of an orderly pipeline of knowledge. The way that institutions such as libraries, universities, publishers, funders, and intellectual property regimes are organised tends to reinforce the notion of a manageable flow from funding, to research question, to investigation to publication to evaluation. The metaphor of channeled flow and the premises of contained stages provides structure. Channeling the flow of valuable knowledge, gives publishing a place and a form: the book, at the end of the pipe. But… you see it coming… where there are pipes there is , breakage, spillage and blockage. And… without overflow and contamination… there won’t be much to be piped. A sanitised scholarly landscape of industrial pipage is a nightmare, that evokes the very real nightmare streamlined industrial production has brought upon very real ecologies—leaving but scraplands for diversity which alone can ensure life. And... also… despite all efforts to establish well irrigated, drip-fed academies, the flow of scholarly knowledge is not easily channeled. Swamps, oceans, ice shields, underground currents, floods and drought prone rivers evoke alternative models of flow, that might inspire a diverse knowledge-scape that cannot be contained within the academy or otherwise.

 Coming back to experimental publishing, new forms of publication might create new kinds of pipes or spill-over into more relational circulation. Either way, we posit that experimental publishing is one of the sites where the shape of scholarly landscapes, and their relationship to other ecologies of knowledge and power is negotiated and materialised in practice. How we do publish matters. Experimenting with scholarly books is to experiment with scholarly modes of knowledge production. This labour of love, like other experimental practice, takes place at the growing edges and in the cracks of established practices, where by steady corrosion, underground commotion or capital intense incubation forms of writing, making, sharing, reviewing, discovering, reading and cataloging books come into being that will change what counts as scholarly work.
--- a/web/content/home.md
+++ b/web/content/home.md
@@ -0,0 +1,15 @@
 # ExPub Compendium

 The Experimental Publishing Compendium is for authors, designers, publishers, institutions and technologist who challenge, push and redefine the shape, form and rationale of scholarly works. The compendium offers a catalogues of tools, practices, publishers, and books to inspire experimental scholarly works.

 ## How to use the compendium

 The compendium catalogues potential ingredients for the making of experimental publications.

 - Under [Tools](/tools) you’ll find mostly software that supports experimental publication from collective writing to the annotation and remix of published texts.
 - [Practices](/practices) provide inspiration for experimental book making.
 - [Books](/books) and [publishers](/publishers) provides examples of experimental books and those who publish them.

 Each item is cross linked so that a practice will take you to relevant tools or examples and vice versa.

 The selection is tentative, reflecting the knowledge and biases of the contributors of current and previous versions. If you want to submit or edit anything in the catalogue please do so by doing XXthisXXX.
--- a/web/requirements.txt
+++ b/web/requirements.txt
@@ -0,0 +1,6 @@
 flask
 flask-moment
 gunicorn
 markdown
 requests
 Wand