ARC Cache IndeX - allows publishing of cache contents from several sites to an
index, which can be queried for data-aware brokering. It consists of two
components: a cache scanner which runs alongside A-REX and gathers cache content
information, and a cache index to which the scanner publishes the content using
a Bloom filter to reduce the data volume. Several cache servers can publish to
one index.

Required software:

  * Python. Only 2.6 and 2.7 have been tested and are supported.
  * Twisted Core and twisted web (python-twisted-core and python-twisted-web)
  * pyOpenSSL (package name python-openssl in Ubuntu)

ACIX Cache Scanner:
-----------------

This is the component which runs on each CE collecting cache information.
Usually no configuration is necessary, but it is possible to specify a custom
logfile location by setting the logfile parameter in arc.conf, like this:

---
[acix/cachescanner]
logfile="/tmp/arc-cacheserver.log"
---


Starting instructions:

/etc/init.d/acix-cache start

Update your rc* catalogs accordingly.

You can stop the daemon with:
$ /etc/init.d/acix-cache stop

You can inspect the log file to check that everything is running. It is located
at /var/log/arc/acix-cachescanner.log. An initial warning about the creation of zombie
process is typically generated (no zombie processes from the program has been
observed). If any zombie processes are observed, please file a bug report.

Send the URL at which your cache filter is located at, to the index admins(s).
Unless you changed anything in the configuration, this will be:
https://HOST_FQDN:5443/data/cache

This is important as the index server pulls the cache filter from your site
(the filter doesn't get registered automatically).

If you have both arex_mount_point and at least one cacheaccess rule defined in
arc.conf then the URL for remote cache access will be sent to the index,
otherwise just the hostname is used.


ACIX Index Server:
-----------------

This is the index of registered caches which is queried by users to discover
locations of cached files. To configure, edit /etc/arc.conf to include cache
server URLs corresponding to the sites to be indexed.

---
[acix/indexserver]
cachescanner="https://myhost:5443/data/cache"
cachescanner="https://anotherhost:5443/data/cache"
---

Starting instructions.

$ /etc/init.d/acix-index start

Update your rc* catalogs accordingly.

You can stop the daemon with:
$ /etc/init.d/acix-index stop

A log file is at /var/log/arc/acix-index.log. By default the index server will
listen on port 6443 (ssl+http) so you need to open this port (or the
configured port) in the firewall.

It is possible to configure port, use of ssl, and the index refresh interval.
See the indexsetup.py file (a bit of Python understanding is required).


Clients:
-------

To query an index server, construct a URL, like this:

https://cacheindex.ndgf.org:6443/data/index?url=http://www.nordugrid.org:80/data/echo.sh

Here you ask the index services located at https://cacheindex.ndgf.org:6443/data/index
for the location(s) of the file http://www.nordugrid.org:80/data/echo.sh

It is possible to query for multiple files by comma-seperating the files, e.g.:

index?url=http://www.nordugrid.org:80/data/echo.sh,http://www.nordugrid.org:80/data/echo.sh

Remember to quote/urlencode the strings when performing the get (wget and curl
will do this automatically, but most http libraries won't)

The result is a JSON encoded datastructure with the top level structure being a
dictionary/hash-table with the mapping: url -> [machines], where [machines] is
a list of the machines on which the files are cached. You should always use a
JSON parser to decode the result (the string might be escaped).

If a machine has enabled remote cache access then a URL at which cache files
may be accessed is shown, otherwise just the hostname is used. To access a
cached file remotely, simply append the URL of the original file to the cache
access endpoint and call HTTP GET (or use wget, curl, arccp,...), eg

https://arex.host/arex/cache/http://www.nordugrid.org:80/data/echo.sh

Some encoding of the original URL may be necessary depending on the tool you
use.
