Ephorus Fetch Federation
                The Ephorus Fetch Federation project created a
                    federation of web spidering services, coordinated by a centralized
                    controller. The project can process long lists of web sites and
                    builds an index using a logically endless number of machines to
                    retrieve, analyse and index text data. The controller distributes
                    work over many machines, choosing the ones that are not actively
                    working to start up the next site to be fetched. The fetching
                    process is self-sustaining, yet the process allows operator
                    intervention; for instance to dynamically add filters. Because of
                    the sheer size of the indexed data, persistence is completely
                    distributed and 'noSQL' . This project was created using core java
                    for the actual worker services and Servlet API 2.5 / Jersey REST for
                    the front-end. The controller uses a multicast-based service Lookup
                    borrowed from the Jini infrastructure; for the rest all
                    intra-service communication, including starting and stopping, is
                    done using REST API's. All data and indexes are stored in the Hadoop
                    HBase noSQL database. The Fetch Federation went live in the last
                    week of June 2011.
            
            
                Ephorus Teacher-UI
                The Ephorus Teacher-UI is the inbox for a
                    teacher to monitor the plagiarism detection process for documents
                    submitted by students. It connects to a distributed document
                    processing pipeline. Because of the massive amounts of documents
                    uploaded and scanned for plagiarism, the system uses a technique
                    called 'sharding'': it handles millions of uploads and downloads and
                    distributes document metadata over several database servers in a
                    deterministic and intuitive manner. It uses cloud storage for actual
                    document content. Besides being a web-ui it generates reports in PDF
                    and statistics in Excel format. This project was created based on
                    Servlet API 2.5; MySQL, a modified version of EclipseLink JPA to
                    support sharding, plain SQL for performance-sensitive operations,
                    WS-REST using Jersey, and plain Javascript and JQuery. The project
                    is has been handed over to the Ephorus product development group for
                    further integration with the Ephorus pipeline and to start migration
                    and the ramp-up process.
            
            
                Ephorus Search Component
                This component is a simple and small ATOM feed
                    / OpenSearch compliant search aggregator. The project supports
                    Ephorus' plagiarism detection process: it searches content using
                    many different search engines; internal ones and search engines
                    external to the organisation. The component is a simple Servlet API
                    2.5 wep application using no database at all. In order to have
                    maximum control, the component controls all threading concerns
                    itself and does not rely on any type of container-supplied resource
                    apart from the URL of the configuration. Its start configuration
                    isolates threading issues per search engine and each search engine
                    is configured for 1000 threads each. Configuration can be managed
                    centrally and changes are picked up with a 10 second delay without
                    restarting the service. The search component went live ithe 21st of
                    september 2011