Hightail’s move into the world of creative collaboration with Spaces made having a high-performance search feature a major priority. Unlike our traditional file sharing service, Spaces is not focused on simple A to B transfers (though it can be used for this purpose). Instead, it’s all about the less linear back-and-forth of creative collaboration.
When you collaborate using Spaces, you no longer just send a file and move on to the next one. You usually need to access them again and again in order to view and reply to comments and add new versions. You may also have more than one Space (which is like a visual folder for hosting files) on the go and be working on multiple projects at any one time.
As we began developing and using Spaces internally and our ALL SPACES views filled up fast, we knew it was time to add search functionality. One of the principles behind Spaces is that it is self-organizing – we’re trying to reduce the amount of administration that creative professionals need to do – so our search system needed to be smart, flexible and utilize existing information without requiring users to tag or otherwise categorize their files.
We decided to implement Elasticsearch, a technology based on Lucene that provides a distributed and scalable full text search service. This allows users to search using terms found in an individual Space name and description, a filename or even in comments. Elasticsearch’s use of resilient clusters and data-sharding makes it incredibly manageable and fast.
To connect Elasticsearch to our service, we used a two-layer approach. The first layer, which we’ll call the batch layer, periodically scans the entire database looking for Spaces, Files, Comments, and Contacts in chronological order by their last updated timestamp. The results are collected and sent over to Elasticsearch for indexing. The timestamp of the most recently collected data is remembered for the next iteration. This gives us the ability to index all our historical data as well as the means to recreate all our searchable data in the event of a code change or system failure.
The second layer, which we call the speed layer, leverages what has now become one of the most innovative and useful features of our API server, known as the audit event bus. By listening for specific types of events such as Space creation, File uploads, or Comment additions, we are able to capture the data in real time and send it over to Elasticsearch for indexing. This approach has the advantage of making your data immediately search-able.
A great feature of our new search service is auto-complete, which will suggest search terms to the user based on their unique context. To accomplish this we leveraged two Elasticsearch features known as edge n-gram tokenizers and completion suggesters. The choice of approach is specific to the particular use case and context.
By utilizing an edge n-gram tokenizer on searchable terms, phrases such as “quick brown” are broken up into terms including “q”, “qu”, “qui”, “quic”, “quick”, “b”, “br”, “bro”, and “brown”. These terms are anchored at the start of the word enabling rapid search-as-you-type operations.
Completion suggesters utilize a different approach. They are fed a list of all possible completions and build them into a finite state transducer, an optimized data structure resembling a big graph. To find auto-completion suggestions, Elasticsearch starts at the beginning of the graph and follows it character by character along the matching path. Once it reaches the end of the input term, it responds with all possible endings of the current path to produce a list of suggestions. This data structure lives in memory which makes prefix lookups very fast. Faster in fact than edge n-gram lookups.
With the recent launch of our new Search function, Hightail is continuing to make creative collaboration more effective. With the help of Elasticsearch (along with fantastic front-end work by Ben Ortel of Comedia Design), we’ve built a smart way for our users to find what they’re looking for quickly and with minimal effort.
Try the new Search bar and all our other great creative collaboration features at www.hightail.com.