EGF2 Edges Vs Search

Relations between objects can be retrieved in at least two ways in EGF2 - via get edge objects and via search endpoint.

Edge based retrieval is pretty limited:

  • There is no way to specify any other sort option. Sorting is done by the created_at timestamp of an edge, descending.
  • There is no way to filter objects that are returned.
  • There is no way to search for an object on edge

While in some cases getting edge objects this way is good enough, sometimes it is not really efficient. Consider a use case when we need to display a list of products for an a e-commerce website. Usually such listings should provide functionality that allows customers to sort the list based on different fields, filter it and also do a free form text search. For this use case using graph API edge retrieval is not good at all.

Search endpoint, on the other hand, provides desired flexibility in abundance. An index in ES can be added for Product object that will support any sorting and / or filtering scenarios along with free form text search capability.

There a couple of finer points related to using ES and search endpoint to work with object relations that should be mentioned. ElasticSearch is a search engine and, consequently, is very efficient on GET. Things are a bit different when you modify ES data. Modifying requests return fast with 200 result, but that doesn’t mean that changes are already propagated to all nodes of the cluster. There can be significant delay (on the scale of seconds) before changes are applied.

For example, let’s see what happens when a web app displays a list of objects using search endpoint and tries to remove an item from the list. Delete request will be completed fast, data in the DB layer will be updated within the request (that may not be true depending on consistency levels selected for Cassandra, I will address this question in a next post). ES index will be updated asynchronously via sync service that is listening to the system event queue. sync usually is capable of updating ES very fast, but updating ES cluster may (and does) take quite some time.

Let’s suppose web app wants to reload the list of objects immediately after deletion. GET request in many cases will return a list of objects that still contains a deleted object.

A similar troublesome situation arise when a new object is added to a list and list is reloaded in order to place newly added object within list sorting. Search endpoint will return a list without added object if request is sent immediately after addition (in many cases).

We recommend to deal with these situations as follows:

  • Deletion is easy - an object can be displayed in the list with deleted status for the time it takes ES to update the cluster or can be removed locally, without list reload. From the UX perspective it means that when the list is updated next time deleted object will disappear. Even though the object is still listed it will have field deleted_at present, allowing us to distinguish deleted objects (DB layer is updated immediately).
  • Addition is a bit tricky. In case the list of objects is sorted by the time of creation newly added object can be added to the top of the list locally, no need to reload the list. If the sorting is different adding objects becomes interesting. It can be that newly added object is not present on currently presented portion of the whole list. Generally speaking, client app may not be able to sort newly added object properly at all (sorting objects locally is not a good idea in case client side has access to partial data). One way to deal with it is to display the added object in a separate area and let user refresh the list manually at will. Obviously, addition issue is not entirely related to the search endpoint, it is made somewhat worse by ES update delays.

To reiterate, addition and deletion delays do not affect edge retrieval using graph API.

See Next - File Operations

comments powered by Disqus