REST can be considered a standard, however the general concepts leave a lot fo room for interpretation. In this article I am going to share some experiences and opinions as a result of working over a decade implementing them. A lot of this might seem like common sense to many people, however one would be surprised how often you run into anti patterns and generally bad implementations.
The cleanest way I have seen a client indicating what the preferred format should be (provided) there are multiple headers) is just to pass it along in the headers, specifically the ‘accept‘ and ‘content type‘ elements. As a personal preference, passing them along in the URL appears to be unecessary, I tend to keep the URLs as clean as possible.
Exception handling is a point where I have seen certain degree of variance. On more than one ocassion, I have observed implementations where an HTTP 200 status code is returned for a failed response, rather than using HTTP 500 an HTTP 400 respectively. Instead, a field in the response is used to indicate the actual status of the request as in the following example.
The above is bad, as consumers expect the HTTP code to reflect the state of the request. the HTTP fetch command and various libraries such as Axios can natively handle other than 200 response codes when properly presented.
My preferred approach is to return additional information in the response payload. Note that lines 13-22 both have the same 404 response code, however the message element in the response body allows the client to further understand the true nature of the response.
When it comes to versioning, there are two general approaches, and each work fairly well. The first approach involves embedding versioning information in the URL itself, for example ‘/v1/customers‘. I find that this tends to make the request URL a bit more verbose. The second approach requires to optionally specify a minimum required version in the request headers. Not is the actual request URL not concearned with the version related aspects, new clients can simply just make the request and adopt the latest version of the API if not provided.
Both approaches ‘work‘, and you should choose which best fits your particular needs.
The important conversation that needs to occur is the versioning strategy, is in place, which semantic or major-minor-patch appears to be the no brainer, and that expectations are set on the support lifecyles associated with each release. For external facing APIs, this can be tricky as you may have large numbers of clients that need to be coordinated with. As for internal facing APIs, this can equeally be as challenging as even though you may all be working for the same organization, different groups work against different project scehdules and against different priorities. Again, it is really important to drive these conversations early on in the design process. So if you are asked to create a new greenfield API, drive that conversation if needed as it may fall short of attention, and do so in the spirit of long term sustainability.
Validation is often an overlooked aspect of API implementations. While it is true that if you are working with an RDDBMS there will be inherintly some enforced data constraints, that should be a last resort.
Validation can be performed in many places, on the client, immediately when an API request is recieved, at some further point in the business logic, and will be enforced at some level in the RDBMS system. Where should one implement this, I would make the argument that it should occur in many places. Ideally on the client, however you may not have control over that.
In my opinion, performing validation on the request as soon as it is recieved makes the most sense. The argument here is that that aspect of validation is performed right up front and not overlooked. Field level validation is a preqiusite to any additional logic that might be attempted, and ensures there are no assumptions that are made on the state of the data submitted by the client prior to performing any additional work. In the Java world this is pretty easily accomplished via annotations. I find this to be clean, concise, easy to understand. In the NodeJS world as Javascript does not support annotations (referred to as decorators), Typescript does provide support for this, which compiles down into compatible Javascript. The following example illustrates the idea in Java. Basically the annotations are evaluated at runtime, and the process of identifying invalid data is fairly automated.
It‘s tedious, perhaps redundant if implementing on the client end, but I find quite necessary.
By default, REST based APIs imply that there is a one to one relationship between an entity and an API endpoint, or at least between and enpoint at the parent of an object graph. This really is not necessarily the case however. There are cases when you should create additional API endpoints to expose data. A golden rule of thumb is to remember that the more APIs, then more effort required and the more code base is introduced.
Let‘s consider the following. Given an object graph where there is the concept of a customer, orders, and a potential shopping cart, a call such as ‘/customer/[customer id]‘ which implements the entire object graph would be the simplest solution, but not effective. In my case, if this was directed against Amazon for my own personal customer record, this would result in an a lot of data, as well let‘s face it, I shop there all to often. A more effective approach would be to expose the data in more granular access endpoints such as ‘/customer/[customer id]/orders‘ ‘/customer/[customer id]/cart‘, etc. This implies that a lazy lading strategy might be best, however not always that case. This is very use case specific.
Special attention might be given to data where use cases exist for list operations, so think of the ‘master detail‘ use case scenario where the intial screen is used to display a subset of a given entity, and perhaps associated data. This might be an opportunity to create a dedicated API, order perhaps optimize the fetching stragey for the data to drive that call. So for example, a call to ‘/orders‘ might present a subset of the orders entities, and perhaps some associated data. There many not be an expectation that it simply just returns all attributes for all orders.
This is another important talking point. When designing an endpoint, an initial design consideration should be to implement or not implement support for pagination. It is easy to get headed down the wrong path here when working on a greenfield project where no data is available, or when not working with a realistic full data set. The point here is this question should be on your checklist when implementing the endpoint. It is far better to bak this in now, then to address it one a production level performance issue is encountered.
So, let‘s take a look at an use case where a user changes the status on a Work Order Entity as an example. The implementation might be as simple as an attribute such as ‘status‘ on that entity. While it would be possible to support this operation with a HTTP PUT or better yet a PATCH operation, it might make a lot more sense to abstract the details and provide an endpoint such as ‘/workOrder/100/complete‘, which indicates the action, however separates the concearns of the actual attribute that will be effected as a result.
Caching is a popular option for improving performance. Personally, I have found that in a disproprtunate number of cases, performance issues are related to poor interaction with the database, whether a missing index or innefcicient query. I would start my troubleshooting there. When and where to to cache is a question with no one exact answer or strategy to follow. There are many different places to implement caching.
An often overlooked approach is to perform this on the client iself. This is relatively easy to do if the client is under your control, especially if it is an HTML client, with a concept of client side state management.
Any attempt at caching introduces additional complexity, including a requirement for invalidating stale data, so it should be used with a great deal of thought, with an emphasis on not tuning or caching to prematurely.
As for Identifying caching candidates, prime candidates are highly accessed objects that don‘t frequently change. Next focus should be spent on objects that require a greater cost to retrieve, with a focus on frequency of access.
Wiring in metrics into your implementation provides value. Both from a performance and usage perspective. I would strongly suggest requiring a client unique identifier when possible, this will allow you to perhaps gather metrics and identify usage patterns. Let‘s say for example, you expose an endpoint that allows a user to locate an entity by a unique identifier, a common use case, however if a client has a reqiurement to make 10,000 requests to access data in bulk, then that might be a sign that you may need an additional endpoint to support the bulk lookup operation.
Additionally, this gives you visibility into performance issues as they occur in production environments. This is especially important when there is a potential for data to grow, during which performance appears ‘fine‘ until a certain threshold is hit.