Data Management

Basic rules for building and managing distributed data services

Distributed systems require developers to move away from the traditional three-tier approach. Take a look at what the considerations are for building an application that's composed of multiple services and how to manage data efficiently in this scenario.


By Tim Landgrave

The real benefit of distributed systems is the ability to architect applications that reuse the data and also the application logic from other systems. To design systems for reuse, you need to move away from the classic three-tier notion of a presentation, business, and data tier as basis for an application and, instead, begin considering applications as the integration of multiple services which may each have their own presentation (or interface), business, and data logic. You should think of a tier as a deployment unit of a service and as a functional module that can process a discrete set of information and return results to other services based on the contract defined in its public interface (presentation).

Designing services
Where you used to have one service per tier in a classic three-tier system, you now design systems that consume multiple services (e.g., a customer management service, a credit card service, an inventory management service, and an order processing service). These services have their own public interface, business logic, and data access logic. The public interface may be a GUI, HTML, COM, or WS interface. When used in an application, they may return scalar values, objects, or datasets that can be consumed by other services or by the user interface of an application.

The core business logic in a service is composed of a set of .NET classes that implement the functions declared in the public interface and manage the business rules that govern access and manage processes. The data access logic aggregates the data required by the business logic from multiple underlying data sources including SQL databases, message queues, Web services, legacy applications, or COM-based systems.

Services should talk to each other only via contracts defined by the public interfaces they expose. Once instantiated, service instances shouldn't have direct access to each other’s state, but, instead, should have their state managed by other services or by the application that’s utilizing them. But managing service and/or application state efficiently in a distributed system also requires that the architect take into account the location of the data required by each service.

When dealing with multiple services, many of which may be located in other application domains or even on other machines connected by slower links, better UI responsiveness or reduced local processing times will require some kind of local data caching. But unless you cache intelligently, you’ll spend more time compensating for transactions that fail because of stale data in the cache than you would have spent querying a remote data service on every call.

The basics of data access in services
There are some basic rules that you should follow when consuming data from data services that relate to both the data’s location and it’s frequency of updating. In addition to considering the location of the data, you must also consider whether the data is primarily read-only (RO) or read-write (RW).

Data such as product categories, sales territories, and even customers and products are mostly RO (unless the system you’re writing happens to be a customer or product management system). On the other hand, a system’s transactional data (e.g., invoices or orders) is mostly RW. To minimize data concurrency issues and provide the best UI response, you should cache and process RO data locally and send RW data to be processed by a remote data service that’s closest to its underlying data source.

In a distributed ordering system, you may have several services with which your user interface interfaces. These may include:
  • A customer service that provides customer names, addresses, and credit information.
  • An inventory service that provides item IDs, product descriptions, and current inventory status.
  • An order service that takes customer and product information and creates an order.

To use these distributed services efficiently, you may keep a local cache of customer and product information while you’re walking through the order creation process locally. This is primarily RO data that’s unlikely to change during the creation of a single order. But when it’s time to submit the order, you should push the data to the remote order service. Because it’s closer to the actual data stores it uses, a remote order service can complete a single unit of work faster using whatever transaction and locking mechanisms are required to insure the consistency of the underlying data.

 

Editor's Picks