2nd International Workshop on
Data Management in Peer-to-peer systems
Sunday March 22, 2009, Saint-Petersburg, Russia
Contact us
CFP flyer
Prévious ed.
Workshop Program
9:15 - 9:30    Registration & Welcome
9:30 - 10:30    Session 1
Database Replication in Large Scale Systems: Optimizing the Number of Replicas
Modou Gueye, Idrissa Sarr, Samba Ndiaye
In distributed systems, replication is used for ensuring availability and increasing performances. However, the heavy workload of distributed systems such as web2.0 applications or Global Distribution Systems, limits the benefit of replication if its degree (i.e., the number of replicas) is not controlled. Since every replica must perform all updates eventually, there is a point beyond which adding more replicas does not increase the throughput, because every replica is saturated by applying updates. Moreover, if the replication degree exceeds the optimal threshold, the useless replica would generate an overhead due to extra communication messages. In this paper, we propose a suitable replication management solution in order to reduce useless replicas. To this end, we define two mathematical models which approximate the appropriate number of replicas to achieve a given level of performance. Moreover, we demonstrate the feasibility of our replication management model through simulation. The results expose the effectiveness of our models and their accuracy.
Towards Access Control Aware P2P Data Management Systems
Rammohan Narendula, Zoltan Miklos
P2P data management systems provide a scalable alternative to centralized architectures. Their adoption, however, is limited by the lack of possibility to control the access to the resources stored in the system. We address this problem in the case of structured P2P networks, in particular, when the system is used in a collaborative working environment. We analyze the problem assuming a simple threat model and we systematically explore the solution possibilities. We design and compare access control enforcement techniques which realize the desired functionality by constructing independent networks or by implementing access control at query or at response time.
10:30 - 10:45    Coffee break
10:45 - 11:45    Session 2
BlobSeer: How to Enable Efficient Versioning for Large Object Storage under Heavy Access Concurrency - Bogdan Nicolae, Gabriel Antoniu, Luc Bougé
To accommodate the needs of large-scale distributed P2P systems, scalable data management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This paper addresses the problem of efficiently storing and accessing very large binary data objects (BLOBs). It proposes an efficient versioning scheme allowing a large number of clients to concurrently read, write and append data to huge BLOBs that are fragmented and distributed at a very large scale. Scalability under heavy concurrency is achieved thanks to an original metadata scheme, based on a distributed segment tree built on top of a Distributed Hash Table (DHT). Our approach has been implemented and experimented within our BlobSeer prototype on the Grid 5000 testbed, using up to 175 nodes.
Optimizing Peer-to-Peer Backup using Lifetime Estimations
Samuel Bernard, Fabrice Le Fessant
In this paper, we study the viability of a peer-to-peer backup system on nowadays internet connections. In particular, we show that peer lifetime estimation can be used to reduce the maintenance cost of peer-to-peer backup. Previous studies have shown that lifetimes in a peer-to-peer system follow a Pareto distribution. Consequently, peers can be sorted on their expected lifetimes, depending only on the length of their history in the system. By carefully selecting the peers on which backup data is stored, repairing cost can be highly reduced for long-term backup users, while it is still acceptable for new users. The efficiency of this technique is evaluated through simulations of a state-of-the-art peer-topeer backup system.
11:45 - 13:00    Invited talk
Leveraging Communities in Social Content Sites - Sihem Hamer-Yahia
Invited talk in conjunction with DataX Workshop. Social Content Sites, which integrate traditional content sites (e.g., Yahoo! Travel) with social networks (e.g., Facebook) have recently emerged as a popular Web destination for creating and sharing content and social links. We discuss new challenges in searching content on those sites and expand the discussion to community-driven information exploration. In particular, we present Social Scope, a new architecture which harnesses information from multiple social content sites and Jelly, a language to help developers build scalable information exploration applications. At the core of our architecture and language are user communities and topics which model users' interests. Finally, we examine how XML technologies can help in modeling and processing social information.
13:00 - 13:30    Lunch
13:30 - 15:00    Session 3
DHTJoin: Processing Continous Join Queries using DHT Networks
Wenceslao Palma, Reza Akbarinia, Esther Pacitti, Patrick Valduriez
This paper addresses the problem of computing approximate answers to continuous join queries. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries exploiting the trees formed by the underlying DHT links. DHTJoin distributes the query workload across multiple DHT nodes and provides a mechanism that avoids indexing tuples that cannot contribute to join results. We provide a performance evaluation which shows that DHTJoin can achieve significant performance gains in terms of network traffic.
Locaware: Index Caching in Unstructured P2P-file Sharing Systems
Manal El Dick, Esther Pacitti
Though widely deployed for file-sharing, unstructured P2P systems aggressively exploit network resources as they grow in popularity. The P2P traffic is the leading consumer of bandwidth, mainly due to search inefficiency, as well as to large data transfers over long distances. This critical issue may compromise the benefits of such systems by drastically limiting their scalability. In order to reduce the P2P redundant traffic, we propose Locaware, which performs index caching while supporting keyword search. Locaware aims at reducing the network load by directing queries to available nearby results. For this purpose, Locaware leverages natural file replication and uses topological information in terms of file physical distribution.
P2P based Hosting System for Scalable Replicated Databases - Mesaac Makpangou
We propose a large-scale storage system capable to host and manage partially replicated databases. The baseline of the overall solution is threefold: the support for transparent replication and management of database replicas distributed over a peer-to-peer network; the provision of a replica control middleware that guarantees the 1-copy snapshot isolation correctness criterion for the replicas of one database distributed world-wide; and the definition of a distributed transaction processing substrate that enables transactions to access data spread out several database replicas, while still preserving the consistency of the distributed state accessed by each distributed transaction. This paper presents how the proposed system can be deployed over a P2P network and discusses our proposed certification protocol. We anticipate that this storage system will boost the performance of database-intensive application or services that are accessed by clients distributed world-wide.

DAMAP 2009 EDBT 2009 EDBT Workshops