Partitioning Functions for Stateful Data Parallelism in Stream Processing
--- The VLDB Journal
skewed, desirable, associated, exhibit, superior, accordingly, necessitate, prominent, tractable, exploit, effectively, efficiently, transparent, elastically, amenable, conflicting, concretely, exemplify, depict, a deluge of
in the form of continuous streams large volumes of necessitate doing as a example for instance in this scenario
Accordingly, there is an increasing need to gather and analyze data streams in near real-time to extract insights and detect emerging patterns and outliers.
The increased affordability of distributed and parallel computing, thanks to advances in cloud computing and multi-core chip design, has made this problem tractable.
However, in the presence of skew in the distribution of the partitioning key, the balance properties cannot be maintained by the consistent hash.
MORM: A Multi-objective Optimized Replication Management strategy for cloud storage cluster
--- Journal of Systems Architecture
issue, achieve, latency, entail, consumption, article, propose, candidate, conclusively, demonstrate, outperform, nowadays, huge, currently, crucial, significantly, adopt, observe, collectively, previously, holistic, thus, tradeoff, primary, therefore, aforementioned, capture, layout, remainder, formulate, present, enormous, drawback, infrastructure, chunk, nonetheless, moreover, duration, substantially, wherein, overall, collision, shortcoming, affect, further, address, motivate, explicitly, suppose, assume, entire, invariably, compromise, inherently, pursue, handle, denote, utilize, constraint, accordingly, infeasible, violate, respectively, guarantee, satisfaction, indicate, hence, worst-case, synthetic, assess, rarely, throughout, diversity, preference, illustrate, imply, additionally, is an important issue a series of in terms of
in a distributed manner in order to by default
be referred to as
take a holistic view of conflict with a variety of
is highly in demand
given the aforementioned issue and trend take into account yield close to as follows
take into consideration with respect to a research hot spot call for
according to depend upon/on
meet ... requirement focus on
is sensitive to is composed of consist of
from the latency minimization perspective a certain number of
is defined as (follows) / can be expressed as (follows) / can be calculated/computed by / is given by the following at hand
corresponding to
has nothing to do with in addition to
as depicted in Fig.1 et al.
The volume of data is measured in terabytes and some time in petabytes in many fields.
Data replication allows speeding up data access, reducing access latency and increasing data availability.
How many suitable replicas of each data should be created in the cloud to meet a reasonable system requirement is an important issue for further research.
Where should these replicas be placed to meet the system task fast execution rate and load balancing requirements is another important issue to be thoroughly investigated.
As the system maintenance cost will significantly increase with the number of replicas increasing, keeping too many or fixed replicas are not a good choice.
Where should these replicas be placed to meet the system task fast execution rate and load balancing requirements is another important issue to be thoroughly investigated.
We build up five objectives for optimization which provides us with the advantage that we can search for solutions that yield close to optimal values for these objectives.
The shortcoming of them is that they only consider a restricted set of parameters affecting the replication decision. Further, they only focus on the improvement of the system performance and they do not address the energy efficiency issue in data centers.
Data node load variance is the standard deviation of data node load of all data nodes in the cloud storage cluster which can be used to represent the degree of load balancing of the system.
The advantage of using simulation is that we can easily vary parameters to understand their individual impact on system performance.
Throughout the simulation, we assumed \include the consistency or write and update propagations costs in the study.
Distributed replica placement algorithms for correlated data
--- The Journal of Supercomputing
yield, potential, congestion, prolonged, malicious, overhead, conventional, present, propose, numerous, tackle, pervasive, valid, utilize, develop a .... algorithm suffer from
in a distributed manner be denoted as M converge to
so on and so forth
With the advances in Internet technologies, applications are all moving toward serving widely distributed users.
Replication techniques have been commonly used to minimize the communication latency by bringing the data close to the clients and improve data availability.
Thus, data needs to be carefully placed to avoid unnecessary overhead. These correlations have significant impact on data access patterns.
For structured data, data correlated due to the structural relations may be frequently accessed together.
Assume that data objects can be clustered into different classes due to user accesses, and whenever a client issues an access request, it will only access data in a single class.
One challenge for using centralized replica placement algorithms in a widely distributed system is that a server site has to know the (logical) network topology and the resident set of all structured data sets to make replication decisions.
We assume that the data objects accessed by most of the transactions follow certain patterns, which will be stable for some time periods.
Locality-aware allocation of multi-dimensional correlated files on the cloud platform
--- Distributed and Parallel Databases
enormous, retrieve, prevailing, commonly, correlated, booming, massive, exploit, crucial, fundamental, heuristic, deterministic, duplication, compromised, brute-force, sacrifice, sophisticated, investigate, abundant, notation, as a matter of fact in various ways
with .... taken into consideration play a vital role in it turns out that in terms of vice versa
a.k.a. = also known as
The effective management of enormous data volumes on the Cloud platform has attracted devoting research efforts.
Currently, most prevailing Cloud file systems allocate data following the principles of fault tolerance and availability, while inter-file correlations, i.e. files correlated with each other, are often neglected.
There is a trade-off between data locality and the scale of job parallelism.
Although distributing data randomly is expected to achieve the best parallelism, however, such a method may lead to degraded user experiences for introducing extra costs on large volume of remote accesses, especially for many applications that are featured with data locality, e.g., context-aware search, subspace oriented aggregation queries, and etc.
However, there must be several application-dependent hot subspaces, under which files are frequently being processed.
The problem is how to find a compromised partition solution to well serve the file correlations of different feature subspaces as much as possible.
If too many files are grouped together, the imbalance cost would raise and degrade the scale of job parallelism; if files are partitioned into too many small groups, data copying traffic across storage nodes would increase.
Instead, our solution is to start from a sub-optimal solution and employ some heuristics to derive a near optimal partition with as less cost as possible.
By allocating correlated files together, significant I/O savings can be achieved on reducing the huge cost of random data access over the entire distributed storage network.
百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说综合文库外文文献阅读笔记(3)在线全文阅读。
相关推荐: