JAVA - ABSTRACTS


JAVA ABSTRACTS
Transaction Management in Service-Oriented Systems: Requirements and a Proposal

Abstract

          Service-Oriented Computing (SOC) is becoming the mainstream development paradigm of applications over the Internet, taking advantage of remote independent functionalities. The cornerstone of SOC’s success lies in the potential advantage of composing services on the fly. When the control over the communication and the elements of the information system is low, developing solid systems is challenging. In particular, developing reliable web service compositions frequently requires the integration of both composition languages, such as the Business Process Execution Language (BPEL), and of coordination protocols, such as WSAtomicTransaction and WS-BusinessActivity. Unfortunately, the composition and coordination of web services currently have separate languages and specifications. The goal of this paper is twofold. First, we identify the major requirements of transaction management in Service-oriented systems and survey the relevant standards. Second, we propose a semiautomatic approach to integrate BPEL specifications and web service coordination protocols, that is, implementing transaction management within service composition processes, and thus overcoming the limitations of current technologies.

A Machine Learning Approach for Identifying Disease-Treatment Relations in Short Texts

Abstract

The Machine Learning (ML) field has gained its momentum in almost any domain of research and just recently has become a reliable tool in the medical domain. The empirical domain of automatic learning is used in tasks such as medical decisionsupport ,medical imaging, protein-protein interaction, extraction of medical knowledge, and for overall patient management care. ML is envisioned as a tool by which computer-based systems can be integrated in the healthcare field in order to get a better, more efficient medical care. This paper describes a ML-based methodology for building an application that is capable of identifying and disseminating healthcare information. It extracts sentences from published medical papers that mention diseases and treatments, and identifiessemantic relations that exist between diseases and treatments. Our evaluation results for these tasks show that the proposed methodology obtains reliable outcomes that could be integrated in an application to be used in the medical care domain. The potential value of this paper stands in the ML settings that we propose and in the fact that we outperform previous results on the same data set.
Dynamics of Malware Spread in Decentralized Peer-to-Peer Networks

Abstract
In this project, we formulate an analytical model to characterize the spread of malware in decentralized, Gnutella type peer-to-peer (P2P) networks and study the dynamics associated with the spread of malware. Using a compartmental model, we derive the system parameters or network conditions under which the P2P network may reach a malware free equilibrium. The model also evaluates the effect of control strategies like node quarantine on stifling the spread of malware. The model is then extended to consider the impact of P2P networks on the malware spread in networks of smart cell phones. We formulate our model as a compartmental model, with the peers divided into compartments, each signifying it’s state at a time instant.



A Generic Multilevel Architecture for Time Series Prediction
Abstract
Rapidly evolving businesses generate massive amounts of time-stamped data sequences and cause a demand for both univariate and multivariate time series forecasting. For such data, traditional predictive models based on autoregression are often not sufficient to capture complex nonlinear relationships between multidimensional features and the time series outputs. In order to exploit these relationships for improved time series forecasting while also better dealing with a wider variety of prediction scenarios, a forecasting system requires a flexible and generic architecture to accommodate and tune various individual predictors as well as combination methods. In reply to this challenge, an architecture for combined, multilevel time series prediction is proposed, which is suitable for many different universal regressors and combination methods. The key strength of this architecture is its ability to build a
diversified ensemble of individual predictors that form an input to a multilevel selection and fusion process before the final optimized
output is obtained. Excellent generalization ability is achieved due to the highly boosted complementarity’s of individual models further
enforced through cross-validation-linked training on exclusive data subsets and ensemble output post processing. In a sample
configuration with basic neural network predictors and a mean

Dual Framework and Algorithms for Targeted Online Data Delivery

Abstract
A variety of emerging online data delivery applications challenge existing techniques for data delivery to human users, applications, or middleware that are accessing data from multiple autonomous servers. In this project, we develop a framework for formalizing and comparing pull-based solutions and present dual optimization approaches. The first approach, most commonly used nowadays, maximizes user utility under the strict setting of meeting a priori constraints on the usage of system resources. We present an alternative and more flexible approach that maximizes user utility by satisfying all users. It does this while minimizing the usage of system resources. We discuss the benefits of this latter approach and develop an adaptive monitoring solution Satisfy User Profiles (SUPs). Through formal analysis, we identify sufficient optimality conditions for SUP. Using real (RSS feeds) and synthetic traces, we empirically analyze the behavior of SUP under varying conditions. The proposed framework aims at providing a scalable online data delivery solution. We identify three types of entities, namely servers, clients, and brokers. We propose a dual formulation OptMon2, which reverses the roles of user utility and system constraints, setting the fulfillment of user needs as the hard constraint. OptMon2 assumes that the system resources that will be consumed to satisfy user profiles should be determined by the specific profiles and the environment, e.g., the model of updates, and does not assume an a priori limitation of system resources.
    

 Decision Trees for Uncertain Data


ABSTRACT:
Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the “complete information” of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.

A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing

Abstract:

Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios.


Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport

Modified Abstract:
Network-based cloud computing is rapidly expanding as an alternative to conventional office based computing. As cloud computing becomes more widespread, the energy consumption of the network and computing resources that underpin the cloud will grow. This is happening at a time when there is increasing attention being paid to the need to manage energy consumption across the entire information and communications technology (ICT) sector. While data center energy use has received much attention recently, there has been less attention paid to the energy consumption of the transmission and switching networks that are key to connecting users to the cloud. In this paper, we present an analysis of energy consumption in cloud computing. The analysis considers both public and private clouds, and includes energy consumption in switching and transmission as well as data processing and data storage. We show that energy consumption in transport and switching can be a significant percentage of total energy consumption in cloud computing. Cloud computing can enable more energy-efficient use of computing power, especially when the computing tasks are of low intensity or infrequent. However, under some circumstances cloud computing can consume more energy than conventional computing where each user performs all computing on their own personal computer (PC).




Self-Organizing Agents for Service Composition in Cloud Computing

Abstract

In Cloud service composition, collaboration between brokers and service providers is essential to promptly satisfy incoming Cloud consumer requirements. These requirements
should be mapped to Cloud resources, which are accessed via web services, in an automated manner. However, distributed and constantly changing Cloud-computing environments pose new challenges to automated service composition such as: (i) dynamically contracting service providers, which set service fees on a supply-and-demand basis, and (ii) dealing with incomplete information regarding Cloud resources (e.g., location and providers). To address these issues, in this work, an agent-based
Cloud service composition approach is presented. Cloud participants and resources are implemented and instantiated by agents. These agents sustain a three-layered self-organizing multi-agent system that establishes a Cloud service composition framework and an experimental test bed. The self-organizing agents make use of acquaintance networks and the contract net protocol to evolve and adapt Cloud service compositions. The experimental results indicate that service composition is efficiently achieved despite dealing with incomplete information as well as coping with dynamic service fees.


Secure and Practical Outsourcing of Linear
Programming in Cloud Computing

Abstract

Cloud Computing has great potential of providing robust computational power to the society at reduced cost. It enables customers with limited computational resources to
outsource their large computation workloads to the cloud, and economically enjoy the massive computational power, bandwidth, storage, and even appropriate software that can be shared in a pay-per-use manner. Despite the tremendous benefits, security
is the primary obstacle that prevents the wide adoption of this promising computing model, especially for customers when their confidential data are consumed and produced during the computation. Treating the cloud as an intrinsically insecure computing platform from the viewpoint of the cloud customers, we must design mechanisms that not only protect sensitive information by enabling computations with encrypted data, but also protect customers from malicious behaviors by enabling the validation of the  omputation result. Such a mechanism of general secure computation outsourcing was recently shown to be feasible in theory, but to design mechanisms that are practically efficient remains a very challenging problem. Focusing on engineering computing and optimization tasks,
this paper investigates secure outsourcing of widely applicable linear programming (LP) computations. In order to achieve practical efficiency, our mechanism design explicitly decomposes the LP computation outsourcing into public LP solvers running on the cloud and private LP parameters owned by the customer. The resulting flexibility allows us to explore appropriate security/ efficiency tradeoff via higher-level abstraction of LP computations than the general circuit representation. In particular, by formulating private data owned by the customer for LP problem as a set of matrices and vectors, we are able to develop a set of efficient privacy-preserving problem transformation techniques, which allow customers to transform original LP problem into some arbitrary one while rotecting sensitive input/output information. To validate the computation result, we further explore
the fundamental duality theorem of LP computation and derive the necessary and sufficient conditions that correct result must satisfy. Such result verification mechanism is extremely efficient and incurs close-to-zero additional cost on both cloud server and
customers. Extensive security analysis and experiment results show the immediate practicability of our mechanism design.


Nymble: Blocking Misbehaving Users
in Anonymizing Networks

Abstract

Anonymizing networks such as Tor allow users to access Internet services privately by using a series of routers to hide the client’s IP address from the server. The success of such networks, however, has been limited by users employing this anonymity for
abusive purposes such as defacing popular Web sites. Web site administrators routinely rely on IP-address blocking for disabling access to misbehaving users, but blocking IP addresses is not practical if the abuser routes through an anonymizing network. As a
result, administrators block all known exit nodes of anonymizing networks, denying anonymous access to misbehaving and behaving users alike. To address this problem, we present Nymble, a system in which servers can “blacklist” misbehaving users, thereby
blocking users without compromising their anonymity. Our system is thus agnostic to different servers’ definitions of misbehavior—servers can blacklist users for whatever reason, and the privacy of blacklisted users is maintained.

A CLOUD COMPUTING SOLUTION FOR PATIENT’S DATA COLLECTION IN HEALTH CARE INSTITUTIONS

Abstract
Existing processes for patients' vital data collection require a great deal of labor work to collect, input and analyze the information. These processes are usually slow and error prone, introducing a latency that prevents real-time data accessibility. This scenario restrains the clinical diagnostics and monitoring capabilities. We propose a solution to automate this process by using “sensors” attached to existing medical equipments that are inter-connected to exchange service. The proposal is based on the concepts of utility computing and wireless sensor networks. The information becomes available in the “cloud” from where it can be processed by expert systems and/or distributed to medical staff. The proof-of-concept design applies commodity computing integrated to legacy medical devices, ensuring cost effectiveness and simple integration.

Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing
Abstract
Cloud Computing is the long dreamed vision of computing as a utility, where users can remotely store their data into the cloud so as to enjoy the on-demand high quality applications and services from a shared pool of configurable computing resources. By data outsourcing, users can be relieved from the burden of local data storage and maintenance. However, the fact that users no longer have physical possession of the possibly large size of outsourced data makes the data integrity protection in Cloud Computing a very challenging and potentially formidable task, especially for users with constrained computing resources and capabilities. Thus, enabling public auditability for cloud data storage security is of critical importance so that users can resort to an external audit party to check the integrity of outsourced data when needed. To securely introduce an effective third party auditor (TPA), the following two fundamental requirements have
to be met: 1) TPA should be able to efficiently audit the cloud data storage without demanding the local copy of data, and introduce no additional on-line burden to the cloud user; 2) The third party auditing process should bring in no new vulnerabilities towards user data privacy. In this paper, we utilize and uniquely combine the public key based homomorphic authenticator with random masking to achieve the privacy-preserving public cloud data auditing system, which meets all above requirements. To support efficient handling of multiple auditing tasks, we further explore the technique of bilinear aggregate signature to extend our main result into a multi-user setting, where TPA can perform multiple auditing tasks simultaneously. Extensive security and performance analysis shows the proposed schemes are provably secure and highly efficient.

Online Intrusion Alert Aggregation with Generative Data Stream Modeling

ABSTRACT

Alert aggregation is an important subtask of intrusion detection. The goal is to identify and to cluster different alerts-produced by low-level intrusion detection systems, firewalls, etc.-belonging to a specific attack instance which has been initiated by an attacker at a certain point in time. Thus, meta-alerts can be generated for the clusters that contain all the relevant information whereas the amount of data (i.e., alerts) can be reduced substantially. Meta-alerts may then be the basis for reporting to security experts or for communication within a distributed intrusion detection system. We propose a novel technique for online alert aggregation which is based on a dynamic, probabilistic model of the current attack situation. Basically, it can be regarded as a data stream version of a maximum likelihood approach for the estimation of the model parameters. With three benchmark data sets, we demonstrate that it is possible to achieve reduction rates of up to 99.96 percent while the number of missing meta-alerts is extremely low. In addition, meta-alerts are generated with a delay of typically only a few seconds after observing the first alert belonging to a new attack instance.

Aggregating Software Services to Discover Enterprise Mashups

ABSTRACT

Service mashup is the act of integrating the resulting data of two complementary software services into a common picture. Such an approach is promising with respect to the discovery of new types of knowledge. However, before service mashup routines can be executed, it is necessary to predict which services (of an open repository) are viable candidates. Similar to Knowledge Discovery in Databases (KDD), we introduce the Knowledge Discovery in Services (KDS) process that identifies mashup candidates. In this work, the KDS process is specialized to address a repository of open services that do not contain semantic annotations. In these situations, specialized techniques are required to determine equivalences among open services with reasonable precision. This paper introduces a bottom-up process for KDS that adapts to the environment of services for which it operates. Detailed experiments are discussed that evaluate KDS techniques on an open repository of services from the Internet and on a repository of services created in a controlled environment.

A Generic Framework for Three-Factor Authentication: Preserving Security and Privacy in Distributed Systems

ABSTRACT

As part of the security within distributed systems, various services and resources need protection from unauthorized use. Remote authentication is the most commonly used method to determine the identity of a remote client. This paper investigates a systematic approach for authenticating clients by three factors, namely password, smart card, and biometrics. A generic and secure framework is proposed to upgrade two-factor authentication to three-factor authentication. The conversion not only significantly improves the information assurance at low cost but also protects client privacy in distributed systems. In addition, our framework retains several practice-friendly properties of the underlying two-factor authentication, which we believe is of independent interest.

Data Leakage Detection

ABSTRACT

We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody's laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party.

The Significance of Instant Messaging at Work

ABSTRACT

Instant messaging has become increasingly prevalent in social life. However, whether to use IM at work remains controversial, due to its unquantified benefits for organizations. In this study we employ the suggestive metaphor of social network theory to examine IM's impact on organizational performance. Specifically, we propose that IM has the potential to enhance organizational agility by enabling quality communication, building interlocutors' mutual trust and establishing relationship networks in the workplace. The conceptual model is validated by 253 survey responses collected from employees of Chinese organizations. The data indicates that IM supports social networks, which contribute substantially to organizational agility. The theoretical and practical implications of the findings are discussed.

Improving Utilization of Infrastructure Clouds

ABSTRACT

A key advantage of infrastructure-as-a-service (IaaS) clouds is providing users on-demand access to resources. To provide on-demand access, however, cloud providers must either significantly overprovision their infrastructure (and pay a high price for operating resources with low utilization) or reject a large proportion of user requests (in which case the access is no longer on-demand). At the same time, not all users require truly on-demand access to resources. Many applications and workflows are designed for recoverable systems where interruptions in service are expected. For instance, many scientists utilize high-throughput computing (HTC)-enabled resources, such as Condor, where jobs are dispatched to available resources and terminated when the resource is no longer available. We propose a cloud infrastructure that combines on-demand allocation of resources with opportunistic provisioning of cycles from idle cloud nodes to other processes by deploying backfill virtual machines (VMs). For demonstration and experimental evaluation, we extend the Nimbus cloud computing toolkit to deploy backfill VMs on idle cloud nodes for processing an HTC workload. Initial tests show an increase in IaaS cloud utilization from 37.5% to 100% during a portion of the evaluation trace but only 6.39% overhead cost for processing the HTC workload. We demonstrate that a shared infrastructure between IaaS cloud providers and an HTC job management system can be highly beneficial to both the IaaS cloud provider and HTC users by increasing the utilization of the cloud infrastructure (thereby decreasing the overall cost) and contributing cycles that would otherwise be idle to processing HTC jobs.

Towards Secure and Dependable Storage Services in Cloud Computing

ABSTRACT

As one of the emerging services in cloud paradigm, cloud storage enables users to remotely store their data into the cloud so as to enjoy the on-demand high quality applications and services from a shared pool of configurable computing resources. While cloud storage relieves users from the burden of local storage management and maintenance, it is also relinquishing users’ ultimate control over the fate of their data, which may put the correctness of outsourced data into risks. In order to regain the assurances of cloud data integrity and availability and enforce the quality of cloud storage service for users, we propose a highly efficient and flexible distributed storage verification scheme with two salient features, opposing to its predecessors. By utilizing the homomorphic token with distributed erasure-coded data, our scheme achieves the integration of storage correctness insurance and data error localization, i.e., the identification of misbehaving server(s). Unlike most prior works, the new scheme further supports secure and efficient dynamic operations on outsourced data, including: block modification, deletion and append. Extensive security and performance analysis shows the proposed scheme is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.

Securing Cloud from DDOS Attacks Using Intrusion Detection System in Virtual Machine

ABSTRACT

Innovation is necessary to ride the inevitable tide of change. The buzzword of 2009 seems to be "cloud computing" which is a futuristic platform to provides dynamic resource pools, virtualization, and high availability and enables the sharing, selection and aggregation of geographically distributed heterogeneous resources for solving large-scale problems in science and engineering. But with this ever developing cloud concept, problems are arising from this ¿golden solution¿ in the enterprise arena. Preventing intruders from attacking the cloud infrastructure is the only realistic thing the staff, management and planners can foresee. Regardless of company size or volume and magnitude of the cloud, this paper explains how maneuver IT virtualization strategy could be used in responding to a denial of service attack. After picking up a grossly abnormal spike in inbound traffic, targeted applications could be immediately transferred to virtual machines hosted in another data center. We're not reinventing the wheel. We have lots of technology and standardized solutions we can already use to engineer into the stack. We are just introducing them in the way least expected.

Service-Centric Framework for a Digital Government Application

ABSTRACT

This paper presents a service-oriented digital government infrastructure focused on efficiently providing customized services to senior citizens. We designed and developed a Web Service Management System (WSMS), called WebSenior, which provides a service-centric framework to deliver government services to senior citizens. The proposed WSMS manages the entire life cycle of third-party web services. These act as proxies for real government services. Due to the specific requirements of our digital government application, we focus on the following key components of WebSenior: service composition, service optimization, and service privacy preservation. These components form the nucleus that achieves seamless cooperation among government agencies to provide prompt and customized services to senior citizens.