Questions for the Close Reading of an Essay

Title: Author:

Name of Original Publication: Date of Publication:

1. What is the subject of the essay (one word)?

2. What is the author’s purpose (inform, persuade, entertain, what)?

3. Who is the intended reader (audience), the general public or is it targeted to a specific reader (e.g., a scientist, artist, student, gamer, etc.)?

4. Is the subject of the essay timeless (always important) or stale (dated, too old to be relevant anymore)?

5. Name three keywords the author uses in the essay:

6. Write one sentence naming the subject in #1 and use the keywords in #5 to describe the essay.

7. What are two critical points that the author makes?

8. What is the author’s point of view on the topic (their stance for, against or neutral)? Is this obvious in the tone of their writing (that it is informative/neutral or opinionated)?

9. Name at least three rhetorical strategies the author uses (narrative, description, process, examples, definition, analysis, classification, compare and contrast, analogy, cause and effect) and cite the page number in the text:

Questions for the Close Reading of an Essay (cont.)

10. Is the first sentence a ‘hook’ that makes you want to read more? Does the last sentence leave you with a good impression that the essay was written well?

11. If there is a thesis, what is it?

12. Is the author persuasive? Are the claims supported by evidence such as facts and personal experience and not from a crowd-sourced site like Wikipedia?

13. Is the author credible (an expert or a journalist or someone writing about their own true personal experience?

14. What content do you agree with? What content do you disagree with?

15. Did you learn anything from the content (what)?

16. Is the essay easy or difficult to read and understand? What vocabulary words are not familiar?

17. Is it well-written and demonstrates the author’s skill as a writer (flows, keeps your interest, informs)? Why or why not?

18. Would you recommend this essay to a friend? Why?

19. Is this essay memorable? Two years from now, will you recall anything about this essay?

20. What is your one word impression of the essay?

Extracting Authoring Information Based on Keywords and Semantic Search

Faisal Alkhateeb, Amal Alzubi, Iyad Abu Doush

Computer Sciences Department

Yarmouk University, Irbid, Jordan

{alkhateebf,iyad.doush}@yu.edu.jo

Shadi Aljawarneh Faculty of Science and Information Technology

Al-Isra University, Amman, Jordan

[email protected]

Eslam Al Maghayreh Computer Sciences

Department Yarmouk University, Irbid,

Jordan [email protected]

ABSTRACT Many people, in particular researchers, are interested in searching and retrieving authoring information from online authoring databases to be cited in their research projects. In this paper, we propose a novel approach for retrieving authoring information that combines keyword and semantic- based approaches. In this approach, the user is interested only in retrieving authoring information considering some specified keywords and ignore how the internal semantic search is being processed. Additionally, this approach ex- ploits the semantics and relationships between different re- sources for a better knowledge-based inference.

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Search pro- cess

Keywords Semantic web, RDF, SPARQL, Authoring Information, Key- word Search, Semantic Search

1. INTRODUCTION The world wide web (or simply the web) has become the

first source of knowledge for all life domains. It can be seen as an extensive information system that allows exchanging the resources as well as documents. The semantic web is an evolving extension of the web aiming at giving well defined forms and semantics to the web resources (e.g., content of an HTML web page) [4].

Due to the growth of the semantic web, semantic search became an attracting approach. The term refers to meth- ods of searching web documents beyond the syntactic level of matching keywords. Exposing metadata is an essential point for a semantic search approach associated with the semantic web. The most important recent development is

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISWSA’10, June 14–16, 2010, Amman, Jordan. Copyright 2010 ACM 978-1-4503-0475-7/0 /2010 ...$10.00.

in the area of embedding metadata directly into web doc- uments. RDF (Resource Description Framework) [15] is a knowledge representation language dedicated to the annota- tion of resources within the Semantic web. Currently, many documents are annotated via RDF due to its simple data model and its formal semantics. For example, it is embed- ded in (X)HTML web pages using the RDFa language [1], in SMIL documents [7] using RDF/XML [3], etc. SPARQL [17] is a W3C recommendation language developed in order to query RDF knowledge bases, e.g., retrieving nodes from RDF graphs.

Another approach that is found in search engines is based on using keywords. More precisely, both queries and docu- ments are typically treated at a word or gram level (simi- lar to Information Retrieval). The search engine is missing a semantic-level understanding of the query and can only understand the content of a document by picking out docu- ments with the most commonly occurring keywords.

The objective of this paper is to provide a novel approach for retrieving authoring information that combines keyword- based and semantic-based approaches. In this approach, the user is interested only in retrieving authoring informa- tion considering some specified keywords and ignores how the internal semantic search is being processed. In particu- lar, the user is interested in searching authoring information from online authoring information portals (such as DBLP1, ACM2, IEEE3, etc). For instance, show me all documents of the author ”faisal alkhateeb” or the author ”jerome eu- zenat” with a title containing ”SPARQL”. In the proposed approach, keywords are used for collecting authoring infor- mation about the authors, which are then filtered with se- mantic search (using RDF and SPARQL) based on the se- mantic relations of the query.

The remainder of the paper is organized as follows: we introduce the research background in Section 2. The com- bined approach is presented in Section 3 as well as a testcase illustrating the proposed approach. A review of related work is discussed in Section 4. Discussion issues drawn from this study are presented in Section 5.

2. RESEARCH BACKGROUND This section provides an overview of the elements that

are necessary for presenting the proposed approach namely:

1http://www.informatik.uni-trier.de/~ley/db/ 2http://portal.acm.org/portal.cfm 3http://www.ieee.org/portal/site6

BibTeX, RDF, and SPARQL.

2.1 BibTeX BibTeX4[16, 10] is a tool and a file format which are used

to describe and process lists of references, mostly in conjunc- tion with LaTeX documents. BibTeX makes it easy to cite sources in a consistent manner, by separating bibliographic information from the presentation of this information. Bib- TeX uses a style-independent text-based file format for lists of bibliography items, such as articles, books, thesis. Each bibliography entry contains some subset of standard data entries: author, booktitle, number, organization, pages, title, type, volume, year, institution, and others. Bib- liography entries included in a .bib file are split by types. The following types are understood by virtually all BibTeX styles: article, book, booklet, conference, inproceed-

ings, phdthesis, etc.

Example 1. The following is an instance of a BibTeX element:

@article{DBLP:AlkhateebBE09,

author = {Faisal Alkhateeb and Jean-Francois

Baget and Jerome Euzenat},

title = {Extending SPARQL with regular exp-

ression patterns (for querying RDF)

},

journal = {J. web Sem.},

volume = {7},

number = {2},

year = {2009},

pages = {57-73},

}

2.2 RDF RDF is a language for describing resources. In its abstract

syntax, an RDF document is a set of triples of the form <subject, predicate, object>.

Example 2. The assertion of the following RDF triples:

{ <ex:person1 foaf:name "Faisal Alkhateeb">, <ex:document1 BibTeX:author ex:person1>, <ex:document1 rdf:type BibTeX:inproceedings>, <ex:document1 BibTeX:title "PSPARQL">, <ex:person1 foaf:knows ex:person2>, <ex:person2 foaf:name "Jerome Euzenat">, <ex:document1 BibTeX:author ex:person2>,

}

means that there exists an inproceedings document, which is coauthored by two persons named ”Faisal Alkhateeb” and ”Jerome Euzenat”, whose title is ”PSPARQL”.

An RDF document can be represented by a directed la- beled graph, as shown in Figure 1, where the set of nodes is the set of terms appearing as a subject or object in a triple and the set of arcs is the set of predicates (i.e., if <s, p, o>

is a triple, then s p−→ o).

2.3 SPARQL SPARQL is the query language developed by the W3C for

querying RDF graphs. A simple SPARQL query is expressed using a form resembling the SQL SELECT query:

4http://www.bibtex.org/

ex:person1 ex:person2

ex:document1

BibTeX:inproceedings

”Faisal Alkhateeb” ”Jerome Euzenat”

”PSPARQL”

foaf:knows

foaf:name foaf:name BibTeX:author BibTeX:author

rdf:typeBibTeX:title

Figure 1: An RDF graph.

SELECT ~B FROM u WHERE P

where u is the URL of an RDF graph G to be queried, P is a SPARQL graph pattern (i.e., a pattern constructed over

RDF graphs with variables) and ~B is a tuple of variables appearing in P . Intuitively, an answer to a SPARQL query is an instantiation of the variables of ~B by the terms of the RDF graph G such that the substitution of the values to the variables of P yields to a subset of the graph G.5

Example 3. Consider the RDF graph of Figure 1 repre- senting some possible authoring information. For instance the existence of the following triples {〈ex:document1, rdf:type, BibTeX:inproceedings〉, 〈ex:document1, BibTeX:title, "PSPARQL"〉} asserts that there exists an inproceedings document whose ti- tle is ”PSPARQL”.

The following SPARQL query modeling this information:

SELECT * FROM <Figure1> WHERE {

?document BibTeX:author ?author . ?document BibTeX:title "PSPARQL" . ?author foaf:name ?name .

}

could be used, when evaluated against the RDF graph of Fig- ure 1, to return the following answers:

# ?document ?author ?name 1 ex:document1 ex:person1 ”FaisalAlkhateeb” 2 ex:document1 ex:person2 ”JeromeEuzenat”

In RDF there exists a set of reserved words (called RDF Schema or simply RDFS [6]), designed to describe the re- lationships between resources and properties, e.g., classA

subClassOf classB. It adds additional constraints to the re- sources associated to the RDFS terms, and thus permitting more consequences (reasoning).

Example 4. Using the RDF graph presented in Figure 1, we can deduce the following triple <ex:document1 rdf:type BibTeX:proceedings> from the following triples <ex:document1 rdf:type BibTeX:inproceedings> and <BibTeX:inproceedings rdfs:subClassOf BibTeX:publications>. Hence, the following SPARQL query :

SELECT * FROM <Figure1> WHERE {

?document rdf:type BibTeX:publications . ?document BibTeX:author ?author .

5When using RDFS semantics [6], this intuitive definition is irrelevant and one could apply RDFS reasoning rules to calculate answers over RDFS documents.

?document BibTeX:title "PSPARQL" . ?author foaf:name ?name .

}

returns the same set of answers described in Example 1 be- cause a inproceedings is a subclass of publications.

SPARQL provides several result forms other than SE- LECT that can be used for formating the query results. For example, a CONSTRUCT query can be used for building an RDF graph from the set of answers to the query. More precisely, an RDF graph pattern (i.e., an RDF involving variables) is specified in the CONSTRUCT clause that will be constructed. For each answer to the query, the variable values are substituted in the RDF graph pattern and the merge of the resulting RDF graphs is computed.6 This fea- ture can be viewed as rules over RDF permitting to build new relations from the linked data.

Example 5. The following CONSTRUCT query:

CONSTRUCT {?author BibTeX:coauthorof ?document .} FROM <Figure1> WHERE {

?document BibTeX:author ?author . ?document BibTeX:title "PSPARQL" . ?author foaf:name ?name .

}

constructs the RDF graph (containing the coauthor relation) by substituting for each located answer the values of the vari- ables ?author and ?document to have the following graph (as done for SPARQL, we encode the resulting graph in the Tur- tle language7):

@prefix ex: <http://ex.org/> .

ex:person1 BibTeX:coauthorof ex:document1 . ex:person2 BibTeX:coauthorof ex:document1 .

3. METHODOLOGY The Extracting Authoring Information system which we

have implemented is used to achieve the following: Given: - A user query in the form of textual keywords.

Find: - A set of BibTeX elements that are relevant to the query.

The proposed methodology consists of the following ma- jor phases: connecting to Google search engine, connecting to DBLP page and extracting BibTeX elements, convert- ing BibTeX to RDF and keywords to SPARQL query, and then evaluate the SPARQL query against the RDF docu- ment. The first two phases deal with extracting author in- formation based on keyword search, while the third and the fourth represent the semantic search. In the following, we present the basic work flow of the system as well as its main components.

3.1 System Work Flow As shown in Figure 2, the system works as follows: the

user firstly enters the keywords to be searched such as key- words from author name, title of the paper title, year of publication, etc. Then, uses goolge search engine to cor- rect misspelled entered keywords (in particular, names of the authors) as well as finding the pages for the corrected

6A definition of RDF merge operation can be found at http: //www.w3.org/TR/2001/WD-rdf-mt-20010925/#merging. 7http://www.dajobe.org/2004/01/turtle/.

entered keywords (for instance, DBLP pages of the author). After that, BibTeX elements will be extracted and these BibTeX elements will be converted to RDF document. The corrected keywords will be transformed to a SPARQL query to be used for querying the RDF document corresponding to the extracted BibTeX elements.

Figure 2: The Basic Flow of the System.

3.2 System Components The following are the main components of the system:

• Google Search: after entering the keywords in the corresponding positions, they will be passed to a com- ponent that connects to Google engine. That is, the magic URL ”http://www.google.com/search?hl=ar&q=” +”searchParameters” of Google search engine will be used to search for the specified keywords. To this end, there could be two cases returned from this search ei- ther:

– correct author name; or

– misspelled author name.

In the second, the new search path ”did you mean structure”will be used to reconnect to the Google search engine. This process is repeated until finding the corre- sponding author page in the specified authoring database (DBLP, ACM, IEEE, etc).

• BibTeX extractor: this component is responsible for extracting the BibTeX elements and save them in a file for later usage. it should be noticed that this compo- nent contains several methods, each of them is specific to a bibliography database. This is due to the fact that each bibliography database has its own style to include BibTeX elements in the authoring web pages. Therefore, we suggest to include BibTeX elements in web pages as an RDFa annotations8.

8http://www.w3.org/TR/xhtml-rdfa-primer/

Figure 3: The user interface of the system as well as the found results.

• BibTeX parser: BibTeX elements are then converted to RDF documents using results from the BibTeX parser that we have implemented in the system. Note that if the RDFa is used to annotate BibTeX elements, then there is no need for this parser. In this case, the on- line RDF distiller9 could be used to extract RDF doc- uments corresponding to the annotated BibTeX ele- ments from web pages. In addition to the RDF triples that correspond to the BibTeX entries, RDF triples corresponding to RDFS relationships (such as <BibTeX:in- proceedings rdfs:subClassOf BibTeX:proceedings> and <BibTeX:booklet rdfs:subClassOf BibTeX:book>) are added to the RDF document to allow reasoning more results.

• Keywords to SPARQL query: the entered key- words are also used to build a SPARQL query auto- matically. The query will be then used to filter the results obtained in search based on keywords. More precisely, when entering keywords, the user selects the type of the data entry to be entered such as ”Title”, ”Author”, ”Publication”, ”Pages”, and so on. Note that, the user can enter multiple authors. If the key- word begins with underscore ” ”, this means that the entered keyword is part of the BibTeX data entry. In this case, the ”regex” function can be used in the Filter constraint to build the SPARQL query. Otherwise, it is considered to be an exact search for such keyword. Moreover, the user can specify the relationship be- tween the entered keywords (i.e., ”or” or ”and”). When building the SPARQL query, these relationships corre-

9http://www.w3.org/2007/08/pyRDFa/

spond to the ”UNION” and ”AND” SPARQL query graph patterns.

• Query evaluator: this component is used to evaluate the SPARQL query (i.e., the query obtained from the entered keywords) against the RDF document (i.e., the RDF document obtained from the file containing the BibTeX elements) to find and construct the precise results. Any query evaluator could be used at this stage10, but we have used jena11.

It should be noticed that DBLP provides the capability of searching by allowing users to pose keyword-based queries over only its bibliography dataset. For instance, one can pose the query ”alkhateeb|jerome euzenat” that searches for documents matching the keyword ”alkhateeb”or ”jerome eu- zenat”. The search process in DBLP offers good features such as a search is triggered after each keystroke with in- stant times if the network connections is not lame and case- insensitive search [2]. However, a misspelled keyword such as ”alkhateb” has no hits while ”alkhateeb” returns five doc- uments. Additionally, the semantic relations are neither fully preserved nor well defined. In particular, one can pose the query ”alkhateeb|euzenat”providing 79 documents while putting a space after the pipe ”alkhateeb| euzenat” provides only 2 documents. The semantic reasoning is not provided (see Example 4). We avoided these limitations in the pro- posed methodology.

3.3 Test Case 10http://esw.w3.org/topic/SparqlImplementations 11http://jena.sourceforge.net/

Suppose that the user had entered ”faisal alkhateb” as an author, ”jerome euzenat”as an another author, and ” sparql” as a title in the interface shown in Figure 3 and selected DBLP as a search database as well ”or” and ”and” connec- tions between the authors and the title keywords. Then the query equation will be as: ((Author or Author)) and Title) = ((faisal alkhateeb or jerome euzenat) and sparql).

The search will be done in Google to check if the author name exists in DBLP or not. In this testcase, the Google engine corrects the misspelled author name ”faisal alkhateb” and uses ”faisal alkhateeb” instead to connect to the DBLP with the correct name. Then the BibTeX elements cor- responding to the keywords ”faisal alkhateb”, ”jerome eu- zenat”, and ”sparql” are extracted from DBLP:

@article{DBLP:AlkhateebBE09, author = {Faisal Alkhateeb and Jean-Francois

Baget and Jerome Euzenat}, title = {Extending SPARQL with regular expre-

ssion patterns (for querying RDF)}, journal = {J. web Sem.}, volume = {7}, number = {2}, year = {2009}, pages = {57-73},}

...

The BibTeX elements will be then converted to an RDF document such as the one in Example 2. Also, the cor- rected entered keywords will be used to build the following SPARQL query used to filter the results:

CONSTRUCT{?doc BibTeX:author "Faisal Alkhateeb" ?doc BibTeX:author "Jerome Euzenat" ... } FROM<RDF document corresponds to BibTeX> WHERE{{{?doc BibTeX:author "Faisal Alkhateeb".

?doc BibTeX:title ?title. ?doc BibTeX:year ?year. ?doc BibTeX:pages ?pages. }

Union { ?doc BibTeX:author "Jerome Euzenat". ?doc BibTeX:title ?title. ?doc BibTeX:year ?year. ?doc BibTeX:pages ?pages. }}

{ ?doc BibTeX:title ?title } FILTER(regex(?title, "^sparql"))}

}

Note that the keyword ” sparql” begins with underscore ” ” and so it is considered to be part of the title while other keywords such ”‘faisal alkhateeb” do not and considered to be the full author names. Note that the user can specify a range for the publishing years. For instance, show me the authoring information between ”2004” and ”2008”. In this case, she/he must can enter ”2004-2008” in the year field, which in tern converted to the following part of a SPARQL query:

?document BibTeX:hasyear ?year . FILTER ((?year >=2004) && (?year <= 2008))

4. RELATED WORK The literature on combining the keyword search with the

semantic search is rich; in this section we provide a brief overview of some relevant proposals.

Semantic web languages (i.e., RDF and OWL) can be used for knowledge encoding and can be used by services, tools, and applications [11]. The semantic web will not enable

only human to process web contents, but also machines will be able to process web contents. This can help in creating intelligent services, customized web, and have more powerful search engines [9].

Traditional search engines use keywords as their search basis. Semantic search applies semantic processing on key- words for a better retrieval search. Hybrid search utilizes the keyword search from regular search along with the ability to use semantic search to query and reason using Meta data. Using ontologies the search engines can find pages that have different syntax, but similar semantics [9].

The hybrid search provided users with more capabilities for searching and reasoning to get better results. According to Bhagdev et al. [5] there are three types of queries that are possible using hybrid search:

• Semantic search using the defined Meta data and the relations between instances. • Regular search using keywords. • Search for keywords within specific contents.

Kiryakov et al. [14] proposed a system in which the user can select between keyword based search or ontology based search, but s/he cannot merge them to obtain search results using the two approaches together.

Another work by Bhagdev et al. [5] introduced a search method that combines ontology and keyword search based methods. The research results shows that the use of hybrid search gives a better performance over keyword search or semantic search in real world cases.

Rocha et al. [18] combined ontology based information retrieval with regular search in a semantic search technique. They used spread activation algorithm to get activation value of the relevance of search results with keywords. The links in the ontology are given weights according to certain prop- erties. The proposed method do not identify promptly the unique concepts and relations.

In another work Gilardoni et al. [12] provided integration of keyword based search with ontology search, but with no capability for Boolean queries.

Hybrid search is implemented by some large companies in the industry. Google Product Search12is a semantic search service from Google which searches for products by linking between different attributes in the knowledge base to re- trieve a product. Sheth et al. [19] use keyword query to apply multi-domain search by automatically classifying and extracting information along with ontology and meta data information.

Guha et al. [13] used a semantic search that uses an ap- proach which combines traditional search and other data from distributed sources to answer the user query in more details. In the work of Davies et al. [8] QuizRDF is in- troduced. A system that combines the traditional search method with the ability to query and navigate RDF. The system shortcoming when there is a chaining in the query.

5. DISCUSSION We have presented in this paper, an approach for search-

ing and extracting authoring information. The approach is based on keyword and semantic search approaches. In the keyword search part, the entered keywords are used to col- lect authoring information. In this part, the Google search

12http://www.google.com/products

engine is used to correct the misspelled keywords, in particu- lar the author’s name, which allows to give more results. ad- ditionally, ad-hoc routines are used to extract bibliography elements from online databases. So, we suggest to include BibTeX elements in web pages as an RDFa annotations so that standard methods can be exploited. In the semantic part, the SPARQL query obtained from entered keywords is queried against the metadata corresponding to the author- ing information, which allows to give more precise results.

6. REFERENCES [1] Adida, B., and Birbeck, M. RDFa primer - bridging

the human and data webs. Working draft, W3C, 2008. http://www.w3.org/TR/xhtml-rdfa-primer/.

[2] Bast, H., Mortensen, C. W., and Weber, I. Output-sensitive autocompletion search. Inf. Retr. 11, 4 (2008), 269–286.

[3] Beckett, D., and McBride, B. RDF/XML syntax specification (revised). Recommendation, W3C, 2004. http://www.w3.org/TR/rdf-syntax-grammar/.

[4] Berners-Lee, T., Hendler, J., and Lassila, O. The semantic web, 2001. http://www.sciam.com/article.cfm?articleID=

00048144-10D2-1C70-84A9809EC588EF21.

[5] Bhagdev, R., Chapman, S., Ciravegna, F., Lanfranchi, V., and Petrelli, D. Hybrid search: Effectively combining keywords and semantic searches. In ESWC (2008), pp. 554–568.

[6] Brickley, D., and Guha, R. RDF vocabulary description language 1.0: RDF schema. Recommendation, W3C, 2004. http://www.w3.org/TR/rdf-schema/.

[7] Bulterman, D., Grassel, G., Jansen, J., Koivisto, A., Layäıda, N., Michel, T., Mullender, S., and Zucker, D. Synchronized Multimedia Integration Language (SMIL 2.1). Recommendation, W3C, 2005. http://www.w3.org/TR/SMIL/.

[8] Davies, J., and Weeks, R. Quizrdf: Search technology for the semantic web. In HICSS ’04: Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS’04) - Track 4 (Washington, DC, USA, 2004), IEEE Computer Society, p. 40112.

[9] Decker, S., Melnik, S., van Harmelen, F., Fensel, D., Klein, M., Broekstra, J., Erdmann, M., and Horrocks, I. The semantic web: the roles of XML and RDF. 63–73.

[10] Fenn, J. Managing citations and your bibliography with BibTeX. The Prac\TeX Journal 4 (2006). http://www.tug.org/pracjourn/2006-4/fenn/.

[11] Finin, T., and Ding, L. Search Engines for Semantic Web Knowledge. In Proceedings of XTech 2006: Building Web 2.0 (May 2006).

[12] Gilardoni, L., Biasuzzi, C., Ferraro, M., Fonti, R., and Slavazza, P. Lkms - a legal knowledge management system exploiting semantic web technologies. In International Semantic Web Conference (2005), Y. Gil, E. Motta, V. R. Benjamins, and M. A. Musen, Eds., vol. 3729 of Lecture Notes in Computer Science, Springer, pp. 872–886.

[13] Guha, R., McCool, R., and Miller, E. Semantic search. In WWW ’03: Proceedings of the 12th international conference on World Wide Web (New York, NY, USA, 2003), ACM, pp. 700–709.

[14] Kiryakov, A., Popov, B., Terziev, I., Manov, D., and Ognyanoff, D. Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web 2, 1 (2004), 49 – 79.

[15] Manola, F., and Miller, E. RDF primer. Recommendation, W3C, 2004. http://www.w3.org/TR/rdf-primer/.

[16] Patashnik, O. Bibtexing, 1988. http://ftp.ntua.gr/mirror/ctan/biblio/bibtex/

contrib/doc/btxdoc.pdf.

[17] Prud’hommeaux, E., and Seaborne, A. SPARQL query language for RDF. Recommendation, W3C, January 2008. http://www.w3.org/TR/rdf-sparql-query/.

[18] Rocha, C., Schwabe, D., and Aragao, M. P. A hybrid approach for searching in the semantic web. In WWW ’04: Proceedings of the 13th international conference on World Wide Web (New York, NY, USA, 2004), ACM, pp. 374–383.

[19] Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., and Warke, Y. Managing semantic content for the web. IEEE Internet Computing 6, 4 (2002), 80–87.

contributed articles

m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c at i o n s o f t h e a c m 121

d o i : 1 0 . 1 1 4 5 / 1 6 6 6 4 2 0 . 1 6 6 6 4 5 2

by fabio arduini and Vincenzo morabito

S i n c e t h e S e p t e m b e r 1 1 t h a t ta c k S on the World Trade Center,8 tsunami disaster, and hurricane Katrina, there has been renewed interest in emergency planning in both the private and public sectors. In particular, as managers realize the size of potential exposure to unmanaged risk, insuring “business continuity” (BC) is becoming a key task within all industrial and financial sectors (Figure 1).

Aside from terrorism and natural disasters, two main reasons for developing the BC approach in the finance sector have been identified as unique to it: regulations and business specificities.

Regulatory norms are key factors for all financial sectors in every country. Every organization is required to comply with federal/national law in addition to national and international governing bodies. Referring to business decisions, more and more organizations recognize that Business Continuity could be and should be strategic for the good of the business. The finance sector is, as a matter of fact, a sector in which the development of information technology (IT) and information systems (IS) have had a dramatic effect upon competitiveness. In this sector, organizations

have become dependent upon tech- nologies that they do not fully compre- hend. In fact, banking industry IT and IS are considered production not sup- port technologies. As such, IT and IS have supported massive changes in the ways in which business is conducted with consumers at the retail level. In- novations in direct banking would have been unthinkable without appropriate IS. As a consequence business continu- ity planning at banks is essential as the industry develops in order to safeguard consumers and to comply with interna- tional regulatory norms. Furthermore, in the banking industry, BC planning is important and at the same time dif- ferent from other industries, for three other specific reasons as highlighted by the Bank of Japan in 2003:

Maintaining the economic activity of ˲ residents in disaster areas2 by enabling the continuation of financial services during and after disasters, thereby sus- taining business activities in the dam- aged area;

Preventing widespread payment and ˲ settlement disorder2 or preventing sys- temic risks, by bounding the inability of financial institutions in a disaster area to execute payment transactions;

Reduce managerial risks ˲ 2 for example, by limiting the difficulties for banks to take profit opportunities and lower their customer reputation.

Business specificities, rather than regulatory considerations, should be the primary drivers of all processes. Even if European (EU) and US markets differ, BC is closing the gap. Progres- sive EU market consolidation neces- sitates common rules and is forcing major institutions to share common knowledge both on organizational and technological issues.

The financial sector sees business continuity not only as a technical or risk management issue, but as a driver towards any discussion on mergers and acquisitions; the ability to manage BC should also be considered a strate- gic weapon to reduce the acquisition timeframe and shorten the data center

business continuity and the banking industry

122 c o m m u n i c at i o n s o f t h e a c m | m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3

contributed articles

differences in preparing and imple- menting strategies that enhance busi- ness process security. Two approaches seem to be prevalent. Firstly, there are those disaster recovery (DR) strate- gies that are internally and hardware- focused9 and secondly, there are those strategies that treat the issues of IT and IS security within a wider internal-ex- ternal, hardware-software framework. The latter deals with IS as an integrat- ing business function rather than as a stand-alone operation. We have labeled this second type of business continuity approach (BCA).

As a consequence, we define BCA as a framework of disciplines, processes, and techniques aiming to provide continuous operation for “essential business functions” under all circum- stances.

More specifically, business continu- ity planning (BCP) can be defined as “a collection of procedures and informa- tion” that have been “developed, com- piled and maintained” and are “ready to use - in the event of an emergency or disaster.”6 BCP has been addressed by different contributions to the litera- ture. Noteworthy studies include Julia Allen’s contribution on Cert’s Octave methoda1 the activities of the Business Continuity Institute (BCI) in defining certification standards and practice guidelines, the EDS white paper on Business Continuity Management4 and

merge, often considered one of the top issues in quick wins and information and communication technology (ICT) budget savings.

business continuity concepts The evolution of IT and IS have chal- lenged the traditional ways of conduct- ing business within the finance sector. These changes have largely represented improvements to business processes and efficiency but are not without their flaws, in as much as business disrup- tion can occur due to IT and IS sources. The greater complexity of new IT and IS operating environments requires that organizations continually reassess how best they may keep abreast of changes and exploit those for organizational ad- vantage. In particular, this paper seeks to investigate how companies in the fi- nancial sector understand and manage their business continuity problems.

BC has become one of the most im- portant issues in the banking industry. Furthermore, there still appears to be some discrepancy as to the formal defi- nitions of what precisely constitutes a disaster and there are difficulties in as- sessing the size of claims in the crises and disaster areas.

One definition of what constitutes a disaster is an incident that leads to the formal invocation of contingency/ continuity plans or any incident which leads to a loss of revenue; in other words it is any accidental, natural or malicious event which threatens or dis- rupts normal operations or services, for as long a time as to significantly cause the failure of the enterprise. It follows then that when referring to the size of claims in the area of organizational cri- ses and disasters, the degree to which a company has been affected by such interruptions is the defining factor.

The definition of these concepts is important because 80% of those orga- nizations which face a significant crisis without either a contingency/recovery or a business continuity plan, fail to survive a further year (Business Con- tinuity Institute estimate). Moreover, the BCI believes that only a small num- ber of organizations have disaster and recovery plans and, of those, few have been renewed to reflect the changing nature of the organization.

In observing Italian banking indus- try practices, there seems to be major

finally, referring to banking, Business Continuity Planning at Financial Insti- tutions by the Bank of Japan.2 This last study illustrates the process and activi- ties for successful business continuity planning in three steps: 1. Formulating a framework for robust

project management, where banks should: a. develop basic policy and guidelines

for BC planning (basic policy); b. Develop a study firm-wide aspects

(firm-wide control section); c. Implement appropriate progress

control (project management pro- cedures)

2. Identifying assumptions and condi- tions for business continuity plan- ning, where banks should: a. Recognize and identify the poten-

tial threats, analyze the frequency of potential threats and identify the specific scenarios with mate- rial risk (Disaster scenarios);

b. Focus on continuing prioritized critical operations (Critical opera- tions);

c. Target times for the resumption of operations (Recovery time objec- tives);

3. Introducing action plans, where banks should: a. Study specific measures for busi-

ness continuity planning (BC measures);

b. acquire and maintain back-up data (Robust back-up data);

c. Determine the managerial re- sources and infrastructure avail- ability capacity required (Procure- ment of managerial resources);

figure 1. 2004 top business priorities in industrial and financial sectors (source Gartner)

a The Operationally Critical Threat, Asset, and Vulnerability Evaluation Method of CERT. CERT is a center of Internet security expertise, located at the Software Engineering Institute, a federally funded research and development center operated by Carnegie Mellon University.

contributed articles

m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c at i o n s o f t h e a c m 123

d. Determine strong time con- straints, a contact list and a means of communication on emergency decisions (Decision-making pro- cedures and communication ar- rangements);

e. Realize practical operational pro- cedures for each department and level (Practical manual)

4. Implement a test/training program on a regular basis (Testing and re- viewing).

business continuity aspects The business continuity approach has three fundamental aspects that can be viewed in a systemic way: technology, people and process.

Firstly, technology refers to the re- covery of mission-critical data and applications contained in the disas- ter recovery plan (DRP). It establishes technical and organizational measures in order to face events or incidents with potentially huge impact that in a worst case scenario could lead to the unavail- ability of data centers. Its development ought to ensure IT emergency proce- dures intervene and protect the data in question at company facilities. In the past, this was, whenever it even existed, the only part of the BCP.

Secondly, people refers to the recov- ery of the employees and physical work- space. In particular, BCP teams should be drawn from a variety of company departments including those from per- sonnel, marketing and internal consul- tants. Also the managers of these teams should possess general skill and they should be partially drawn from busi-

ness areas other than IT departments. Nowadays this is perceived as essential to real survival with more emphasis on human assets and value rather than on those hardware and software resources that in most cases are probably protect- ed by backup systems.

Finally, the term process here refers to the development of a strategy for the deployment, testing and maintenance of the plan. All BCP should be regularly updated and modified in order to take into consideration the latest kinds of threats, both physical as well as tech- nological.

Whereas a simple DR approach aims at salvaging those facilities that are sal- vageable, a BCP approach should have different foci. One of these ought to be treating IT and IS security with a wider internal-external, hardware-software framework where all processes are nei- ther in-house nor subcontracted-out but are a mix of the two so as to be an integrating business function rather than a stand alone operation. From this point of view the BCP constitutes a dual approach where management and technology function together.

In addition, the BCP as a global ap- proach must also consider all existing relationships, thus giving value to cli- ents and suppliers considering the to- tal value chain for business and to pro- tect business both in-house and out.

The BCP proper incorporates the di- saster recovery (DR) approach but rejects its exclusive focus upon facilities. It de- fines the process as essentially business- wide and one which enables competitive and/or organizational advantages.

it focus Versus business focus as a starting Point The starting point for planning pro- cesses that an organization will use as its BCP must include an assessment of the likely impact different types of ‘in- cidents’ will/would make on the busi- ness. As far as financial companies are concerned, IT focus is critical since, as mentioned, new technologies continue to become more and more integral to on going financial activities. In addition to assessing the likely impact upon the entire organization, banks must con- sider the likely effects upon their differ- ent business areas. The “vulnerability & business impact matrix” (Figure 2) is a tool that can be used to summarize the inter-linkages between the various information system services, their vul- nerability and the impact on business activities. It is useful in different ways.

To start, the BC approach doesn’t fo- cus solely upon IT problems but rather uses a business-wide approach. Given the strategic focus of BCP, an under- standing of the relationships between value-creating activities is a key deter- minant of the effectiveness of any such process. In this way we can define cor- rect BC perimeter (Figure 2) by trying to extract the maximum value from BCP within a context of bounded rationality and limited resources. What the BCP teams in these organizations have done is focus upon how resources were uti- lized and how they were added to value- creation rather than merely being “sup- port activity” which consumes financial resources unproductively. In addition, the convergence of customer with client technologies also demands that those managing the BCP process are aware of the need to “... expand the contingency role to not merely looking inward but actually looking out.” Such a dual focus uncovers the linkages between customer and client which create competitive ad- vantage. Indeed, in cases where clients’ business fundamentally depends upon information exchange, for instance many banks today provide online equity brokerage services, it might be argued that there is a ‘virtual value chain’ which the BCP team protects thereby provid- ing the ‘market-space’ for value creation to take place. Finally, another benefit is that vulnerability and business impact can aid the prioritization of particular key areas.

figure 2. Vulnerability & business impact matrix

124 c o m m u n i c at i o n s o f t h e a c m | m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3

contributed articles

player, yet their functions are just as vital to achieving the overall objectives of the football team. The value chain provides an opportunity to examine the connection between the exciting and the hum drum links that deliver customer value. The evolution of crisis preparations from the IT focused di- saster recovery (DR) solutions towards the BC approach reflects a growing un- derstanding that business continuity depends upon the maintenance of all elements which provide organizational efficiency-effectiveness and customer value, whether directly or indirectly.

Prevention focus of business continuity A final key characteristic of the BC ap- proach concerns its primary role in prevention. A number of authors have identified that the potential for crises is normal for organizations.7,11 Crisis avoidance requires a strategic approach and requires a good understanding of both the organization’s operating pro- cesses, systems and the environment in which it operates.

In the BC approach, a practice orga- nization should develop a BCP culture to eliminate the barriers to the develop- ment of crisis prevention strategies. In particular, these organizations should recognize that incidents, such as the New York terrorist attach or the City of London bombings are merely triggered by external technical causes and that their effects are largely determined by internal factors that were within the control of their organizations. In these cases a cluster of crises should be iden-

new and obsolete technologies Today’s approach to BCP is focused on well-structured process management and business-driven paradigms. Even if some technology systems seem to be “business as usual,” some considerations must be made to avoid any misleading conjecture from an analytical side.

When considering large institutions with systemic impact- not only on their own but on clients businesses as well- two key objectives need to be consid- ered when facing an event. These have been named RPO (Recovery Point Ob- jective) and RTO (Recovery Time Ob- jective) as shown in Figure 3. RPO deals with how far in the past you have to go to resume a consistent situation; RTO considers how long it takes to resume a standard or regular situation. The defi- nitions of RPO and RTO can change ac- cording to data center organization and how high a level a company wants to its own security and continuity to be.

For instance a dual site recovery sys- tem organization must consider and evaluate three points of view (Figure 3). These are: application’s availability, BC process and data perspective.

Data are first impacted (RTO) before the crisis event (CE) due to the closest “consistent point” from which to re- start. The crisis opening (CO) or decla- ration occurs after the crisis event (CE).

“RTO_s,” or computing environ- ment restored point, considers the length of time the computing environ- ment needs in order to be restored (for example, when servers, network etc. are once again available); “RTO_rc,” or mission critical application restarted point, indicates the “critical or vital ap- plications” (in rank order) are working once again; “RTO_r,” or applications and data restored point, is the point from which all applications and data are restored, but (and it is a big but) “RTO_end,” or previous environment restored point, is the true end point when the previous environment is fully restored (all BC solutions are properly working). Of the utmost importance is that during the period between “RTO_r” and “RTO_end” a second di- saster event could be fatal!

Natural risks are also increasing in scope and frequency, both in terms of floods (central Europe 2002) and hurri- canes (U.S. 2005), thus the coining of an actual geographical recovery distance,

today considered more than 500 miles. Such distance is forcing businesses and institutions alike to consider a new tech- nological approach and to undertake critical discussion on synchronous-asyn- chronous data replication: their intervals and quality. Therefore, more complex analysis about RPO and RTO is required.

However the most important issue, from a business point of view when faced with an imminent and unfore- seen disaster, is how to reduce restore or restart time, trying to shrink this win- dow to mere seconds or less. New push- ing technologies (SATA – Serial ATA and MAID – Massive Arrays Inexpen- sive Disk) are beginning to make some progress in reducing the time problem.

business focus Versus Value chain focus The business area selected by the “vul- nerability and business impact analy- sis matrix” should be treated in accor- dance with the value chain and value system. In addition to assessing the likely disaster impact upon IT depart- ments, organizations should consider disaster impacts over all company de- partments and their likely effects upon customers. Organizations should avoid the so-called Soccer Star Syndrome.6 In drawing an analogy with the football industry, one recognizes that greater management attention is often focused on the playing field rather than the un- glamorous, but very necessary, locker room and stadium management sup- port activities. Defenders and goalkeep- ers, let alone the stadium manager, do not get paid at the same level as the star

figure 3. rPo & rto

contributed articles

m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c at i o n s o f t h e a c m 125

tified. Such clusters should be catego- rized along the axis of internal-external and human/social-technical/economic causes and effects. By adopting a strate- gic approach, decisions could be made about the extent of exposure in particu- lar product markets or geographical sites. An ongoing change management program could contribute to real com- mitment from middle managers who, from our first investigation, emerged as key determinants of the success of the BC approach.

management support and sponsorship BCP success requires the commitment of middle managers. Hence manag- ers need to avoid considering BCP as a costly, administrative inconvenience that diverts time away from money- making activities. All organizational levels should be aware of the fact that BCP was developed in partnership be- tween the BCP team and front line op- eratives. As a result, strategic business units should own BCP plans. In addi- tion, CEO involvement is key in rallying support for the BCP process.

Two other key elements support the BC approach. Firstly, there is the recognition that responsibility for the process rests with business managers and this is reinforced through a formal appraisal and other reward systems. Secondly, peer pressure is deemed im- portant in getting laggards to assume responsibility and so affect a more re- ceptive culture.

Finally, BCP teams need to regard BCP as a process rather than as a spe- cific end-point.

conclusion Although the risk of terrorism and regulations are identified as two key factors for developing a business con- tinuity perspective, we see that orga- nizations need to adopt the BC ap- proach for strategic reasons. The trend to adopt a BC approach is also a proxy for organizational change in terms of culture, structure and communica- tions. The BC approach is increasingly viewed as a driver to generate competi- tive advantage in the form of resilient information systems and as an impor- tant marketing characteristic to attract and maintain customers.

Referring to organizational change

and culture, the BC approach should be a business-wide approach and not an IT-focused one. It needs supportive measures to be introduced to encour- age managers to adhere to the BC idea. Management as a whole should also be confident that the BC approach is an ongoing process and not only an end point that remains static upon comple- tion. It requires changes of key assump- tions and values within the organiza- tional structure and culture that lead to a real cultural and organizational shift. This has implications for the role that the BC approach has to play within the strategic management processes of the organization as well as within the levels of strategic risk that an organization may wish to undertake in its efforts to secure a sustainable competitive or so called first mover advantage.

References 1. Allen J.H. CERT® Guide to System and Network

Security Practices. Addison Wesley Professional, 2001. 2. Bank of Japan, Business Continuity Planning at

Financial Institutions, July 2003. http://www.boj.or.jp/ en/type/release/zuiji/kako03/fsk0307a.htm

3. Cerullo V. and Cerullo, J. Business continuity planning: A comprehensive approach. Informtion System Management Journal, Summer 2004.

4. Decker A. Business continuity management: A model for survival. EDS White Paper, 2004.

5. Dhillon, G. The challenge of managing information security. In International Journal of Information Management 1, 1(2004), 243–244.

6. Elliott D. and Swartz E. Just waiting for the next big bang: Business continuity planning in the uk finance sector. Journal of Applied Management Studies 8, 1 (1999), 45-60.

7. Greiner, L. Evolution and revolution as organisations grow. In Harvard Business Review (July/August) reprinted in Asch, D. & Bowman, C. (Eds) (1989) Readings in Strategic Management (London, Macmillan), 373-387.

8. Lam, W. Ensuring business continuity. IT Professional 4, 3 (2002), 19 - 25

9. Lewis, W. and Watson, R.T. Pickren A. An empirical assessment of IT disaster risk. Comm. ACM 46, 9 (2003), 201-206.

10. McAdams, A.C. Security and risk management: A fundamental business issue. Information Management Journal 38, 4 (2004), 36–44.

11. Pauchant, T.C. and Mitroff, I. Crisis prone versus crisis avoiding organisations: is your company’s culture its own worst enemy in creating crises?. Industrial Crisis Quarterly 2, 4 (1998), 53-63.

12. Quirchmayr, G. Survivability and business continuity management. In Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation. ACSW Frontiers (2004).

Vincenzo Morabito ([email protected]) is assistant professor of Organization and Information System at the Bocconi University in Milan where he teaches management information system, information management and organization. He is also Director of the Master of Management Information System System at the Bocconi University.

Fabio Arduini ([email protected]) is responsible for IT architecture and Business Continuity for defining the technological and business continuity statements for the Group according to the ICT department.

© 2010 ACM 0001-0782/10/0300 $10.00

The Anti-Forensics Challenge

Kamal Dahbur [email protected]

Bassil Mohammad [email protected]

School of Engineering and Computing Sciences New York Institute of Technology

Amman, Jordan

ABSTRACT Computer and Network Forensics has emerged as a new field in IT that is aimed at acquiring and analyzing digital evidence for the purpose of solving cases that involve the use, or more accurately misuse, of computer systems. Many scientific techniques, procedures, and technological tools have been evolved and effectively applied in this field. On the opposite side, Anti-Forensics has recently surfaced as a field that aims at circumventing the efforts and objectives of the field of computer and network forensics. The purpose of this paper is to highlight the challenges introduced by Anti-Forensics, explore the various Anti-Forensics mechanisms, tools and techniques, provide a coherent classification for them, and discuss thoroughly their effectiveness. Moreover, this paper will highlight the challenges seen in implementing effective countermeasures against these techniques. Finally, a set of recommendations are presented with further seen research opportunities.

Categories and Subject Descriptors K.6.1 [Management of Computing and Information Systems]: Projects and People Management – System Analysis and Design, System Development.

General Terms Management, Security, Standardization.

Keywords Computer Forensics (CF), Computer Anti-Forensics (CAF), Digital Evidence, Data Hiding.

1. INTRODUCTION The use of technology is increasingly spreading

covering various aspects of our daily lives. An equal increase, if not even more, is realized in the methods and techniques created with the intention to misuse the technologies serving varying objectives being political, personal or anything else. This has clearly been reflected in our terminology as well, where new terms like cyber warfare, cyber security, and cyber crime, amongst others, were introduced. It is also noticeable that such attacks are getting increasingly more sophisticated, and are utilizing novel methodologies and techniques. Fortunately, these attacks leave traces on the victim systems that, if successfully

recovered and analyzed, might help identify the offenders and consequently resolve the case(s) justly and in accordance with applicable laws. For this purpose, new areas of research emerged addressing Network Forensics and Computer Forensics in order to define the foundation, practices and acceptable frameworks for scientifically acquiring and analyzing digital evidence in to be presented in support of filed cases. In response to Forensics efforts, Anti-Forensics tools and techniques were created with the main objective of frustrating forensics efforts, and taunting its credibility and reliability.

This paper attempts to provide a clear definition for Computer Anti-Forensics and consolidates various aspects of the topic. It also presents a clear listing of seen challenges and possible countermeasures that can be used. The lack of clear and comprehensive classification for existing techniques and technologies is highlighted and a consolidation of all current classifications is presented.

Please note that the scope of this paper is limited to Computer- Forensics. Even though it is a related field, Network-Forensics is not discussed in this paper and can be tackled in future work. Also, this paper is not intended to cover specific Anti-Forensics tools; however, several tools were mentioned to clarify the concepts.

After this brief introduction, the remainder of this paper is organized as follows: section 2 provides a description of the problem space, introduces computer forensics and computer anti-forensics, and provides an overview of the current issues concerning this field; section 3 provides an overview of related work with emphasis on Anti-Forensics goals and classifications; section 4 provides detailed discussion of Anti-Forensics challenges and recommendations; section 5 provides our conclusion, and suggested future work.

2. THE PROBLEM SPACE Rapid changes and advances in technology are impacting every aspect of our lives because of our increased dependence on such systems to perform many of our daily tasks. The achievements in the area of computers technology in terms of increased capabilities of machines, high speeds communication channels, and reduced costs resulted in making it attainable by the public. The popularity of the Internet, and consequently the technology associated with it, has skyrocketed in the last decade (see Table 1 and Figure 1). Internet usage statistics for 2010 clearly show the huge increase in Internet users who may not necessary be computer experts or even technology savvy [1].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISWSA’11, April 18–20, 2011, Amman, Jordan.

Copyright 2011 ACM 978-1-4503-0474-0/04/2011…$10.00.

WORLD INTERNET USAGE AND POPULATION STATISTICS

World Regions Population

(2010 Est.)

Internet Users

Dec. 31, 2000

Internet Users

Latest Data

Growth

2000-2010

Africa 1,013,779,050 4,514,400 110,931,700 2357%

Asia 3,834,792,852 114,304,000 825,094,396 622%

Europe 813,319,511 105,096,093 475,069,448 352%

Middle East 212,336,924 3,284,800 63,240,946 1825%

North America 344,124,450 108,096,800 266,224,500 146%

Latin America/

Caribbean 592,556,972 18,068,919 204,689,836 1033%

Oceania/Australia 34,700,201 7,620,480 21,263,990 179%

WORLD TOTAL 6,845,609,960 360,985,492 1,966,514,816 445%

Table 1. World Internet Usage – 2010 (Reproduced from [1]).

Figure 1. World Internet Usage–2010 (Based on Data from [1])

Unfortunately, some of the technology users will not use it in a legitimate manner; instead, some users may deliberately misuse it. Such misuse can result in many harmful consequences including, but not limited to, major damage to others systems or prevention of service for legitimate users. Regardless of the objectives that such “bad guys” might be aiming for from such misuse (e.g. personal, financial, political or religious purposes), one common goal for such users is the need to avoid detection (i.e. source determination). Therefore, these offenders will exert thought and effort to cover their tracks to avoid any liabilities or accountability for their damaging actions. Illegal actions (or crimes) that involve a computing system, either as a mean to carry out the attack or as a target, are referred to as Cybercrimes [2]. Computer crime or Cybercrime are two terms that are being used interchangeably to refer to the same thing. A Distributed Denial of Service attack (DDoS) is a good example for a computer crime where the computing system is used as a mean as well as a target. Fortunately, cybercrimes leave fingerprints that investigators can collect, correlate and analyze to understand what, why, when and how a crime was committed; and consequently, and most importantly, build a good case that can bring the criminals to justice. In this sense, computers can be seen as great source of evidence. For this purpose Computer Forensics (CF) emerged as a major area of interest, research and development driven by the legislative needs of having scientific reliable framework, practices, guidelines, and techniques for forensics activities starting from evidence acquisition, preservation, analysis, and finally presentation. Computer Forensics can be defined as the process of scientifically

obtaining, examining and analyzing digital information so that it can be used as evidence in civil, criminal or administrative cases [2]. A more formal definition of Computer Forensics is “the discipline that combines elements of law and computer science to collect and analyse data from computer systems, networks, wireless communications, and storage devices in a way that is admissible as evidence in a court of law” [3].

To hinder the efforts of Computer Forensics, criminals work doggedly to instigate, develop and promote counter techniques and methodologies, or what is commonly referred to as Anti- Forensics. If we adopt the definition of Computer Forensics (CF) as scientifically obtaining, examining, and analysing digital information to be used as evidence in a court of law, then Anti- Forensics can be defined similarly but in the opposite direction. In Computer Anti-Forensics (CAF) scientific methods are used to simply frustrate Forensics efforts at all forensics stages. This includes preventing, impeding, and/or corrupting the acquiring of the needed evidence, its examination, its analysis, or its credibility. In other words, whatever necessary to ensure that computer evidence cannot get to, or will not be admissible in, a court of law.

The use of Computer Anti-Forensics tools and techniques is evident and far away from being an illusion. So, criminals’ reliance on technology to cover their tracks is not a claim, as clearly reflected in recent researches conducted on reported and investigated incidents. Based on 2009-2010 Data Breach Investigations Reports [4][5], investigators found signs of anti- forensics usage in over one third of cases in 2009 and 2010 with the most common forms being the same for both years. The results show that the overall use of anti-forensics remained relatively flat with slight movement among the techniques themselves. Figure [2] below shows the types of anti-Forensic techniques used (data wiping, data hiding and data corruption) by percentage of breaches. As shown in Figure [2] below, data wiping is still the most common, because it is supported by many commercial off-the-shelf products that are available even as freeware that are easy to install, learn and use; while data hiding and data corruption remain a distant behind.

Figure 2 Types of Anti-Forensics – 2010 (Reproduced from [5])

It is important to note that the lack of understanding on what CAF is and what it is capable of may lead to underestimating or probably overlooking CAF impact on the legitimate efforts of CF. Therefore, when dealing with computer forensics, it is important that we address the following questions, among others, that are related to CAF: Do we really have everything? Are the collected evidences really what were left behind or they are only just those intentionally left for us to find? How to know if the CF tool used was not misleading us due to certain weaknesses in the tool itself? Are these CF tools developed according to proper secure software engineering methodologies? Are these CF tools immune against attacks? What are the recent CAF methods and techniques? This paper attempts to provide some answers to such questions that can assist in developing the proper understanding for the issue.

3. RELATED WORK, CAF GOALS AND

CLASSIFICATIONS Even though computer forensics and computer ant-forensics are tightly related, as if they are two faces of the same coin, the amount of research they received was not the same. CF received more focus over the past ten years or so because of its relation with other areas like data recovery, incident management and information systems risk assessment. CF is a little bit older, and therefore more mature than CAF. It has consistent definition, well defined systematic approach and complete set of leading best practices and technology.

CAF on the other side, is still a new field, and is expected to get mature overtime and become closer to CF. In this effort, recent research papers attempted to introduce several definitions, various classifications and suggest some solutions and countermeasures. Some researchers have concentrated more on the technical aspects of CF and CAF software in terms of vulnerabilities and coding techniques, while others have focused primarily on understanding file systems, hardware capabilities, and operating systems. A few other researchers chose to address the issue from an ethical or social angle, such as privacy concerns. Despite the criticality of CAF, it is hard to find a comprehensive research that addresses the subject in a holistic manner by providing a consistent definition, structured taxonomies, and an inclusive view of CAF.

3.1. CAF Goals As stated in the previous section, CAF is a collection of tools and techniques that are intended to frustrate CF tools and CF’s investigators efforts. This field is growingly receiving more interest and attention as it continues to expose the limitations of currently available computer forensics techniques as well as challenge the presumed reliability of common CF tools. We believe, along with other researchers, that the advancements in the CAF field will eventually put the necessary pressure on CF developers and vendors to be more proactive in identifying possible vulnerabilities or weaknesses in their products, which consequently should lead to enhanced and more reliable tools.

CAF can have a broad range of goals including: avoiding detection of event(s), disrupting the collection of information, increasing the time an examiner needs to spend on a case, casting doubt on a forensic report or testimony. In addition,

these goals may also include: forcing the forensic tool to reveal its presence, using the forensic tool to attack the organization in which it is running, and leaving no evidence that an anti-forensic tool has been run [6].

3.2. CAF Classifications Several classifications for CAF have been introduced in the literature. These various taxonomies differ in the criteria used to do the classification. The following are the most common approaches used:

1. Categories Based on the Attacked Target • Attacking Data: The acquisition of evidentiary data in

the forensics process is a primary goal. In this category CAFs seek to complicate this step by wiping, hiding or corrupting evidentiary data.

• Attacking CF Tools: The major focus of this category is the examination step of the forensics process. The objective of this category is to make the examination results questionable, not trustworthy, and/or misleading by manipulating essential information like hashes and timestamps.

• Attacking the Investigator: This category is aimed at exhausting the investigator’s time and resources, leading eventually to the termination of the investigation.

2. CAF Techniques vs. Tactics This categorization makes a clear distinction

between the terms anti-forensics and counter-forensics [7], even though the two terms have been used interchangeably by many others as the emphasis is usually on technology rather than on tactics.

• Counter-Forensics: This category includes all techniques that target the forensics tools directly to cause them to crash, erase collected evidence, and/or break completely (thus disallowing the investigator from using it). Compression bombs are good example on this category.

• Anti-Forensics: This category includes all technology related techniques including encryption, steganography, and alternate data streams (ADMs).

3. Traditional vs. Non-Traditional

• Traditional Techniques: This category includes techniques involving overwriting data, Cryptography, Steganography, and other data hiding approaches beside generic data hiding techniques.

• Non-Traditional Techniques: As opposed to traditional techniques, these techniques are more creative and impose more risk as they are harder to detect. These include: o Memory injections, where all malicious

activities are done on the volatile memory area. o Anonymous storage, utilizes available web-

based storage to hide data to avoid being found on local machines.

o Exploitation of CF software bugs, including Denial of Service (DoS) and Crashers, amongst others.

4. Categories Based on Functionality

This categorization includes data hiding, data wiping and obfuscation. Attacks against CF processes and tools is considered a separate category based on this scheme

4. CAF CHALLENGES Because Computer Anti-Forensics (CAF) is a relatively new discipline, the field faces many challenges that need considered and addressed. In this section, we have attempted to identify the most pressing challenges surrounding this area, highlight the research needed to address such challenges, and attempt to provide perceptive answers to some the concerns.

4.1. Ambiguity Aside from having no industry-accepted definition for CAF, studies in this area view anti-forensics differently; this leads to not having a clear set of standards or frameworks for this critical area. Consequently, misunderstanding may be an unavoidable end result that could lead to improperly addressing the associated concerns. The current classification schemes, stated above, which mostly reflect the author’s viewpoint and probably background, confirm as well as contribute to the ambiguity in this field. A classification can only be beneficial if it must has clear criteria that can assist not only in categorizing the current known techniques and methodologies but will also enable proper understanding and categorization of new ones. The attempt to distinguish between the two terms, anti-forensics and counter- forensics based on technology and tactics is a good initiative but yet requires more elaboration to avoid any unnecessary confusions.

To address the definition issue, we suggest to adopt a definition for CAF that is built from our clear understanding of CF. The classification issue can be addressed by narrowing the gaps amongst the different viewpoints in the current classifications and excluding the odd ones.

4.2. Investigation Constraints A CF investigation has three main constraints/challenges, namely: time, cost and resources. Every CF investigation case should be approached as separate project that requires proper planning, scoping, budgeting and resources. If these elements are not properly accounted for, the investigation will eventually fail, with most efforts up to the point of failure being wasted. In this regard, CAF techniques and methodologies attempt to attack the time, cost and resources constraints of an investigation project. An investigator may not able to afford the additional costs or allocate the additional necessary resources. Most importantly, the time factor might play a critical role in the investigation as evidentiary data might lose value with time, and/or allow the suspect(s) the opportunity to cover their tracks or escape. Most, if not all, CAF techniques and methodologies (including data wiping, data hiding, and data corruption) attempt to exploit this weakness. Therefore, it proper project management is imperative before and during every CF investigation.

4.3. Integration of Anti-Forensics into Other

Attacks

Recent researches show an increased adoption of CAF techniques into other typical attacks. The primary purposes of integrating CAF into other attacks are undetectability and deletion of evidence. Two major areas for this threatening integration are Malware and Botnets [8][9]. Malwares and Botnets when armed with these techniques will make the investigative efforts labour and time intensive which can lead to overlooking critical evidence, if not abandoning the entire investigation.

4.4. Breaking the Forensics Software CF tools are, of course, created by humans, just like other software systems. Rushing to release their products to the market before their competition, companies tend to, unintentionally, introduce vulnerabilities into their products. In such cases, software development best practices, which are intended to ensure the quality of the product, might be overlooked leading to the end product being exposed to many known vulnerabilities, such as buffer overflow and code injection. Because CF software is ultimately used to present evidence in courts, the existence of such weaknesses is not tolerable. Hence, all CF software, before being used, must be subjected to thorough security testing that focuses on robustness against data hiding and accurate reproduction of evidence.

The Common Vulnerabilities and Exposures (CVE) database is a great source for getting updates on vulnerabilities in existing products [10]. Some studies have reported several weaknesses that may result in crashes during runtime leaving no chance for interpreting the evidence [11]. Regardless of the fact that some of these weaknesses are still being disputed [12], it is important to be aware that these CF tools are not immune to vulnerabilities, and that CAF tools would most likely take advantage of such weaknesses. A good example of a common technique that can cause a CF to fail or crash is the “Compression Bomb”; where files are compressed hundreds of times such that when a CF tool tries to decompress, it will use up so many resources causing the computer or the tool to hang or crash.

4.5. Privacy Concerns Increasingly, users are becoming more aware of the fact that just deleting a file does not make it really disappear from the computer and that it can be retrieved by several means. This awareness is driving the market for software solutions that provide safe and secure means for files deletion. Such tools are marketed as “privacy protection” software and claim to have the ability to completely remove all traces of information concerning user’s activity on a system, websites, images and downloaded files. Some of these tools do not only provide protection through secure deletion; but also offer encryption and compression. Moreover, these tools are easy use, and some can even be downloaded for free. WinZip is a popular tool that offers encryption, password protection, and compression. Such tools will most definitely complicate the search for and acquiring of evidence in any CF investigation because they make the whole process more time and resources consuming.

Privacy issues in relation to CF have been the subject of detailed research in an attempt to define appropriate policies and

procedures that would maintain users’ privacy when excessive data is acquired for forensics purposes [13].

4.6. Nature of Digital Evidence CF investigations rely on two main assumptions to be successful: (1) the data can be acquired and used as evidence, and (2) the results of the CF tools are authentic, reliable, and believable. The first assumption highlights the importance of digital evidence as the basis for any CF investigation; while the second assumption highlights the critical role of the trustworthiness of the CF tools in order for the results to stand solid in courts.

Digital evidence is more challenging than physical evidence because of its more susceptible to being altered, hidden, removed, or simply made unreadable. Several techniques can be utilized to achieve such undesirable objectives that can complicate the acquisition process of evidentiary digital data, and thus compromise the first assumption.

CF tools rely on many techniques that can attest to their trustworthiness, including but limited to: hashing; timestamps; and signatures during examination, analyses and inspection of source files. CAF tools can in turn utilize new advances in technology to break such authentication measures, and thus comprise the second assumption..

The following is a brief explanation of some of the techniques that are used to compromise these two assumptions:

• Encryption is used to make the data unreadable. This is one of the most challenging techniques, as advances in encryption algorithms and tools empowered it to be applied on entire hard drive, selected partitions, or specific directories and files. In all cases, an encryption key is usually needed to reverse the process and decrypt the desired data, which is usually unknown to an investigator, in most cases. To complicate matters, decryption using brute-force techniques becomes infeasible when long keys are used. More success in this regard might be achieved with keyloggers or volatile memory content acquisition.

• Steganography aims at hiding the data, by embedding it into another digital form, such as images or videos. Commercial Steganalysis tools, that can detect hidden data, exist and can be utilized to counter Steganography. Encryption and Steganography can be combined to obscure data and make it also unreadable, which can extremely complicate a CF investigation.

• Secure-Deletion removes the target data completely from the source system, by overwriting it with random data, and thus rendering the target data unrecoverable. Fortunately, most of the available commercial secure-deletion tools tend to underperform and thus miss some data [14]. More research is needed in this area to understand the weaknesses and identify the signatures of such tools. Such information is needed to detect the operations and minimize the impact of these tools.

• Hashing is used by CF tools to validate the integrity of data. A hashing algorithm accepts a variable-size input, such as a file, and generates a unique fixed-size value that corresponds to the given input. The generated output is unique and can be used as a fingerprint for the input file. Any change in the original file, no matter how minor, will result in considerable change in the hash value produced by the hashing algorithm. A key feature in hashing algorithms is “Irreversibility” where having the hash value in hand will not allow the recovery of the original input. Another key feature is “Uniqueness” which basically means that the hash values of two files will be equal if and only if the files are absolutely identical. Many hashing algorithms have developed, and some have been already infiltrated or cracked. Other algorithms like MD5, MD6, Secure Hashing Algorithms (SHA), SHA-1, SHA-2, amongst others, are harder to break. However, all are vulnerable to being infiltrated as technology and research advance [15]. Research is also necessary in the other direction to enhance the capabilities of CF tools in this regard and maintain their credibility.

• Timestamps are associated with files and are critical for the task of establishing the chain of events during a CF investigation. The time line for the events is contingent on the accuracy of timestamps. CAF tools have provided the capability to modify timestamps of files or logs, which can mislead an investigation and consequently coerce the conclusion. Many tools currently exist on the market, some are even freely available, that make it easy to manipulate the timestamps, such as Timestamp Modifier and SKTimeStamp [16].

• File Signatures, also known as Magic Numbers, are constant known values that exist at the beginning of each file to identify the file type (e.g. image file, word document, etc.). Hexadecimal editors, such as WinHex, can be used to view and inspect these values. Forensics investigators rely on these values to search for evidence of certain type. When a file extension is changed, the actual type file is not changed, and thus the file signature remains unchanged. ACF tools intentionally change the file signatures in their attempt to mislead the investigations as some evidence files are overlooked or dismissed. Complete listing of file signatures or magic numbers can be found on the web in [17].

• CF Detection is simply the capability of ACF tools to detect the presences of CF software and their activities or functionalities. Self-Monitoring, Analysis and Reporting Technology (SMART) built into most hard drives reports the total number of power cycles (Power_Cycle_Count), the total time that a hard drive has been in use (Power_On_Hours or Power_On_Minutes), a log of high temperatures that the drive has reached, and other manufacturer-determined attributes. These counters can be reliably read by user programs and cannot be reset. Although the SMART specification implements a DISABLE command (SMART 96), experimentation indicates that the few drives that actually implement the DISABLE command continue to keep track of the time-in- use and power cycle count and make this information available after the next power cycle. CAF tools can read SMART counters to detect attempts at forensic analysis and alter their behavior accordingly. For example, a dramatic

increase in Power_On_Minutes might indicate that the computer’s hard drive has been imaged [18].

• Business Needs: Cloud Computing (CC) is a business model typically suited for small and medium enterprises (SME) that do not have enough resources to invest in building their own IT infrastructure. Hence, they tend to outsource this to third parties who will in turn lease their infrastructure and probably applications as services. This new model introduces more challenges to CF investigations due to mainly the fact that the data is on the cloud (i.e. hosted somewhere in the Internet space), being transferred across countries with different regulations, and most importantly might reside on a machine that hosts other data instances of other enterprises. In some instances, the data for the same enterprise might even be stored across multiple data centres [19][20]. These issues complicate the CF’s primary functions (i.e. data acquisition, examination, and analyses) needed to build a good case extremely hard.

4.7 Recommendations Based on our findings, we see room for improvement in the field of ACF that can address some of the issues surrounding this field. We believe that such recommendations, when adopted and/or implemented properly, can add value and consolidate the efforts for advancing this field. Below is a list and brief explanation of the recommendations:

a) Spend More Efforts to Understand ACF More efforts should be spent in order to reach an agreed upon comprehensive definition for ACF that would assist in getting better understanding of the concepts in the field. These efforts should also extend to develop acceptable best practices, procedures and processes that constitute the proper framework, or standard, that professionals can use and build onto. ACF classifications also need to be integrated, clarified, and formulated on well-defined criteria. Such fundamental foundational efforts would eventually assist researchers and experts in addressing the issues and mitigating the associated risks.

Awareness of AFC techniques and their capabilities will prevent, or at least reduce, their success and consequently their impact on CF investigations. Knowledge in this area should encompass both techniques and tactics. Continued education and research are necessary to stay atop of latest developments in the field, and be ready with appropriate countermeasures when and as necessary.

b) Define Laws that Prohibit Unjustified Use of ACF Existence of strict and clear laws that detail the obligations and consequences of violations can play a key deterrent role for the use of these tools in a destructive manner. When someone knows in advance that having certain ACF tools on one’s machine might be questioned and possibly pose some liabilities, one would probably have second thoughts about installing such tools.

Commercial non-specialized ACF tools, which are more commonly used, always leave easily detectable fingerprints and signatures. They sometimes also fail to fulfil their developers’ promises of deleting all traces of data. This can

later be used as evidence against a suspected criminal and can lead to an indictment. The proven unjustified use of ACF tools can be used as supporting incriminatory evidence in courts in some countries [21].

To address the privacy concerns, such as users needs to protect personal data like family pictures or videos, an approved list of authorized software can be compiled with known fingerprints, signatures and special recovery keys. Such information, especially recovery keys, would then be safe-guarded in possession of the proper authorities. It would strictly be used to reverse the process of AFC tools, through the appropriate judicial processes.

c) Utilize Weaknesses of ACF Software In some cases, digital evidence can still be recovered if a data wiping tool is poorly used or is functioning improperly. Hence, each AFC software must be carefully examined and continuously analyzed in order to fully understand its exact behaviour and determine its weaknesses and vulnerabilities [14][22]. This can help to develop the appropriate course of actions given the different possible scenarios and circumstances. This could prove to be valuable in saving time and resources during an investigation.

d) Harden CF Software CAF and CF thrive on the weaknesses of each other. To ensure justice CF must always strive to be more advanced than its counterpart. This can be achieved by conducting security and penetration tests to verify the software is immune to external attacks. Also, it is imperative not to submit to market pressure and demand for tools by rapidly releasing products without proper validation. The best practices of software development must not be overlooked at any rate. When vulnerabilities are identified, proper fixes and patches must be tested, verified and deployed promptly in order to avoid zero-day attacks.

5. CONCLUSION AND FUTURE WORK

5.1. Conclusion Computer Anti-Forensics (CAF) is an important developing area of technology. Because CAF success means that digital evidence will not be admissible in courts, Computer Forensics (CF) must evaluate its techniques and tactics very carefully. Also, CF efforts must be integrated and expedited to narrow the current exiting gap with CAF. It is important to agree on an acceptable definition and classification for CAF which will assist in implementing proper countermeasures. Current definitions and classifications all seem to concentrate on specific aspects of CAF without truly providing the needed holistic view.

It is very important to realize that CAF is not only about tools that are used to delete, corrupt, or hide evidence. CAF is a blend of techniques and tactics that utilize technological advancements in areas like encryption and data overwriting amongst other techniques to obstruct investigators’ efforts.

Many challenges exist and need to be carefully analyzed and addressed. In this paper we attempted to identify some of these

challenges and suggested some recommendations that might, if applied properly, mitigate the risks.

5.2. Future Work This paper provides solid foundation for future work that can further elaborate on the various highlighted areas. It suggests a definition for CAF that is closely aligned with CF and presents several classifications that we deem acceptable. It also discusses several challenges that can be further addressed in future research. CAF technologies, techniques, and tactics need to receive more attention in research, especially in the areas that present debates on hashes, timestamps, and file signatures.

Research opportunities in Computer Forensics, Network Forensics, and Anti-Forensics can use the work presented in this paper as a base. Privacy concerns and other issues related to the forensics field introduce a raw domain that requires serious consideration and analysis. Cloud computing, virtualization, and related laws and regulations concerns are topics that can be considered in future research.

6. REFERENCES [ 1 ] Corey Thuen, University of Idaho: “Understanding

Counter-Forensics to Ensure a Successful Investigation”. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.138.2196

[ 2 ] Internet Usage Statistics, “The Internet Big Picture, World Internet Users and Population Stats”. DOI= http://www.internetworldstats.com/stats.htm

[ 3 ] Bill Nelson, Amelia Phillips, and Steuart, “Guide to Computer Forensics and Investigations”, pp 2-3, 4

th

Edition. [ 4 ] US-Computer Emergency Readiness Team, CERT, a

government organization, “Computer Forensics”, 2008. [ 5 ] Verizon Business, “2009 Data Breach Investigations

Report”. A study conducted by the Verizon RISK Team in cooperation with the United States Secret Service. DOI=http://www.verizonbusiness.com/about/news/podca sts/1008a1a3-111=129947-- Verizon+Business+2009+Data+Breach+Investigations+ Report.xml

[ 6 ] Verizon Business, “2010 Data Breach Investigations Report”. A study conducted by the Verizon RISK Team in cooperation with the United States Secret Service. DOI=http://www.verizonbusiness.com/resources/reports/ rp_2010-data-breach- report_en_xg.pdf?&src=/worldwide/resources/index.xml &id=

[ 7 ] Simson Garfinkel, “Anti-Forensics: Techniques, Detection and Countermeasures”, 2

nd International

Conference in i-Warefare and Security, pp 77, 2007 [ 8 ] W.Matthew Hartley, “Current and Future Threats to

Digital Forensics”, ISSA Journal, August 2007 [ 9 ] Murray Brand, (2007), “Forensics Analysis Avoidance

Techniques of Malware”, Edith Cowan University, Australia.

[ 10 ] “Security 101: Botnets”. DOI= http://www.secureworks.com/research/newsletter/2008/0 5/

[ 11 ] Common Vulnerabilities and Exposures (CVE) database, http://cve.mitre.org/

[ 12 ] Tim Newsham, Chris Palmer, Alex Stamos, “Breaking Forensics Software: Weaknesses in Critical Evidence

Collection”, iSEC Partners http://www.isecpartners.com, 2007

[ 13 ] Guidance Software: Computer Forensics Solutions and Digital Investigations (http://www.guidancesoftware.com/)

[ 14 ] S. Srinivasan, “Security and Privacy vs. Computer Forensics Capabilities”, ISACA Online Journal, 2007

[ 15 ] Matthew Geiger, Carnegie Mellon University, “Evaluating Commercial Counter-Forensic Tools”, Digital Forensic Research Workshop (DFRWS), 2005

[ 16 ] Xiaoyun Wang and Hongbo Yu, Shandong University, China, “How to Break MD5 and Other Hash Functions”, EUROCRYPT 2005, pp.19-35, May, 2005

[ 17 ] How to Change TimeStamp of a File in Windows. DOI= http://www.trickyways.com/2009/08/how-to-change- timestamp-of-a-file-in-windows-file-created-modified-

and-accessed/. [ 18 ] File Signature Table. DOI=

http://www.garykessler.net/library/file_sigs.html, [ 19 ] McLeod S, “SMART Anti-Forensics”, DOI=

http://www.forensicfocus.com/smart-anti-forensics, . [ 20 ] Stephen Biggs and Stilianos, “Cloud Computing

Storms”, International Journal of Intelligent Computing Research (IJICR), Volume 1, Issue 1, MAR, 2010

[ 21 ] U Gurav, R Shaikh, “Virtualization – A key feature of cloud computing”, International Conference and Workshop on Emerging Trends in technology (ICWET 2010), Mumbai, India

[ 22 ] U.S .v .Robert Johnson - Child Pornography Indictment. DOI=http://news.findlaw.com/hdocs/docs/chldprn/usjhns n62805ind.pdf

[ 23 ] United States of America v. H. Marc Watzman. DOI= http://www.justice.gov/usao/iln/.../2003/watzman.pdf

[ 24 ] Mark Whitteker, “Anti-Forensics: Breaking the Forensics Process”, ISSA Journal, November, 2008

[ 25 ] Gary C. Kessler,“Anti-Forensics and the Digital Investigator”, Champlain College, USA

[ 26 ] Ryan Harris, “Arriving at an anti-forensics consensus: examining how to define and control the anti-forensics

problem”, DOI= www.elsevier.com/locate/dinn.

Appendix A: Anti-Forensics Tools

The following is a list of some commercial CAF software packages available on the market. The tools listed below are intended as examples; none of these tools were purchased or tested as part of this paper work.

Category Tool Name

Privacy and Secure Deletion Privacy Expert; SecureClean; PrivacyProtection; Evidence Eliminator; Internet Cleaner

File and Disk Encryption TruCrypt, PointSec; Winzip 14

Time stamp Modifiers SKTimeStamp; Timestamp Modifier; Timestomp

Others The Defiler’s Toolkit – Necrofile and Klimafile; Metasploit Anti- Forensic Investigation Arsenal (known affectionately as MAFIA)

Download and read the following articles available in the ACM Digital Library:

Arduini, F., & Morabito, V. (2010, March). Business continuity and the banking industry. Communications of the ACM, 53(3), 121-125 

Dahbur, K., & Mohammad, B. (2011). The anti-forensics challenge. Proceedings from ISWSA '11: International Conference on Intelligent Semantic Web-Services and Applications. Amman, Jordan. 

Write a five to seven (5-7) page paper in which you:

1. Consider that Data Security and Policy Assurance methods are important to the overall success of IT and Corporate data security.

     a. Determine how defined roles of technology, people, and processes are necessary to ensure resource allocation for business 

         continuity.

     b. Explain how computer security policies and data retention policies help maintain user expectations of levels of business 

         continuity that could be achieved.

     c. Determine how acceptable use policies, remote access policies, and email policies could help minimize any anti-forensics  

         efforts. Give an example with your response.

2. Suggest at least two (2) models that could be used to ensure business continuity and ensure the integrity of corporate forensic   

    efforts. Describe how these could be implemented.

3. Explain the essentials of defining a digital forensics process and provide two (2) examples on how a forensic recovery and analysis 

    plan could assist in improving the Recovery Time Objective (RTO) as described in the first article.

4. Provide a step-by-step process that could be used to develop and sustain an enterprise continuity process.  

5. Describe the role of incident response teams and how these accommodate business continuity.

6. There are several awareness and training efforts that could be adopted in order to prevent anti-forensic efforts.

     a. Suggest two (2) awareness and training efforts that could assist in preventing anti-forensic efforts.

     b. Determine how having a knowledgeable workforce could provide a greater level of secure behavior. Provide a rationale with 

         your response.  

     c. Outline the steps that could be performed to ensure continuous effectiveness.

7. Use at least three (3) quality resources in this assignment. Note: Wikipedia and similar Websites do not qualify as quality 

         resources. 

Your assignment must follow these formatting requirements:

· Be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides; citations and references must follow APA or school-specific format. Check with your professor for any additional instructions.

· Include a cover page containing the title of the assignment, the student’s name, the professor’s name, the course title, and the date. The cover page and the reference page are not included in the required assignment page length. 

The specific course learning outcomes associated with this assignment are:

· Describe and apply the 14 areas of common practice in the Department of Homeland Security (DHS) Essential Body of Knowledge.

· Describe best practices in cybersecurity.

· Explain data security competencies to include turning policy into practice.

· Describe digital forensics and process management.

· Evaluate the ethical concerns inherent in cybersecurity and how these concerns affect organizational policies.

· Create an enterprise continuity plan.

· Describe and create an incident management and response plan.

· Describe system, application, network, and telecommunications security policies and response.

· Use technology and information resources to research issues in cybersecurity.

· Write clearly and concisely about topics associated with cybersecurity using proper writing mechanics and technical style conventions.

Analyzing Texts Essay Two

1.Purpose

In 50 Essays, in the “Table of Contents by Theme,” select one of the readings listed under the “Identity” category (but not the Brendt Staples).(I have selected one, and took photos for you) Write a critique that is similar to a book review to recommend whether the reading should or should not be included in the next edition (the 5th) of 50 Essays. It is best to select a reading that you feel strongly positive or negative.

2. Requirements – this assignment has three parts: 4a, 4b and 4c.

A. Preliminary. Download and save the form “Questions for the Close Reading of an Essay” . Complete the form for your selected reading.

B. Essay Content. The first paragraph of your essay must clearly state whether the reading should or should not be included in the next edition and the primary reason why. This is your thesis. Do not research the topic or the author. Instead, analyze and critique the reading on its own merit. Use the answers from the “Questions for the Close Reading of an Essay” form to build your essay. Use at least two quotations from the reading. Is the reading appropriate for college students in an English composition and rhetoric class? Look at the syllabus for our class. Does the reading support some of the ‘Course Objectives and Learning Outcomes’?

C. Essay Format. Use MLA format to write at least three pages: two pages of textural analysis and the Works Cited page that contains only your selected reading using the MLA format for a ‘work in an anthology’ (#7) on page 123 of The Little SeaGull Handbook. Do not write in first person ‘I’.

D. Submit your draft .

E. Make revisions to your essay based on comments from other students in the Peer Workshop. Upload your final version to 4c by the due date. Note that your essay will be submitted to “Turnitin.”

3. Resources

A. Use book reviews (NY Times, GoodReads, KirKus) as examples. See also the SeaGull website wwnorton.com/write/little-seagull-handbook “Model Student Papers.” Look at the example under the “Evaluation” category which is a movie review and similar to a book or essay review.

B. Consider my comments and any problems that were identified on essay#1. I will consider progress and improvement when grading.

E. After submitting your essay, look at your Turnitin file and color. It should be blue or green and less than 10%. If not, revise your document and resubmit by the due date.

Get help from top-rated tutors in any subject.

Efficiently complete your homework and academic assignments by getting help from the experts at homeworkarchive.com