77范文网 - 专业文章范例文档资料分享平台

Structural Knowledge Discovery Used to Analyze Earthquake Ac(2)

来源:网络收集 时间:2021-04-06 下载这篇文档 手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:或QQ: 处理(尽可能给您提供完整文档),感谢您的支持与谅解。点击这里给我发消息

The Subdue structural discovery system is being used as the Data Mining tool to study the "Orizaba Fault " located in Mexico, as part of a research project of the geologist Dr. Burke Burkart. We analyze the information of the Earthquake Database

hierarchical description of the input data where latersubstructures are defined in terms of substructuresdiscovered on previous iterations.

There are other components that make Subdue morepowerful. We can specify predefined substructures thatSubdue looks for in the data. This allows Subdue to useprevious knowledge as a starting point and guide thediscovery process. Subdue uses an inexact graph matchtechnique so that instances of substructures that are slightlydifferent can be matched. We can also iterate Subdue’sdiscovery process in order to find more substructures innew iterations that might contain substructures found inprevious iterations. Figure 1 shows a simple example ofSubdue’s operation. Subdue finds four instances of thetriangle-on-square substructure in the geometric figure.The graph representation used to describe the substructure,as well as the input graph, is shown in the middle.

Vertices: objects or attributesEdges: relationships

4 instances ofFigure 1: Subdue’s Example

The Earthquake Database

The earthquake database contains information collectedfrom several catalogs (gs.gov). Thesecatalogs were provided by sources like the NationalGeophysical Data Center of the National Oceanic andAtmospheric Administration (NOAA). The database hasrecords of earthquakes from 2000 B. C. through the currentweek. An earthquake record consists of 35 fields: sourcecatalog, date, time, latitude, longitude, magnitude, intensityand seismic related information such as cultural effects,isoseismal map, geographic region and stations used forthe computations. Earthquakes of magnitude below 1.0 arenot stored in the database; most of the magnitudes ofearthquakes range from 2.5 to 9.5.

There are some differences between catalogs, e.g. it ispossible to find the same earthquake with a slightlydifferent epicenter or magnitude in two catalogs. This isdue to the methods and instruments used to compute thedata. As an example we mention that currently epicentersand magnitudes are calculated with computer programsusing seismographic data. The problem is that thecomputer programs contain assumptions about the earth inthe formulae they use. If those assumptions are violatedthen the results can be different.

The size of the Earthquake database is extremely large(e.g. 2.2 MB only for 1995 data), so we could not use allthe information in our experiments; we just used subsets ofinformation corresponding to periods of time between 6months and 1 year. We created a relational databasecontaining the earthquake information (the 35 fields). Thiseased the extraction of information for the experiments,because we can use SQL queries to extract the desired

subset of the database. We use the Data Mining approachinstead of queries because we do not pre-set theinformation to be included in the result. This means thatwe prepare a query that can uncover novel structuralpatterns in the same way as the Subdue system.

Earthquake Database Knowledge Representation

Every record in the database represents an earthquakeevent. In this domain we used two kinds of edges toconnect the events (earthquakes). The first type of edge isthe “near_in_distance” edge, which is set between twoevents if the distance between them is equal or less than 75kilometers. The second type of edge is the “near_in_time”edge that is set between two events if they happened with adifference of time equal or less than 36 hours. We chosethose parameters because of two reasons. First, they were agood combination that generates enough edges so that thesystem may find them, and not too many to overload thegraph so that those were the only substructures found.Second, our geology specialist told us that 75 kilometerswas reasonable for the size of the area of study and that theeffects between one earthquake and another are usuallyshown within 36 hours. An earthquake event in graph formis shown in figure 2. All the fields of the Earthquakedatabase are included except for the empty fields, whichwould bias the system because of the large amount ofthem.

Figure 2: Earthquake Knowledge Representation

Earthquake Database Experimental Results

We chose only a subset of the database to run theexperiments. For example, we took 6 months ofinformation and ran Subdue on it, so the query to extractthe information from the database included the year andmonth of the earthquakes that we wanted. We started usingall the fields of the database, but the year field affected ourresults because the values were all the same, so we decidedto exclude that field.

We wanted to take a random sample from the database(from the 5 years of information and keeping the samegraph size) but that would affect the “near_in_time” edges,

百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说教育文库Structural Knowledge Discovery Used to Analyze Earthquake Ac(2)在线全文阅读。

Structural Knowledge Discovery Used to Analyze Earthquake Ac(2).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印 下载失败或者文档不完整,请联系客服人员解决!
本文链接:https://www.77cn.com.cn/wenku/jiaoyu/1211759.html(转载请注明文章来源)
Copyright © 2008-2022 免费范文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ: 邮箱:tiandhx2@hotmail.com
苏ICP备16052595号-18
× 注册会员免费下载(下载后可以自由复制和排版)
注册会员下载
全站内容免费自由复制
注册会员下载
全站内容免费自由复制
注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: