Structural Knowledge Discovery Used to Analyze Earthquake Ac(2)

来源：网络收集时间：2021-04-06 下载这篇文档手机版

说明：文章内容仅供预览，部分内容可能不全，需要完整文档或者需要复制内容，请下载word后使用。下载word有问题请添加微信号:或QQ：处理（尽可能给您提供完整文档），感谢您的支持与谅解。

The Subdue structural discovery system is being used as the Data Mining tool to study the "Orizaba Fault " located in Mexico, as part of a research project of the geologist Dr. Burke Burkart. We analyze the information of the Earthquake Database

hierarchical description of the input data where latersubstructures are defined in terms of substructuresdiscovered on previous iterations.

There are other components that make Subdue morepowerful. We can specify predefined substructures thatSubdue looks for in the data. This allows Subdue to useprevious knowledge as a starting point and guide thediscovery process. Subdue uses an inexact graph matchtechnique so that instances of substructures that are slightlydifferent can be matched. We can also iterate Subdue’sdiscovery process in order to find more substructures innew iterations that might contain substructures found inprevious iterations. Figure 1 shows a simple example ofSubdue’s operation. Subdue finds four instances of thetriangle-on-square substructure in the geometric figure.The graph representation used to describe the substructure,as well as the input graph, is shown in the middle.

Vertices: objects or attributesEdges: relationships

4 instances ofFigure 1: Subdue’s Example

The Earthquake Database

The earthquake database contains information collectedfrom several catalogs (gs.gov). Thesecatalogs were provided by sources like the NationalGeophysical Data Center of the National Oceanic andAtmospheric Administration (NOAA). The database hasrecords of earthquakes from 2000 B. C. through the currentweek. An earthquake record consists of 35 fields: sourcecatalog, date, time, latitude, longitude, magnitude, intensityand seismic related information such as cultural effects,isoseismal map, geographic region and stations used forthe computations. Earthquakes of magnitude below 1.0 arenot stored in the database; most of the magnitudes ofearthquakes range from 2.5 to 9.5.

There are some differences between catalogs, e.g. it ispossible to find the same earthquake with a slightlydifferent epicenter or magnitude in two catalogs. This isdue to the methods and instruments used to compute thedata. As an example we mention that currently epicentersand magnitudes are calculated with computer programsusing seismographic data. The problem is that thecomputer programs contain assumptions about the earth inthe formulae they use. If those assumptions are violatedthen the results can be different.

The size of the Earthquake database is extremely large(e.g. 2.2 MB only for 1995 data), so we could not use allthe information in our experiments; we just used subsets ofinformation corresponding to periods of time between 6months and 1 year. We created a relational databasecontaining the earthquake information (the 35 fields). Thiseased the extraction of information for the experiments,because we can use SQL queries to extract the desired

subset of the database. We use the Data Mining approachinstead of queries because we do not pre-set theinformation to be included in the result. This means thatwe prepare a query that can uncover novel structuralpatterns in the same way as the Subdue system.

Earthquake Database Knowledge Representation

Every record in the database represents an earthquakeevent. In this domain we used two kinds of edges toconnect the events (earthquakes). The first type of edge isthe “near_in_distance” edge, which is set between twoevents if the distance between them is equal or less than 75kilometers. The second type of edge is the “near_in_time”edge that is set between two events if they happened with adifference of time equal or less than 36 hours. We chosethose parameters because of two reasons. First, they were agood combination that generates enough edges so that thesystem may find them, and not too many to overload thegraph so that those were the only substructures found.Second, our geology specialist told us that 75 kilometerswas reasonable for the size of the area of study and that theeffects between one earthquake and another are usuallyshown within 36 hours. An earthquake event in graph formis shown in figure 2. All the fields of the Earthquakedatabase are included except for the empty fields, whichwould bias the system because of the large amount ofthem.

Figure 2: Earthquake Knowledge Representation

Earthquake Database Experimental Results

We chose only a subset of the database to run theexperiments. For example, we took 6 months ofinformation and ran Subdue on it, so the query to extractthe information from the database included the year andmonth of the earthquakes that we wanted. We started usingall the fields of the database, but the year field affected ourresults because the values were all the same, so we decidedto exclude that field.

We wanted to take a random sample from the database(from the 5 years of information and keeping the samegraph size) but that would affect the “near_in_time” edges,

百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读，免费范文网，提供经典小说教育文库Structural Knowledge Discovery Used to Analyze Earthquake Ac(2)在线全文阅读。

Structural Knowledge Discovery Used to Analyze Earthquake Ac(2).doc 将本文的Word文档下载到电脑，方便复制、编辑、收藏和打印下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档

本文链接：https://www.77cn.com.cn/wenku/jiaoyu/1211759.html（转载请注明文章来源）

上一篇：投标人财务承诺书
下一篇：汽车外文翻译---驾驶者的转向感