在做推荐系统的时候想查看原始数据集中自然存在的类别有多少种,即找到一些子集,这些子集属于原始数据集,子集之间没有任何关联,而子集内部所有数据都有直接或间接的关联。 首先考虑的是由于数据规模,读入内存是不可能的,所以要借助硬盘(虽然很不情愿)
在做推荐系统的时候想查看原始数据集中自然存在的类别有多少种,即找到一些子集,这些子集属于原始数据集,子集之间没有任何关联,而子集内部所有数据都有直接或间接的关联。
首先考虑的是由于数据规模,读入内存是不可能的,所以要借助硬盘(虽然很不情愿)。既然是借助硬盘,那就要文件存取。而又由于在处理过程中需要快速的查找数据是否存在于某个集合内和将数据集合关联等操作,选择使用并查集。
这样选择之后算是有一个解决方案了,但是还需要最后一个关键的部分,就是需要建立文件索引和缓存机制以便快速进行合并和查询过程。这里选择使用的工具还是最趁手的hbase,很好的解决这两个问题。
这个类主要解决的问题就是原始数据的聚类,有关联的聚在一起。核心的两个方法是:
public byte[] findSet(byte[] pos);
public void union(byte[] pos1, byte[] pos2);
其中还有一个
public byte[] findSet(byte[] pos)
是递归实现。两个方法都使用了路径压缩进行优化。union()方法的两个参数有顺序要求,其作用是后者集合连接到前者集合的根节点。
最后,计算的并行是使用MapReduce计算框架。
package recommendsystem; ? import java.io.IOException; import java.lang.reflect.Array; ? import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; ? public class UnionFindSet { private Configuration _conf; private HBaseAdmin _hbAdmin; private HTable _unionTable; ? public static void main(String[] args) throws IOException { UnionFindSet ufs = new UnionFindSet("test"); ufs.union(Bytes.toBytes("7"), Bytes.toBytes("8")); ufs.union(Bytes.toBytes("5"), Bytes.toBytes("9")); ufs.union(Bytes.toBytes("3"), Bytes.toBytes("7")); ufs.union(Bytes.toBytes("4"), Bytes.toBytes("6")); ufs.union(Bytes.toBytes("1"), Bytes.toBytes("7")); for (int i = 1; i <p class="copyright"> 原文地址:用于大数据的并查集(基于HBase)的java类, 感谢原作者分享。 </p>

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

The steps to build a MySQL database include: 1. Create a database and table, 2. Insert data, and 3. Conduct queries. First, use the CREATEDATABASE and CREATETABLE statements to create the database and table, then use the INSERTINTO statement to insert the data, and finally use the SELECT statement to query the data.

MySQL is suitable for beginners because it is easy to use and powerful. 1.MySQL is a relational database, and uses SQL for CRUD operations. 2. It is simple to install and requires the root user password to be configured. 3. Use INSERT, UPDATE, DELETE, and SELECT to perform data operations. 4. ORDERBY, WHERE and JOIN can be used for complex queries. 5. Debugging requires checking the syntax and use EXPLAIN to analyze the query. 6. Optimization suggestions include using indexes, choosing the right data type and good programming habits.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver CS6
Visual web development tools

Atom editor mac version download
The most popular open source editor

Zend Studio 13.0.1
Powerful PHP integrated development environment

SublimeText3 Mac version
God-level code editing software (SublimeText3)

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software