0%

leetcode hashtable

本题解包括:

  • hashtable简洁即实现
  • 3 Longest Substring Without Repeating Characters
  • 160 Intersection of Two Linked Lists
  • 202 Happy Number
  • 242 Valid Anagram

一、hashtable简介

散列表(Hash table,也叫哈希表),它通过把关键码值映射到表中一个位置来访问记录,以加快查找的速度。这个映射函数叫做散列函数,存放记录的数组叫做散列表。

  • 若关键字为k,则其值存放在f(k)的存储位置上。由此,要查找k的位置直接去f(k)查找即可。

  • 对不同的关键字可能得到同一散列地址,即k1≠k2,而f(k1)=f(k2),这种现象称为冲突/碰撞(Collision)。可以说collision是不可避免的。但是有一些处理冲突的办法:

    • 开散列:相同的key 存于链中
    • 闭散列:相同key存于 不同的槽中
    • 多个hashtable : 如果一个冲突的概率是0.1.。那么我们用两个不同的散列函数,可以大大冲突概率。(暴雪好像就是这么做的,用了三个散列表,使得冲突概率为几亿分之一)

对于数字,常见的散列函数可以为 % 一个数(经常是2的倍数-1 ,可以用位与来优化速度!)

一个开散列的C实现如下(使用了静态的邻接表):

(C++可以把define改为const int N = 1<<7; )

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#define N (1<<7)
int head[N], len = 0, mod = N - 1;
struct node{
int val;
int next;
}data[10000];

void insert(int n){
int pos = n & mod;
data[len].val = n;
data[len].next = head[pos]; //插入第一个位置
head[pos] = len++;
}

bool found(int n){
int pos = n & mod;
for (int i = head[pos]; i != -1; i = data[i].next)
if (data[i].val == n) return true;
return false;
}

什么你问python?乖乖的去用set吧(其实c++是用unordered_set,不要重复发明轮子)

leetcode 202 Happy Number为例,来演示一下他们的用法:

二、题解

3. Longest Substring Without Repeating Characters

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

题目地址:leetcode Longest Substring Without Repeating Characters

题意:给定一个字符串s,求它最长的不重复字母的子串长度。

思路:最naive的方法是,枚举起点和终点,O(n^2),但是有很多重复的工作。

进一步可以改进为双指针+哈希表。

首先i指针指向字符串开头,而j指向结尾。每次扫描到一个字符s[j],有:

  1. 若该字符在hash表中,则说明s[j]在之前出现过,我们需要删除hash表中的字符s[i....vis[s[j]]]中的所有信息。并且将i 设置为 vis[s[j]] += 1(因为vis[s[j]]保存了上一个s[j]的下标,必须+1的位置才可能作为新的起点)
  2. 将该字符的下表写进hash表。 就是vis[s[j]] = j
  3. 看看能否更新最大长度。

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Solution(object):
def lengthOfLongestSubstring(self, s):
"""
:type s: str
:rtype: int
"""
if not s: return 0
vis = {}
i = ans = 0
for j in range(len(s)):
if s[j] in vis:
for x in range(i,vis[s[j]]):
del vis[s[x]]
i = vis[s[j]] + 1
ans = max(ans, j - i + 1)
vis[s[j]] = j
return ans

当然,也可以直接用一个长度为128的数组

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Solution(object):
def lengthOfLongestSubstring(self, s):
"""
:type s: str
:rtype: int
"""
if not s: return 0
vis = [-1] * 128
i = ans = 0
for j in range(len(s)):
if vis[ord(s[j])] != - 1:
for x in range(i, vis[ord(s[j])]):
vis[ord(s[x])] = -1
i = vis[ord(s[j])] + 1
ans = max(ans, j - i + 1)
vis[ord(s[j])] = j
return ans

或者直接Sliding window也行(其实都差不多,只是不记录下标,只记录访问过没有),复杂度也是O(n)

C++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Solution {
public:
int lengthOfLongestSubstring(string s) {
vector<int> vis(128, 0);
int start = 0, max_len = 0;
for(int i = 0; i < s.size(); i++){
if (vis[s[i]]) {
for (; s[i] != s[start]; start++)
vis[s[start]] = 0;
vis[s[start++]] = 0;
}
vis[s[i]] = 1;
max_len = max(max_len, i - start + 1);
}
return max_len;
}
};

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Solution {
public int lengthOfLongestSubstring(String s) {
boolean[] vis = new boolean[128];
int start = 0, maxLen = 0;
for (int i = 0; i < s.length(); i++) {
if (vis[s.charAt(i)]) {
for (; s.charAt(i) != s.charAt(start); start++)
vis[s.charAt(start)] = false;
vis[s.charAt(start++)] = false;
}
vis[s.charAt(i)] = true;
if (i - start + 1 > maxLen)
maxLen = i - start + 1;
}
return maxLen;
}
}

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Solution(object):
def lengthOfLongestSubstring(self, s):
"""
:type s: str
:rtype: int
"""

vis = [False] * 128
start = max_len = 0
for i in range(len(s)):
if vis[ord(s[i])]:
while s[i] != s[start]:
vis[ord(s[start])] = False
start += 1
vis[ord(s[start])] = False
start += 1
vis[ord(s[i])] = True
max_len = max(max_len, i - start + 1)
return max_len

 

方法3: 动态规划

dp[j]为以s[j]结尾不重复字符的长度, i = vis[s[j]]则为s[j]字符上一次出现的位置。

则有:

  • j - i < dp[j - 1]: 说明上一次出现的位置在j - 1的子串里,只能重新开始,dp[j] = j - i
  • j - i > dp[j - 1]: 说明上一次出现的位置不在j - 1的子串里面,因此dp[j] = dp[j - 1] + 1

考虑一个特殊的情况,若s[j]第一次出现(i < 0)必然有上面第二个式子成立。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Solution {
public:
int lengthOfLongestSubstring(string s) {
vector<int> vis(128, -1);
int last_max = 0, ans = 0;
for (int j = 0; j < s.size(); ++j) {
int i = vis[s[j]];
if (last_max < j - i) {
last_max += 1;
} else {
last_max = j - i;
}
vis[s[j]] = j;
ans = max(ans, last_max);
}
return ans;
}
};

160. leetcode Intersection of Two Linked Lists

Write a program to find the node at which the intersection of two singly linked lists begins.

For example, the following two linked lists:

1
2
3
4
5
6
A:          a1 → a2

c1 → c2 → c3

B: b1 → b2 → b3

begin to intersect at node c1.

Notes:

  • If the two linked lists have no intersection at all, return null.
  • The linked lists must retain their original structure after the function returns.
  • You may assume there are no cycles anywhere in the entire linked structure.
  • Your code should preferably run in O(n) time and use only O(1) memory.

题目地址:leetcode Intersection of Two Linked Lists

题意:给定两个链表,返回其重复的第一个位置。

思路:

  • 用哈希表O(n+m) 空间为O(m)或者O(n)
  • 由于两个链表可能一长一短,故对其同时进行遍历时一定有一个先到达结尾。如果链表A先到结尾,则指向B的开头,如果B先到达结尾,则指向A的开头。然后继续进行遍历,这样两个链表一定同时到达结尾。当到达结尾时,如果之前最早出现过两个指针指向地址相同的,那么就是所求解。

 

这里只给出c的hash表代码,其他python什么的,见https://www.hrwhisper.me/?p=466

c 的hash表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#define N (1<<14)
int head[N], len = 0, mod = N - 1;
struct node{
struct ListNode * val;
int next;
}data[100000];

void insert(struct ListNode *n)
{
int pos = ((int)n) & mod;
data[len].val = n;
data[len].next = head[pos];
head[pos] = len++;
}

bool found(struct ListNode *n){
int pos = ((int)n) & mod;
for (int i = head[pos]; i != -1;){
if (data[i].val == n)
return true;
i = data[i].next;
}
return false;
}

struct ListNode *getIntersectionNode(struct ListNode *headA, struct ListNode *headB) {
memset(head, -1, sizeof(head));
for (; headA; headA = headA->next)
insert(headA);
for (; headB; headB = headB->next)
if (found(headB)) return headB;
return NULL;
}

 

202 leetcode Happy Number

Write an algorithm to determine if a number is "happy".

A happy number is a number defined by the following process: Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1. Those numbers for which this process ends in 1 are happy numbers.

Example: 19 is a happy number

  • 12 + 92 = 82
  • 82 + 22 = 68
  • 62 + 82 = 100
  • 12 + 02 + 02 = 1

题目地址: leetcode Happy Number

题意:给你一个数,判断是否为Happy number (对n每个字母求平方和,成为下一个n~若n为1,则为happy number)

思路:

  1. 用hash表判断是否循环,若循环则不是happy number
  2. 4为非happy number循环必经点。所以可以判断n是否为4即可。(这个是看discuss的)

C

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#define N (1<<7)
int head[N], len = 0, mod = N - 1;
struct node{
int val;
int next;
}data[10000];

void insert(int n){
int pos = n & mod;
data[len].val = n;
data[len].next = head[pos];
head[pos] = len++;
}

bool found(int n){
int pos = n & mod;
for (int i = head[pos]; i != -1; i = data[i].next)
if (data[i].val == n) return true;
return false;
}
int cal(int n){
int ans = 0;
while (n)
{
int t = n % 10;
n /= 10;
ans += t * t;
}
return ans;
}

bool isHappy(int n) {
memset(head, -1, sizeof(head));
while (true){
n = cal(n);
if (n == 1) return true;
if (found(n)) return false;
insert(n);
}
}

 

C++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Solution {
public:
bool isHappy(int n) {
unordered_set<int> table;
table.insert(n);
while (true){
n = cal(n);
if (n == 1) return true;
if (table.find(n) != table.end()) return false;
table.insert(n);
}
}
int cal(int n){
int ans = 0;
while (n)
{
int t = n % 10;
n /= 10;
ans += t * t;
}
return ans;
}
};

 

Python

1
2
3
4
5
6
7
8
9
10
11
class Solution:
# @param {integer} n
# @return {boolean}
def isHappy(self, n):
table = set()
while True:
squere_sum = sum(int(i) **2 for i in str(n))
if squere_sum == 1: return True
elif squere_sum in table: return False
table.add(squere_sum)
n = squere_sum

 

242. Valid Anagram

Given two strings s and t, write a function to determine if t is an anagram of s.

For example, s = "anagram", t = "nagaram", return true. s = "rat", t = "car", return false.

Note: You may assume the string contains only lowercase alphabets.

Follow up: What if the inputs contain unicode characters? How would you adapt your solution to such case?

题目地址:leetcode Valid Anagram

题意:给定两个字符串,判断他们是否是相同字母异序的串

题意:

看到这个,排序嘛!

1
2
3
4
5
6
7
8
class Solution(object):
def isAnagram(self, s, t):
"""
:type s: str
:type t: str
:rtype: bool
"""
return sorted(s) == sorted(t)

更快的呢?用hash表计算出现的次数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Solution(object):
def isAnagram(self, s, t):
"""
:type s: str
:type t: str
:rtype: bool
"""
if len(s) != len(t): return False
n = len(s)
vis = [0] * 26
for i in xrange(n):
vis[ord(s[i]) - 97] += 1
vis[ord(t[i]) - 97] -= 1
return filter(lambda x: x != 0, vis) == []

当然可以用Python的Counter类开挂- - 继续oneline

1
2
3
4
5
6
7
8
class Solution(object):
def isAnagram(self, s, t):
"""
:type s: str
:type t: str
:rtype: bool
"""
return collections.Counter(s) == collections.Counter(t)
请我喝杯咖啡吧~