关于0x80的讨论汇总 - 话题女王

c*******e
发帖数: 373

// 末字节high bit为1，是非法单字节编码，所以必然是双字节编码
if (lastByte & 0x80 != 0) return DoubleByteEncoding;
// 末第二字节high bit为0，不带末字节混，所以末字节肯定是单字节编码
if (last2ndByte & 0x80 == 0) return SingleByteEncoding;
// 末字节high bit为0，末二字节high bit为1的情况，不能确定，需要检查末第三
if (last3rdByte & 0x80 == 0)
return DoubleByteEncoding; // 末第3个高位0，不带末第2混，所以倒数2和1是
双字节码
if (last4thByte & 0x80 == 0) //末4不带末3混，末3和末2组成双字节，最末一个单了
return SingleByteEncoding;
到此为止，不用再继续推理了，可以总结规律了：
if (n == 1) return 1;
// 末高位1，必然是双字节编码
if (bytes[n-1] & 0x80 !=... 阅读全帖

D****t
发帖数: 17

来自主题: JobHunting版 - 这个G题是DFS还是DP

没有那么简单，所以是你弱 :-)
bool IsLastOneByteChar(unsigned char * bytes, int len)
{
int leadCount = 0;
len--;
bool lastByteIsLead = bytes[len] & 0x80;
while(len > 0)
{
if (bytes[len] & 0x80) leadCount++;
if (!(bytes[len] & 0x80)) break;
}
if (leadCount/2 == 0)
{
if (lastByteIsLead)
return true; //second byte of a 2-byte char
else
return false; //single byte char;
}
else
{
if (lastByteIsLead)
... 阅读全帖

n**p
发帖数: 1150

来自主题: JobHunting版 - 一道google面经题

while(s != NULL && *s != NULL)
{
if (*s & 0x80 == 0) {s++; continue;}
if (*s & 0xC0 == 0x80) return false;
char* t = s+1;
for(int bitmask = 0x40;
bitmask >0 && (*s & bitmask != 0);
bitmask /= 2, t++)
if (*t == NULL || (*t & 0xC0 != 0x80)) return false;
s = t + 1;
}
return true;

s****e
发帖数: 638

来自主题: JobHunting版 - 一道面试题：数组 in-place shuffle

下面这个是O(n) 不知道这样做行不行。
#include
#include
#include
using namespace std;
void shuffle(char* A)
{
int size = strlen(A);
int i=1;
while(i if (A[i] & 0x80) {
i++;
continue;
}
int j=i;
int j2;
while(true){
if ( j < size/2 ) j2 = j*2;
else j2 = 2*(j-size/2)+1;
if (j2<=i) break;
swap(A[i], A[j2]);
A[j2] |= 0x80;
j=j2;
}
i++;
}
for (int i=0; i }
int ... 阅读全帖

n*****t
发帖数: 22014

来自主题: BuildingWeb版 - 简单的jquery/table/mysql网站模版

// ajax.php
// http://myhost.com/ajax.php?admin=user&pass=passwd&query=select * from table
function pre_encode(&$item, $key) {
if (is_string ( $item ))
$item = mb_encode_numericentity ( $item, array ( 0x80, 0xffff, 0,
0xffff ), 'UTF-8' );
}
$user = $_REQUEST ['user'];
$pass = $_REQUEST ['pass'];
if ($user != 'admin' || $pass != 'passwd')
die ( "incorrect passwd" );
$ret = mysql_connect ( "localhost", "root", "passwd" ) && mysql_select_db (
"database" ) && mysql_set_charset ... 阅读全帖

S*A
发帖数: 7142

来自主题: Hardware版 - TP-Link TL_WDR4300 vs Asus RT-N66U哪个好？ (转载)

Linux B 家 wifi 的 b43 driver 啊。你以为那个 b43 是基于 broadcomm
提供的文档开发出来的用在比较新的wifi芯片啊？ B 家 MIPS 驱动有两个选择，
b43 是自己反向工程出来的，有源码。wl 是 B 家提供的，没源码，
只支持 2.4 MIPS kernel。 Openwrt 如果选 2.6 kernel，那就是用 b43
驱动。还有其他家用 B43 的我就不一一列了。
我只是想指出，很多人捧 B 家的 wifi 芯片，高大上，其实用起来不是那么好。
你说 Asus 用 B家的芯片 firmware 也高大上。如何请问你如何解释隔壁贴子好
几个 id 说那个 Asus 的旗舰路由经常抽风呢？这个不太符合稳定的前提啊。
我再给你一个 binary wl driver 不稳定的示例。
这是用 MBA 和 B 家官方最新的 wl driver （binary）编译出来的。FC20
最新的 kernel. 经常抽风，我实在没法赞这个 B 家 wifi 芯片高大上。
[289269.848254] Modules linked in: wl(PO... 阅读全帖

S*A
发帖数: 7142

来自主题: Hardware版 - nighthawk ac-1900 $130

对于哪个芯片组更高大上，仁者见仁吧。
我不是很 fan broadcomm 的芯片，尤其是使用 wl driver 的。
先不说没有 open source 的驱动，Broadcomm 连自己的芯片驱动都
写不好,我用 broadcomm 官方的最新的驱动都反复出 kernel stack dump。
需要经常 rmmod/modprobe 和 reboot。
相比之下我用 Atheros 的芯片没有碰到什么问题，还有完整的
开源驱动。
[154474.425321] Hardware name: Apple Inc. MacBookAir6,2/Mac-7DF21CB3ED6977E5
, BIOS MBA61.88Z.0099.B01.1307121317 07/12/2013
[154474.425322] 0000000000000000 000000003e2a4b3c ffff88008ca23dc8
ffffffff81707091
[154474.425325] 0000000000000000 ffff88008ca23e00 ffffffff8108d0ad
f... 阅读全帖

S*A
发帖数: 7142

来自主题: Hardware版 - SSA 大哥推荐的TPLINK+OPENWRT真是给力！

你的 router 稳定的话， WDS 两个 router wifi 速度如何不稳定？
tomato 用闭源的 wl 驱动，特别老的内核。我没有兴趣玩。
wl 驱动的问题多多，你要我现在贴个 kernel dump 出来，
就是你最爱的 wl 必源驱动。最新的 broamcomm 官方下载的。
你来看看？
Oct 31 17:15:09 mba.localdomain kernel: WARNING: CPU: 2 PID: 642 at net/
wireless/sme.c:791 cfg80211_roamed+0x89/0x90 [cfg80211]()
Oct 31 17:15:09 mba.localdomain kernel: Modules linked in: arc4 ppp_mppe ppp
_async crc_ccitt ppp_generic slhc rfcomm fuse nf_conntrack_netbios_ns
Oct 31 17:15:09 mba.localdomain kernel: lpc_ich i2c_i801 mfd_core ... 阅读全帖

m**********r
发帖数: 122

来自主题: Programming版 - 有一个文件夹里有大概1000个文件。我有以下的Python语句调用后(转载)

【以下文字转载自 DataSciences 讨论区】
发信人: milkrootbeer (milkbeer), 信区: DataSciences
标题: 有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。
发信站: BBS 未名空间站 (Sat May 2 20:09:17 2015, 美东)
有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应
该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。
DIR = 'C:\Users\Desktop\data\rec.sport.hockey'
posts = [open(os.path.join(DIR,f)).read() for f in os.listdir(DIR)]
x_train = vectorizer.fit_transform(posts)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in positio... 阅读全帖

m**********r
发帖数: 122

来自主题: DataSciences版 - 有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。

有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应
该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。
DIR = 'C:\Users\Desktop\data\rec.sport.hockey'
posts = [open(os.path.join(DIR,f)).read() for f in os.listdir(DIR)]
x_train = vectorizer.fit_transform(posts)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 240:
invalid start byte
Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo10.py", line 16, in
x_train = vectorizer.fit_transform(posts)
File "C:UsersAppDa... 阅读全帖

e***s
发帖数: 799

来自主题: JobHunting版 - 怎么快速找二进制的某一位是否0

看a的第7位1or0.
(a & 0x80) > 0 ? 1 : 0

h*****f
发帖数: 248

来自主题: JobHunting版 - 如何判断char的赋值有没有溢出

Hmm...a char can hold 0x0-0xFF. 128=0x80...so no overflow..
I guess the question is to detect whether the input occupies more than 1
byte?
You probably could *if* it were foo(char*) or foo(char&).

w**5
发帖数: 34

来自主题: JobHunting版 - 求教一道关于string的Google面试题～～

可以这样不？
string serialize(string &str1, string &str2)
{
string str = str1;
str.append(str2);
str[str1.length()] |= 0x80; // 把st2的第一个char变成负数
return
}
void deserialize(string &str， string &str1, string &str2)
{
for (int i=0; i {
if (str[i] < 0) {
str[i] &= 0x7f;
str1 = str.substr(0, i);
str2 = str.substr(i, str.length() - i);
}
}
}

s********y
发帖数: 28

来自主题: JobHunting版 - utf-8

utf-8怎么count频率最高的字符的啊？
while (*p != 0)
{
if (*p & 0x80 == 0 || *p & 0xc0 == 0xc0)
++count;
++p;
}
这个可以count总数。

w**z
发帖数: 8232

来自主题: JobHunting版 - twitter ID 怎么变成 uuid？

convert long to byte array, then construct UUID from the byte array?
From Java Source code:
/**
* Static factory to retrieve a type 3 (name based) [email protected]/* */ UUID} based
on
* the specified byte array.
*
* @param name
* A byte array to be used to construct a [email protected]/* */ UUID}
*
* @return A [email protected]/* */ UUID} generated from the specified array
*/
public static UUID nameUUIDFromBytes(byte[] name) {
... 阅读全帖

a***g
发帖数: 70

来自主题: Programming版 - 请教一个在 AIX 下编译运行的问题

程序在 AIX 5.3 Patch Level 0 下面编译运行没有问题，在 AIX 5.3 Patch
Level 1 下面编译没有问题，结果运行就 crash，查看 core file 结果如下
[using memory image in /tmp/core]
reading symbolic information ...
Trace/BPT trap in pth_pthread._internal_error [/usr/lib/libpthreads.a] at
0xd00558a4 ($t1)
0xd00558a4 (_internal_error+0x80) 80410014 lwz r2,0x14(r1)
(dbx) where
pth_pthread._internal_error(??, ??, ??) at 0xd00558a4
pth_vp._vp_start(??, ??, ??) at 0xd0061e50
pth_pthread.pth_create_common(??, ??, ??, ??, ??) at 0xd0054b54

D****A
发帖数: 360

来自主题: Programming版 - 怎么把 integer 转为 multi-byte integer format？

interesting. 估计是每个byte取一位做连接符, 是1就看下一个byte否则结束
每个byte只有7位有效位，两个byte应该能表示一个integer的低14位
0xA0=10100000
0x81=10000001
0x20=00100000
因为一个byte只能encode7位，所以0xA0的最高位1应该在0x81里的有效最低位，
另外0xA0的低7位应该在0x20里，0xA0和0x20的低七位相通，说明这种encoding
把每个byte的最高位用做连接符了
综上转换公式应该是 do x = (x << 7)|(b[i] & 0x7f) until (b[i] & 0x80) == 0

M*******8
发帖数: 85

来自主题: Programming版 - 怎么把 integer 转为 multi-byte integer format？

多谢指点！
这个
do x = (x << 7)|(b[i] & 0x7f) until (b[i] & 0x80) == 0
是从multi-byte integer format转为regular integer 吧?
如何从integer转为 multi-byte integer format？谢谢

j*a
发帖数: 14423

来自主题: Programming版 - C10M 练习 step 1: 10M sockets

kernel: Mem-Info:
kernel: [] system_call_fastpath+0x1a/0x1f
kernel: [] ? page_fault+0x28/0x30
kernel: [] SyS_socket+0x5c/0xa0
kernel: [] sock_alloc_file+0x52/0x130
kernel: [] d_alloc_pseudo+0xe/0x20
kernel: [] __d_alloc+0x25/0x180
kernel: [] ? tcp_v4_init_sock+0x12/0x30
kernel: [] kmem_cache_alloc+0x227/0x280
kernel: [] ? inet_... 阅读全帖

w*****s
发帖数: 122

来自主题: XML版 - Why does software xxx work with Big5: it not work

Why does software xxx work with Big5: the documentation says it does not?
Big5 is an "7-bit unsafe" "ASCII-family" coded character set.
"ASCII-family" coded character sets (ASCII, ISO646, ISO8859-*, UTF-8, EUC, Big5, GB2312) means all the sets
which have the ASCII characters in the ASCII codepoints. (E.g., where "A" has the codepoint 65 (0x41).) All ASCII
characters have a value less than decimal 128 (0x80).
An "8-bit safe" characters encodings is one in which, if a byte appears

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天