> Parsing feature names

Hi,

I noticed that it would be useful to parse some of the features. I created the code piece below. Using this you can search for all the features that contains 'education' in the explanation.

def parse_codebook_data(fname):
fNames = dict()
with open(fname, 'r') as fl:
lstate = False
for line in fl:
if line.startswith('-----'):
lstate = True
else:
if lstate:
while ' ' in line:
line = line.replace(' ', ' ')
#print line
temp = line.strip().split(' ')
fNames[temp[0]] = ' '.join(temp[1:])
lstate = False
return fNames


featureNames = dict()
# Looking for a path that has all feature description files
for fname in glob.glob('data/codebooks/ff*.txt'):
featureNames.update(parse_codebook_data(fname))
for f in featureNames:
if 'education' in featureNames[f]:
print '[{}]: {}'.format(f, featureNames[f])

Posted by: ovarol @ April 6, 2017, 10:05 p.m.

I noticed that copy-paste into forum looked terrible. Here is a gist for the same code https://gist.github.com/onurvarol/ba101ebed3c9cd966d0e58f15932e999

Posted by: ovarol @ April 6, 2017, 10:07 p.m.
Post in this thread