Monday, August 22, 2011

Pretty code (3)

I worked a bit more on the Python parser that I described last time (here). Actually, I started over from scratch. The first approach was based on breaking the code into words for processing, but I realized that a "stream" approach of one character at a time is better. The only limitation I'm aware of is that I still don't handle triple-quoted strings, but I think it's doable in the future.

As I mentioned before, I'm using <br /> tags in the code now rather than newlines.

The instructions for doing this said to modify the blog template, but I'm afraid of breaking the formatting for old posts. I put the following just below the <body> tag instead, as shown. It seems to work.
<body>

<style>
  cd { font-size:120%; }
  cm { color: green }
  kw { color: blue; }
  str { color: red; }
</style>


Then, to be totally meta about it, I ran the parser on itself. It is set up to grab HTML tags from a separate file (otherwise the meta application chokes on the tags). They are the ones shown above. Plus, there is a head and a tail to make an independent .html document. But for pasting into the blog, you don't need those guys. I invoked it like this:

python simple_parser2.py simple_parser2.py > example.html

And here is the code:



import sys
from keyword import iskeyword
from utils import load_data
import html_tags as H

try:
fn = sys.argv[1]
except IndexError:
fn = 'example.py'
data = list(load_data(fn))

D = {'is_cm':False,
'is_str_1':False,'is_str_2':False }

L = list()
for c in data:
# comments first
if c == '#':
if not (D['is_str_1'] or D['is_str_2']):
L.extend(list(H.cm_start))
D['is_cm'] = True
if c == "\n" and D['is_cm']:
L.extend(list(H.cm_stop))
D['is_cm'] = False
L.append(c)

# single-quoted strings
if c == "'":
if not D['is_str_1']:
if not D['is_str_2']:
# start a str_1
L.pop()
L.extend(list(H.str_start))
L.append(c)
D['is_str_1'] = True
else:
# already in str_2
pass
else:
# terminate str_1
L.extend(list(H.str_stop))
D['is_str_1'] = False
# double-quoted strings
if c == '"':
if not D['is_str_2']:
if not D['is_str_1']:
# start a str_2
L.pop()
L.extend(list(H.str_start))
L.append(c)
D['is_str_2'] = True
else:
# already in str_1
pass
else:
# terminate str_2
L.extend(list(H.str_stop))
D['is_str_2'] = False
s = ''.join(L)

# keywords last
pL = list()
for line in s.split('\n'):
D['is_cm'] = False
words = line.split()
for w in words:
# no kw highlighting in comments
if w.startswith(H.cm_start):
D['is_cm'] = True
if not D['is_cm'] and iskeyword(w):
r = H.kw_start + w + H.kw_stop
line = line.replace(w, r)
pL.append(line)
s = H.br.join(pL)

pL = [H.head, H.hr, s, H.hr, H.tail]
s = '\n'.join(pL)

try:
fn = sys.argv[2]
except IndexError:
fn = 'example.html'
FH = open(fn,'w')
FH.write(s + '\n')
FH.close()