Google Python Class 之——带参数的WordCount 实现

野小喵

845人浏览 · 2015-09-02 13:42:25

野小喵 · 2015-09-02 13:42:25 发布

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

"""Wordcount exercise
Google's Python class

The main() below is already defined and complete. It calls print_words()
and print_top() functions which you write.

1. For the --count flag, implement a print_words(filename) function that counts
how often each word appears in the text and prints:
word1 count1
word2 count2
...

Print the above list in order sorted by word (python will sort punctuation to
come before letters -- that's fine). Store all the words as lowercase,
so 'The' and 'the' count as the same word.

2. For the --topcount flag, implement a print_top(filename) which is similar
to print_words() but which prints just the top 20 most common words sorted
so the most common word is first, then the next most common, and so on.

Use str.split() (no arguments) to split on all whitespace.

Workflow: don't build the whole program at once. Get it to an intermediate
milestone and print your data structure and sys.exit(0).
When that's working, try for the next milestone.

Optional: define a helper function to avoid code duplication inside
print_words() and print_top().

"""

import sys
from operator import itemgetter

# +++your code here+++
# Define print_words(filename) and print_top(filename) functions.
# You could write a helper utility function that reads a file
# and builds and returns a word/count dict for it.
# Then print_words() and print_top() can just call the utility function.

###


def print_words(filename):
    dict_words = {}
    f = open(filename, 'rU')
    for line in f:
        list_word = line.split()
        for a in list_word:
            a = a.lower()
            if a in dict_words:
                dict_words[a] += 1
            else:
                dict_words[a] = 1
    f.close()
    print dict_words
    sorted_dict_words = sorted(dict_words.items(), key=itemgetter(0))

    print sorted_dict_words
    for key in sorted_dict_words:
        print key[0]+" "+str(key[1])

    # sys.exit(0)


def print_top(filename):
    dict_words = {}
    f = open(filename, 'rU')
    for line in f:
        list_word = line.split()
        a = a.lower()
        for a in list_word:
            if a in dict_words:
                dict_words[a] += 1
            else:
                dict_words[a] = 1
    f.close()
    print dict_words
    sorted_dict_words = sorted(dict_words.items(), key=itemgetter(1), reverse=True)

    print sorted_dict_words
    for key in sorted_dict_words[:20]:
        print key[0]+" "+str(key[1])

    # sys.exit(0)


# This basic command line argument parsing code is provided and
# calls the print_words() and print_top() functions which you must define.
def main():
    if len(sys.argv) != 3:
        print 'usage: ./wordcount.py {--count | --topcount} file'
        sys.exit(1)

    option = sys.argv[1]
    filename = sys.argv[2]
    if option == '--count':
        print_words(filename)
    elif option == '--topcount':
        print_top(filename)
    else:
        print 'unknown option: ' + option
        sys.exit(1)

if __name__ == '__main__':
  main()

开放原子开发者工作坊

开放原子开发者工作坊旨在鼓励更多人参与开源活动，与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动，如meetup、训练营等，主打技术交流，干货满满，真诚地邀请各位开发者共同参与！

更多推荐

操作系统大会&openEuler Summit 2024参会指南，请查收！