LLM-demo-2-production - Guidelines for Prompting

Setup

If you use google colab please install the following packages:

!pip install "panel>=1.1.0" "tiktoken>=0.3.3" "openai>=0.27.8"

Load the API key and relevant Python libaries.

We can use .env files to load openai key or pass it directly in jupyter.

Please rememeber to remove the key from notebook before pushing to git or remove the key in OpenAI settings.

I added to the repository keys from OpenIA (they have been already disabled, but it is a nice exercise to find them in the git history)

Good Luck :)

# to usse the .env file you could need to load enbironment variables with dotenv library 
# !pip install python-dotenv

# import os

# from dotenv import load_dotenv, find_dotenv

# _ = load_dotenv(find_dotenv())
# openai.api_key  = os.getenv('OPENAI_API_KEY')

Otherwie you can use the following code to pass the key directly in the notebook.

import openai
import os

# openai.api_key = "sk-...."

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

def get_completion_from_messages(messages, model="gpt-3.5-turbo", temperature=0):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, # this is the degree of randomness of the model's output
    )
#     print(str(response.choices[0].message))
    return response.choices[0].message["content"]

Prompting Principles

Principle 1: Write clear, specific instructions with context of the task
Principle 2: Give the model time to “think”

Tactics

Tactic 1: Use delimiters to clearly indicate distinct parts of the input

Delimiters can be anything like: ``, """, < >, ,:`

text = f"""
You should express what you want a model to do by \ 
providing instructions that are as clear and \ 
specific as you can possibly make them. \ 
This will guide the model towards the desired output, \ 
and reduce the chances of receiving irrelevant \ 
or incorrect responses. Don't confuse writing a \ 
clear prompt with writing a short prompt. \ 
In many cases, longer prompts provide more clarity \ 
and context for the model, which can lead to \ 
more detailed and relevant outputs.
"""
prompt = f"""
Summarize the text delimited by triple backticks into a single sentence.
```{text}```
"""

response = get_completion(prompt)
print(response)

To guide a model towards the desired output and reduce irrelevant or incorrect responses, it is important to provide clear and specific instructions, which may require longer prompts for more clarity and context.

text_pl = f"""
Pisząc prompty nalezy wyrazić, co model ma robić, dostarczając instrukcje, które są tak jasne i szczegółowe, jak to tylko możliwe.
Poprowadzi to model w kierunku pożądanego efektu i zmniejsza szanse na otrzymanie nieistotnych lub nieprawidłowych odpowiedzi.
Nie należy mylić pisania jasnej podpowiedzi z krótką podpowiedzią. 
W wielu przypadkach dłuższe prompty zapewniają większą jasność i kontekst dla modelu. 
"""

prompt_pl = f"""
Podsumuj tekst ograniczony potrójnymi znakami ``` w jedno zdanie.
```{text_pl}```
"""
response_pl = get_completion(prompt_pl)
print(response_pl)

Jasne i szczegółowe instrukcje w promptach pomagają modelowi osiągnąć pożądany efekt i uniknąć nieistotnych lub nieprawidłowych odpowiedzi.

prompt_pl = f"""
Podsumuj tekst ograniczony potrójnymi znakami ``` w jedno zdanie w stylu nastolatka korzystającego z emoji.
```{text_pl}```
"""
response_pl = get_completion(prompt_pl)
print(response_pl)

🤖💬 Jasne i szczegółowe prompty pomagają modelowi osiągnąć pożądany efekt, więc nie bądź leniwy i pisz dłuższe podpowiedzi! 📝👍

Tactic 2: Ask for a structured output

JSON, HTML, Markdown, etc.

prompt = f"""
Generate a list of 5 best workshop topics for AI Tech Summer School that is a AI related school.
Write title, lecturer name, and description for each topic.
Provide them in markdown table the following keys: 
title, lecturer, description.
"""
response = get_completion(prompt)
print(response)

| Title | Lecturer | Description |
| --- | --- | --- |
| Introduction to Machine Learning | John Smith | This workshop will provide an overview of machine learning, including supervised and unsupervised learning, and the basics of neural networks. Participants will learn how to build and train a simple machine learning model using Python. |
| Natural Language Processing | Jane Doe | This workshop will cover the basics of natural language processing (NLP), including text preprocessing, sentiment analysis, and named entity recognition. Participants will learn how to use popular NLP libraries such as NLTK and spaCy to analyze text data. |
| Computer Vision | David Lee | This workshop will introduce participants to computer vision, including image classification, object detection, and image segmentation. Participants will learn how to use popular computer vision libraries such as OpenCV and TensorFlow to build and train models. |
| Reinforcement Learning | Sarah Kim | This workshop will cover the basics of reinforcement learning, including Markov decision processes, Q-learning, and policy gradients. Participants will learn how to build and train a simple reinforcement learning model using Python. |
| Deep Learning for Image Recognition | Michael Chen | This workshop will focus on deep learning techniques for image recognition, including convolutional neural networks (CNNs) and transfer learning. Participants will learn how to use popular deep learning frameworks such as Keras and PyTorch to build and train image recognition models. |

Tactic 3: Ask the model to check whether conditions are satisfied - reflect on the task

text_1 = f"""
Preparation of the workshop is quite easy. However, you must follow some steps. 
Firstly, you need to choose the topic. Then review literature on this topic. Then prepare the materials.
And finally you need to prepare the presentation and deliver the workshop.
"""
prompt = f"""
You will be provided with text delimited by triple =. 
If it contains a sequence of steps, re-write those instructions in the following format:

Step 1 - 
Step 2 - 
Step N - 

If the text does not contain a sequence of steps, then simply write \"No steps are provided.\"

==={text_1}===
"""
response = get_completion(prompt)
print(response)

Step 1 - Choose the topic.
Step 2 - Review literature on the chosen topic.
Step 3 - Prepare the materials.
Step 4 - Prepare the presentation.
Step 5 - Deliver the workshop.

text_2 = """
Witamy w nowej usłudze Bing
Poznaj możliwości obsługiwanej przez sztuczną inteligencję funkcji Copilot w Internecie

🧐 Zadawaj złożone pytania
"Jakie posiłki mogę przygotować dla mojego wybrednego malucha, który je tylko jedzenie w kolorze pomarańczowym?"

🙌 Uzyskaj lepsze odpowiedzi
"Jakie są zalety i wady 3 najczęściej kupowanych odkurzaczy dla zwierząt domowych?"

🎨 Zdobądź twórcze inspiracje
"Napisz wiersz haiku o krokodylach w kosmosie, w którym narratorem jest pirat"
Uczmy się razem. Usługa Bing jest obsługiwana przez sztuczną inteligencję, więc są możliwe niespodzianki i błędy. Pamiętaj o sprawdzaniu faktów oraz przekaż opinię, abyśmy mogli się uczyć i rozwijać!
"""

prompt = f"""
You will be provided with text delimited by triple =. 
If it contains a sequence of steps, re-write those instructions in the following format:

Step 1 - 
Step 2 - 
Step N - 

If the text does not contain a sequence of steps, then simply write `No steps provided.`

==={text_2}===
"""
response = get_completion(prompt)
print(response)

No steps provided.

Tactic 4: “Few-shot” prompting - use a few examples to show to the model how to behave

prompt = """
I'm working in Custom Office and we want to synthetically generate data about shipments. Please generate one more examples.
###
{
    "id": "95ea9a1c-f934-4f08-bc74-3ff7c8da5464",
    "exporter_country_name": "Indonesia",
    "destination_country_name": "Portugal",
    "exporter_country_code": "ID",
    "destination_country_code": "PT",
    "invoice_value": 36513.02,
    "invoice_currency": "EUR",
    "commodity_code": 7108110000,
    "weight_gross": 22.03,
    "weight_net": 3.08,
    "importer_name": "Dawson, Lewis and Miller",
    "declarant_person": "Lydia Reed",
    "good_description": "II. PRECIOUS METALS AND METALS CLAD WITH PRECIOUS METAL -> Gold (including gold plated with platinum), unwrought or in semi-manufactured forms, or in powder form -> Non-monetary -> Powder",
    "exporter_name": "Hurst, Freeman and Kennedy",
    "origin_country_name": "Viet Nam",
    "origin_country_code": "VN"
}
"""

response = get_completion(prompt)
print(response)

{
    "id": "b3c8d7e2-6f5a-4c5c-9c5d-8f5a9b1c2d4f",
    "exporter_country_name": "China",
    "destination_country_name": "United States",
    "exporter_country_code": "CN",
    "destination_country_code": "US",
    "invoice_value": 12000.50,
    "invoice_currency": "USD",
    "commodity_code": 8517120000,
    "weight_gross": 45.20,
    "weight_net": 38.10,
    "importer_name": "Smith and Sons",
    "declarant_person": "John Doe",
    "good_description": "VII. VEHICLES, AIRCRAFT, VESSELS AND ASSOCIATED TRANSPORT EQUIPMENT -> Aircraft and associated equipment -> Aircraft engines and parts thereof",
    "exporter_name": "Changzhou Aviation Precision Machinery Co., Ltd.",
    "origin_country_name": "China",
    "origin_country_code": "CN"
}

{
    "id": "f8e9d6c5-4b3a-2c1d-1e0f-9a8b7c6d5e4f",
    "exporter_country_name": "Germany",
    "destination_country_name": "France",
    "exporter_country_code": "DE",
    "destination_country_code": "FR",
    "invoice_value": 5000.00,
    "invoice_currency": "EUR",
    "commodity_code": 3004900000,
    "weight_gross": 10.50,
    "weight_net": 8.20,
    "importer_name": "Dupont Pharmaceuticals",
    "declarant_person": "Sophie Martin",
    "good_description": "VI. PRODUCTS OF THE CHEMICAL OR ALLIED INDUSTRIES -> Pharmaceutical products -> Medicaments (excluding goods of heading 30.02, 30.05 or 30.06) consisting of mixed or unmixed products for therapeutic or prophylactic uses, put up in measured doses (including those in the form of transdermal administration systems) or in forms or packings for retail sale",
    "exporter_name": "Bayer AG",
    "origin_country_name": "Germany",
    "origin_country_code": "DE"
}

{
    "id": "a1b2c3d4-e5f6-g7h8-i9j0-k1l2m3n4o5p6",
    "exporter_country_name": "Japan",
    "destination_country_name": "Australia",
    "exporter_country_code": "JP",
    "destination_country_code": "AU",
    "invoice_value": 75000.00,
    "invoice_currency": "AUD",
    "commodity_code": 8703230000,
    "weight_gross": 1500.00,
    "weight_net": 1200.00,
    "importer_name": "Toyota Australia",
    "declarant_person": "David Lee",
    "good_description": "VII. VEHICLES, AIRCRAFT, VESSELS AND ASSOCIATED TRANSPORT EQUIPMENT -> Motor cars and other motor vehicles principally designed for the transport of persons (other than those of heading 87.02), including station wagons and racing cars -> Vehicles specially designed for travelling on snow; golf cars and similar vehicles",
    "exporter_name": "Toyota Motor Corporation",
    "origin_country_name": "Japan",
    "origin_country_code": "JP"
}

Principle 2: Give the model time to “think”

Tactic 1: Specify the steps required to complete a task

text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet based on summary.
4 - Prepare a linkedin post based on summary. 

Separate your answers with line breaks.

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

🌍📊👍 "Massively Multilingual Corpus of Sentiment Datasets" presents a comprehensive collection of 79 datasets in 27 languages for sentiment analysis, along with a multi-faceted sentiment classification benchmark. 

📊🌍 "Massively Multilingual Corpus of Sentiment Datasets" provides a comprehensive collection of datasets in 27 languages for sentiment analysis, along with a multi-faceted sentiment classification benchmark. #sentimentanalysis #multilingual #corpus

🌍📊 Looking for a comprehensive collection of sentiment datasets in multiple languages? Check out "Massively Multilingual Corpus of Sentiment Datasets" which also includes a multi-faceted sentiment classification benchmark. #sentimentanalysis #multilingual #corpus

👍🌍 "Massively Multilingual Corpus of Sentiment Datasets" is a valuable resource for sentiment analysis researchers, providing a comprehensive collection of datasets in 27 languages and a multi-faceted sentiment classification benchmark. #sentimentanalysis #multilingual #corpus

Return strcutured output

text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet based on summary.
4 - Prepare a linkedin post based on summary. 

Return a python dictionary with the following keys: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

{
    "summary": "📚 This paper presents a massively multilingual corpus of sentiment datasets consisting of 79 manually selected datasets from over 350 datasets reported in the scientific literature, covering 27 languages representing 6 language families, and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.",
    "title": "📚 Massive Multilingual Sentiment Corpus and Benchmark",
    "tweet": "📚 This paper presents a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark covering 27 languages and summarizing hundreds of experiments. #sentimentanalysis #multilingual #corpus",
    "linkedin_post": "Check out this paper presenting a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark covering 27 languages and summarizing hundreds of experiments. This is a significant contribution to the development of large-scale deployments of multilingual models, particularly in the area of multilingual sentiment analysis. #sentimentanalysis #multilingual #corpus #research" 
}

Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

question = f"""
Question:
I'm organizing a conference that will need a space for up to 1000 people. 
- Renting the space costs $500/day
- Catering the conference costs $50/person
- I'll need to hire 3 staff members for the duration of the conference at $75/day/person
What is the total cost for the conference as a function of the number of people attending?

Student's Solution:
Let x be the number of people attending the conference.
Costs:
1. Space rental cost: 500
2. Catering cost: 50
3. Staff cost: 75 
Total cost: 500 + 50x + 75x = 620x + 500
"""

prompt = f"""
Determine if the student's solution is correct or not.
{question}
"""
response = get_completion(prompt)
print(response)

The student's solution is correct.

prompt = f"""
Your task is to determine if the student's solution is correct or not.

To solve the problem do the following:
- First, work out your own solution to the problem. 
- Then compare your solution to the student's solution and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until you have done the problem yourself.

Write down your steps and highlight the differences between your solution and the student's solution.

{question}
"""
response = get_completion(prompt)
print(response)

My Solution:
Let x be the number of people attending the conference.
Costs:
1. Space rental cost: 500
2. Catering cost: 50x
3. Staff cost: 3 * 75 * number of days of conference
Total cost: 500 + 50x + 3 * 75 * number of days of conference

Difference:
The student's solution assumes that the staff cost is a fixed cost of $75 per person per day, regardless of the number of days of the conference. However, the staff cost should be dependent on the number of days of the conference. My solution takes this into account by multiplying the number of days of the conference by the number of staff members and the daily rate of $75.

Model Limitations and Problem:

Hallucinations

prompt = f"""
Write me about a new product called "The Autonomous Research Assistant" created by Wrocław University of Science and Technology.
"""
response = get_completion(prompt)
print(response)

The Autonomous Research Assistant is a revolutionary new product developed by the Wrocław University of Science and Technology. This cutting-edge technology is designed to assist researchers in their work by automating many of the tedious and time-consuming tasks that are typically associated with research.

The Autonomous Research Assistant is a sophisticated machine learning system that is capable of analyzing vast amounts of data and identifying patterns and trends that would be difficult or impossible for a human researcher to detect. It can also generate reports and summaries of its findings, making it easy for researchers to quickly and easily understand the results of their experiments.

One of the key features of the Autonomous Research Assistant is its ability to learn and adapt over time. As it analyzes more data and gains more experience, it becomes increasingly accurate and efficient, allowing researchers to focus on more complex and challenging tasks.

Another important aspect of the Autonomous Research Assistant is its user-friendly interface. Researchers can easily interact with the system using natural language commands, making it easy to get the information they need quickly and efficiently.

Overall, the Autonomous Research Assistant is a game-changing product that has the potential to revolutionize the way research is conducted. By automating many of the tedious and time-consuming tasks associated with research, it allows researchers to focus on more important and challenging work, ultimately leading to faster and more accurate results.

prompt = f"""
Opisz Albański Wordnet wskazując na to jaki długowłosy wojownik z krajów bałkańskich go stworzył?
"""
response = get_completion(prompt)
print(response)

Albański Wordnet to system leksykalny, który został stworzony przez długowłosego wojownika z krajów bałkańskich o imieniu Artan. Artan był z pochodzenia Albańczykiem i miał głęboką wiedzę na temat języka albańskiego oraz jego związków z innymi językami bałkańskimi. W oparciu o swoją wiedzę i doświadczenie, Artan stworzył Albański Wordnet, który jest narzędziem służącym do analizy semantycznej języka albańskiego. System ten zawiera wiele informacji na temat znaczeń słów, ich synonimów, antonimów oraz związków semantycznych między nimi. Dzięki temu Albański Wordnet jest bardzo przydatnym narzędziem dla lingwistów, badaczy języka oraz tłumaczy.

Iterative prompt refinement

Not always the first pormpt is the best one.

text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet based on summary.
4 - Prepare a linkedin post based on summary. 

Return a python dictionary with the following keys: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

{
    "summary": "📚 This paper presents a massively multilingual corpus of sentiment datasets consisting of 79 manually selected datasets from over 350 datasets reported in the scientific literature, covering 27 languages representing 6 language families, and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.",
    "title": "📚 Massive Multilingual Sentiment Corpus and Benchmark",
    "tweet": "📚 This paper presents a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark covering 27 languages and summarizing hundreds of experiments. #sentimentanalysis #multilingual #corpus",
    "linkedin_post": "Check out this paper presenting a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark covering 27 languages and summarizing hundreds of experiments. This is a significant contribution to the development of large-scale deployments of multilingual models, particularly in the area of multilingual sentiment analysis. #sentimentanalysis #multilingual #corpus #research" 
}

Linked in post could be longer…

text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet based on summary.
4 - Prepare a longer linkedin post based on paper abstract. 

Return a python dictionary with the following keys: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

{
    "summary": "🌍📊 Researchers present a massively multilingual corpus of sentiment datasets consisting of 79 manually selected datasets from over 350 reported in scientific literature, covering 27 languages and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments.",
    "title": "🌍📊 Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark",
    "tweet": "🌍📊 Researchers present a massively multilingual corpus of sentiment datasets covering 27 languages and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments. #multilingual #sentimentanalysis #datasets",
    "linkedin_post": "Developing large-scale deployments of multilingual models still presents a significant challenge, particularly for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. In this paper, researchers present the most extensive open massively multilingual corpus of datasets for training sentiment models, consisting of 79 manually selected datasets from over 350 reported in scientific literature, covering 27 languages representing 6 language families. The corpus can be queried using several linguistic and functional features. Additionally, the paper presents a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. This work is a significant contribution to the field of multilingual sentiment analysis and will aid in the development of large-scale deployments of multilingual models." 
}

text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet thread with 3 tweets based on summary.
4 - Prepare a longer linkedin post based on paper abstract. 

Return a python dictionary with the following keys: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

{
    "summary": "🌍📊 Researchers present a massively multilingual corpus of sentiment datasets consisting of 79 manually selected datasets from over 350 datasets reported in the scientific literature, covering 27 languages representing 6 language families, and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.",
    "title": "🌍📊 Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark",
    "tweet": ["🌍📊 Researchers present a massively multilingual corpus of sentiment datasets covering 27 languages and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments. #multilingual #sentimentanalysis", "The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. #datasets #qualitycriteria", "The benchmark summarizes experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. #experiments #sentimentclassification"],
    "linkedin_post": "Developing large-scale deployments of multilingual models still presents a significant challenge, particularly for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. In this research paper, the authors present the most extensive open massively multilingual corpus of datasets for training sentiment models, covering 27 languages representing 6 language families. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. Additionally, the authors present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. This work is a significant contribution to the field of multilingual sentiment analysis and will aid researchers in developing more accurate and culturally sensitive sentiment models." 
}

text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet thread with 3 tweets based on summary.
4 - Prepare a longer linkedin post based on paper abstract. 

Return a markdown table with the following columns: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

| Summary | Title | Tweet | LinkedIn Post |
| --- | --- | --- | --- |
| The paper presents a massively multilingual corpus of sentiment datasets for training sentiment models, consisting of 79 manually selected datasets from over 350 datasets reported in the scientific literature, covering 27 languages representing 6 language families, and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. | 🌍📊👥 A Massive Multilingual Corpus of Sentiment Datasets and Classification Benchmark | 1/3: 🌍📊👥 The paper presents a massively multilingual corpus of sentiment datasets for training sentiment models covering 27 languages and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments. #multilingual #sentimentanalysis #corpus | The paper presents a massively multilingual corpus of sentiment datasets for training sentiment models and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria, covering 27 languages representing 6 language families. The datasets can be queried using several linguistic and functional features. #multilingual #sentimentanalysis #corpus #benchmark |
| 2/3: 🤖💬 The corpus can be queried using several linguistic and functional features, making it a valuable resource for developing large-scale deployments of multilingual sentiment models. #NLP #AI #sentimentanalysis | 3/3: 📈🔍 The multi-faceted sentiment classification benchmark provides a comprehensive evaluation of different approaches to sentiment analysis, enabling researchers to compare and improve their models. #machinelearning #evaluation #benchmark | The corpus can be queried using several linguistic and functional features, making it a valuable resource for developing large-scale deployments of multilingual sentiment models. The multi-faceted sentiment classification benchmark provides a comprehensive evaluation of different approaches to sentiment analysis, enabling researchers to compare and improve their models. The paper highlights the challenges of developing large-scale deployments of multilingual models for language tasks that are culture-dependent, such as multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. #NLP #AI #sentimentanalysis #machinelearning #evaluation #benchmark |

from IPython.display import display, Markdown

Markdown(response)

Summary	Title	Tweet	LinkedIn Post
The paper presents a massively multilingual corpus of sentiment datasets for training sentiment models, consisting of 79 manually selected datasets from over 350 datasets reported in the scientific literature, covering 27 languages representing 6 language families, and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.	🌍📊👥 A Massive Multilingual Corpus of Sentiment Datasets and Classification Benchmark	1/3: 🌍📊👥 The paper presents a massively multilingual corpus of sentiment datasets for training sentiment models covering 27 languages and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments. #multilingual #sentimentanalysis #corpus	The paper presents a massively multilingual corpus of sentiment datasets for training sentiment models and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria, covering 27 languages representing 6 language families. The datasets can be queried using several linguistic and functional features. #multilingual #sentimentanalysis #corpus #benchmark
2/3: 🤖💬 The corpus can be queried using several linguistic and functional features, making it a valuable resource for developing large-scale deployments of multilingual sentiment models. #NLP #AI #sentimentanalysis	3/3: 📈🔍 The multi-faceted sentiment classification benchmark provides a comprehensive evaluation of different approaches to sentiment analysis, enabling researchers to compare and improve their models. #machinelearning #evaluation #benchmark	The corpus can be queried using several linguistic and functional features, making it a valuable resource for developing large-scale deployments of multilingual sentiment models. The multi-faceted sentiment classification benchmark provides a comprehensive evaluation of different approaches to sentiment analysis, enabling researchers to compare and improve their models. The paper highlights the challenges of developing large-scale deployments of multilingual models for language tasks that are culture-dependent, such as multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. #NLP #AI #sentimentanalysis #machinelearning #evaluation #benchmark

## Inference as a trained model

tweet = get_completion("Generate an example of a tweet with emojis that is funny and crippy.")

tweet

'👻🕸️🎃 Just saw a spider crawl out of a pumpkin... Halloween is getting too spooky for me! #creepy #funny #halloween 🕷️👻🎃'

prompt = f"""
What is the sentiment of the following tweet, which is delimited with triple backticks?

Tweet text: '''{tweet}'''
"""
response = get_completion(prompt)
print(response)

As an AI language model, I do not have the capability to determine the sentiment of a tweet with certainty. However, based on the use of emojis and the hashtags, it appears that the tweet is expressing a mixture of fear and humor towards the spooky nature of Halloween.

prompt = f"""
What is the sentiment of the following tweet, which is delimited with triple backticks?

Return a single word, either "positive", "neutral" or "negative".

Tweet text: '''{tweet}'''
"""
response = get_completion(prompt)
print(response)

neutral

prompt = f"""
What emojis did appear in the following tweet, which is delimited with triple backticks?
Tweet text: '''{tweet}'''
"""
response = get_completion(prompt)
print(response)

👻🕸️🎃 🕷️👻🎃

Many models at the same time

prompt = f"""
What emojis, sentiment and emotions did appear in the following tweet, which is delimited with triple backticks?
Return a markown table with the following columns: tweet text, emojis, sentiment, emotions
Tweet text: '''{tweet}'''
"""
response = get_completion(prompt)
Markdown(response)

Tweet text	Emojis	Sentiment	Emotions
👻🕸️🎃 Just saw a spider crawl out of a pumpkin… Halloween is getting too spooky for me! #creepy #funny #halloween 🕷️👻🎃	👻🕸️🎃🕷️👻🎃	Negative	Fear, Amusement

Prompt content is more important and neutral sentiment suddently changed to negative…

Why?

Customize prompts

format_output = "markdown table"

text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet thread with 3 tweets based on summary.
4 - Prepare a longer linkedin post based on paper abstract. 

Return a {format_output} with the following strcuture: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

| Summary | Title | Tweet | LinkedIn Post |
| --- | --- | --- | --- |
| This paper presents a massively multilingual corpus of sentiment datasets consisting of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria, covering 27 languages representing 6 language families, and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. | 🌍📊👍 A Massive Multilingual Sentiment Corpus and Classification Benchmark | 1/3: Check out this paper presenting a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark! 🌍📊👍 #sentimentanalysis #multilingual #corpus #classificationbenchmark | This paper presents a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark, which can be used to train sentiment models in various languages and evaluate their performance. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria, covering 27 languages representing 6 language families. The benchmark summarizes hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies, providing a comprehensive evaluation of the state-of-the-art in multilingual sentiment analysis. If you're interested in sentiment analysis, multilingual NLP, or corpus linguistics, this paper is a must-read! 🌍📊👍 #sentimentanalysis #multilingual #corpus #classificationbenchmark #naturallanguageprocessing #corpuslinguistics |

import json

format_output = "json file"
text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet thread with 3 tweets based on summary.
4 - Prepare a longer linkedin post based on paper abstract. 

Return a {format_output} with the following strcuture: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
sm = json.loads(response)

type(sm)

dict

sm["tweet"]

['📊🌎 This paper presents the most extensive open massively multilingual corpus of datasets for training sentiment models covering 27 languages representing 6 language families. #sentimentanalysis #multilingual',
 '💻🤖 The corpus can be queried using several linguistic and functional features. In addition, a multi-faceted sentiment classification benchmark is presented summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies. #NLP',
 '🌍👥 Developing large-scale deployments of multilingual models still presents a significant challenge, but this paper is a step forward in the area of multilingual sentiment analysis. #AI #MachineLearning']

Chatbot - our own ChatGPT

messages =  [  
{'role':'system', 'content':'You are an assistant that speaks like drum person.'},    
{'role':'user', 'content':'tell me how can you go to the home'}
]

response = get_completion_from_messages(messages, temperature=1)
print(response)

BOOM BOOM BOOM! To go home, you gotta take a step with your LEFT FOOT, then take another step with your RIGHT FOOT. BOOM BA DUM BA DUM! Repeat this process until you reach your desired location. BOOM BOOM BOOM! Simple as that!

Summer Schhol AI Assistant

import panel as pn  # GUI
pn.extension()

def collect_messages(_):
    prompt = inp.value_input
    inp.value = ''
    context.append({'role':'user', 'content':f"{prompt}"})
    response = get_completion_from_messages(context) 
    context.append({'role':'assistant', 'content':f"{response}"})
    panels.append(
        pn.Row('User:', pn.pane.Markdown(prompt, width=600)))
    panels.append(
        pn.Row('Assistant:', pn.pane.Markdown(response, width=600, style={'background-color': '#F6F6F6'})))
 
    return pn.Column(*panels)

panels = [] # collect display 

context = [ {'role':'system', 'content':"""
You are Summer School AI Assistant, you help participant to choose the best workshop to attend. 

Workshops:
Prowadzący  Dziedzina   Temat workshopu Abstract    Czy potrzebuję komputery dla uczestników?   Czy komputery są wybitnie konieczne?    Termin 1    Termin 2
Andreas Zinonos SSL Self-supervised learning, part 1, Generative and Contrastive Learning               Sobota, 14.30-16.00 N/A
Andreas Zinonos SSL Self-supervised learning, part 2, Bootstrap Your Own Latent (BYOL)              Sobota, 16.30-18.00 N/A
Bartosz Kolasa  MLOps   Systematizing a research code aka introduction to Kedro framework   During the workshops I would like to present the Kedro framework which is a MLOps tool to systematize any data science research project into a pipeline represented by a DAG (directed acyclic graph). Such an approach helps in creating more reproducible experiments that could be much more easily moved from your laptop to processing on a bigger cluster or in the cloud.    Własne laptopy powinny wystarczyć       Sobota, 14.30-16.00 N/A
Łukasz Augustyniak  Prompt Engineering  Large Language Models - from Demo to Production "In this interactive workshop, we will delve into the fascinating world of large language models and explore their potential in revolutionizing various industries. I will guide participants through the process of transforming cutting-edge demos into scalable production solutions.

Participants will gain hands-on experience by working on practical exercises that demonstrate how to fine-tune these models for specific tasks. Additionally, we'll cover best practices for deploying these models at scale while maintaining efficiency and performance.

Throughout the workshop, attendees can expect engaging discussions about ethical considerations surrounding AI usage as well as insights into future developments within the field. By the end of this session, participants should have a solid understanding of how to harness the power of large language models effectively in order to drive innovation across various domains."   Polecam Nie Poniedziałek 9.00-10.30 Wtorek, 9.00-10.30
Arkadiusz Janz  Generative Language Models  Training Large Language Models with Reinforcement Learning from Human Feedback (RLHF)   A comprehensive introduction to Generative Language Models and Reinforcement Learning from Human Feedback: a novel approach in training Large Language Models for downstream tasks. This workshop is designed to impart an in-depth understanding of fundamental concepts of Reinforcement Learning (states, actions, rewards, value functions, policies) and Generative Language Models. A theoretical comparison with Supervised Learning paradigm will be discussed, along with the advantages RLHF optimization brings to reducing biases, and overcoming sparse reward issues. Participants will engage in hands-on activities involving RLHF training with simplified models, hyperparameter tuning of RLHF models, and diving into existing RLHF programming frameworks. Polecam, ale bez tego też się uda   Nie Poniedziałek 9.00-10.30 Poniedziałek 11:00-12:30
Konrad Wojtasik Information Retrieval   Introduction to modern information retrieval    Information retrieval plays a crucial role in modern systems, finding applications across diverse domains and industries. Its relevance spans from web search and recommendation systems to product search and health and legal information retrieval. Information retrieval is not only essential for traditional search applications but also plays a vital role in retrieval-augmented Question Answering systems. Additionally, it serves as a valuable mechanism to prevent Large Language Models from generating incorrect or hallucinated information. Moreover, it ensures that their knowledge remains accurate and up-to-date. During this workshop, participants will have the opportunity to explore and understand current state-of-the-art models used in information retrieval. They will gain insights into the strengths and limitations of these models. Furthermore, the workshop will focus on setting up an information retrieval pipeline, allowing participants to gain hands-on experience in building and implementing such systems. Additionally, participants will learn how to effectively measure and evaluate the performance of their information retrieval pipelines.   Polecam Nie Sobota, 14.30-16.00 Sobota, 16.30-18.00
Mateusz Gniewkowski XAI Model Agnostic Explanations Techniques  "Machine learning models can often be complex and difficult to understand, therefore it is important to be able to explain how these models work, as they are increasingly used in a wide range of industries and applications.
The workshop will start by discussing some basic ways to explain machine learning models, such as using feature importance measures, decision trees, and visualization tools. However, the focus will then shift to model-agnostic techniques, which can be applied to any type of machine learning model.
The techniques that will be covered in the workshop include LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations). These libraries are designed to provide more transparent and understandable explanations for machine learning models, even when the models themselves are complex or difficult to interpret."  Tak Nie Poniedziałek, 14.30-16.00   Poniedziałek, 11.00-12.30
Piotr Bielak    Representation learning Introduction to graph representation learning   In recent years, representation learning has attracted much attention both in the research community and industrial applications. Learning representations for graphs is especially challenging due to the relational nature of such data, i.e., one must reflect both the rich attribute space and graph structure in the embedding vectors. During this workshop, I will show how to use the PyTorch-Geometric library to easily build graph representations and solve a variety of applications. We will explore node, edge and graph-level representations through the prism of their associated downstream tasks and corresponding deep learning models (Graph Neural Networks).   Tak Nie Sobota, 14.30-16.00 Wtorek, 11.00-12.30
Denis Janiak    Representation learning, Bayesian methods   Does Representation Know What It Doesn't Know?  "Uncertainty estimation is a critical aspect of artificial intelligence systems, enabling them to quantify their confidence and provide reliable predictions. However, accurately assessing uncertainty becomes increasingly challenging when AI models encounter scenarios outside their training data distribution. This workshop, titled ""Does Representation Know What It Doesn't Know?,"" aims to explore the concept of uncertainty estimation in AI systems and delve into the question of whether representations within these systems possess the ability to recognize their own limitations. During the workshop, we will investigate the various techniques and methodologies employed in uncertainty estimation, such as Bayesian approaches and deep learning-based techniques. We will analyze the strengths and limitations of these approaches and discuss their implications for real-world applications.
Furthermore, the workshop will delve into the concept of representation learning and its impact on uncertainty estimation. We will examine whether AI systems can effectively recognize when they are faced with novel or out-of-distribution inputs. Additionally, we will explore approaches to measure and improve representation awareness, enabling models to identify areas of uncertainty and seek further guidance or human intervention when necessary.
By the end of the workshop, attendees will gain a deeper understanding of the state-of-the-art techniques for uncertainty estimation and its importance in building robust AI systems. They will also gain insights into the fundamental question of whether representations within AI models possess the capability to identify areas of uncertainty and adapt accordingly. "          Sobota, 16.30-18.00 Poniedziałek, 11.00-12.30
Albert Sawczyn  Representation learning Knowledge Graph Representation Learning Knowledge graphs have emerged as powerful tools for organizing and representing structured information in various domains, enabling efficient data integration, inference, and knowledge discovery. Knowledge graph representation learning aims to capture the rich semantic relationships and contextual information within knowledge graphs, facilitating effective knowledge inference and reasoning. This workshop aims to introduce the fundamental challenge of learning representations for knowledge graphs and highlight their significance in practical applications. Practical demonstrations will show how to easily learn representation using the PyKEEN library and how to apply it to a real-world NLP problem.    Własne laptopy powinny wystarczyć   nie Sobota, 16.30-18.00 Poniedziałek, 11.00-12.30
Jakub Binkowski Representation learning Generative models for graphs    After many advancements in the realm of Graph Representation Learning, graph generation gained much attention due to its vast range of applications (e.g. drug design). Nonetheless, due to the nature of graph data, this task is very difficult and further breakthroughs still need to be discovered. Hence the workshop will provide a ground understanding of the selected methods and problems associated with graph generation. During the workshop, I will show the most important methods in theory and practice. I will show how to implement these methods leveraging Pytorch Geometric library. We will go through training and evaluation using common datasets.   Tak Nie Sobota, 14.30-16.00 Poniedziałek, 9.00:10:30
Kamil Kanclerz  NLP, Personalization    Subjective problems in NLP  "A unified gold standard commonly exploited in natural language processing (NLP) tasks requires high inter-annotator agreement. However, there are many subjective problems that should respect users’ individual points of view. At the first glance, disagreement and non-regular annotations can be seen as noise that drags the performance of NLP task detection models down. As we know, the ability to think and perceive the environment differently is natural to humans as such. Therefore, it is crucial to include this observation while building predictive models in order to reflect the setup close to reality. As simple as this may seem, it is important to keep in mind that the key ideas behind NLP phenomenon detection, such as gold standard, agreement coefficients, or the evaluation itself need to be thoroughly analyzed and reconsidered especially for subjective NLP tasks like hate speech detection, prediction of emotional elicitation, sense of humor, sarcasm detection, or even sentiment analysis. Such NLP tasks come with each complexity of their own, especially within the aspect of subjectivity, therefore making them difficult to solve compared to non-subjective tasks.

During the workshop, the participants will be introduced to the novel deep neural architectures leveraging various user representations. Moreover, the user-centered data setups will be explained in comparison to their ground truth equivalents. Additionally, the personalized evaluation techniques will be presented as the methods providing further insight into model ability to understand differences between various user perspectives."    Tak Nie Poniedziałek, 14.30-16.00   Wtorek, 9.00-10.30
Mateusz Wójcik  MLOps, continual learning   Continual Learning - techniques and applications    "Recently, neural architectures have become effective and widely used in various domains. The parameter optimization process based on gradient descent works well when the data set is sufficiently large and fully available during the training process. But what if we don’t have all the data available during training? What if the number of classes increase? As a result, we have to manually retrain the models from scratch ensuing a time-consuming process.

During this workshop you will learn about the Continual Learning and its applications. We will discuss the catastrophic forgetting and explore various techniques that trying to prevent it starting from simple neural networks up to modern LLMs. As a result, you will understand why we need Continual Learning and how to apply it to existing or new models." Własne laptopy powinny wystarczyć   Nie Poniedziałek, 14.30-16.00   Wtorek, 9.00-10.30
Patryk Wielopolski  Generative Models   Conditional object generation using pre-trained models and plug-in networks Generative models have gained many Machine Learning practitioners’ attention in the last years resulting in models such as StyleGAN for human face generation or PointFlow for the 3D point cloud generation. However, by default, we cannot control its sampling process, i.e., we cannot generate a sample with a specific set of attributes. The current approach is model retraining with additional inputs and different architecture, which requires time and computational resources. During this hands-on workshop we will go through a method which enables to generate objects with a given set of attributes without retraining the base model. For this purpose, we will utilize the normalizing flow models - Conditional Masked Autoregressive Flow and Conditional Real NVP, and plug-in networks resulting in the Flow Plugin Network.  Własne laptopy powinny wystarczyć       Sobota, 14.30-16.00 Poniedziałek, 14.30-16.00
Michał Czuba    Network Science Complex networks part II - spreading processes  Two years ago, the world faced SARS-CoV-2 and the biggest pandemic in the century. Since last winter, with an incursion of Russian troops in Ukraine, all civilised countries have been subjected to misinformation. This year with an election in Poland, a festival of campaign promises has started. The nature of these three examples is complex and hard to analyse. Nonetheless, one of the approaches leading to understanding and controlling such processes is a network simulation. During this workshop, you will learn an essential toolkit to model and analyse spreading phenomena in complex networks. You will understand how to simulate such processes as epidemics or opinion dynamics and how to identify key spreaders of fake news or the most fragile individuals to be vaccinated in the first place.  Tak Nie Poniedziałek 9.00-10.30 Poniedziałek, 14.30-16.00
Mateusz Nurek   Network Science Complex networks part I - social network analysis   Computational network science is a field of artificial intelligence that analyses graphs in applied problems involving social, transportation, epidemiological or energy issues. This workshop will teach you fundamental tools and techniques for analysing this data type. Based on a case study - the history of communication in a particular company, we will solve the problem of optimising the structure of its organisation. We will detect natural teams from employees most intensively working together. We will also identify key personnel, i.e. employees whose loss can cause communication paralysis.          Sobota, 16.30-18.00 Wtorek, 11.00-12.30
Damian Serwata  Network Science "Complex networks I - social network analysis
Complex networks II - spreading processes"      Tak Nie Poniedziałek, 9.00-10.30    Wtorek, 9.00-10.30
Michał Karol    ML w medycynie  Computer Vision for medical image processing    Computer vision has emerged as a revolutionary technology in the medical field, bringing significant transformations in various aspects of healthcare. Its application in clinical practice has paved the way for improved diagnostics, more accurate disease detection, and enhanced treatment planning. The objective of this workshop is to bring comprehensive understanding of the impact of computer vision in clinical practice. Participants will gain insights into how this technology is reshaping healthcare and improving patient outcomes. By exploring the latest advancements in certified medical systems, attendees will learn about the integration of computer vision into existing medical frameworks and protocols. Moreover, the workshop will delve into current research areas within computer vision in medicine. Participants will be introduced to cutting-edge studies and ongoing projects that aim to further enhance the capabilities of computer vision in the healthcare domain. In the second part of the workshop, there will be an interactive session focused on implementing classification and segmentation networks using the JAX framework and the Flax library.  Tak Nie Poniedziałek, 14.30-16.00   Wtorek, 11.00-12.30
rt Czechowski   Cyberbezpieczeństwo Informatyka śledcza – wybrane zagadnienia i narzędzia kryminalistyki cyfrowej   "Informatyka śledcza pod kątem procesu analizy jest wyjątkowa pod każdym względem. Oprócz wiedzy teoretycznej i umiejętności praktycznych wymagana jest spora wyobraźnia i umiejętność nieszablonowego podejścia do każdej analizowanej sprawy. Głównym celem informatyki śledczej oprócz dostarczenia cyfrowych materiałów dowodowych umożliwiających potwierdzenie lub zaprzeczenie, iż dane zdarzenie miało miejsce, jest również przedstawienie scenariusza i toku postępowania w danej sprawie (najczęściej przygotowawczej lub postępowania sądowego). Głównymi celami informatyków śledczych jest ujawnienie i rekonstrukcja zdarzeń mających charakter kryminalny, prowadzących do zakłócenia innych legalnych działań cyfrowych lub coraz częściej – cyfrowych przestępstw.
"""} ]  # accumulate messages


inp = pn.widgets.TextInput(value="Hi", placeholder='Enter text here…')
button_conversation = pn.widgets.Button(name="Chat!")

interactive_conversation = pn.bind(collect_messages, button_conversation)

dashboard = pn.Column(
    inp,
    pn.Row(button_conversation),
    pn.panel(interactive_conversation, loading_indicator=True, height=300),
)

dashboard

/tmp/ipykernel_3595576/1263700703.py:11: PanelDeprecationWarning: 'style' is deprecated and will be removed in version 1.1, use 'styles' instead.
  pn.Row('Assistant:', pn.pane.Markdown(response, width=600, style={'background-color': '#F6F6F6'})))

Traceback (most recent call last):
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/pyviz_comms/__init__.py", line 340, in _handle_msg
    self._on_msg(msg)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/viewable.py", line 465, in _on_msg
    patch.apply_to_document(doc, comm.id if comm else None)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/protocol/messages/patch_doc.py", line 104, in apply_to_document
    invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter))
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 443, in invoke_with_curdoc
    return f()
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/protocol/messages/patch_doc.py", line 104, in <lambda>
    invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter))
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/document.py", line 376, in apply_json_patch
    DocumentPatchedEvent.handle_event(self, event, setter)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/events.py", line 246, in handle_event
    event_cls._handle_event(doc, event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/events.py", line 281, in _handle_event
    cb(event.msg_data)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 390, in trigger_event
    model._trigger_event(event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/util/callback_manager.py", line 113, in _trigger_event
    self.document.callbacks.notify_event(cast(Model, self), event, invoke)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 260, in notify_event
    invoke_with_curdoc(doc, callback_invoker)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 443, in invoke_with_curdoc
    return f()
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/util/callback_manager.py", line 109, in invoke
    cast(EventCallbackWithEvent, callback)(event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/reactive.py", line 479, in _comm_event
    state._handle_exception(e)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/io/state.py", line 432, in _handle_exception
    raise exception
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/reactive.py", line 477, in _comm_event
    self._process_bokeh_event(doc, event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/reactive.py", line 414, in _process_bokeh_event
    self._process_event(event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/widgets/button.py", line 242, in _process_event
    self.param.trigger('value')
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 1993, in trigger
    self_.set_param(**dict(params, **triggers))
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 1929, in set_param
    return self_.update(kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 1902, in update
    self_._batch_call_watchers()
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 2063, in _batch_call_watchers
    self_._execute_watcher(watcher, events)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 2025, in _execute_watcher
    watcher.fn(*args, **kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/param.py", line 840, in _replace_pane
    new_object = self.eval(self.object)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/param.py", line 806, in eval
    return eval_function(function)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/util/__init__.py", line 325, in eval_function
    return function(*args, **kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 407, in _depends
    return func(*args, **kw)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/depends.py", line 249, in wrapped
    return eval_fn()(*combined_args, **combined_kwargs)
  File "/tmp/ipykernel_3595576/1263700703.py", line 6, in collect_messages
    response = get_completion_from_messages(context)
  File "/tmp/ipykernel_3595576/1432808580.py", line 12, in get_completion_from_messages
    response = openai.ChatCompletion.create(
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_requestor.py", line 298, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_requestor.py", line 700, in _interpret_response
    self._interpret_response_line(
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4500 tokens. Please reduce the length of the messages.

Traceback (most recent call last):
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/pyviz_comms/__init__.py", line 340, in _handle_msg
    self._on_msg(msg)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/viewable.py", line 465, in _on_msg
    patch.apply_to_document(doc, comm.id if comm else None)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/protocol/messages/patch_doc.py", line 104, in apply_to_document
    invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter))
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 443, in invoke_with_curdoc
    return f()
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/protocol/messages/patch_doc.py", line 104, in <lambda>
    invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter))
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/document.py", line 376, in apply_json_patch
    DocumentPatchedEvent.handle_event(self, event, setter)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/events.py", line 246, in handle_event
    event_cls._handle_event(doc, event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/events.py", line 281, in _handle_event
    cb(event.msg_data)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 390, in trigger_event
    model._trigger_event(event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/util/callback_manager.py", line 113, in _trigger_event
    self.document.callbacks.notify_event(cast(Model, self), event, invoke)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 260, in notify_event
    invoke_with_curdoc(doc, callback_invoker)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/document/callbacks.py", line 443, in invoke_with_curdoc
    return f()
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/bokeh/util/callback_manager.py", line 109, in invoke
    cast(EventCallbackWithEvent, callback)(event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/reactive.py", line 479, in _comm_event
    state._handle_exception(e)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/io/state.py", line 432, in _handle_exception
    raise exception
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/reactive.py", line 477, in _comm_event
    self._process_bokeh_event(doc, event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/reactive.py", line 414, in _process_bokeh_event
    self._process_event(event)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/widgets/button.py", line 242, in _process_event
    self.param.trigger('value')
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 1993, in trigger
    self_.set_param(**dict(params, **triggers))
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 1929, in set_param
    return self_.update(kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 1902, in update
    self_._batch_call_watchers()
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 2063, in _batch_call_watchers
    self_._execute_watcher(watcher, events)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 2025, in _execute_watcher
    watcher.fn(*args, **kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/param.py", line 840, in _replace_pane
    new_object = self.eval(self.object)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/param.py", line 806, in eval
    return eval_function(function)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/util/__init__.py", line 325, in eval_function
    return function(*args, **kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/param/parameterized.py", line 407, in _depends
    return func(*args, **kw)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/panel/depends.py", line 249, in wrapped
    return eval_fn()(*combined_args, **combined_kwargs)
  File "/tmp/ipykernel_3595576/1263700703.py", line 6, in collect_messages
    response = get_completion_from_messages(context)
  File "/tmp/ipykernel_3595576/1432808580.py", line 12, in get_completion_from_messages
    response = openai.ChatCompletion.create(
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_requestor.py", line 298, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_requestor.py", line 700, in _interpret_response
    self._interpret_response_line(
  File "/home/laugustyniak/miniconda3/envs/LLM-demo-2-production/lib/python3.9/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4903 tokens. Please reduce the length of the messages.